cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Dhruv-22
by Contributor
  • 78 Views
  • 3 replies
  • 2 kudos

Reading empty json file in serverless gives error

I ran a databricks notebook to do incremental loads from files in raw layer to bronze layer tables. Today, I encountered a case where the delta file was empty. I tried running it manually on the serverless compute and encountered an error.df = spark....

  • 78 Views
  • 3 replies
  • 2 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 2 kudos

Hello @Dhruv-22 , Can you share the schema of the df? Do you have a _corrupt_record column in your dataframe? If yes.. where are you getting it from, because you said its an empty file correct?As per the design ,Spark blocks queries that only referen...

  • 2 kudos
2 More Replies
VaDim
by New Contributor III
  • 77 Views
  • 1 replies
  • 0 kudos

transformWithStateInPandas. Invalid pickle opcode when updating ValueState with large (float) array

I am getting an error when the entity I need to store in a ValueState is a large array (over 15k-20k items). No error (and works correctly) if I trim the array to under 10k samples. The same error is raised when using it as a value for MapState or as...

  • 77 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The error you’re facing, specifically PySparkRuntimeError: Error updating value state: invalid pickle opcod, usually points to a serialization (pickling) problem when storing large arrays in Flink/Spark state such as ValueState, ListState, or MapStat...

  • 0 kudos
SamAdams
by Contributor
  • 41 Views
  • 1 replies
  • 0 kudos

Time window for "All tables are updated" option in job Table Update Trigger

I've been using the Table Update Trigger for some SQL alert workflows. I have a job that uses 3 tables with an "All tables updated" trigger:Table 1 was updated at 07:20 UTCTable 2 was updated at 16:48 UTCTable 3 was updated at 16:50 UTC-> Job is trig...

Data Engineering
jobs
TableUpdateTrigger
  • 41 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

There is no fixed or documented “window” time for the interval between updates to all monitored tables before a job with an "All tables updated" trigger runs in Databricks. The job is triggered as soon as every table in the set has seen at least one ...

  • 0 kudos
deano2025
by New Contributor II
  • 21 Views
  • 0 replies
  • 0 kudos

Databricks asset bundles CI/CD design for github actions

We are wanting to use Databricks asset bundles and deploy code changes and tests using github actions. We have seen lots of content online, but nothing concrete on how this is done at scale. So I'm wondering, if we have many changes and therefore man...

Data Engineering
asset bundles
  • 21 Views
  • 0 replies
  • 0 kudos
ak5har
by New Contributor II
  • 2700 Views
  • 9 replies
  • 2 kudos

Databricks connection to on-prem cloudera

Hello,     we are trying to evaluate Databricks solution to extract the data from existing cloudera schema hosted on physical server. We are using the Databricks serverless compute provided by databricks express setup and we assume we will not need t...

  • 2700 Views
  • 9 replies
  • 2 kudos
Latest Reply
Adrian_Ashley
New Contributor
  • 2 kudos

I work for a databricks partner called Cirata.  Our Data migrator offering allows  both data and metadata replication  from cloudera to be delivered to the databricks environment , whether this is just delivering it to the ADLS2 object storage or to ...

  • 2 kudos
8 More Replies
pepco
by New Contributor II
  • 35 Views
  • 2 replies
  • 0 kudos

Resolved! Environment in serverless

I'm playing little bit with on the Databricks free environment and I'm super confused by the documentation vs actual behavior. Maybe you could help me to understand better.For the workspace I can define base environment which I can use in serverless ...

Data Engineering
base environment
serverless
  • 35 Views
  • 2 replies
  • 0 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 0 kudos

Hello @pepco , Is it possible to use environments with notebook tasks? Yes—but only in a very specific way. Notebook tasks can use base environments, but you don’t attach them in the job’s YAML. You pick the base env in the notebook’s Environment sid...

  • 0 kudos
1 More Replies
JanFalta
by New Contributor
  • 22 Views
  • 0 replies
  • 0 kudos

Data Masking

Hi all,I need some help on this masking problem. If you create a view with used masking function based on table.The user reading this view has to have read access to underlying table. So theoretically, he can access unmasked data in the table.I would...

  • 22 Views
  • 0 replies
  • 0 kudos
KKo
by Contributor III
  • 325 Views
  • 1 replies
  • 0 kudos

On Prem MS sql to Azure Databricks

Hi allI need to ingest data from on prem MS sql tables using Databricks to Azure Cloud. For the ingest, previously I used notebooks, jdbc connectors, read sql tables and write in unity catalog tables. Now, I want to experiment Databricks connectors f...

  • 325 Views
  • 1 replies
  • 0 kudos
Latest Reply
AbhaySingh
Databricks Employee
  • 0 kudos

This is feature is good to go... I can't think of any disadvantages. Here is a guide.  https://landang.ca/2025/01/31/simple-data-ingestion-from-sql-server-to-databricks-using-lakeflow-connect/  

  • 0 kudos
Suheb
by New Contributor
  • 25 Views
  • 1 replies
  • 0 kudos

How have you set up a governance structure (data access control, workspace management, cluster polic

If your company uses Databricks with many people, how do you manage security, organize teams, and control costs — and what tools do you use to make it all work smoothly?

  • 25 Views
  • 1 replies
  • 0 kudos
Latest Reply
AbhaySingh
Databricks Employee
  • 0 kudos

Please take a look here to get some initial ideas. https://medium.com/databricks-unity-catalog-sme/a-practical-guide-to-catalog-layout-data-sharing-and-distribution-with-databricks-unity-catalog-763e4c7b7351  

  • 0 kudos
him
by New Contributor III
  • 25002 Views
  • 14 replies
  • 10 kudos

i am getting the below error while making a GET request to job in databrick after successfully running it

"error_code": "INVALID_PARAMETER_VALUE",  "message": "Retrieving the output of runs with multiple tasks is not supported. Please retrieve the output of each individual task run instead."}

Capture
  • 25002 Views
  • 14 replies
  • 10 kudos
Latest Reply
Octavian1
Contributor
  • 10 kudos

Hi @Debayan I'd suggest to also mention this explicitly in the documentation of the workspace client for get_run_outputOne has to pay extra attention to the examplerun_id=run.tasks[0].run_id otherwise it can be easily missed. 

  • 10 kudos
13 More Replies
toproximahk
by New Contributor
  • 134 Views
  • 3 replies
  • 0 kudos

Inquiry on GraphFrame Library Upgrade Timeline for Databricks Runtime for Machine Learning

Thanks for the Databricks community and maintaining such a valuable platform.I would like to inquire if there is a planned timeline for upgrading the GraphFrame library. We’ve noticed that the latest release on GitHub is v0.9.3, while the Databricks ...

  • 134 Views
  • 3 replies
  • 0 kudos
Latest Reply
Sem-Sinchenko
New Contributor
  • 0 kudos

You can try to add to your cluster mvn dependency manually ... For example, for spark 3.5.x it will be like:io.graphframes:graphframes-spark3_2.12:0.10.0and add a PyPi dependency graphframes-py. Adding maven coordinates should download and install al...

  • 0 kudos
2 More Replies
alhuelamo
by New Contributor II
  • 10301 Views
  • 5 replies
  • 1 kudos

Getting non-traceable NullPointerExceptions

We're running a job that's issuing NullPointerException without traces of our job's code.Does anybody know what would be the best course of action when it comes to debugging these issues?The job is a Scala job running on DBR 11.3 LTS.In case it's rel...

  • 10301 Views
  • 5 replies
  • 1 kudos
Latest Reply
Amora
New Contributor
  • 1 kudos

You could try enabling full stack traces and checking the Spark executor logs for hidden errors. Null Pointer Exceptions in Scala on DBR often come from lazy evaluations or missing schema fields during I/O. Reviewing your Data Frame transformations a...

  • 1 kudos
4 More Replies
Phani1
by Valued Contributor II
  • 4604 Views
  • 4 replies
  • 2 kudos

Convert EBCDIC (Binary) file format to ASCII

Hi Team,How can we convert EBCDIC (Binary) file format to ASCII in databricks? Do we have any libraries in Databricks?

  • 4604 Views
  • 4 replies
  • 2 kudos
Latest Reply
amulight
New Contributor
  • 2 kudos

Hi Phani1 Were you able to do that successfully ? Can you share the details and steps please. Thanks.

  • 2 kudos
3 More Replies
67
by New Contributor
  • 49 Views
  • 1 replies
  • 1 kudos

Simple integration to push data from third-party into a client's Databricks instance

Hi there, we have an industry data platform with multiple customers using it. We provide each customer with their own data every night via .csv. Some of our customers use Databricks, and import their data from us into it.We would like to offer a more...

  • 49 Views
  • 1 replies
  • 1 kudos
Latest Reply
jeffreyaven
Databricks Employee
  • 1 kudos

You could use external volumes with a Cloudflare R2 bucket as an intermediary - you write the nightly data files to R2 (using S3-compatible API), and your customers create external volumes in their Databricks workspace pointing to their designated R2...

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels