Data Engineering

Forum Posts

Sorted by:

by Rasputin312 • Databricks Partner

03-02-2025 3:42:15 PM

4083 Views
1 replies
1 kudos

Resolved! How To Save a File as a Pickle Object to the Databricks File System

I tried running this code:```def save_file(name, obj with open(name, 'wb') as pickle.dump(obj, f)``` One file was saved in the local file system, but the second was too large and so I need to save in the dbfs file system. Unfortunately, I d...

Data Engineering

4083 Views
1 replies
1 kudos

03-02-2025 3:42:15 PM

View Replies

Latest Reply

JissMathew
Valued Contributor

03-03-2025 2:58:44 AM

1 kudos

To save a Python object to the Databricks File System (DBFS), you can use the dbutils.fs module to write files to DBFS. Since you are dealing with a Python object and not a DataFrame, you can use the pickle module to serialize the object and then wri...

1 kudos

03-03-2025 2:58:44 AM

by JonathanFlint • New Contributor III

10-24-2024 4:16:36 AM

10391 Views
9 replies
2 kudos

Asset bundle doesn't sync files to workspace

I've created a completely fresh project with a completely empty workspaceLocally I have the databricks CLI version 0.230.0 installedI rundatabricks bundle init default-pythonI have auth set up with a PAT generated by an account which has workspace ad...

Data Engineering

10391 Views
9 replies
2 kudos

10-24-2024 4:16:36 AM

View Replies

Latest Reply

pherrera
New Contributor II

03-03-2025 1:47:21 AM

2 kudos

Ok, I feel silly. Despite reading the other messages in this thread, I didn't twig to the fact that I had in fact added the subfolder I had created the DAB in to my top-level project .gitignore since I was just playing around and didn't want to comm...

2 kudos

03-03-2025 1:47:21 AM

8 More Replies

by aa_204 • Databricks Partner

07-28-2023 5:38:24 AM

4618 Views
4 replies
0 kudos

Reading excel file using pandas on spark api not rendering #N/A values correctly

I am trying to read a .xlsx file using ps.read_excel() and having #N/A as a value for string type columns. But in the dataframe, i am getting "null" inplace of #N/A . Is there any option , using which we can read #N/A as a string in .xlsx file

Data Engineering

4618 Views
4 replies
0 kudos

07-28-2023 5:38:24 AM

View Replies

Latest Reply

Soumik
Databricks Partner

03-03-2025 12:51:39 AM

0 kudos

Did you get a solution or workaround for this error, as I am also facing the same even after using dtype = str, na_filter= False, keep_default_na = False ?

0 kudos

03-03-2025 12:51:39 AM

3 More Replies

by ah0896 • New Contributor III

06-13-2023 11:29:25 AM

22991 Views
18 replies
10 kudos

Using init scripts on UC enabled shared access mode clusters

I know that UC enabled shared access mode clusters do not allow init script usage and I have tried multiple workarounds to use the required init script in the cluster(pyodbc-install.sh, in my case) including installing the pyodbc package as a workspa...

Data Engineering

22991 Views
18 replies
10 kudos

06-13-2023 11:29:25 AM

View Replies

Latest Reply

praveenVP
New Contributor III

03-02-2025 8:41:01 PM

10 kudos

Hello all,Below workaround was efficient to me1) pyodbc-install.sh is uploaded in a Volume 2) the shared cluster is able to navigate to the Volume to select the init script3) the Databricks runtime is 15.4 LTS4) the Allowlist has been updated to allo...

10 kudos

03-02-2025 8:41:01 PM

17 More Replies

by Benni • Databricks Partner

11-05-2024 1:42:37 PM

7420 Views
8 replies
0 kudos

UC, pyodbc, Shared Cluster, and :Can't open lib 'ODBC Driver 17 for SQL Server' : file not found

Hey Databricks! Trying to use the pyodbc init script in a Volume in UC on a shared compute cluster but receive error: "[01000] [unixODBC][Driver Manager]Can't open lib 'ODBC Driver 17 for SQL Server' : file not found (0) (SQLDriverConnect)"). I fo...

Data Engineering

7420 Views
8 replies
0 kudos

11-05-2024 1:42:37 PM

View Replies

Latest Reply

praveenVP
New Contributor III

03-02-2025 8:38:45 PM

0 kudos

0 kudos

03-02-2025 8:38:45 PM

7 More Replies

by poorni_sm • New Contributor

08-20-2024 3:57:49 AM

4442 Views
2 replies
0 kudos

Get failed records from Salesforce write target tool in AWS GLUE job

I am working in the AWS GLUE service, where we are trying to migrate data from S3 to salesforce using Salesforce write target tool(Using Salesforce connection). The actual process has to be, once the process is done, the salesforce provides the jobId...

Data Engineering

4442 Views
2 replies
0 kudos

08-20-2024 3:57:49 AM

View Replies

Latest Reply

jhonm_839
New Contributor III

03-01-2025 9:10:14 PM

0 kudos

Thank you so much emillion. This helps me a lot. Keep it up!

0 kudos

03-01-2025 9:10:14 PM

1 More Replies

by tobyevans • New Contributor II

02-07-2024 7:33:12 AM

10524 Views
1 replies
1 kudos

Ingesting complex/unstructured data

Hi there,my company is reasonably new to using Databricks, and we're running our first PoCs. Some of the data we have structured/reasonably structured, so it drops into a bucket, we point a notebook at it, and all is well and DeltaThe problem is ari...

Data Engineering

10524 Views
1 replies
1 kudos

02-07-2024 7:33:12 AM

View Replies

Latest Reply

mark5
New Contributor II

03-01-2025 10:39:55 AM

1 kudos

Hi Toby,Managing diverse, unstructured data can be challenging. At Know2Ledge (ShareArchiver), we specialize in unstructured data management to streamline this process.To handle your scenario efficiently:1️⃣Pre-Process Before Ingestion – Use AI-power...

1 kudos

03-01-2025 10:39:55 AM

by sachin_kanchan • New Contributor III

02-18-2025 10:38:06 PM

2884 Views
2 replies
0 kudos

Community Edition? More Like Community Abandonment - Thanks for NOTHING, Databricks!

To the Databricks Team (or whoever is pretending to care),Let me get this straight. You offer a "Community Edition" to supposedly help people learn, right? Well, congratulations, you've created the most frustrating, useless signup process I've ever s...

Data Engineering

2884 Views
2 replies
0 kudos

02-18-2025 10:38:06 PM

View Replies

Latest Reply

Advika_
Databricks Employee

02-19-2025 2:13:37 AM

0 kudos

Hello @sachin_kanchan! I understand the frustration, and I appreciate you sharing your experience. The Community Edition is meant to provide a smooth experience, and this shouldn’t be happening. We usually ask users to drop an email to help@databrick...

0 kudos

02-19-2025 2:13:37 AM

1 More Replies

by mstfkmlbsbdk • New Contributor II

02-27-2025 3:16:54 PM

5323 Views
1 replies
1 kudos

Resolved! Access ADLS with serverless. CONFIG_NOT_AVAILABLE error

I have my own Autoloader repo and this repo is responsible for ingestion data from landing layer(ADLS) and load data into raw layer in Databricks. In that repo, I created a couple of workflows, and run these workflows with serverless cluster. and I u...

Data Engineering

ADLS

autoloader

dbt

NCC

serverless cluster

5323 Views
1 replies
1 kudos

02-27-2025 3:16:54 PM

View Replies

Latest Reply

cgrant
Databricks Employee

02-28-2025 8:39:10 AM

1 kudos

The recommended approach for accessing cloud storage is to create Databricks storage credentials. These storage credentials can refer to entra service principals, managed identities, etc. After a credential is created, create an external location. Wh...

1 kudos

02-28-2025 8:39:10 AM

by Sjoshi • New Contributor

02-26-2025 5:22:58 PM

2086 Views
2 replies
1 kudos

How to make the write operation faster for writing a spark dataframe to a delta table

So, I am doing 4 spatial join operation on the files with the following sizes:Base_road_file which is 1gigabyteTelematics file which is 1.2 gigsstate boundary file , BH road file, client_geofence file and kpmg_geofence_file which are not too large My...

Data Engineering

2086 Views
2 replies
1 kudos

02-26-2025 5:22:58 PM

View Replies

Latest Reply

cgrant
Databricks Employee

02-28-2025 8:34:58 AM

1 kudos

We recommend using spatial frameworks to speed up things like spatial joins, point-in-polygon, etc, like databricks mosaic or apache sedona. Without these frameworks, many of these operations result in unoptimized and explosive crossjoins.

1 kudos

02-28-2025 8:34:58 AM

1 More Replies

by lprevost • Contributor III

02-26-2025 9:56:09 AM

1824 Views
2 replies
1 kudos

Resolved! Autoloader streaming table - how to determine if new rows were updated from query?

If I'm running a scheduled batch Autoloader query which read from csv files on S3 and incrementally loads a delta table, how can I determine if new rows were added? I'm currently trying to do this from the streaming query.lastProgress as follows. s...

Data Engineering

1824 Views
2 replies
1 kudos

02-26-2025 9:56:09 AM

View Replies

Latest Reply

lprevost
Contributor III

02-28-2025 7:02:01 AM

1 kudos

Thank you!

1 kudos

02-28-2025 7:02:01 AM

1 More Replies

by aladda • Databricks Employee

06-23-2021 8:37:09 PM

5685 Views
2 replies
0 kudos

What is the difference between View and Table in Delta Live Table pipeline

Data Engineering

5685 Views
2 replies
0 kudos

06-23-2021 8:37:09 PM

View Replies

Latest Reply

aladda
Databricks Employee

06-23-2021 8:38:44 PM

0 kudos

Here's the difference a View and Table in the context of a Delta Live Table PIpelineViews are similar to a temporary view in SQL and are an alias for some computation. A view allows you to break a complicated query into smaller or easier-to-understan...

0 kudos

06-23-2021 8:38:44 PM

1 More Replies

by BillBishop • New Contributor III

02-27-2025 1:21:21 PM

890 Views
1 replies
0 kudos

Resolved! Using initcap function in materialized view fails

This query works: select order_date, initcap(customer_name), count(*) AS number_of_ordersfrom ... The initcap does as advertised and capitalizes the customer_name column. However, if I wrap the same exact select in a create materialized view I get an...

Data Engineering

890 Views
1 replies
0 kudos

02-27-2025 1:21:21 PM

View Replies

Latest Reply

BillBishop
New Contributor III

02-27-2025 1:29:54 PM

0 kudos

NOTE: I got it to work by aliasing the customer_name column, it's documented here: https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-create-materialized-view#limitationsHowever, it wasn't clear that "Non-column reference expre...

0 kudos

02-27-2025 1:29:54 PM

by devpdi • New Contributor

09-05-2024 5:42:37 PM

3575 Views
3 replies
0 kudos

Re-use jobs as tasks with the same cluster.

Hello,I am facing an issue with my workflow.I have a job (name it main job) that, among others, runs 5 concurrent tasks, which are defined as jobs (not notebooks).Each of these jobs is identical to the others (name them sub-job-1), with the only diff...

Data Engineering

3575 Views
3 replies
0 kudos

09-05-2024 5:42:37 PM

View Replies

Latest Reply

razi9126
New Contributor II

02-27-2025 12:14:10 PM

0 kudos

Did you find any solution?

0 kudos

02-27-2025 12:14:10 PM

2 More Replies

by diguid • New Contributor III

11-22-2022 2:22:46 PM

5793 Views
3 replies
13 kudos

Using foreachBatch within Delta Live Tables framework

Hey there!I was wondering if there's any way of declaring a delta live table where we use foreachBatch to process the output of a streaming query.Here's a simplification of my code:def join_data(df_1, df_2): df_joined = ( df_1 ...

Data Engineering

5793 Views
3 replies
13 kudos

11-22-2022 2:22:46 PM

View Replies

Latest Reply

cgrant
Databricks Employee

02-27-2025 11:50:42 AM

13 kudos

foreachBatch support in DLT is coming soon, and you now have the ability to write to non-DLT sinks as well

13 kudos

02-27-2025 11:50:42 AM

2 More Replies

Databricks Community

Forum Posts

Resolved! How To Save a File as a Pickle Object to the Databricks File System

Asset bundle doesn't sync files to workspace

Reading excel file using pandas on spark api not rendering #N/A values correctly

Using init scripts on UC enabled shared access mode clusters

UC, pyodbc, Shared Cluster, and :Can't open lib 'ODBC Driver 17 for SQL Server' : file not found

Get failed records from Salesforce write target tool in AWS GLUE job

Ingesting complex/unstructured data

Community Edition? More Like Community Abandonment - Thanks for NOTHING, Databricks!

Resolved! Access ADLS with serverless. CONFIG_NOT_AVAILABLE error

How to make the write operation faster for writing a spark dataframe to a delta table

Resolved! Autoloader streaming table - how to determine if new rows were updated from query?

What is the difference between View and Table in Delta Live Table pipeline

Resolved! Using initcap function in materialized view fails

Re-use jobs as tasks with the same cluster.

Using foreachBatch within Delta Live Tables framework

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template