Data Engineering

Forum Posts

Sorted by:

by alonisser • Contributor II

06-09-2025 12:12:25 AM

1638 Views
4 replies
0 kudos

Controlling the name of the downloaded csv file from a notebook

I got a notebook with multiple display() commands in various cells, the users are currently downloading the result csv from each cellI want the downloads to be named after the name of the cell (or any other methods I can make each download have a dif...

Data Engineering

1638 Views
4 replies
0 kudos

06-09-2025 12:12:25 AM

View Replies

Latest Reply

Isi
Honored Contributor III

06-10-2025 6:02:06 AM

0 kudos

Hey @alonisser Once the file is stored in the volume —whether in S3, GCS, or ADLS— you’ll be able to see it with a custom name defined by the customer or project. Additionally, the files may be saved in different folders, making it easier to identify...

0 kudos

06-10-2025 6:02:06 AM

3 More Replies

by korasino • New Contributor II

06-02-2025 3:11:07 AM

1288 Views
2 replies
0 kudos

Photon and Predictive I/O vs. Liquid Clustering

Hi Quick question about optimizing our Delta tables. Photon and Predictive I/O vs. Liquid Clustering (LC).We have UUIDv4 columns (random, high-cardinality) used in both WHERE uuid = … filters and joins. From what I understand Photon (on Serverless wa...

Data Engineering

1288 Views
2 replies
0 kudos

06-02-2025 3:11:07 AM

View Replies

Latest Reply

korasino
New Contributor II

06-10-2025 2:21:25 AM

0 kudos

Hey, thanks for the reply. Could you share some documentation links around those bullet points in your answer? thanks!

0 kudos

06-10-2025 2:21:25 AM

1 More Replies

by seefoods • Valued Contributor

06-04-2025 6:20:28 AM

1693 Views
3 replies
1 kudos

Resolved! build autoloader pyspark job

Hello Guys, I have build an ETL in pyspark which use autolaoder. So i want to know what is best way to use autoader databricks? What is the best way to vaccum checkpoint files on /Volumes ? Hope to have your ideas about that. Cordially ,

Data Engineering

1693 Views
3 replies
1 kudos

06-04-2025 6:20:28 AM

View Replies

Latest Reply

seefoods
Valued Contributor

06-10-2025 1:19:36 AM

1 kudos

Hello @intuz , Thanks for your reply. Cordially

1 kudos

06-10-2025 1:19:36 AM

2 More Replies

by yathish • New Contributor II

02-14-2025 9:56:02 AM

4141 Views
6 replies
0 kudos

upstream request timeout in databricks apps when using databricks sql connector

Hi, i am building an application in Databricks apps, Sometimes when i try to fetch data using Databricks SQL connector in an API, it takes time to hit the SQL warehouse and if the time exceeds more than 60 seconds it gives upstream timeout error. I h...

Data Engineering

4141 Views
6 replies
0 kudos

02-14-2025 9:56:02 AM

View Replies

Latest Reply

epistoteles
Databricks Partner

06-10-2025 12:44:30 AM

0 kudos

@Alberto_Umana Any news on this? I am having similar issues and am also using a (running) serverless SQL warehouse.

0 kudos

06-10-2025 12:44:30 AM

5 More Replies

by EndreM • New Contributor III

06-04-2025 6:42:45 AM

2834 Views
8 replies
1 kudos

Replay a stream after converting to liquid cluster failes

I have problem replaying a stream.I need to replay it because conversion from liquid clusterto partition doesnt work. I see a lot of garbage collectionand memory maxes out immediatly. Then the driver restarts.TO debug the problem I try to force only ...

Data Engineering

2834 Views
8 replies
1 kudos

06-04-2025 6:42:45 AM

View Replies

Latest Reply

EndreM
New Contributor III

06-10-2025 12:17:33 AM

1 kudos

After increasing the compute to one with 500 GB memory, the job was able to transfer ca 300 GB of data, but it produced a large amount of files, 26000. While the old table with partition and no liquid cluster had 4000 files with a total of 1.2 TB of ...

1 kudos

06-10-2025 12:17:33 AM

7 More Replies

by anilsampson • New Contributor III

06-09-2025 8:30:54 PM

820 Views
1 replies
0 kudos

Resolved! databricks dashboard via workflow task.

Hello, i am trying to trigger a databricks dashboard via workflow task.1.when i deploy the job triggering the dashboard task via local "Deploy bundle" command deployment is successful.2. when i try to deploy to a different environment via CICD while ...

Data Engineering

820 Views
1 replies
0 kudos

06-09-2025 8:30:54 PM

View Replies

Latest Reply

anilsampson
New Contributor III

06-09-2025 10:03:10 PM

0 kudos

i think i figured out the issue , it had to do with the version of cli, updated the CICD to use latest version of clicurl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh

0 kudos

06-09-2025 10:03:10 PM

by arendon • New Contributor II

06-06-2025 10:47:34 AM

1410 Views
2 replies
1 kudos

Resolved! Asset Bundles: How to mute job failure notifications until final retry?

I'm trying to configure a job to only send failure notifications on the final retry failure (not on intermediate retry failures). This feature is available in the Databricks UI as "Mute notifications until the last retry", but I can't get this to wor...

Data Engineering

1410 Views
2 replies
1 kudos

06-06-2025 10:47:34 AM

View Replies

Latest Reply

arendon
New Contributor II

06-09-2025 12:05:42 PM

1 kudos

Thank you for the response, @lingareddy_Alva!I'll take a look at the workarounds you shared.

1 kudos

06-09-2025 12:05:42 PM

1 More Replies

by dataminion01 • New Contributor II

06-09-2025 10:32:05 AM

764 Views
1 replies
0 kudos

create streaming table using variable file path

is it possible to use a variable for the file path based on dates?files are stores in folders in this format yyyy/mm CREATE OR REFRESH STREAMING TABLE test AS SELECT * FROM STREAM read_files( "/Volumes/.....", format => "parquet" );

Data Engineering

764 Views
1 replies
0 kudos

06-09-2025 10:32:05 AM

View Replies

Latest Reply

Rishabh-Pandey
Databricks MVP

06-09-2025 11:19:05 AM

0 kudos

@dataminion01 Yes, it is possible to use a variable or dynamic file path based on dates in some data processing frameworks , but not directly in static SQL DDL statements like CREATE OR REFRESH STREAMING TABLE unless the environment you're working e...

0 kudos

06-09-2025 11:19:05 AM

by Akshay_Petkar • Valued Contributor

06-09-2025 3:08:07 AM

1738 Views
1 replies
1 kudos

How to Convert MySQL SELECT INTO OUTFILE and LOAD DATA INFILE to Databricks SQL?

Hi Community,I have some existing MySQL code :SELECT * FROM [table_name]INTO OUTFILE 'file_path'FIELDS TERMINATED BY '\t'OPTIONALLY ENCLOSED BY '"'LINES TERMINATED BY '\n';LOAD DATA INFILE 'file_path' REPLACE INTO TABLE [database].[table_name]FIELDS ...

Data Engineering

1738 Views
1 replies
1 kudos

06-09-2025 3:08:07 AM

View Replies

Latest Reply

krishnakhadka28
New Contributor II

06-09-2025 8:55:56 AM

1 kudos

Databricks SQL does not directly support MySQL’s SELECT INTO OUTFILE or LOAD DATA INFILE syntax. However, equivalent functionality can be achieved using Databricks features like saving to and reading from external locations like dbfs, s3 etc. I have ...

1 kudos

06-09-2025 8:55:56 AM

by erigaud • Honored Contributor

12-13-2023 11:50:24 PM

10738 Views
3 replies
2 kudos

Dynamically specify pivot column in SQL

Hello everyone !I am looking for a way to dynamically specify pivot columns in a SQL query, so it can be used in a view. However we don't want to hard code the values that need to become columns, and would rather extract it from another table.I've se...

Data Engineering

10738 Views
3 replies
2 kudos

12-13-2023 11:50:24 PM

View Replies

Latest Reply

lprevost
Contributor III

06-09-2025 5:44:52 AM

2 kudos

I agree - not clear how to do this? I'm thinking of using the pandas api.

2 kudos

06-09-2025 5:44:52 AM

2 More Replies

by Gecofer • Contributor II

06-09-2025 3:12:53 AM

1367 Views
2 replies
1 kudos

Resolved! Inconsistent query results between dbt ETL run and SQL editor in Databricks

Hi everyone,I’m running into a strange issue in one of my ETL pipelines using dbt on Databricks, and I’d appreciate any insights or ideas. I have a query that is part of my dbt model. When I run the ETL process, the results from this query are incorr...

Data Engineering

1367 Views
2 replies
1 kudos

06-09-2025 3:12:53 AM

View Replies

Latest Reply

Gecofer
Contributor II

06-09-2025 5:31:01 AM

1 kudos

Hi @Isi Thanks so much for your insight!It turned out to be a combination of the two things you mentioned:There was a data masking policy applied to one of the columns, and while I had permissions to view the unmasked data, the service principal runn...

1 kudos

06-09-2025 5:31:01 AM

1 More Replies

by Debashisrajib • New Contributor II

06-07-2025 1:45:59 PM

905 Views
1 replies
0 kudos

Resolved! 65 technical questions.

I recently gave Databricks Data Engineer professional exam and I got really lengthy 65 technical questions. These questions are different from statistical questions.65 lengthy technical questions for 120 minutes is too much and this number is not men...

Data Engineering

905 Views
1 replies
0 kudos

06-07-2025 1:45:59 PM

View Replies

Latest Reply

Advika
Community Manager

06-09-2025 3:53:20 AM

0 kudos

Hello @Debashisrajib! I’m sorry to hear that.As outlined in the exam details on Webassessor, Databricks certification exams include 60 scored questions, along with additional unscored questions that appear like regular ones but do not affect your fin...

0 kudos

06-09-2025 3:53:20 AM

by Abhimanyu • Databricks Partner

05-20-2025 2:39:44 AM

1163 Views
2 replies
0 kudos

Why does df.dropna(how="all") fail when there is a . in a column name?

I'm working in a Databricks notebook and using Spark to query a Delta table. Here's the code I ran: df = spark.sql("select * from catalog.schema.table") df = df.dropna(how="all") display(df)This works fine unless the DataFrame has a column name that ...

Data Engineering

1163 Views
2 replies
0 kudos

05-20-2025 2:39:44 AM

View Replies

Latest Reply

MujtabaNoori
New Contributor III

05-20-2025 5:55:54 AM

0 kudos

Hi @Abhimanyu ,Yeah Actually in spark, '.'(dot) in columns is used for StructType by which the nested object can be accessed.But definitely You can rename thE columns dynamically whichever has '.' in it.Attached few screenshots for your reference tha...

0 kudos

05-20-2025 5:55:54 AM

1 More Replies

by chsoni12 • New Contributor II

06-08-2025 11:50:55 AM

1269 Views
1 replies
0 kudos

Resolved! Limitation in Managed Volumes Recovery — UNDROP Should Be Supported

Hello Databricks Community,While reviewing the Databricks official documentation and performing a POC on managed volumes, I observed that volumes cannot be recovered using the UNDROP command if accidentally deleted — unlike managed tables.Technically...

Data Engineering

@databricks

1269 Views
1 replies
0 kudos

06-08-2025 11:50:55 AM

View Replies

Latest Reply

Vidhi_Khaitan
Databricks Employee

06-09-2025 2:11:25 AM

0 kudos

Thank you for highlighting this issue!Databricks is already working on implementing this in the future.

0 kudos

06-09-2025 2:11:25 AM

by farazahmad372 • New Contributor II

06-08-2025 12:35:55 AM

2705 Views
3 replies
0 kudos

TypeError: 'JavaPackage' object is not callable

from pyspark.sql import *if __name__ == "__main__": spark = SparkSession.builder \ .appName("hello Spark") \ .master("local[2]") \ .getOrCreate() data_list = [("Ravi",28), ("David",45), ("Abd...

Data Engineering

2705 Views
3 replies
0 kudos

06-08-2025 12:35:55 AM

View Replies

Latest Reply

nikhilj0421
Databricks Employee

06-09-2025 12:57:15 AM

0 kudos

@farazahmad372 May I know the DBR version and type of cluster?Are you using serverless?

0 kudos

06-09-2025 12:57:15 AM

2 More Replies

Databricks Community

Forum Posts

Controlling the name of the downloaded csv file from a notebook

Photon and Predictive I/O vs. Liquid Clustering

Resolved! build autoloader pyspark job

upstream request timeout in databricks apps when using databricks sql connector

Replay a stream after converting to liquid cluster failes

Resolved! databricks dashboard via workflow task.

Resolved! Asset Bundles: How to mute job failure notifications until final retry?

create streaming table using variable file path

How to Convert MySQL SELECT INTO OUTFILE and LOAD DATA INFILE to Databricks SQL?

Dynamically specify pivot column in SQL

Resolved! Inconsistent query results between dbt ETL run and SQL editor in Databricks

Resolved! 65 technical questions.

Why does df.dropna(how="all") fail when there is a . in a column name?

Resolved! Limitation in Managed Volumes Recovery — UNDROP Should Be Supported

TypeError: 'JavaPackage' object is not callable

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template