Data Engineering

Forum Posts

Sorted by:

by MCosta • New Contributor III

08-20-2021 10:23:46 AM

7066 Views
11 replies
20 kudos

Resolved! Debugging!

Hi ML folks, We are using Databricks to train deep learning models. The code, however, has a complex structure of classes. This would work fine in a perfect bug-free world like Alice in Wonderland. Debugging in Databricks is awkward. We ended up do...

Data Engineering

7066 Views
11 replies
20 kudos

08-20-2021 10:23:46 AM

View Replies

Latest Reply

petern
New Contributor II

03-04-2024 1:06:47 PM

20 kudos

Has this been solved yet; a mature way to debug code on databricks. I'm running in the same kind of issue.Variable explorer can be used and pdb, but not the same really..

20 kudos

03-04-2024 1:06:47 PM

10 More Replies

by DatBoi • Contributor

02-29-2024 1:13:21 PM

913 Views
2 replies
1 kudos

Resolved! How big should a delta table be to benefit from liquid clustering?

My questions is pretty straightforward - how big should a delta table be to benefit from liquid clustering? I know the answer will most likely depend on the details of how you are querying the data, but what is the recommendation?I know Databricks re...

Data Engineering

913 Views
2 replies
1 kudos

02-29-2024 1:13:21 PM

View Replies

Latest Reply

daniel_sahal
Esteemed Contributor

03-04-2024 12:51:48 AM

1 kudos

@DatBoi Once you watch this video you'll understand more about Liquid Clustering https://www.youtube.com/watch?v=5t6wX28JC_M&ab_channel=DeltaLakeLong story short:I know Databricks recommends not partitioning on tables less than 1 TB and aim for 1 GB ...

1 kudos

03-04-2024 12:51:48 AM

1 More Replies

by Jagan_etl • New Contributor II

02-27-2024 7:32:24 AM

581 Views
3 replies
0 kudos

Avro file format generation

Hi All,We are using cluster with 9.1 run time version, I'm getting "incompatible schema exception" error while writing the data into avro file. Fields in Avro schema are more compared to dataframe output Fields. I tried the same in community edition ...

Data Engineering

581 Views
3 replies
0 kudos

02-27-2024 7:32:24 AM

View Replies

Latest Reply

Jagan_etl
New Contributor II

03-04-2024 5:28:21 AM

0 kudos

Hi All,Any suggestions on this.

0 kudos

03-04-2024 5:28:21 AM

2 More Replies

by BhaveshPatel • New Contributor

02-28-2024 6:08:01 PM

1239 Views
1 replies
0 kudos

Auto loader

Suppose I have 1000's of historical .csv files stored from Jan, 2022 in a folder of my azure blob storage container. I want to use auto loader to read files beginning only on 1st, Oct, 2023 and ignoring all the files before this date to build a pipel...

Data Engineering

1239 Views
1 replies
0 kudos

02-28-2024 6:08:01 PM

View Replies

Latest Reply

daniel_sahal
Esteemed Contributor

03-04-2024 12:59:11 AM

0 kudos

@BhaveshPatel Three things that you can do:- Move the files to the separate folder,- Use a filter on metadata fields to filter out the unnecessary files,- Use a pathGlobFilter to filter only on the files you need

0 kudos

03-04-2024 12:59:11 AM

by Bharathi7 • New Contributor II

02-23-2024 12:25:19 AM

477 Views
3 replies
0 kudos

Python UDF fails with UNAVAILABLE: Channel shutdownNow invoked

I'm using a Python UDF to apply OCR to each row of a dataframe which contains the URL to a PDF document. This is how I define my UDF: def extract_text(url: str): ocr = MyOcr(url) extracted_text = ocr.get_text() return json.dumps(extracte...

Data Engineering

477 Views
3 replies
0 kudos

02-23-2024 12:25:19 AM

View Replies

Latest Reply

daniel_sahal
Esteemed Contributor

02-23-2024 2:33:59 AM

0 kudos

@Bharathi7 It's really hard to determine what's going on without knowing what acutally MyOcr function does.Maybe there's some kind of timeout on the service side? To many parallell connections?

0 kudos

02-23-2024 2:33:59 AM

2 More Replies

by nggianno • New Contributor III

03-01-2024 6:21:26 AM

1174 Views
4 replies
0 kudos

How to enable Delta live tables serverless in Databricks?

I am trying to enable the Serverless mode in the Delta Live Tables, based on what the official Databricks channel YouTube video "Delta Live Tables A to Z: Best practices for Modern Data Pipelines".And I cannot find it in my UI. Could you help me with...

Data Engineering

1174 Views
4 replies
0 kudos

03-01-2024 6:21:26 AM

View Replies

Latest Reply

nggianno
New Contributor III

03-04-2024 12:33:59 AM

0 kudos

The problem is that the "Serverless" checkbox does not appear in my UI Pipeline Settings. So, I do not know how to enable serverless given your instructions. Can you tell me why the button is not displayed or how to display it or how to enable DLT se...

0 kudos

03-04-2024 12:33:59 AM

3 More Replies

by Poovarasan • New Contributor II

03-01-2024 3:30:23 AM

671 Views
2 replies
0 kudos

com.databricks.sql.transaction.tahoe.ColumnMappingException: Found duplicated column id `2` in colum

Hi,Currently, I am using the below-mentioned query to create a materialized view. It was working fine until yesterday in the DLT pipeline, but from today on, the below-provided code throws an error (com.databricks.sql.transaction.tahoe.ColumnMappingE...

Data Engineering

ColumnMapping

dlt

671 Views
2 replies
0 kudos

03-01-2024 3:30:23 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-03-2024 11:21:35 PM

0 kudos

Hi @Poovarasan, The error message you’re encountering, com.databricks.sql.transaction.tahoe.ColumnMappingException: Found duplicated column id 2 in column, indicates that there is a conflict related to column IDs in your query. Let’s break down the ...

0 kudos

03-03-2024 11:21:35 PM

1 More Replies

by elgeo • Valued Contributor II

02-15-2023 5:56:42 AM

5783 Views
6 replies
4 kudos

Resolved! Data type length enforcement

Hello. Is there a way to enforce the length of a column in SQL? For example that a column has to be exactly 18 characters? Thank you!

Data Engineering

5783 Views
6 replies
4 kudos

02-15-2023 5:56:42 AM

View Replies

Latest Reply

databricks31
New Contributor II

03-03-2024 11:26:29 PM

4 kudos

we are facing similar issues while write into adls location delta format, after that we created on top delta location unity catalog tables. below format of data type length should be possible to change spark sql supported ?Azure SQL Spark ...

4 kudos

03-03-2024 11:26:29 PM

5 More Replies

by NT911 • New Contributor II

02-29-2024 6:57:45 PM

476 Views
1 replies
0 kudos

how to reduce file size in sedona o/p

I have shape files with polygon/geometry info. I am exporting the file after Sedona integration with Kepler.I o/p file is in .html. I want to reduce the file size.Pls suggest in case any option is available.

Data Engineering

476 Views
1 replies
0 kudos

02-29-2024 6:57:45 PM

View Replies

Latest Reply

Kaniz
Community Manager

03-03-2024 11:16:34 PM

0 kudos

Hi @NT911, When dealing with shape files and trying to reduce the file size, there are a few strategies you can consider: Simplify Geometries: One effective method is to simplify the geometries in your shape file. This involves reducing the numb...

0 kudos

03-03-2024 11:16:34 PM

by Ajay-Pandey • Esteemed Contributor III

02-23-2023 3:30:55 AM

726 Views
3 replies
7 kudos

docs.databricks.com

Rename and drop columns with Delta Lake column mapping. Hi all,Now databricks started supporting column rename and drop.Column mapping requires the following Delta protocols:Reader version 2 or above.Writer version 5 or above.Blog URL##Available in D...

Data Engineering

726 Views
3 replies
7 kudos

02-23-2023 3:30:55 AM

View Replies

Latest Reply

Poovarasan
New Contributor II

03-03-2024 9:51:03 PM

7 kudos

Above mentioned feature is not working in the DLT pipeline. if the scrip has more than 4 columns

7 kudos

03-03-2024 9:51:03 PM

2 More Replies

by afisl • New Contributor II

11-27-2023 5:47:39 AM

2790 Views
6 replies
2 kudos

Apply unitycatalog tags programmatically

Hello,I'm interested in the "Tags" feature of columns/schemas/tables of the UnityCatalog (described here: https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/tags)I've been able to play with them by hand and would now lik...

Data Engineering

Resolved! CLONE not supported on delta table with Liquid Clustering

Hello all,We are building a data warehouse on Unity Catalog and we use the SHALLOW CLONE command to allow folks to spin up their own dev environments by light copying the prod tables. We also started using Liquid Clustering on our feature tables, tho...

Data Engineering

1295 Views
4 replies
0 kudos

12-05-2023 2:40:27 PM

View Replies

Latest Reply

seydouHR
New Contributor III

12-07-2023 3:15:35 PM

0 kudos

Thanks Kaniz for your reply. I was able to get it make it work using runtime 14.0.Regards,

0 kudos

12-07-2023 3:15:35 PM

3 More Replies

by sumitdesai • New Contributor II

03-01-2024 6:06:00 AM

1042 Views
1 replies
2 kudos

How to reuse a cluster with Databricks Asset bundles

I am using Databricks asset bundles as an IAC tool with databricks. I want to create a cluster using DAB and then reuse the same cluster in multiple jobs. I can not find an example for this. Whatever examples I found out have all specified individual...

Data Engineering

1042 Views
1 replies
2 kudos

03-01-2024 6:06:00 AM

View Replies

Latest Reply

Wojciech_BUK
Contributor III

03-03-2024 2:39:58 AM

2 kudos

Hello,Jobs are specific in Databricks; a job definition also contains the cluster definition because when you run a job, a new cluster is created based on the cluster specification you provided for the job, and it exists only until the job is complet...

2 kudos

03-03-2024 2:39:58 AM

by espenol • New Contributor III

12-18-2022 11:54:36 PM

8592 Views
9 replies
10 kudos

input_file_name() not supported in Unity Catalog

Hey, so our notebooks reading a bunch of json files from storage typically use a input_file_name() when moving from raw to bronze, but after upgrading to Unity Catalog we get an error message:AnalysisException: [UC_COMMAND_NOT_SUPPORTED] input_file_n...

Data Engineering

8592 Views
9 replies
10 kudos

12-18-2022 11:54:36 PM

View Replies

Latest Reply

JasonThomas
New Contributor III

12-22-2023 12:03:01 PM

10 kudos

.withColumn("RECORD_FILE_NAME", col("_metadata.file_name"))Will work for spark.read to get the file name, or: .withColumn("RECORD_FILE_NAME", col("_metadata.file_path"))To get the whole file path

10 kudos

12-22-2023 12:03:01 PM

8 More Replies

by n-riesco • New Contributor

07-03-2015 9:50:48 AM

15337 Views
5 replies
0 kudos

How can I view an exported DBC notebook in my computer?

Is it possible to convert to or export as a .ipynb notebook?

Data Engineering

15337 Views
5 replies
0 kudos

07-03-2015 9:50:48 AM

View Replies

Latest Reply

AlexV
New Contributor II

03-01-2024 9:07:06 AM

0 kudos

You can rename somefile.dbc to somefile.zip and open it with the Windows File Explorer, however the .python files cannot be opened in vscode or pycharm

0 kudos

03-01-2024 9:07:06 AM

4 More Replies

User

Count

1603

736

344

284

247

Databricks

Forum Posts

Resolved! Debugging!

Resolved! How big should a delta table be to benefit from liquid clustering?

Avro file format generation

Auto loader

Python UDF fails with UNAVAILABLE: Channel shutdownNow invoked

How to enable Delta live tables serverless in Databricks?

com.databricks.sql.transaction.tahoe.ColumnMappingException: Found duplicated column id `2` in colum

Resolved! Data type length enforcement

how to reduce file size in sedona o/p

docs.databricks.com

Apply unitycatalog tags programmatically

Resolved! CLONE not supported on delta table with Liquid Clustering

How to reuse a cluster with Databricks Asset bundles

input_file_name() not supported in Unity Catalog

How can I view an exported DBC notebook in my computer?

Optimising Clusters in Databricks on GCP

DLT apply_changes applies only deletes and inserts...

Azure Data Factory and Photon

Scheduled job output export

Upload file from local file system to Unity Catalo...