Data Engineering

Forum Posts

Sorted by:

by rt-slowth • Contributor

01-21-2024 6:13:47 PM

1016 Views
2 replies
1 kudos

How to writeStream with redshift

I have already checked the documentation below The documentation below does not describe how to write to streaming.Is there a way to write the gold table (type is streaming table), which is the output of the streaming pipeline of Delta Live Tables in...

Data Engineering

1016 Views
2 replies
1 kudos

01-21-2024 6:13:47 PM

View Replies

Latest Reply

jose_gonzalez
Moderator

01-22-2024 3:32:05 PM

1 kudos

Only batch processing is supported.

1 kudos

01-22-2024 3:32:05 PM

1 More Replies

by umarkhan • New Contributor II

11-23-2023 12:02:25 PM

815 Views
1 replies
0 kudos

Module not found when using applyInPandasWithState in Repos

I should start by saying that everything works fine if I copy and paste it all into a notebook and run it. The problem starts if we try to have any structure in our application repository. Also, so far we have only run into this problem with applyInP...

Data Engineering

815 Views
1 replies
0 kudos

11-23-2023 12:02:25 PM

View Replies

Latest Reply

jose_gonzalez
Moderator

01-22-2024 3:30:02 PM

0 kudos

which DBR version are you using? does it works on non DLT jobs?

0 kudos

01-22-2024 3:30:02 PM

by sher • Valued Contributor II

12-19-2023 5:34:03 AM

601 Views
1 replies
0 kudos

did anyone faced this issue in delta table while genrating manifest file

error message : Manifest generation is not supported for tables that leverage column mapping, as external readers cannot read these Delta tableswhy i got this issue. not sure should we need to do any process ?

Data Engineering

601 Views
1 replies
0 kudos

12-19-2023 5:34:03 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

01-22-2024 3:21:49 PM

0 kudos

could you please share the full stack trace and the repro steps?

0 kudos

01-22-2024 3:21:49 PM

by VishalD • New Contributor

01-22-2024 9:13:41 AM

527 Views
1 replies
0 kudos

Not able to load nested XML file with struct type

Hello Experts,I am trying to load XML with struct type and having XSI type attribute. below is sample XML format:<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="htt...

Data Engineering

527 Views
1 replies
0 kudos

01-22-2024 9:13:41 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

01-22-2024 3:18:24 PM

0 kudos

You can try to use from_xml() function, here is the link to the docs https://docs.databricks.com/en/sql/language-manual/functions/from_xml.html

0 kudos

01-22-2024 3:18:24 PM

by dbx_687_3__1b3Q • New Contributor III

10-16-2023 8:19:28 AM

2999 Views
2 replies
2 kudos

Resolved! Databricks Asset Bundle (DAB) from existing workspace?

Can anyone point us to some documentation that explains how to create a DAB from an EXISTING workspace? We've been building pipelines, notebooks, tables, etc in a single workspace and a DAB seems like a great way to deploy it all to our Test and Prod...

Data Engineering

2999 Views
2 replies
2 kudos

10-16-2023 8:19:28 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

10-17-2023 12:21:34 AM

2 kudos

Hi @dbx_687_3__1b3Q , Yes, you can create a Databricks Asset Bundle (DAB) from an existing workspace that contains pipelines, notebooks, tables, and other Databricks assets. To create a DAB from an existing workspace, you can use the Databricks CL...

2 kudos

10-17-2023 12:21:34 AM

1 More Replies

by hprasad • New Contributor III

01-19-2024 4:34:59 AM

2775 Views
7 replies
1 kudos

Spark read GZ file as corrupted data, when file extension having .GZ in upper case

if file is renamed with file_name.sv.gz (lower case extension) is working fine, if file_name.sv.GZ (upper case extension) the data is read as corrupted, means it simply reading compressed file as is.

Data Engineering

gzip files

spark-csv

spark.read.csv

2775 Views
7 replies
1 kudos

01-19-2024 4:34:59 AM

View Replies

Latest Reply

Lakshay
Esteemed Contributor

01-22-2024 7:56:58 AM

1 kudos

Agree but Spark infers the compression from your filename and Spark cannot infer the compression from .GZ format. You can read more about this in below article: https://aws.plainenglish.io/demystifying-apache-spark-quirks-2c91ba2d3978

1 kudos

01-22-2024 7:56:58 AM

6 More Replies

by vishwanath_1 • New Contributor III

01-11-2024 7:06:25 PM

1474 Views
5 replies
1 kudos

i am reading a 130gb csv file with multi line true it is taking 4 hours just to read

reading 130gb file without multi line true it is 6 minutes my file has data in multi liner .How to speed up the reading time here .. i am using below commandInputDF=spark.read.option("delimiter","^").option("header",false).option("encoding","UTF-8"...

Data Engineering

1474 Views
5 replies
1 kudos

01-11-2024 7:06:25 PM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

01-18-2024 4:17:10 AM

1 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

1 kudos

01-18-2024 4:17:10 AM

4 More Replies

by SimDarmapuri • New Contributor II

01-21-2024 8:07:02 AM

698 Views
1 replies
1 kudos

Databricks Deployment using Data Thirst

Hi,I am trying to deploy Databricks Notebooks using Azure Devops to different environments using third party extension Data Thirst (Databricks Script Deployment Task by Data Thirst). The pipeline is able to generate/download artifacts but not able to...

Data Engineering

698 Views
1 replies
1 kudos

01-21-2024 8:07:02 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

01-22-2024 7:49:12 AM

1 kudos

the extension is quite old and does not know about Unity Catalog. So that is probably the reason why it fails.But why do you use the extension for notebook propagation from dev to prd? You can do this using Repos, feature branches and pull requests...

1 kudos

01-22-2024 7:49:12 AM

by Michael_Appiah • New Contributor III

01-22-2024 12:46:22 AM

875 Views
1 replies
1 kudos

Resolved! Display Limits Catalog Explorer

It seems as if the Catalog Explorer can only display a maximum of 1000 folders within a UC Volume. I just ran into this issue when I added new folders to a volume which were not displayed in the Catalog Explorer (only folders 1-1000). I was able to r...

Data Engineering

875 Views
1 replies
1 kudos

01-22-2024 12:46:22 AM

View Replies

Latest Reply

Lakshay
Esteemed Contributor

01-22-2024 7:17:22 AM

1 kudos

Hi @Michael_Appiah , This is a known limitation: https://docs.databricks.com/en/connect/unity-catalog/volumes.html#limitations

1 kudos

01-22-2024 7:17:22 AM

by jonathan-dufaul • Valued Contributor

01-05-2024 2:10:42 PM

2509 Views
4 replies
0 kudos

Resolved! Is there a command in sql cell to ignore formatting for some lines like `# fmt: off` in Python cells

In python cells I can add the comments `# fmt: off` before a block of code that I want black/autoformatter to ignore and `# fmt: on` afterwards. Is there anything similar I can put in sql cells to accomplish the same effect?Some of the recommendation...

Data Engineering

autoformatter

formatter

sql

2509 Views
4 replies
0 kudos

01-05-2024 2:10:42 PM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

01-18-2024 2:52:44 AM

0 kudos

0 kudos

01-18-2024 2:52:44 AM

3 More Replies

by vishwanath_1 • New Contributor III

01-22-2024 1:26:32 AM

836 Views
1 replies
0 kudos

Resolved! Need Suggestion for better caching strategy

i have below steps to perform 1.Read a csv file (considerably huge file .. ~100gb)2.add index using zipwithindex function 3.repartition dataframe 4.Passing on to another function .Can you suggest the best optimized caching strategy to execute these c...

Data Engineering

836 Views
1 replies
0 kudos

01-22-2024 1:26:32 AM

View Replies

Latest Reply

Lakshay
Esteemed Contributor

01-22-2024 4:03:52 AM

0 kudos

Hi @vishwanath_1 , Caching only comes into picture when there are multiple reference to data source in your code. As per the flow mentioned by you, I don't see that being the case for you. You are only reading the data from source once and also there...

0 kudos

01-22-2024 4:03:52 AM

by Pratibha • New Contributor II

01-07-2024 4:56:02 AM

2107 Views
4 replies
1 kudos

Want to set execution termination time/timeout limit for job in job config

Hi , I Want to set execution termination time/timeout limit for job in job config file. please help me how I can do this by pass some parameter in job config file.

Data Engineering

2107 Views
4 replies
1 kudos

01-07-2024 4:56:02 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

01-22-2024 3:01:23 AM

1 kudos

1 kudos

01-22-2024 3:01:23 AM

3 More Replies

by ElaPG • New Contributor III

12-06-2023 12:25:31 AM

2607 Views
2 replies
1 kudos

notebooks naming convention

I have read info about objects names but are there any best practices regarding notebooks naming convention?

Data Engineering

2607 Views
2 replies
1 kudos

12-06-2023 12:25:31 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

01-22-2024 1:49:36 AM

1 kudos

1 kudos

01-22-2024 1:49:36 AM

1 More Replies

by cyong • New Contributor II

12-06-2023 12:58:39 AM

771 Views
2 replies
0 kudos

Disable CDF on DLT tables

Hi, I noticed Change Data Feed (CDF) is enabled by default for the bronze and gold tables running in DLT. How to check the size of the delta log? Can it be turned off?

Data Engineering

771 Views
2 replies
0 kudos

12-06-2023 12:58:39 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

01-22-2024 1:49:20 AM

0 kudos

0 kudos

01-22-2024 1:49:20 AM

1 More Replies

by Ravikumashi • Contributor

01-21-2024 3:14:25 PM

980 Views
2 replies
0 kudos

Extract cluster usage tags from databricks cluster init script

Is it possible we extract cluster usage tags from databricks cluster init script, I am specifically interested in spark.databricks.clusterUsageTags.clusterAllTags.I tried to extract from /databricks/spark/conf/spark.conf and /databricks/spark/conf/sp...

Data Engineering

Azure Databricks

980 Views
2 replies
0 kudos

01-21-2024 3:14:25 PM

View Replies

Latest Reply

Debayan
Esteemed Contributor III

01-21-2024 10:31:40 PM

0 kudos

Hi, For reference: https://community.databricks.com/t5/data-engineering/pull-cluster-tags/td-p/19216 , could you please confirm the key expectation here? Extracting as such?

0 kudos

01-21-2024 10:31:40 PM

1 More Replies

User

Count

1603

744

348

285

247

Databricks Community

Forum Posts

How to writeStream with redshift

Module not found when using applyInPandasWithState in Repos

did anyone faced this issue in delta table while genrating manifest file

Not able to load nested XML file with struct type

Resolved! Databricks Asset Bundle (DAB) from existing workspace?

Spark read GZ file as corrupted data, when file extension having .GZ in upper case

i am reading a 130gb csv file with multi line true it is taking 4 hours just to read

Databricks Deployment using Data Thirst

Resolved! Display Limits Catalog Explorer

Resolved! Is there a command in sql cell to ignore formatting for some lines like `# fmt: off` in Python cells

Resolved! Need Suggestion for better caching strategy

Want to set execution termination time/timeout limit for job in job config

notebooks naming convention

Disable CDF on DLT tables

Extract cluster usage tags from databricks cluster init script

Compute Policy Does Not Install Libraries

Is there a way to let the DLT pipeline retry by it...

Can't create Catalog on Databricks on AWS

Executing Notebooks - Run All Cells vs Run All Bel...

getting Status code: 301 Moved Permanently error