Data Engineering

Forum Posts

Sorted by:

by oussamak • New Contributor II

05-27-2022 6:54:12 AM

4620 Views
1 replies
2 kudos

How to install JAR libraries from ADLS? I'm having an error

I mounted the ADLS to my Azure Databricks resource and I keep on getting this error when I try to install a JAR from a container:Library installation attempted on the driver node of cluster 0331-121709-buk0nvsq and failed. Please refer to the followi...

Data Engineering

4620 Views
1 replies
2 kudos

05-27-2022 6:54:12 AM

View Replies

by chandan_a_v • Valued Contributor

05-05-2022 11:23:48 PM

20603 Views
6 replies
6 kudos

Resolved! Spark Driver Out of Memory Issue

Hi, I am executing a simple job in Databricks for which I am getting below error. I increased the Driver size still I faced same issue. Spark config :from pyspark.sql import SparkSessionspark_session = SparkSession.builder.appName("Demand Forecasting...

Data Engineering

20603 Views
6 replies
6 kudos

05-05-2022 11:23:48 PM

View Replies

Latest Reply

chandan_a_v
Valued Contributor

05-08-2022 12:05:48 PM

6 kudos

I am getting the above issue while writing a Spark DF as a parquet file to AWS S3. Not doing any broadcast join actually.

6 kudos

05-08-2022 12:05:48 PM

5 More Replies

by William_Scardua • Valued Contributor

04-25-2022 12:30:05 PM

3684 Views
1 replies
2 kudos

Resolved! Best way to encrypt PII data

Hi guys, I have around 600GB per load, in you opnion, what is the best way to encrypt PII data in terms of performance ? (lib, cluster type, etc.)Thank youWilliam

Data Engineering

3684 Views
1 replies
2 kudos

04-25-2022 12:30:05 PM

View Replies

Latest Reply

Prabakar
Databricks Employee

06-01-2022 11:08:38 PM

2 kudos

Hello @William Scardua please check if the blog helps you.https://databricks.com/blog/2020/11/20/enforcing-column-level-encryption-and-avoiding-data-duplication-with-pii.html

2 kudos

06-01-2022 11:08:38 PM

by rahul3 • Databricks Partner

04-26-2022 3:33:36 AM

3923 Views
1 replies
1 kudos

Facing mount/unmount issue while running same job parallelly with scala.

Using above configuration in cluster, when I run databricks job parallelly with multiple request at a same time, then I am getting mount/unmount issue. For an example : When I make three request to databricks job , it run 3 jobs parallelly but somet...

Data Engineering

3923 Views
1 replies
1 kudos

04-26-2022 3:33:36 AM

View Replies

Latest Reply

Prabakar
Databricks Employee

06-01-2022 11:04:50 PM

1 kudos

hi @rahul upadhyay are you using the same mount path /mnt/rahul in all the 3 jobs? Could you please add the full error message?

1 kudos

06-01-2022 11:04:50 PM

by cfregly • Contributor

03-08-2015 4:13:07 PM

17449 Views
8 replies
3 kudos

How do I handle a task not serializable exception?

Data Engineering

17449 Views
8 replies
3 kudos

03-08-2015 4:13:07 PM

View Replies

Latest Reply

RajatS
New Contributor II

06-01-2022 9:03:43 PM

3 kudos

Hi @Nick Studenski , Could you share, how you solved your problem ?

3 kudos

06-01-2022 9:03:43 PM

7 More Replies

by Devarsh • Contributor

05-23-2022 10:47:58 PM

11968 Views
3 replies
7 kudos

Resolved! Getting the error 'No such file or directory', when trying to access the json file

I am trying to write in my google sheet through Databricks but when it comes to reading the json, file containing the credentials, I am getting the error that No such file or directory exists.import gspread gc = gspread.service_account(filename='...

Data Engineering

11968 Views
3 replies
7 kudos

05-23-2022 10:47:58 PM

View Replies

Latest Reply

Noopur_Nigam
Databricks Employee

06-01-2022 8:57:45 PM

7 kudos

Hi @Devarsh Shah The issue is not with json file but the location you are specifying while reading.As suggested by @Werner Stinckens please start using spark API to read the json file as below:spark.read.format("json").load("testjson")Please check ...

7 kudos

06-01-2022 8:57:45 PM

2 More Replies

by BhagS • New Contributor II

05-17-2022 6:36:03 AM

7642 Views
2 replies
5 kudos

Resolved! Write Empty Delta file in Datalake

hi all,Currently, i am trying to write an empty delta file in data lake, to do this i am doing the following:Reading parquet file from my landing zone ( this file consists only of the schema of SQL tables)df=spark.read.format('parquet').load(landingZ...

Data Engineering

7642 Views
2 replies
5 kudos

05-17-2022 6:36:03 AM

View Replies

Latest Reply

Noopur_Nigam
Databricks Employee

06-01-2022 8:20:42 PM

5 kudos

Hi @bhagya s Since your source file is empty, there is no data file inside the centralizedZonePath directory i.e .parquet file is not created in the target location. However, _delta_log is the transaction log that holds the metadata of the delta for...

5 kudos

06-01-2022 8:20:42 PM

1 More Replies

by Krishscientist • New Contributor III

05-13-2022 3:00:52 PM

3247 Views
2 replies
0 kudos

How to merge delta data..

Data from Parquet to delta converted and delta files written into diff folders based on SRC_SYS_ID....Any one help me how to merge delta data from multiple folders.Regards.

Data Engineering

3247 Views
2 replies
0 kudos

05-13-2022 3:00:52 PM

View Replies

Latest Reply

Noopur_Nigam
Databricks Employee

06-01-2022 7:10:02 PM

0 kudos

Hi @Krishna Kommineni Is the table partitioned on SRC_SYS_ID col?

0 kudos

06-01-2022 7:10:02 PM

1 More Replies

by scholar • New Contributor II

05-07-2022 2:17:16 PM

4091 Views
3 replies
2 kudos

How to read data from kafka topic using spark streaming

I have installed kafka-2.10-0.10.2. And using cluster with configuration: Runtime :6.4 Extended Support( scala 2.11,Spark 2.4.5) After this i am able to get mesgage son producer and consumer But when i try to read data from spark.readsttream and tr...

Data Engineering

4091 Views
3 replies
2 kudos

05-07-2022 2:17:16 PM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

05-08-2022 2:25:47 AM

2 kudos

You can just use display(orders_df3) for debugging purposes

2 kudos

05-08-2022 2:25:47 AM

2 More Replies

by palzor • New Contributor III

04-20-2022 11:24:05 PM

13526 Views
4 replies
4 kudos

Getting error when using CDC in delta live table

Hi,I am trying to use CDC for delta live table, and when when I run the pipeline second time I get an error :org.apache.spark.sql.streaming.StreamingQueryException: Query tbl_cdc [id = ***-xx-xx-bf7e-6cb8b0deb690, runId = ***-xxxx-4031-ba74-b4b22be05...

Data Engineering

13526 Views
4 replies
4 kudos

04-20-2022 11:24:05 PM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

06-01-2022 4:55:13 PM

4 kudos

Hi @Palzor Lama,A streaming live table can only process append queries; that is, queries where new rows are inserted into the source table. Processing updates from source tables, for example, merges and deletes, is not supported. To process updates,...

4 kudos

06-01-2022 4:55:13 PM

3 More Replies

by JeromeB974 • New Contributor II

04-07-2022 11:29:31 PM

9274 Views
5 replies
6 kudos

can we use spark-xml with delta live tables ?

Hiis there a way to use spark-xml with delta live tables (Azure Databricks) ?i 've try something like this without any succes for the momentCREATE LIVE TABLE df17 USING com.databricks.spark.xmlAS SELECT * FROM cloud_files("/mnt/dev/bronze/xml/s432799...

Data Engineering

9274 Views
5 replies
6 kudos

04-07-2022 11:29:31 PM

View Replies

Latest Reply

Zachary_Higgins
Contributor

05-16-2022 10:48:18 AM

6 kudos

This is a tough one since the only magic command available is %pip, but spark-xml is a maven package. The only way I found to do this was to install the spark-xml jar from the maven repo using the databricks-cli. You can reference the cluster ID usin...

6 kudos

05-16-2022 10:48:18 AM

4 More Replies

by Taha_Hussain • Databricks Employee

06-01-2022 12:47:04 PM

1379 Views
0 replies
1 kudos

Databricks Office Hours Register for Office Hours to participate in a live Q&A session with Databricks experts! Our next events are scheduled for ...

Databricks Office HoursRegister for Office Hours to participate in a live Q&A session with Databricks experts! Our next events are scheduled for June 8th & June 22 from 8:00 am - 9:00am PT.This is your opportunity to connect directly with our experts...

Data Engineering

1379 Views
0 replies
1 kudos

06-01-2022 12:47:04 PM

by thaipham • New Contributor III

05-31-2022 5:00:39 PM

3427 Views
3 replies
4 kudos

Resolved! How would I export the latest revision of a notebook?

I've been trying to export some notebooks from my Databricks workspace to my laptop. I can't use Git Repos because the company restricted access to external services from the control plane.However it looks to me that I always exported the previous re...

Data Engineering

3427 Views
3 replies
4 kudos

05-31-2022 5:00:39 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

06-01-2022 5:50:08 AM

4 kudos

Too bad you are not allowed to use Repos, can be a life saver.Can you check your answer as best answer so the question is marked as solved?

4 kudos

06-01-2022 5:50:08 AM

2 More Replies

by Ruby8376 • Valued Contributor

05-24-2022 10:37:23 AM

2730 Views
2 replies
0 kudos

Primary/Foreign key Costraints on Delta tables?

Hi All!I am using databricks in data migration project . We need to transform the data before loading it to SalesForce. Can we do Primary key/foreign key constraints on databricks delta tables?

Data Engineering

2730 Views
2 replies
0 kudos

05-24-2022 10:37:23 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-31-2022 11:59:14 PM

0 kudos

Hi @Ruby Rubi following- up did you get a chance to check @Werner Stinckens previous comments or do you need any further help on this?

0 kudos

05-31-2022 11:59:14 PM

1 More Replies

by laurencewells • New Contributor III

01-11-2022 5:21:51 AM

5091 Views
3 replies
1 kudos

Resolved! Log4J Custom Filter Not Working

Hi All, Hoping you can help. I am looking to set up a custom logging process that captures application ETL logs and Streaming logs I have set up multiple custom logging appenders using the guide here: https://kb.databricks.com/clusters/overwrite-log4...

Data Engineering

5091 Views
3 replies
1 kudos

01-11-2022 5:21:51 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-31-2022 9:44:43 AM

1 kudos

Hey there @Laurence Wells Hope you are doing great.Does @Kaniz Fatma 's response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?Thanks!

1 kudos

05-31-2022 9:44:43 AM

2 More Replies

Databricks Community

Forum Posts

How to install JAR libraries from ADLS? I'm having an error

Resolved! Spark Driver Out of Memory Issue

Resolved! Best way to encrypt PII data

Facing mount/unmount issue while running same job parallelly with scala.

How do I handle a task not serializable exception?

Resolved! Getting the error 'No such file or directory', when trying to access the json file

Resolved! Write Empty Delta file in Datalake

How to merge delta data..

How to read data from kafka topic using spark streaming

Getting error when using CDC in delta live table

can we use spark-xml with delta live tables ?

Databricks Office Hours Register for Office Hours to participate in a live Q&A session with Databricks experts! Our next events are scheduled for ...

Resolved! How would I export the latest revision of a notebook?

Primary/Foreign key Costraints on Delta tables?

Resolved! Log4J Custom Filter Not Working

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template