Data Engineering

Forum Posts

Sorted by:

by ndatabricksuser • New Contributor

10-20-2023 9:39:55 AM

749 Views
2 replies
2 kudos

Vacuum and Streaming Issue

Hi User Community,Requesting some advice on the below issue please:I have 4 Databricks notebooks, 1 That ingests data from a Kafka topic (metric data from many servers) and dumps the data in parquet format into a specified location. My 2nd data brick...

Data Engineering

Delta Lake

optimize

spark

structured streaming

vacuum

749 Views
2 replies
2 kudos

10-20-2023 9:39:55 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

11-01-2023 10:21:06 AM

2 kudos

Hi @ndatabricksuser , Just a friendly follow-up. Have you had a chance to review my colleague's response to your inquiry? Did it prove helpful, or are you still in need of assistance? Your response would be greatly appreciated.

2 kudos

11-01-2023 10:21:06 AM

1 More Replies

by Srikanth_Gupta_ • Valued Contributor

06-23-2021 1:13:07 PM

3325 Views
2 replies
0 kudos

Whats the difference between Spark Pipeline and Delta Live Table Pipelines? in which scenarios we should leverage DLT pipeline over Spark pipeline

Data Engineering

3325 Views
2 replies
0 kudos

06-23-2021 1:13:07 PM

View Replies

Latest Reply

BilalAslamDbrx
Honored Contributor II

11-01-2023 3:54:03 AM

0 kudos

I'll try to answer this in the simplest possible way 1. Spark is an imperative programming framework. You tell it what it to do, it does it. DLT is declarative - you describe what you want the datasets to be (i.e. the transforms), and it takes care ...

0 kudos

11-01-2023 3:54:03 AM

1 More Replies

by Gilg • Contributor II

10-26-2023 3:01:21 PM

1111 Views
2 replies
0 kudos

Data Encryption in DLT

Hi Team,We have a requirement to Encrypt PII data in Silver layer. What is the best way to implement this in DLT? and only users that has security privileges are able to decrypt the PII info.I have done this in the past using Structured Streaming but...

Data Engineering

Delta Live Table

Encryption

1111 Views
2 replies
0 kudos

10-26-2023 3:01:21 PM

View Replies

Latest Reply

Gilg
Contributor II

10-31-2023 5:32:57 PM

0 kudos

Can you show me how to use the functions built in pyspark using DLT please.Also, trying to implement column/row level security in silver tables that is generated by DLT, but giving me the following error[RequestId=35024c5d-ad05-4f68-a4cb-f3a723f66e1c...

0 kudos

10-31-2023 5:32:57 PM

1 More Replies

by T_1 • New Contributor III

05-25-2022 12:31:51 PM

15510 Views
13 replies
3 kudos

Resolved! displayHTML can't seem to be used from Python code, only hand typed into a cell???

Trying to use displayHTML from w/in a Python module gets a Python exception:NameError: name 'displayHTML' is not definedand I've found no way around this. It seems to be something at the UI layer or something, not a Python function that can be refere...

Data Engineering

15510 Views
13 replies
3 kudos

05-25-2022 12:31:51 PM

View Replies

Latest Reply

T_1
New Contributor III

10-31-2023 11:23:07 AM

3 kudos

Holy Guacamole Batman! It works finally!!!! Wow, thanks @ptweir That's awesome! I can go back and update my doc (and code, to just use databricks the same, now, and Jupyter!) and it'll work by default. It's great they fixed it, shame they never told ...

3 kudos

10-31-2023 11:23:07 AM

12 More Replies

by pavlos_skev • New Contributor III

10-26-2023 5:18:25 AM

2553 Views
3 replies
0 kudos

Resolved! Invalid configuration value detected for fs.azure.account.key only when trying to save RDD

Hello,We have encountered a weird issue in our (old) set-up that looks like a bug in the Unity Catalog. The storage account which we are trying to persist is configured via External Volumes.We have a pipeline that gets XML data and stores it in an RD...

Data Engineering

2553 Views
3 replies
0 kudos

10-26-2023 5:18:25 AM

View Replies

Latest Reply

pavlos_skev
New Contributor III

10-31-2023 5:42:00 AM

0 kudos

I will post here what worked resolving this error for us, in case someone else in the future encounters this.It turns out that this error appears in this case, when we were using the below command while the directory 'staging2' already exists. To avo...

0 kudos

10-31-2023 5:42:00 AM

2 More Replies

by Braxx • Contributor II

10-27-2021 5:10:34 AM

8428 Views
3 replies
1 kudos

Resolved! How to kill the execution of a notebook on specyfic cell?

Let's say I want to check if a condition is false then stop the execution of the rest of the script. I tried with two approaches:1) raising exceptionif not data_input_cols.issubset(data.columns): raise Exception("Missing column or column's name mis...

Data Engineering

8428 Views
3 replies
1 kudos

10-27-2021 5:10:34 AM

View Replies

Latest Reply

Invasioned
New Contributor II

10-31-2023 5:33:04 AM

1 kudos

In Jupyter notebooks or similar environments, you can stop the execution of a notebook at a specific cell by raising an exception. However, you need to handle the exception properly to ensure the execution stops. The issue you're encountering could b...

1 kudos

10-31-2023 5:33:04 AM

2 More Replies

by ashdam • New Contributor III

10-27-2023 12:34:57 AM

3981 Views
10 replies
1 kudos

Resolved! How to version your workflows/jobs

WE would like to version control workflows/jobs over git, not the underlying notebooks but the job logic itselfis it possible?

Data Engineering

3981 Views
10 replies
1 kudos

10-27-2023 12:34:57 AM

View Replies

Latest Reply

ashdam
New Contributor III

10-31-2023 12:44:51 AM

1 kudos

Thank you very much for all your answers

1 kudos

10-31-2023 12:44:51 AM

9 More Replies

by madhav_dhruve • New Contributor III

07-18-2023 9:19:47 PM

2693 Views
1 replies
0 kudos

Move Files from S3 to Local File System with Unity Catalog Enabled

Dear Databricks Community Experts,I am working on databricks on AWS with unity catalog.One usecase for me is to uncompress files with many extensions there on S3 Bucket.Below is my strategy:-Move files from S3 to Local file system (where spark driver...

Screenshot 2023-07-18 at 10.57.19 AM.png

Data Engineering

2693 Views
1 replies
0 kudos

07-18-2023 9:19:47 PM

View Replies

Latest Reply

rvadali2
New Contributor II

10-30-2023 11:39:45 AM

0 kudos

did you find a solution to this?

0 kudos

10-30-2023 11:39:45 AM

by dfoard • New Contributor

10-20-2023 12:46:50 PM

1615 Views
1 replies
0 kudos

ERROR: No matching distribution found for databricks-smolder

I'm trying to follow along with the blog post Gaining Insights Into Your HL7 Data With Smolder and Databricks-#1 of 3. I was able to finally get a jar file built from the repo using Java 17 and it successfully imports into the cluster. However, when ...

Data Engineering

smolder

1615 Views
1 replies
0 kudos

10-20-2023 12:46:50 PM

View Replies

Latest Reply

Kaniz
Community Manager

10-30-2023 3:50:29 AM

0 kudos

Hi @dfoard , It appears that the error is due to attempting to import a Java package in Python code, which isn't supported. The Smolder library is designed to work with Scala code in a Databricks Notebook environment. To use the com.data...

0 kudos

10-30-2023 3:50:29 AM

by Akash2 • Contributor

10-21-2023 2:51:09 AM

627 Views
1 replies
0 kudos

Data Engineer Professional Exam Suspended

Hi team,I was giving my exam today and 40 minutes into the exam I was interrupted by the proctor to show the test area. The table had a guitar e string and an almost eaten apple. Nothing else was on the table. Then the proctor asked me to show the ro...

Data Engineering

627 Views
1 replies
0 kudos

10-21-2023 2:51:09 AM

View Replies

Latest Reply

Kaniz
Community Manager

10-30-2023 3:41:50 AM

0 kudos

Hi @Akash2 , Thank you for posting your concern on Community! To expedite your request, please list your concerns on our ticketing portal. Our support staff would be able to act faster on the resolution (our standard resolution time is 24-48 hours).

0 kudos

10-30-2023 3:41:50 AM

by pankaj_kaushal • New Contributor

10-22-2023 10:22:52 PM

568 Views
1 replies
0 kudos

Tuple2 UDF not working

From a UDF i am trying to return a tuple. But looks like the tuple is not serialising and hence getting empty tuple.Can some help me on this.Attached code and output.

Data Engineering

568 Views
1 replies
0 kudos

10-22-2023 10:22:52 PM

View Replies

Latest Reply

Kaniz
Community Manager

10-30-2023 3:31:03 AM

0 kudos

Hi @pankaj_kaushal , When you encounter a situation where the tuple returned from a User-Defined Function (UDF) in PySpark isn't serializable, it can cause problems. To make sure the returned object is serializable and can be handled correctly, you ...

0 kudos

10-30-2023 3:31:03 AM

by abhinandan084 • New Contributor III

08-19-2021 11:15:28 AM

9101 Views
17 replies
12 kudos

Resolved! Community Edition signup issues

I am trying to sign up for the community edition (https://databricks.com/try-databricks) for use with a databricks academy course. However, I am unable to signup and I receive the following error (image attached). On going to login page (link in ora...

Data Engineering

9101 Views
17 replies
12 kudos

08-19-2021 11:15:28 AM

View Replies

Latest Reply

Kaniz
Community Manager

10-30-2023 2:02:41 AM

12 kudos

Please look at this link related to the Community - Edition, which might solve your problem. I appreciate your interest in sharing your Community-Edition query with us. However, at this time, we are not entertaining any Community-Edition questions. W...

12 kudos

10-30-2023 2:02:41 AM

16 More Replies

by gopeshr • New Contributor

10-24-2023 6:59:46 PM

722 Views
1 replies
0 kudos

Databricks <> snowflake connectivity

We are trying to establish connection between databricks and snowflake through the databricks workspaces running on cluster. Initially we assumed it would be the firewall/network blocking the traffic and tried to add a firewall rule but even after ...

Data Engineering

722 Views
1 replies
0 kudos

10-24-2023 6:59:46 PM

View Replies

Latest Reply

Kaniz
Community Manager

10-30-2023 1:57:14 AM

0 kudos

Hi @gopeshr , To connect Azure Databricks with Snowflake, you need to ensure the necessary network configurations and firewall rules are in place. Here's a brief guide: Whitelist Databricks Public IP Addresses: Obtain the list of Azure Databricks' ...

0 kudos

10-30-2023 1:57:14 AM

by ccs • New Contributor II

01-25-2022 4:34:48 AM

1779 Views
5 replies
2 kudos

Resolved! What would happen it my dynamic IP changed in IP Access list?

On this feature IP access lists IP access lists - Azure Databricks | Microsoft Docs, what we observe is that if your IP is not on the access list, you cannot modify the list via API since you are not on trusted location. What if I specify only 1 IP s...

Data Engineering

1779 Views
5 replies
2 kudos

01-25-2022 4:34:48 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-20-2022 8:09:03 AM

2 kudos

Hey @Chun Sing Chan Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...

2 kudos

05-20-2022 8:09:03 AM

4 More Replies

by Mohammad_Younus • New Contributor

10-26-2023 7:35:31 PM

1537 Views
1 replies
0 kudos

Merge delta tables with data more than 200 million

HI Everyone,Im trying to merge two delta tables who have data more than 200 million in each of them. These tables are properly optimized. But upon running the job, the job is taking a long time to execute and the memory spills are huger (1TB-3TB) rec...

Data Engineering

1537 Views
1 replies
0 kudos

10-26-2023 7:35:31 PM

View Replies

Latest Reply

Kaniz
Community Manager

10-30-2023 12:06:30 AM

0 kudos

Hi @Mohammad_Younus , When dealing with large Delta tables with over 200 million rows, optimizing merge operations becomes crucial to avoid memory overflow and reduce execution time. Here are some effective strategies to enhance the efficiency of yo...

0 kudos

10-30-2023 12:06:30 AM

User

Count

1602

736

343

284

247

Databricks

Forum Posts

Vacuum and Streaming Issue

Whats the difference between Spark Pipeline and Delta Live Table Pipelines? in which scenarios we should leverage DLT pipeline over Spark pipeline

Data Encryption in DLT

Resolved! displayHTML can't seem to be used from Python code, only hand typed into a cell???

Resolved! Invalid configuration value detected for fs.azure.account.key only when trying to save RDD

Resolved! How to kill the execution of a notebook on specyfic cell?

Resolved! How to version your workflows/jobs

Move Files from S3 to Local File System with Unity Catalog Enabled

ERROR: No matching distribution found for databricks-smolder

Data Engineer Professional Exam Suspended

Tuple2 UDF not working

Resolved! Community Edition signup issues

Databricks <> snowflake connectivity

Resolved! What would happen it my dynamic IP changed in IP Access list?

Merge delta tables with data more than 200 million

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...