Data Engineering

Forum Posts

Sorted by:

by MetaRossiVinli • Contributor

03-31-2023 4:47:42 PM

1255 Views
2 replies
4 kudos

Resolved! Can you use the Secrets API 2.0 in a Delta Live Tables configuration?

Is the Secrets API 2.0 not applied to Delta Live Tables configurations? I understand that the Secrets API 2.0 is in public preview and this use case may not be supported, yet. I tried the following and both do not work for the stated reasons.In a DLT...

Data Engineering

1255 Views
2 replies
4 kudos

03-31-2023 4:47:42 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-02-2023 9:26:22 AM

4 kudos

@Kevin Rossi : As a workaround, you can use the code you provided to load the secret in a cell in a DLT notebook and set it in the Spark configuration. This will allow you to use the secret in your DLT code.Another workaround could be to store the c...

4 kudos

04-02-2023 9:26:22 AM

1 More Replies

by mshettar • New Contributor II

03-30-2023 1:17:44 PM

845 Views
2 replies
2 kudos

newAPIHadoopRDD Spark API doesn't retrieve unflushed data written to Hbase table

Reading from an HBase table with a few hundred records that haven't been persisted (flushed) to HDFS doesn't show up in Spark. However, the records become visible after forced flush via Hbase shell or system triggered flush (when size of Memstore cro...

Data Engineering

845 Views
2 replies
2 kudos

03-30-2023 1:17:44 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-02-2023 9:07:52 AM

2 kudos

@Manjunath Shettar :It seems that the issue is related to the fact that the records in the HBase table have not been flushed to HDFS and are still stored in the Memstore. Spark's newAPIHadoopRDD API reads data from the HBase table through HBase's Ta...

2 kudos

04-02-2023 9:07:52 AM

1 More Replies

by Osky_Rosky • New Contributor II

03-30-2023 10:24:40 AM

2838 Views
2 replies
0 kudos

Combine Python + R in data manipulation in Databricks Notebook

Want to combine Py + Rfrom pyspark.sql import SparkSessionspark = SparkSession.builder.appName("CreateDataFrame").getOrCreate()# Create a sample DataFramedata = [("Alice", 25), ("Bob", 30), ("Charlie", 35), ("Oscar",36), ("Hiromi",41), ("Alejandro", ...

Data Engineering

2838 Views
2 replies
0 kudos

03-30-2023 10:24:40 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-02-2023 9:11:44 AM

0 kudos

@Oscar CENTENO MORA :To combine Py and R in a Databricks notebook, you can use the magics command %python and %rto switch between Python and R cells. Here's an example of how to create a Spark DataFrame in Python and then use it in R:from pyspark.sq...

0 kudos

04-02-2023 9:11:44 AM

1 More Replies

by TheRealJimShady • New Contributor

03-30-2023 8:16:36 AM

2667 Views
7 replies
0 kudos

Resolved! Email destination not appearing in Job's System Notification list.

On job failure I need to send an email with a custom subject line. I have configured the email address as a destination with the subject that I need, but I don't see it as an option that I can choose in the 'System Notification' dialog in the job set...

Data Engineering

2667 Views
7 replies
0 kudos

03-30-2023 8:16:36 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 7:18:22 PM

0 kudos

Hi @James Smith Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so w...

0 kudos

03-31-2023 7:18:22 PM

6 More Replies

by Hunter1604 • New Contributor II

03-30-2023 11:44:11 AM

3158 Views
5 replies
0 kudos

How to remove checkpoints from DeltaLake table ?

How to remove checkpoints from DeltaLake table ?I see that on my delta table exist a few checkpoints I want to remove the oldest one. It seems that existing of it is blocking removing the oldest _delta_logs entries

Data Engineering

3158 Views
5 replies
0 kudos

03-30-2023 11:44:11 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 7:09:15 PM

0 kudos

Hi @Pawel Woj Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ...

0 kudos

03-31-2023 7:09:15 PM

4 More Replies

by 646901 • New Contributor II

03-31-2023 5:25:33 AM

1857 Views
6 replies
0 kudos

Using databricks as an application database?

Is databricks suitable to be used as an application database? I have been asked to build a fairly large CRM type app, databricks will use the data from this for analysis. I am thinking if i built the application database inside of databricks then i c...

Data Engineering

1857 Views
6 replies
0 kudos

03-31-2023 5:25:33 AM

View Replies

Latest Reply

646901
New Contributor II

04-03-2023 3:21:27 AM

0 kudos

@Vigneshraja Palaniraj Latency under 50-100ms to the database would be ideal, once we start adding 2-5 queries in a request, and time after this really starts to compound and add up. Concurrency - the number of users initially will be 5-10 users bu...

0 kudos

04-03-2023 3:21:27 AM

5 More Replies

by JonD • New Contributor III

03-08-2023 2:22:32 AM

1662 Views
3 replies
0 kudos

Resolved! Why does my Single Node cluster automatically resize num_workers?

Hi community,We have setup a Databricks cluster as Single node with num_workers=0 . Sometimes the cluster automatically resizes to e.g. 10 workers. When I edit the cluster subsequently it gives an error that num_workers is not allowed for Single node...

Data Engineering

1662 Views
3 replies
0 kudos

03-08-2023 2:22:32 AM

View Replies

Latest Reply

JonD
New Contributor III

04-03-2023 4:42:38 AM

0 kudos

I think the issue is solved, at least it didn't occur in the last month. We monitored this via Azure Log Analytics. Maybe it was solved due to some patch/update, thanks anyway!

0 kudos

04-03-2023 4:42:38 AM

2 More Replies

by pvignesh92 • Honored Contributor

03-10-2023 12:26:51 AM

3704 Views
6 replies
2 kudos

Resolved! Optimizing Writes from Databricks to Snowflake

My job after doing all the processing in Databricks layer writes the final output to Snowflake tables using df.write API and using Spark snowflake connector. I often see that even a small dataset (16 partitions and 20k rows in each partition) takes a...

Data Engineering

3704 Views
6 replies
2 kudos

03-10-2023 12:26:51 AM

View Replies

Latest Reply

pvignesh92
Honored Contributor

04-03-2023 4:32:26 AM

2 kudos

There are few options I tried out which had given me a better performance.Caching the intermediate or final results so that while writing the dataframe computation does not repeat again. Coalesce the results into the partitions 1x or 0.5x your number...

2 kudos

04-03-2023 4:32:26 AM

5 More Replies

by ahmedE_ • New Contributor II

03-09-2023 9:08:29 AM

2347 Views
6 replies
0 kudos

How to install AI library aif360 on databricks notebook

Hello,I'm trying to install a library called aif360 on the databricks notebook. However, I get error that tkinter is not installed.I tried installing tk and tk-tools, but still the issue remains. Any idea on what solution we can use? I also tried ins...

Data Engineering

2347 Views
6 replies
0 kudos

03-09-2023 9:08:29 AM

View Replies

Latest Reply

Vartika
Moderator

04-03-2023 3:49:18 AM

0 kudos

Hi @Ahmed Elghareeb Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.This will ...

0 kudos

04-03-2023 3:49:18 AM

5 More Replies

by sanjay • Valued Contributor II

03-29-2023 11:59:29 PM

7682 Views
20 replies
17 kudos

Resolved! How to limit number of files in each batch in streaming batch processing

Hi,I am running batch job which processes incoming files. I am trying to limit number of files in each batch process so added maxFilesPerTrigger option. But its not working. It processes all incoming files at once.(spark.readStream.format("delta").lo...

Data Engineering

7682 Views
20 replies
17 kudos

03-29-2023 11:59:29 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 7:08:19 PM

17 kudos

Hi @Sanjay Jain Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so w...

17 kudos

03-31-2023 7:08:19 PM

19 More Replies

by Sandesh87 • New Contributor III

03-08-2023 10:28:33 AM

726 Views
3 replies
0 kudos

parse and combine multiple datasets within a single file

An application receives messages from event hub. Below is a message received from event hub and loaded into a dataframe with one columnname,gender,idsam,m,001-----time,x,y,z,long,lat160,22,45,51,83,56230,82,95,48,18,26-----event,a,b,c034,1,5,6073,4,2...

Data Engineering

726 Views
3 replies
0 kudos

03-08-2023 10:28:33 AM

View Replies

Latest Reply

Vartika
Moderator

04-03-2023 3:38:14 AM

0 kudos

Hi @Sandesh Puligundla Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you...

0 kudos

04-03-2023 3:38:14 AM

2 More Replies

by KVNARK • Honored Contributor II

03-07-2023 8:01:58 AM

1199 Views
4 replies
2 kudos

Azure SQL date function conversion to Databricks SQL.

I need to convert the below azure sql date_add function to databricks sql. But not getting the expected output. Can anyone suggest what can be done for this.DATE_ADD(Hour,(SELECT t1.SLA FROM SLA t1 WHERE t1.Stage_Id = 2 AND t1.RNK = 1)

Data Engineering

1199 Views
4 replies
2 kudos

03-07-2023 8:01:58 AM

View Replies

Latest Reply

Vartika
Moderator

04-03-2023 3:21:18 AM

2 kudos

Hi @KVNARK . Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

2 kudos

04-03-2023 3:21:18 AM

3 More Replies

by Anonymous • Not applicable

04-03-2023 1:52:07 AM

441 Views
1 replies
2 kudos

Join our Community Social Group and Never Miss a Beat! Are you looking to connect with like-minded individuals and stay on top of the latest news and ...

Join our Community Social Group and Never Miss a Beat!Are you looking to connect with like-minded individuals and stay on top of the latest news and events in your community? Look no further than our special group on Community called the "Community S...

Data Engineering

441 Views
1 replies
2 kudos

04-03-2023 1:52:07 AM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

04-03-2023 2:17:01 AM

2 kudos

@Rishabh Pandey

2 kudos

04-03-2023 2:17:01 AM

by Data_Analytics1 • Contributor III

03-30-2023 9:57:26 PM

3956 Views
5 replies
0 kudos

"Waiting to run" status of the cell

Whenever I tried to execute the cell, it is not executing and it says waiting to run and gets stuck there. I tried this with different clusters but the problem still persist.What should be the standard process if I create a new cluster and wants to a...

Data Engineering

3956 Views
5 replies
0 kudos

03-30-2023 9:57:26 PM

View Replies

Latest Reply

NandiniN
Valued Contributor II

04-03-2023 2:05:26 AM

0 kudos

Hello @Mahesh Chahare , The "Waiting to run" message appears usually when the cluster start or library installation is in progress. If you are running the commands on an interactive notebook, and you have clicked on "Run All", the commands will get ...

0 kudos

04-03-2023 2:05:26 AM

4 More Replies

by YogeshS • New Contributor II

03-20-2023 6:36:36 PM

1014 Views
5 replies
1 kudos

Waiting Waiting Waiting.......Databricks voucher not received

Hello,I have attended the webinar Lakehouse Fundamentals Training in Feb 2023then Completed the Databricks Lakehouse fundamentals accreditation and submitted the survey.As per communication it is expected that I will receive Databricks Certification ...

Data Engineering

1014 Views
5 replies
1 kudos

03-20-2023 6:36:36 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-29-2023 11:12:03 PM

1 kudos

Hi @YOGESH SINGH I'm sorry you could not find a solution to your problem in the answers provided.Our community strives to provide helpful and accurate information, but sometimes an immediate solution may only be available for some issues.I suggest p...

1 kudos

03-29-2023 11:12:03 PM

4 More Replies

User

Count

1601

736

343

284

247

Databricks

Forum Posts

Resolved! Can you use the Secrets API 2.0 in a Delta Live Tables configuration?

newAPIHadoopRDD Spark API doesn't retrieve unflushed data written to Hbase table

Combine Python + R in data manipulation in Databricks Notebook

Resolved! Email destination not appearing in Job's System Notification list.

How to remove checkpoints from DeltaLake table ?

Using databricks as an application database?

Resolved! Why does my Single Node cluster automatically resize num_workers?

Resolved! Optimizing Writes from Databricks to Snowflake

How to install AI library aif360 on databricks notebook

Resolved! How to limit number of files in each batch in streaming batch processing

parse and combine multiple datasets within a single file

Azure SQL date function conversion to Databricks SQL.

Join our Community Social Group and Never Miss a Beat! Are you looking to connect with like-minded individuals and stay on top of the latest news and ...

"Waiting to run" status of the cell

Waiting Waiting Waiting.......Databricks voucher not received

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...