Data Engineering

Forum Posts

Sorted by:

by Tom_Jones • New Contributor II

12-23-2022 6:07:04 AM

8236 Views
3 replies
1 kudos

How to explode an array column and repack the distinct values into one array in DB SQL?

Hi, I am new to DB SQL. I have a table where the array column (cities) contains multiple arrays and some have multiple duplicate values. I need to unpack the array values into rows so I can list the distinct values. The following query works for this...

Data Engineering

8236 Views
3 replies
1 kudos

12-23-2022 6:07:04 AM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-23-2022 8:35:31 PM

1 kudos

try to use SQL windows functions here

1 kudos

12-23-2022 8:35:31 PM

2 More Replies

by Aviral-Bhardwaj • Esteemed Contributor III

12-22-2022 11:35:18 PM

2827 Views
6 replies
33 kudos

Resolved! Timezone understanding

Today I was working in Timezone kind of data but my Singapore user want to see their time in the Data and USA user want to see their time in the datainstead of both, we all are getting UTC time,how to solve this issuePlease guide Data can be anything...

Data Engineering

2827 Views
6 replies
33 kudos

12-22-2022 11:35:18 PM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-23-2022 4:54:48 AM

33 kudos

I got it guys it was happening due to a library conflict now your answers are really helpful I tried all things

33 kudos

12-23-2022 4:54:48 AM

5 More Replies

by Ruby8376 • Valued Contributor

12-22-2022 7:58:46 AM

1424 Views
5 replies
1 kudos

Resolved! Databricks authentication

Hi there!!we are planning to use databricks -tableau on prem integration for reporting. Data would reside in delta lake and using ta leau-databricks connector, user would be able to generate reports from that data .question is: a private end point wi...

Data Engineering

1424 Views
5 replies
1 kudos

12-22-2022 7:58:46 AM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-22-2022 7:36:13 PM

1 kudos

and make sure that you are going with SPARK SQL connection , else it will always fail

1 kudos

12-22-2022 7:36:13 PM

4 More Replies

by Sharmila04 • New Contributor

12-20-2022 7:15:50 PM

1529 Views
3 replies
0 kudos

DBFS File Browser Error RESOURCE_DOES_NOT_EXIST:

Hi,I am new to databricks, and was trying to follow some tutorial to upload a file and move it under some different folder. I used DBFS option.While trying to move/rename the file I am getting below error, can you please help to understand why I am g...

Data Engineering

1529 Views
3 replies
0 kudos

12-20-2022 7:15:50 PM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-20-2022 8:55:45 PM

0 kudos

use these three commands and it will workdbutils.fs.ls('dbfs:/FileStore/vehicle_data.csv')dbutils.fs.ls('/dbfs/FileStore/vehicle_data.csv')dbutils.fs.ls('/dbfs/dbfs/FileStore/vehicle_data.csv')ThanksAviral

0 kudos

12-20-2022 8:55:45 PM

2 More Replies

by pashashiz • New Contributor III

10-16-2022 11:51:27 AM

1164 Views
3 replies
9 kudos

Does Databricks plan to release runtime with Scala 2.13 support?

Data Engineering

1164 Views
3 replies
9 kudos

10-16-2022 11:51:27 AM

View Replies

Latest Reply

pashashiz
New Contributor III

12-22-2022 10:53:52 PM

9 kudos

Hi, @Vidula Khanna, the new version of Databricks still has only 2.12 scala support.

9 kudos

12-22-2022 10:53:52 PM

2 More Replies

by SRK • Contributor III

12-21-2022 8:29:40 AM

4297 Views
2 replies
0 kudos

How to get the count of dataframe rows when reading through spark.readstream using batch jobs?

I am trying to read messages from kafka topic using spark.readstream, I am using the following code to read it.My CODE:df = spark.readStream .format("kafka") .option("kafka.bootstrap.servers", "192.1xx.1.1xx:9xx") .option("subscr...

Data Engineering

4297 Views
2 replies
0 kudos

12-21-2022 8:29:40 AM

View Replies

Latest Reply

daniel_sahal
Esteemed Contributor

12-22-2022 5:13:54 AM

0 kudos

You can try this approach:https://stackoverflow.com/questions/57568038/how-to-see-the-dataframe-in-the-console-equivalent-of-show-for-structured-st/62161733#62161733ReadStream is running a thread in background so there's no easy way like df.show().

0 kudos

12-22-2022 5:13:54 AM

1 More Replies

by KVNARK • Honored Contributor II

12-22-2022 5:22:25 AM

3949 Views
7 replies
7 kudos

Resolved! Copying delta to Azure SQL DB.

How to copy DELTA to AZURE SQL DB using ADF? Earlier we are using parquet format. Now, We have converted parquet to Delta by using below command: CONVERT TO DELTA parquet.path (Azure Blob Path)

Data Engineering

3949 Views
7 replies
7 kudos

12-22-2022 5:22:25 AM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

12-22-2022 8:08:54 PM

7 kudos

Hi @Aviral Bhardwaj in ADF there is an option of a delta lake file you can directly save your file in delta lake format.

7 kudos

12-22-2022 8:08:54 PM

6 More Replies

by Mado • Valued Contributor II

12-22-2022 5:38:14 PM

871 Views
2 replies
2 kudos

How can I pull all branches at once in Databricks?

Hi,I have cloned a remote repository into my folder in Repos. The repository has several feature branches.When I want to pull any branch, I select the desired branch in Repos and click on "Pull" button. And then I need to select another branch. Is th...

Data Engineering

871 Views
2 replies
2 kudos

12-22-2022 5:38:14 PM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

12-22-2022 8:17:03 PM

2 kudos

Hi @Mohammad Saber below QnA link might help you-linklink2

2 kudos

12-22-2022 8:17:03 PM

1 More Replies

by mrmota • New Contributor

12-22-2022 1:11:28 PM

863 Views
1 replies
3 kudos

ssh connection with paramiko

I'm trying to use the notebook with python to access our server through ssh. But I am not able to connect through the paramiko library. I have already authorized our server's firewall to accept the databrick ip. Have you ever had a similar case? can ...

Data Engineering

863 Views
1 replies
3 kudos

12-22-2022 1:11:28 PM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-22-2022 7:33:00 PM

3 kudos

yes this can happen you have to whitelist that Port in the databricks configuration, there is one spark configuration that you can add to your cluster, that will whitelist your particular portas that configuration can only give by databricks, you can...

3 kudos

12-22-2022 7:33:00 PM

by Ryan_Chynoweth • Honored Contributor III

06-25-2021 3:05:29 PM

766 Views
3 replies
2 kudos

Does Delta Live Tables only work with Delta tables?

Data Engineering

766 Views
3 replies
2 kudos

06-25-2021 3:05:29 PM

View Replies

Latest Reply

Trodenn
New Contributor III

12-21-2022 8:32:34 PM

2 kudos

is there a way to read a delta live table and then have it as a spark dataframe?

2 kudos

12-21-2022 8:32:34 PM

2 More Replies

by AK98 • New Contributor II

12-21-2022 9:45:46 PM

2391 Views
4 replies
0 kudos

Py4JJavaError when trying to write dataframe to delta table

I'm trying to write a dataframe to a delta table and am getting this error.Im not sure what the issue is as I had no problem successfully writing other dataframes to delta tables. I attached a snippet of the data as well along with the schema:

Data Engineering

2391 Views
4 replies
0 kudos

12-21-2022 9:45:46 PM

View Replies

Latest Reply

Kaniz
Community Manager

12-22-2022 1:48:30 AM

0 kudos

Hi @Ravi Teja, We haven’t heard from you since the last response from @Rishabh Pandey, and I was checking back to see if his suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to oth...

0 kudos

12-22-2022 1:48:30 AM

3 More Replies

by espenol • New Contributor III

12-19-2022 10:59:55 PM

1467 Views
4 replies
1 kudos

How to work with DLT pipelines? Best practices?

So I'm used to developing notebooks interactively. Write some code, run to see if I made an error and if no error, filter and display dataframe to see that I did what I intended. With DLT pipelines, however, I can't run interactively. Is my understan...

Data Engineering

1467 Views
4 replies
1 kudos

12-19-2022 10:59:55 PM

View Replies

Latest Reply

Rishabh264
Honored Contributor II

12-19-2022 11:06:23 PM

1 kudos

yes exactly i am also working on the dlt , and what i get to know about from this is that if we want to check our error , we have to run the pipeline again and again for debugging it , but this is not the best practice to do so , so the other metho...

1 kudos

12-19-2022 11:06:23 PM

3 More Replies

by Choolanadu • New Contributor

12-17-2022 11:16:54 AM

1980 Views
1 replies
0 kudos

Airflow - How to pull XComs value in the notebook task?

Using AIrflow, I have created a DAG with a sequence of notebook tasks. The first notebook returns a batch id; the subsequent notebook tasks need this batch_id.I am using the DatabricksSubmitRunOperator to run the notebook task. This operator pushes ...

Data Engineering

1980 Views
1 replies
0 kudos

12-17-2022 11:16:54 AM

View Replies

Latest Reply

daniel_sahal
Esteemed Contributor

12-22-2022 5:32:04 AM

0 kudos

From what I understand - you want to pass a run_id parameter to the second notebook task?You can: Create a widget param inside your databricks notebook (https://docs.databricks.com/notebooks/widgets.html) that will consume your run_idPass the paramet...

0 kudos

12-22-2022 5:32:04 AM

by Viren123 • Contributor

12-19-2022 8:29:14 AM

928 Views
1 replies
1 kudos

MERGE in the delta table

Hello,I would like to UPDATE the data in the delta table. For the same MERGE option came to my attention.However my below code ends up with errors as shown at the end. Kindly advice what am I missing here. Thank you.**********************************...

Data Engineering

928 Views
1 replies
1 kudos

12-19-2022 8:29:14 AM

View Replies

Latest Reply

daniel_sahal
Esteemed Contributor

12-22-2022 5:25:19 AM

1 kudos

I think that you're mixing DataFrames spark vs. pandas.Try creating dfSource in Spark instead of Pandas.

1 kudos

12-22-2022 5:25:19 AM

by Kash • Contributor III

12-20-2022 12:05:31 PM

930 Views
1 replies
0 kudos

Data-quality help: Save Data Profile dbutils.data.summarize(df) to table

Hi there,We would like to create a data quality database that helps us understand how complete our data is. We would like to run a job each day that basically outputs the same table data as dbutils.data.summarize(df) for a given table and save it to ...

Data Engineering

930 Views
1 replies
0 kudos

12-20-2022 12:05:31 PM

View Replies

Latest Reply

daniel_sahal
Esteemed Contributor

12-22-2022 5:17:20 AM

0 kudos

From what I know there's no easy way to save dbutils.data.summarize() into a df.You can still create your custom python/pyspark code to profile your data and save the output.

0 kudos

12-22-2022 5:17:20 AM

User

Count

1601

736

343

284

247

Databricks

Forum Posts

How to explode an array column and repack the distinct values into one array in DB SQL?

Resolved! Timezone understanding

Resolved! Databricks authentication

DBFS File Browser Error RESOURCE_DOES_NOT_EXIST:

Does Databricks plan to release runtime with Scala 2.13 support?

How to get the count of dataframe rows when reading through spark.readstream using batch jobs?

Resolved! Copying delta to Azure SQL DB.

How can I pull all branches at once in Databricks?

ssh connection with paramiko

Does Delta Live Tables only work with Delta tables?

Py4JJavaError when trying to write dataframe to delta table

How to work with DLT pipelines? Best practices?

Airflow - How to pull XComs value in the notebook task?

MERGE in the delta table

Data-quality help: Save Data Profile dbutils.data.summarize(df) to table

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...