Data Engineering

Forum Posts

Sorted by:

by Gilg • Contributor II

12-13-2022 9:07:10 PM

9135 Views
4 replies
5 kudos

Avro Deserialization from Event Hub capture and Autoloader

Hi All,I am getting data from Event Hub capture in Avro format and using Auto Loader to process it.I get into the point where I can read the Avro by casting the Body into a string.Now I wanted to deserialized the Body column so it will in table forma...

Data Engineering

9135 Views
4 replies
5 kudos

12-13-2022 9:07:10 PM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

12-13-2022 9:43:46 PM

5 kudos

If you still want to go with the above approach and don't want to provide schema manually, then you can fetch a tiny batch with 1 record and build the schema into a variable using a .schema option. Once done, you can add a new Body column by providin...

5 kudos

12-13-2022 9:43:46 PM

3 More Replies

by kskistad • Databricks Partner

12-14-2022 5:14:00 PM

2797 Views
0 replies
1 kudos

Set and use variables in DLT pipeline notebooks

Using DLT, I have two streaming sources coming from autoloader. Source1 contains a single row of data in the file and Source2 has thousands of rows. There is a common key column between the two sources to join them together. So far, so good.I have a ...

Data Engineering

2797 Views
0 replies
1 kudos

12-14-2022 5:14:00 PM

by mikaellognseth • New Contributor III

06-21-2022 2:48:06 AM

15644 Views
7 replies
0 kudos

Resolved! Databricks cluster start-up: Self Bootstrap Failure

When attempting to deploy/start an Azure Databricks cluster through the UI, the following error consistently occurs: { "reason": { "code": "SELF_BOOTSTRAP_FAILURE", "parameters": { "databricks_error_message": "Self-bootstrap failure d...

Data Engineering

15644 Views
7 replies
0 kudos

06-21-2022 2:48:06 AM

View Replies

Latest Reply

mikaellognseth
New Contributor III

06-29-2022 11:07:48 PM

0 kudos

Hi, in our case the issue turned out to be DNS... As the DNS servers set on the Databricks workspace vnet are only available when peering the "management" vnet in our setup. Took a while to figure out as the error didn't exactly give a lot of clarity...

0 kudos

06-29-2022 11:07:48 PM

6 More Replies

by NavyaD • New Contributor III

12-07-2022 6:27:15 AM

3951 Views
2 replies
4 kudos

How to read a sql notebook in python notebook on workspace

I have a notebook named ecom_sellout.sql under the path notebooks/python/dataloader/queries.I have another notebook(named dataloader under the path notebooks/python/dataloader) in which I am calling this sql notebook.My code runs perfectly fine on re...

Data Engineering

3951 Views
2 replies
4 kudos

12-07-2022 6:27:15 AM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-10-2022 7:27:01 AM

4 kudos

use magic commands and other hand you can use python and SQL formatted there. It will work

4 kudos

12-10-2022 7:27:01 AM

1 More Replies

by Sun1 • New Contributor II

05-04-2022 8:37:57 AM

6969 Views
2 replies
2 kudos

How to find out the databricks driver IP from Ganglia metrics UI?

Data Engineering

6969 Views
2 replies
2 kudos

05-04-2022 8:37:57 AM

View Replies

Latest Reply

Ajay-Pandey
Databricks MVP

12-13-2022 11:38:50 PM

2 kudos

Thanks @Kaniz Fatma Actually, I was looking for the same and then got this blog

2 kudos

12-13-2022 11:38:50 PM

1 More Replies

by rami-lv • New Contributor II

12-13-2022 7:02:12 AM

5412 Views
3 replies
3 kudos

What gets overridden when writing overriding a delta lake table?

I just tried to write to a delta lake table using override mode, and I found that history is reserved. It's unclear to me how the data is overridden, and how long the history could be preserved. As they say, a code is better than a thousand words: my...

Data Engineering

5412 Views
3 replies
3 kudos

12-13-2022 7:02:12 AM

View Replies

Latest Reply

Ajay-Pandey
Databricks MVP

12-13-2022 10:40:09 PM

3 kudos

Hi @Rami ALZEBAK overwrite means first it will remove the data and again it will write the whole data.If you want to see history use can use DESCRIBE HISTORY command

3 kudos

12-13-2022 10:40:09 PM

2 More Replies

by Chris_Shehu • Valued Contributor III

12-12-2022 1:36:26 PM

2192 Views
1 replies
2 kudos

What are the options for extracting data from the delta lake for a vendor?

Our vendor is looking to use Microsoft API Manager to retrieve data from a variety of sources. Is it possible to extract records from the delta lake by using an API?What I've tried:I reviewed the available databricks API's it looks like most of them ...

Data Engineering

2192 Views
1 replies
2 kudos

12-12-2022 1:36:26 PM

View Replies

Latest Reply

Chris_Shehu
Valued Contributor III

12-13-2022 6:25:36 AM

2 kudos

Another possibility for this potentially is to stand up a cluster and have a notebook running flask to create an API interface. I'm still looking into options, but it seems like there should be a baked in solution besides the JDBC connector. I'm not ...

2 kudos

12-13-2022 6:25:36 AM

by gauthamchettiar • New Contributor II

12-13-2022 5:57:27 AM

2796 Views
0 replies
1 kudos

Spark always performing broad casts irrespective of spark.sql.autoBroadcastJoinThreshold during streaming merge operation with DeltaTable.

I am trying to do a streaming merge between delta tables using this guide - https://docs.delta.io/latest/delta-update.html#upsert-from-streaming-queries-using-foreachbatchOur Code Sample (Java): Dataset<Row> sourceDf = sparkSession ...

Data Engineering

2796 Views
0 replies
1 kudos

12-13-2022 5:57:27 AM

by same213 • New Contributor III

10-18-2022 8:44:36 AM

7227 Views
4 replies
8 kudos

Is it possible to create a sqlite database and export it?

I am trying to create a sqlite database in databricks and add a few tables to it. Ultimately, I want to export this using Azure. Is this possible?

Data Engineering

7227 Views
4 replies
8 kudos

10-18-2022 8:44:36 AM

View Replies

Latest Reply

same213
New Contributor III

12-13-2022 5:04:54 AM

8 kudos

@Hubert Dudek We currently have a process in place that reads in a SQLite file. We recently transitioned to using Databricks. We were hoping to be able to create a SQLite file so we didn't have to alter the current process we have in place.

8 kudos

12-13-2022 5:04:54 AM

3 More Replies

by URJ24 • Databricks Partner

12-04-2022 9:17:05 AM

4465 Views
3 replies
1 kudos

I have attended Data + AI World Tour Asia Pacific this week but did not received post confirmation email.

I have attended Data + AI World Tour Asia Pacific this week but did not received post confirmation email. After webinar I received short survey and then thank you note for participation. But unexpectedly did not received any email with feedback link ...

Data Engineering

4465 Views
3 replies
1 kudos

12-04-2022 9:17:05 AM

View Replies

Latest Reply

URJ24
Databricks Partner

12-13-2022 2:11:55 AM

1 kudos

Emailing apacevents@databricks.com helped.

1 kudos

12-13-2022 2:11:55 AM

2 More Replies

by antonyj453 • New Contributor II

12-12-2022 7:01:36 AM

4836 Views
1 replies
3 kudos

How to extract JSON object from a pyspark data frame. I was able to extract data from another column which in array format using "Explode" function, but Explode is not working for Object type. Its returning with type mismatch error.

I have tried below code to extract data which in Array:df2 = df_deidentifieddocuments_tst.select(F.explode('annotationId').alias('annotationId')).select('annotationId.$oid')It was working fine.. but,its not working for JSON object type. Below is colu...

Data Engineering

4836 Views
1 replies
3 kudos

12-12-2022 7:01:36 AM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

12-12-2022 11:28:56 PM

3 kudos

Did you try extracting that column data using from_json function ?

3 kudos

12-12-2022 11:28:56 PM

by gpzz • New Contributor III

12-12-2022 9:24:57 PM

3564 Views
1 replies
3 kudos

pyspark code error

rdd4 = rdd3.reducByKey(lambda x,y: x+y)AttributeError: 'PipelinedRDD' object has no attribute 'reducByKey'Pls help me out with this

Data Engineering

3564 Views
1 replies
3 kudos

12-12-2022 9:24:57 PM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

12-12-2022 10:44:40 PM

3 kudos

Is it a typo or are you really using reducByKey instead of reduceByKey ?

3 kudos

12-12-2022 10:44:40 PM

by Axserv • New Contributor II

12-08-2022 5:47:37 AM

4584 Views
4 replies
1 kudos

How do I "Earn 100 points to the Databricks Community Rewards Store" ? (As advertised on Databricks Academy)

Hello, how do I join the Databricks Community study group for 100points, as advertised on the Databricks Academy website?

Data Engineering

4584 Views
4 replies
1 kudos

12-08-2022 5:47:37 AM

View Replies

Latest Reply

Harun
Honored Contributor

12-08-2022 9:10:42 AM

1 kudos

@Alex Serlovsky You need to earn the lakehouse fundamental credetial certification, then you can join this community group. Within 24 to 48 hours you will get 100 reward points. But As per databricks, you need to earn the credential on or before Nov...

1 kudos

12-08-2022 9:10:42 AM

3 More Replies

by Dave_Nithio • Contributor II

12-12-2022 3:18:35 PM

2364 Views
0 replies
1 kudos

Natively Query Delta Lake with R

I have a large delta table that I need to analyze in native R. The only option I have currently is to query the delta table then use collect() to bring that spark dataframe into an R dataframe. Is there an alternative method that would allow me to qu...

Data Engineering

2364 Views
0 replies
1 kudos

12-12-2022 3:18:35 PM

by lawrence009 • Contributor

12-08-2022 8:39:44 PM

4579 Views
4 replies
4 kudos

Cannot CREATE TABLE with 'No Isolation Shared' cluster

Recently I ran into a number issues running with our notebooks in Interactive Mode. For example, we can't create (delta) table. The command would run and then idle for no apparent exception. The path is created on AWS S3 but delta log is never create...

Data Engineering

4579 Views
4 replies
4 kudos

12-08-2022 8:39:44 PM

View Replies

Latest Reply

youssefmrini
Databricks Employee

12-09-2022 7:54:31 AM

4 kudos

The Admin can disable the possibility to use the no Isolate Shared cluster. I recommend you to switch to Single user where UC is activated. Don't worry you won't need to change your code. If you encounter this kind of issues, make sure to open a tick...

4 kudos

12-09-2022 7:54:31 AM

3 More Replies

Databricks Community

Forum Posts

Avro Deserialization from Event Hub capture and Autoloader

Set and use variables in DLT pipeline notebooks

Resolved! Databricks cluster start-up: Self Bootstrap Failure

How to read a sql notebook in python notebook on workspace

How to find out the databricks driver IP from Ganglia metrics UI?

What gets overridden when writing overriding a delta lake table?

What are the options for extracting data from the delta lake for a vendor?

Spark always performing broad casts irrespective of spark.sql.autoBroadcastJoinThreshold during streaming merge operation with DeltaTable.

Is it possible to create a sqlite database and export it?

I have attended Data + AI World Tour Asia Pacific this week but did not received post confirmation email.

How to extract JSON object from a pyspark data frame. I was able to extract data from another column which in array format using "Explode" function, but Explode is not working for Object type. Its returning with type mismatch error.

pyspark code error

How do I "Earn 100 points to the Databricks Community Rewards Store" ? (As advertised on Databricks Academy)

Natively Query Delta Lake with R

Cannot CREATE TABLE with 'No Isolation Shared' cluster

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template