Data Engineering

Forum Posts

Sorted by:

by gauthamchettiar • New Contributor II

12-13-2022 5:57:27 AM

2802 Views
0 replies
1 kudos

Spark always performing broad casts irrespective of spark.sql.autoBroadcastJoinThreshold during streaming merge operation with DeltaTable.

I am trying to do a streaming merge between delta tables using this guide - https://docs.delta.io/latest/delta-update.html#upsert-from-streaming-queries-using-foreachbatchOur Code Sample (Java): Dataset<Row> sourceDf = sparkSession ...

Data Engineering

2802 Views
0 replies
1 kudos

12-13-2022 5:57:27 AM

by same213 • New Contributor III

10-18-2022 8:44:36 AM

7230 Views
4 replies
8 kudos

Is it possible to create a sqlite database and export it?

I am trying to create a sqlite database in databricks and add a few tables to it. Ultimately, I want to export this using Azure. Is this possible?

Data Engineering

7230 Views
4 replies
8 kudos

10-18-2022 8:44:36 AM

View Replies

Latest Reply

same213
New Contributor III

12-13-2022 5:04:54 AM

8 kudos

@Hubert Dudek We currently have a process in place that reads in a SQLite file. We recently transitioned to using Databricks. We were hoping to be able to create a SQLite file so we didn't have to alter the current process we have in place.

8 kudos

12-13-2022 5:04:54 AM

3 More Replies

by URJ24 • Databricks Partner

12-04-2022 9:17:05 AM

4487 Views
3 replies
1 kudos

I have attended Data + AI World Tour Asia Pacific this week but did not received post confirmation email.

I have attended Data + AI World Tour Asia Pacific this week but did not received post confirmation email. After webinar I received short survey and then thank you note for participation. But unexpectedly did not received any email with feedback link ...

Data Engineering

4487 Views
3 replies
1 kudos

12-04-2022 9:17:05 AM

View Replies

Latest Reply

URJ24
Databricks Partner

12-13-2022 2:11:55 AM

1 kudos

Emailing apacevents@databricks.com helped.

1 kudos

12-13-2022 2:11:55 AM

2 More Replies

by antonyj453 • New Contributor II

12-12-2022 7:01:36 AM

4845 Views
1 replies
3 kudos

How to extract JSON object from a pyspark data frame. I was able to extract data from another column which in array format using "Explode" function, but Explode is not working for Object type. Its returning with type mismatch error.

I have tried below code to extract data which in Array:df2 = df_deidentifieddocuments_tst.select(F.explode('annotationId').alias('annotationId')).select('annotationId.$oid')It was working fine.. but,its not working for JSON object type. Below is colu...

Data Engineering

4845 Views
1 replies
3 kudos

12-12-2022 7:01:36 AM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

12-12-2022 11:28:56 PM

3 kudos

Did you try extracting that column data using from_json function ?

3 kudos

12-12-2022 11:28:56 PM

by gpzz • New Contributor III

12-12-2022 9:24:57 PM

3568 Views
1 replies
3 kudos

pyspark code error

rdd4 = rdd3.reducByKey(lambda x,y: x+y)AttributeError: 'PipelinedRDD' object has no attribute 'reducByKey'Pls help me out with this

Data Engineering

3568 Views
1 replies
3 kudos

12-12-2022 9:24:57 PM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

12-12-2022 10:44:40 PM

3 kudos

Is it a typo or are you really using reducByKey instead of reduceByKey ?

3 kudos

12-12-2022 10:44:40 PM

by Axserv • New Contributor II

12-08-2022 5:47:37 AM

4588 Views
4 replies
1 kudos

How do I "Earn 100 points to the Databricks Community Rewards Store" ? (As advertised on Databricks Academy)

Hello, how do I join the Databricks Community study group for 100points, as advertised on the Databricks Academy website?

Data Engineering

4588 Views
4 replies
1 kudos

12-08-2022 5:47:37 AM

View Replies

Latest Reply

Harun
Honored Contributor

12-08-2022 9:10:42 AM

1 kudos

@Alex Serlovsky You need to earn the lakehouse fundamental credetial certification, then you can join this community group. Within 24 to 48 hours you will get 100 reward points. But As per databricks, you need to earn the credential on or before Nov...

1 kudos

12-08-2022 9:10:42 AM

3 More Replies

by Dave_Nithio • Contributor II

12-12-2022 3:18:35 PM

2367 Views
0 replies
1 kudos

Natively Query Delta Lake with R

I have a large delta table that I need to analyze in native R. The only option I have currently is to query the delta table then use collect() to bring that spark dataframe into an R dataframe. Is there an alternative method that would allow me to qu...

Data Engineering

2367 Views
0 replies
1 kudos

12-12-2022 3:18:35 PM

by lawrence009 • Contributor

12-08-2022 8:39:44 PM

4593 Views
4 replies
4 kudos

Cannot CREATE TABLE with 'No Isolation Shared' cluster

Recently I ran into a number issues running with our notebooks in Interactive Mode. For example, we can't create (delta) table. The command would run and then idle for no apparent exception. The path is created on AWS S3 but delta log is never create...

Data Engineering

4593 Views
4 replies
4 kudos

12-08-2022 8:39:44 PM

View Replies

Latest Reply

youssefmrini
Databricks Employee

12-09-2022 7:54:31 AM

4 kudos

The Admin can disable the possibility to use the no Isolate Shared cluster. I recommend you to switch to Single user where UC is activated. Don't worry you won't need to change your code. If you encounter this kind of issues, make sure to open a tick...

4 kudos

12-09-2022 7:54:31 AM

3 More Replies

by Hunter • New Contributor III

09-15-2021 2:31:58 PM

24393 Views
7 replies
6 kudos

Resolved! How to programmatically download png files from matplotlib plots in notebook?

I am creating plots in databricks using python and matplotlib. These look great in notebook and I can save them to the dbfs usingplt.savefig("/dbfs/FileStore/tables/[plot_name].png")I can then download the png files to my computer individually by pas...

Data Engineering

24393 Views
7 replies
6 kudos

09-15-2021 2:31:58 PM

View Replies

Latest Reply

Hunter
New Contributor III

09-20-2021 8:19:02 AM

6 kudos

Thanks everyone! I am already at a place where I can download a png to FileStore and use a url to download that file locally. What I was wondering was if there is some databricks function I can use to launch the url that references the png file and d...

6 kudos

09-20-2021 8:19:02 AM

6 More Replies

by successhawk • New Contributor II

12-11-2022 8:31:06 PM

3509 Views
3 replies
2 kudos

Resolved! Is there a way to tell if a created job is not compliant against configured cluster policies before it runs?

As a DevOps engineer, I want to enforce cluster policies at deployment time when the job is deployed/created, well before it is time to actually use it (i.e. before its scheduled/triggered run time without actually running it).

Data Engineering

3509 Views
3 replies
2 kudos

12-11-2022 8:31:06 PM

View Replies

Latest Reply

irfanaziz
Contributor II

12-12-2022 12:56:16 AM

2 kudos

Is it not the linked service that defines the kind of cluster created or used for any job?So i believe you could control the configuration via the linked service settings.

2 kudos

12-12-2022 12:56:16 AM

2 More Replies

by labtech • Valued Contributor II

12-08-2022 7:11:02 PM

2846 Views
3 replies
20 kudos

Resolved! Create Databricks Workspace with different email address on Azure

Hi team,I wonder if we can create a Databricks Workspace that not releated with Azure email address.Thanks

Data Engineering

2846 Views
3 replies
20 kudos

12-08-2022 7:11:02 PM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-10-2022 7:16:23 AM

20 kudos

yes , i have done this multiple time

20 kudos

12-10-2022 7:16:23 AM

2 More Replies

by labtech • Valued Contributor II

12-11-2022 10:18:45 PM

2727 Views
3 replies
14 kudos

Get a new badge or new certified for version 3 of DE exam

I took a certified of DE exam (version 2). Do I receive a new badge or certified when I pass newest version of DE exam?I'm going to take that and review my knowlege.

Data Engineering

2727 Views
3 replies
14 kudos

12-11-2022 10:18:45 PM

View Replies

Latest Reply

Ajay-Pandey
Databricks MVP

12-11-2022 11:29:00 PM

14 kudos

Hi @Gam Nguyen I think there is no new badge for this one

14 kudos

12-11-2022 11:29:00 PM

2 More Replies

by cmilligan • Contributor II

12-12-2022 5:54:27 AM

1442 Views
0 replies
1 kudos

Fail a multi-task job successfully

I have a multi-task job that runs everyday where the first notebook in the job checks if the run should be continued based on the date that the job is run. The majority of the time the answer to that is no and I'm raising an exception for the job to ...

Data Engineering

1442 Views
0 replies
1 kudos

12-12-2022 5:54:27 AM

by Harun • Honored Contributor

12-12-2022 3:37:53 AM

2294 Views
1 replies
1 kudos

Hi Community members and Databricks Officials, Now a days i am seeing lot of spam post in our groups and discussions. Forum admins and databricks offi...

Hi Community members and Databricks Officials,Now a days i am seeing lot of spam post in our groups and discussions. Forum admins and databricks officials please take action on the users who are spamming the timeline with some promotional contents.As...

Data Engineering

2294 Views
1 replies
1 kudos

12-12-2022 3:37:53 AM

View Replies

Latest Reply

Ajay-Pandey
Databricks MVP

12-12-2022 4:44:00 AM

1 kudos

Yes @Databricks Forum Admin please take an action on this

1 kudos

12-12-2022 4:44:00 AM

by DB_developer • New Contributor III

12-08-2022 5:08:20 AM

10815 Views
2 replies
3 kudos

How to optimize storage for sparse data in data lake?

I have lot of tables with 80% of columns being filled with nulls. I understand SQL sever provides a way to handle these kind of data during the data definition of the tables (with Sparse keyword). Do datalake provide similar kind of thing?

Data Engineering

10815 Views
2 replies
3 kudos

12-08-2022 5:08:20 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

12-08-2022 6:17:23 AM

3 kudos

datalake itself not, but the file format you use to store data does.f.e. parquet uses column compression, so sparse data will compress pretty good.csv on the other hand: total disaster

3 kudos

12-08-2022 6:17:23 AM

1 More Replies

Databricks Community

Forum Posts

Spark always performing broad casts irrespective of spark.sql.autoBroadcastJoinThreshold during streaming merge operation with DeltaTable.

Is it possible to create a sqlite database and export it?

I have attended Data + AI World Tour Asia Pacific this week but did not received post confirmation email.

How to extract JSON object from a pyspark data frame. I was able to extract data from another column which in array format using "Explode" function, but Explode is not working for Object type. Its returning with type mismatch error.

pyspark code error

How do I "Earn 100 points to the Databricks Community Rewards Store" ? (As advertised on Databricks Academy)

Natively Query Delta Lake with R

Cannot CREATE TABLE with 'No Isolation Shared' cluster

Resolved! How to programmatically download png files from matplotlib plots in notebook?

Resolved! Is there a way to tell if a created job is not compliant against configured cluster policies before it runs?

Resolved! Create Databricks Workspace with different email address on Azure

Get a new badge or new certified for version 3 of DE exam

Fail a multi-task job successfully

Hi Community members and Databricks Officials, Now a days i am seeing lot of spam post in our groups and discussions. Forum admins and databricks offi...

How to optimize storage for sparse data in data lake?

Databricks to Salesforce Core (Not cloud)

Databricks optimization for query perfomance and p...

Parametrize the DLT pipeline for dynamic loading o...

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...