cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

gauthamchettiar
by New Contributor II
  • 2802 Views
  • 0 replies
  • 1 kudos

Spark always performing broad casts irrespective of spark.sql.autoBroadcastJoinThreshold during streaming merge operation with DeltaTable.

I am trying to do a streaming merge between delta tables using this guide - https://docs.delta.io/latest/delta-update.html#upsert-from-streaming-queries-using-foreachbatchOur Code Sample (Java): Dataset<Row> sourceDf = sparkSession ...

BroadCastJoin 1M
  • 2802 Views
  • 0 replies
  • 1 kudos
same213
by New Contributor III
  • 7230 Views
  • 4 replies
  • 8 kudos

Is it possible to create a sqlite database and export it?

I am trying to create a sqlite database in databricks and add a few tables to it. Ultimately, I want to export this using Azure. Is this possible?

  • 7230 Views
  • 4 replies
  • 8 kudos
Latest Reply
same213
New Contributor III
  • 8 kudos

@Hubert Dudek​  We currently have a process in place that reads in a SQLite file. We recently transitioned to using Databricks. We were hoping to be able to create a SQLite file so we didn't have to alter the current process we have in place.

  • 8 kudos
3 More Replies
URJ24
by Databricks Partner
  • 4487 Views
  • 3 replies
  • 1 kudos

I have attended Data + AI World Tour Asia Pacific this week but did not received post confirmation email.

I have attended Data + AI World Tour Asia Pacific this week but did not received post confirmation email. After webinar I received short survey and then thank you note for participation. But unexpectedly did not received any email with feedback link ...

  • 4487 Views
  • 3 replies
  • 1 kudos
Latest Reply
URJ24
Databricks Partner
  • 1 kudos

Emailing apacevents@databricks.com helped.

  • 1 kudos
2 More Replies
antonyj453
by New Contributor II
  • 4845 Views
  • 1 replies
  • 3 kudos

How to extract JSON object from a pyspark data frame. I was able to extract data from another column which in array format using "Explode" function, but Explode is not working for Object type. Its returning with type mismatch error.

I have tried below code to extract data which in Array:df2 = df_deidentifieddocuments_tst.select(F.explode('annotationId').alias('annotationId')).select('annotationId.$oid')It was working fine.. but,its not working for JSON object type. Below is colu...

CreateaAT
  • 4845 Views
  • 1 replies
  • 3 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 3 kudos

Did you try extracting that column data using from_json function ?

  • 3 kudos
gpzz
by New Contributor III
  • 3568 Views
  • 1 replies
  • 3 kudos

pyspark code error

rdd4 = rdd3.reducByKey(lambda x,y: x+y)AttributeError: 'PipelinedRDD' object has no attribute 'reducByKey'Pls help me out with this

  • 3568 Views
  • 1 replies
  • 3 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 3 kudos

Is it a typo or are you really using reducByKey instead of reduceByKey ?

  • 3 kudos
Axserv
by New Contributor II
  • 4588 Views
  • 4 replies
  • 1 kudos

How do I "Earn 100 points to the Databricks Community Rewards Store" ? (As advertised on Databricks Academy)

Hello, how do I join the Databricks Community study group for 100points, as advertised on the Databricks Academy website?

image
  • 4588 Views
  • 4 replies
  • 1 kudos
Latest Reply
Harun
Honored Contributor
  • 1 kudos

@Alex Serlovsky​ You need to earn the lakehouse fundamental credetial certification, then you can join this community group. Within 24 to 48 hours you will get 100 reward points. But As per databricks, you need to earn the credential on or before Nov...

  • 1 kudos
3 More Replies
Dave_Nithio
by Contributor II
  • 2367 Views
  • 0 replies
  • 1 kudos

Natively Query Delta Lake with R

I have a large delta table that I need to analyze in native R. The only option I have currently is to query the delta table then use collect() to bring that spark dataframe into an R dataframe. Is there an alternative method that would allow me to qu...

  • 2367 Views
  • 0 replies
  • 1 kudos
lawrence009
by Contributor
  • 4593 Views
  • 4 replies
  • 4 kudos

Cannot CREATE TABLE with 'No Isolation Shared' cluster

Recently I ran into a number issues running with our notebooks in Interactive Mode. For example, we can't create (delta) table. The command would run and then idle for no apparent exception. The path is created on AWS S3 but delta log is never create...

  • 4593 Views
  • 4 replies
  • 4 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 4 kudos

The Admin can disable the possibility to use the no Isolate Shared cluster. I recommend you to switch to Single user where UC is activated. Don't worry you won't need to change your code. If you encounter this kind of issues, make sure to open a tick...

  • 4 kudos
3 More Replies
Hunter
by New Contributor III
  • 24393 Views
  • 7 replies
  • 6 kudos

Resolved! How to programmatically download png files from matplotlib plots in notebook?

I am creating plots in databricks using python and matplotlib. These look great in notebook and I can save them to the dbfs usingplt.savefig("/dbfs/FileStore/tables/[plot_name].png")I can then download the png files to my computer individually by pas...

  • 24393 Views
  • 7 replies
  • 6 kudos
Latest Reply
Hunter
New Contributor III
  • 6 kudos

Thanks everyone! I am already at a place where I can download a png to FileStore and use a url to download that file locally. What I was wondering was if there is some databricks function I can use to launch the url that references the png file and d...

  • 6 kudos
6 More Replies
successhawk
by New Contributor II
  • 3509 Views
  • 3 replies
  • 2 kudos

Resolved! Is there a way to tell if a created job is not compliant against configured cluster policies before it runs?

As a DevOps engineer, I want to enforce cluster policies at deployment time when the job is deployed/created, well before it is time to actually use it (i.e. before its scheduled/triggered run time without actually running it).

  • 3509 Views
  • 3 replies
  • 2 kudos
Latest Reply
irfanaziz
Contributor II
  • 2 kudos

Is it not the linked service that defines the kind of cluster created or used for any job?So i believe you could control the configuration via the linked service settings.

  • 2 kudos
2 More Replies
labtech
by Valued Contributor II
  • 2846 Views
  • 3 replies
  • 20 kudos

Resolved! Create Databricks Workspace with different email address on Azure

Hi team,I wonder if we can create a Databricks Workspace that not releated with Azure email address.Thanks

  • 2846 Views
  • 3 replies
  • 20 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 20 kudos

yes , i have done this multiple time

  • 20 kudos
2 More Replies
labtech
by Valued Contributor II
  • 2727 Views
  • 3 replies
  • 14 kudos

Get a new badge or new certified for version 3 of DE exam

I took a certified of DE exam (version 2). Do I receive a new badge or certified when I pass newest version of DE exam?I'm going to take that and review my knowlege.

  • 2727 Views
  • 3 replies
  • 14 kudos
Latest Reply
Ajay-Pandey
Databricks MVP
  • 14 kudos

Hi @Gam Nguyen​ I think there is no new badge for this one

  • 14 kudos
2 More Replies
cmilligan
by Contributor II
  • 1442 Views
  • 0 replies
  • 1 kudos

Fail a multi-task job successfully

I have a multi-task job that runs everyday where the first notebook in the job checks if the run should be continued based on the date that the job is run. The majority of the time the answer to that is no and I'm raising an exception for the job to ...

  • 1442 Views
  • 0 replies
  • 1 kudos
Harun
by Honored Contributor
  • 2294 Views
  • 1 replies
  • 1 kudos

Hi Community members and Databricks Officials, Now a days i am seeing lot of spam post in our groups and discussions. Forum admins and databricks offi...

Hi Community members and Databricks Officials,Now a days i am seeing lot of spam post in our groups and discussions. Forum admins and databricks officials please take action on the users who are spamming the timeline with some promotional contents.As...

  • 2294 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ajay-Pandey
Databricks MVP
  • 1 kudos

Yes @Databricks Forum Admin​ please take an action on this

  • 1 kudos
DB_developer
by New Contributor III
  • 10815 Views
  • 2 replies
  • 3 kudos

How to optimize storage for sparse data in data lake?

I have lot of tables with 80% of columns being filled with nulls. I understand SQL sever provides a way to handle these kind of data during the data definition of the tables (with Sparse keyword). Do datalake provide similar kind of thing?

  • 10815 Views
  • 2 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

datalake itself not, but the file format you use to store data does.f.e. parquet uses column compression, so sparse data will compress pretty good.csv on the other hand: total disaster

  • 3 kudos
1 More Replies
Labels