cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

BasavarajAngadi
by Contributor
  • 5136 Views
  • 6 replies
  • 6 kudos

Resolved! Hi Experts I want to know the difference between connecting any BI Tool to Spark SQL and Databricks SQL end point?

Its all about spinning the spark cluster and both spark Sql api and databricks does the same operation what difference does it make to BI tools ?

  • 5136 Views
  • 6 replies
  • 6 kudos
Latest Reply
Anonymous
Not applicable
  • 6 kudos

Thanks @Bilal Aslam​ and @Aman Sehgal​ for jumping in! @Basavaraj Angadi​ â€‹ I want to make sure you got your question(s) answered! Will you let us know? Don't forget, you can select any reply as the "best answer" !

  • 6 kudos
5 More Replies
hare
by New Contributor III
  • 2771 Views
  • 4 replies
  • 8 kudos

Azure DBR - Have to load list of json files but the column has special character.(ex: {"hydra:xxxx": {"hydra:value":"yyyy", "hydra:value1":"zzzzz"}

Azure DBR - Have to load list of json files into data frame and then from DF to data bricks table but the column has special character and getting below error.Both column(key) and value (as json record) has special characters in the json file. # Can...

  • 2771 Views
  • 4 replies
  • 8 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 8 kudos

The best is just define schema manually. There is nice article from person who had exactly the same problem https://towardsdev.com/create-a-spark-hive-meta-store-table-using-nested-json-with-invalid-field-names-505f215eb5bf

  • 8 kudos
3 More Replies
alejandrofm
by Valued Contributor
  • 1190 Views
  • 1 replies
  • 2 kudos

Resolved! Feature request for spark performance tuning

Hi, I don't think there's a place to see this, please correct me if I'm wrong.Now to see performance tuning tips I have to go to spark UI, then to SQL view and on top I could see performance alerts that help me know If I need apply a spark config, co...

  • 1190 Views
  • 1 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

I think that can be requested at ideas.databricks.com

  • 2 kudos
LukaszJ
by Contributor III
  • 10686 Views
  • 4 replies
  • 0 kudos

Resolved! Send UPDATE from Databricks to Azure SQL DataBase

Hello.I want to know how to do an UPDATE on Azure SQL DataBase from Azure Databricks using PySpark.I know how to make query as SELECT and turn it into DataFrame, but how to send back some data (as UPDATE on rows)?I want to use build in pyspark istead...

  • 10686 Views
  • 4 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

This is discussed on Stack Overflow. As you see for Azure Synapse there is a way, but for a plain SQL database you will have to use some kind of driver like odbc/jdbc.

  • 0 kudos
3 More Replies
Atacama
by New Contributor II
  • 2536 Views
  • 3 replies
  • 1 kudos
  • 2536 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

the spilled data is written to some object store on the cloud provider.I believe all of them apply encryption by default.Of course it is up to you (or your colleagues) to restrict access to the storage.​

  • 1 kudos
2 More Replies
KKo
by Contributor III
  • 5263 Views
  • 3 replies
  • 4 kudos

Resolved! Reading multiple parquet files from same _delta_log under a path

I have a path where there is _delta_log and 3 snappy.parquet files. I am trying to read all those .parquet using spark.read.format('delta').load(path) but I am getting data from only one same file all the time. Can't I read from all these files? If s...

  • 5263 Views
  • 3 replies
  • 4 kudos
Latest Reply
KKo
Contributor III
  • 4 kudos

@Werner Stinckens​ Thanks for the reply and explanation, that was helpful to understand the delta feature.

  • 4 kudos
2 More Replies
SailajaB
by Valued Contributor III
  • 4234 Views
  • 5 replies
  • 4 kudos

Resolved! when and otherwise issue

Hi,Here in our scenario we are reading json files as input and it contains nested structure. Few of the attributes are array type struct. Where we need to change name of nested ones. So we created a new structure and doing cast.We are facing below pr...

  • 4234 Views
  • 5 replies
  • 4 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 4 kudos

Can you provide the structure that you're using?Also, a more elaborate sample input and output.

  • 4 kudos
4 More Replies
SailajaB
by Valued Contributor III
  • 18607 Views
  • 4 replies
  • 4 kudos

Unable to mount the blob storage account as soft delete got enabled

Hi Team,when we try to mount or access the blob storage where soft delete enabled. But it is getting failed with below errororg.apache.hadoop.fs.FileAlreadyExistsException: Operation failed: "This endpoint does not support BlobStorageEvents or So...

  • 18607 Views
  • 4 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

Jeez, I was planning on enabling soft delete on our adls gen2, but I think I will wait a while after reading this.

  • 4 kudos
3 More Replies
JoeWMP
by New Contributor III
  • 3600 Views
  • 5 replies
  • 1 kudos

Resolved! Databricks Job ID's increasing in massive sequence gaps

Has anyone seen something like this before? Today around midnight, our Job ID's started increasing in increments of quadrillions - was this a new change to how Job ID's are generated?

  • 3600 Views
  • 5 replies
  • 1 kudos
Latest Reply
JoeWMP
New Contributor III
  • 1 kudos

Thank you Ravi! Glad that this confirms my understanding

  • 1 kudos
4 More Replies
Edmondo
by New Contributor III
  • 6190 Views
  • 7 replies
  • 3 kudos

Resolved! Limiting parallelism when external APIs are invoked (i.e. mlflow)

We are applying a groupby operation to a pyspark.sql.Dataframe and then on each group train a single model for mlflow. We see intermittent failures because the MLFlow server replies with a 429, because of too many requests/sWhat are the best practice...

  • 6190 Views
  • 7 replies
  • 3 kudos
Latest Reply
Edmondo
New Contributor III
  • 3 kudos

To me it's already resolved through professional services. The question I do have is how useful is this community if people with the right background aren't here, and if it takes a month to get a no-answer.

  • 3 kudos
6 More Replies
thushar
by Contributor
  • 4524 Views
  • 5 replies
  • 3 kudos

Resolved! dataframe.rdd.isEmpty() is throwing error in 9.1 LTS

Loaded a csv file with five columns into a dataframe, and then added around 15+ columns using dataframe.withColumn method.After adding these many columns, when I run the query df.rdd.isEmpty() - which throws the below error. org.apache.spark.SparkExc...

  • 4524 Views
  • 5 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@Thushar R​ - Thank you for your patience. We are looking for the best person to help you.

  • 3 kudos
4 More Replies
hari
by Contributor
  • 2472 Views
  • 3 replies
  • 3 kudos

Resolved! Multi-cluster write for delta tables with s3 as the datastore

Does Delta currently support multi-cluster writes to delta table in s3?I see in the data bricks documentation that data bricks doesn't support writing to the same table from multiple spark drivers and thus multiple clusters.But s3Guard was also added...

  • 2472 Views
  • 3 replies
  • 3 kudos
Latest Reply
nastasiya09
New Contributor II
  • 3 kudos

that's really good post for memobdroverizon wifi

  • 3 kudos
2 More Replies
tonykun
by New Contributor
  • 3715 Views
  • 0 replies
  • 0 kudos

A dumb general question - why databricks no support java REPL?

I'm a new student to programming world, have strong interest in data engineering and databricks technology. I've tried this product, the UI, notebook, dbfs are very user-friendly and powerful.Recently, a doubt came to my mind why databricks doesn't s...

  • 3715 Views
  • 0 replies
  • 0 kudos
GMO
by New Contributor III
  • 2703 Views
  • 4 replies
  • 1 kudos

Resolved! Trigger.AvailableOnce in Pyspark?

There’s a new Trigger.AvailableOnce option in runtime 10.1 that we need to process a large folder bit by bit using Autoloader. But I don’t see how to engage this from pyspark.  Is this accessible from scala only or is it available in pyspark? Thanks...

  • 2703 Views
  • 4 replies
  • 1 kudos
Latest Reply
pottsork
New Contributor II
  • 1 kudos

Any update on this issue? I can see that one can use .trigger(availableNow=True) i DBR 10.3 (On Azure Databricks).... Unfortunately I can't get it to work with Autoloader. Is this supported? Additionally, can't find any answers when skimming through ...

  • 1 kudos
3 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels