cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

DavideCagnoni
by Contributor
  • 5162 Views
  • 4 replies
  • 1 kudos

How to force pandas_on_spark plots to use all dataframe data?

When I load a table as a `pandas_on_spark` dataframe, and try to e.g. scatterplot two columns, what I obtain is a subset of the desired points. For example, if I try to plot two columns from a table with 1000000 rows, I only see some of the data - i...

  • 5162 Views
  • 4 replies
  • 1 kudos
Latest Reply
DavideCagnoni
Contributor
  • 1 kudos

@Kaniz Fatma​  The problem is not about performance or plotly. It is about the pandas_on_spark dataframe arbitrarily subsampling the input data when plotting, without notifying the user about it.While subsampling is comprehensible and maybe even nece...

  • 1 kudos
3 More Replies
SailajaB
by Valued Contributor III
  • 9007 Views
  • 12 replies
  • 4 kudos

Resolved! JSON validation is getting failed after writing Pyspark dataframe to json format

Hi We have to convert transformed dataframe to json format. So we used write and json format on top of final dataframe to convert it to json. But when we validating the output json its not in proper json format.Could you please provide your suggestio...

  • 9007 Views
  • 12 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

@Sailaja B​ - Does @Aman Sehgal​'s most recent answer help solve the problem? If it does, would you be happy to mark their answer as best?

  • 4 kudos
11 More Replies
aladda
by Databricks Employee
  • 3277 Views
  • 2 replies
  • 3 kudos
  • 3277 Views
  • 2 replies
  • 3 kudos
Latest Reply
User16255483290
Contributor
  • 3 kudos

@Anand Ladda​ @André Monteiro​ From comments in the code:Indicates whether the task should be run in a REPL. This value must be true to run on an existing cluster. Please ignore the 'run_as_repl' parameters it will be removed from public docs as it i...

  • 3 kudos
1 More Replies
al_joe
by Contributor
  • 4160 Views
  • 2 replies
  • 0 kudos

Where / how does DBFS store files?

I tried to use %fs head to print the contents of a CSV file used in a training%fs head "/mnt/path/file.csv"but got an error saying cannot head a directory!?Then I did %fs ls on the same CSV file and got a list of 4 files under a directory named as a ...

screenshot image
  • 4160 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16753725182
Databricks Employee
  • 0 kudos

Hi @Al Jo​ , are you still seeing the error while printing the contents of te CSV file?

  • 0 kudos
1 More Replies
digitalinstitut
by New Contributor
  • 605 Views
  • 0 replies
  • 0 kudos

www.amritsardigitalacademy.in

Amritsar Digital Academy is the best https://www.amritsardigitalacademy.in/ digital marketing institute In Punjab. if you want to do a digital marketing course. you can enroll now!

  • 605 Views
  • 0 replies
  • 0 kudos
Infosys_128139
by New Contributor III
  • 7045 Views
  • 8 replies
  • 5 kudos

Resolved! Unable to start SQL End point in DATABRICKS SQL

Hello All, I am trying to use Databricks SQL but somehow the SQL end point is not getting started. It is in starting state for long time and then session is getting expired. Please note , the default SQL End point also not getting started. I am using...

  • 7045 Views
  • 8 replies
  • 5 kudos
Latest Reply
BilalAslamDbrx
Databricks Employee
  • 5 kudos

@AMZ DUD​ did you get this working? With a quota of 500, 43 mins is a long time for a cluster to launch. Perhaps a something in the account isn’t set up correctly. Can you please email me your workspace ID please at bilal dot aslam at databricks dot ...

  • 5 kudos
7 More Replies
BasavarajAngadi
by Contributor
  • 5370 Views
  • 6 replies
  • 6 kudos

Resolved! Hi Experts I want to know the difference between connecting any BI Tool to Spark SQL and Databricks SQL end point?

Its all about spinning the spark cluster and both spark Sql api and databricks does the same operation what difference does it make to BI tools ?

  • 5370 Views
  • 6 replies
  • 6 kudos
Latest Reply
Anonymous
Not applicable
  • 6 kudos

Thanks @Bilal Aslam​ and @Aman Sehgal​ for jumping in! @Basavaraj Angadi​ â€‹ I want to make sure you got your question(s) answered! Will you let us know? Don't forget, you can select any reply as the "best answer" !

  • 6 kudos
5 More Replies
hare
by New Contributor III
  • 2929 Views
  • 4 replies
  • 8 kudos

Azure DBR - Have to load list of json files but the column has special character.(ex: {"hydra:xxxx": {"hydra:value":"yyyy", "hydra:value1":"zzzzz"}

Azure DBR - Have to load list of json files into data frame and then from DF to data bricks table but the column has special character and getting below error.Both column(key) and value (as json record) has special characters in the json file. # Can...

  • 2929 Views
  • 4 replies
  • 8 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 8 kudos

The best is just define schema manually. There is nice article from person who had exactly the same problem https://towardsdev.com/create-a-spark-hive-meta-store-table-using-nested-json-with-invalid-field-names-505f215eb5bf

  • 8 kudos
3 More Replies
alejandrofm
by Valued Contributor
  • 1282 Views
  • 1 replies
  • 2 kudos

Resolved! Feature request for spark performance tuning

Hi, I don't think there's a place to see this, please correct me if I'm wrong.Now to see performance tuning tips I have to go to spark UI, then to SQL view and on top I could see performance alerts that help me know If I need apply a spark config, co...

  • 1282 Views
  • 1 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

I think that can be requested at ideas.databricks.com

  • 2 kudos
LukaszJ
by Contributor III
  • 11654 Views
  • 4 replies
  • 0 kudos

Resolved! Send UPDATE from Databricks to Azure SQL DataBase

Hello.I want to know how to do an UPDATE on Azure SQL DataBase from Azure Databricks using PySpark.I know how to make query as SELECT and turn it into DataFrame, but how to send back some data (as UPDATE on rows)?I want to use build in pyspark istead...

  • 11654 Views
  • 4 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

This is discussed on Stack Overflow. As you see for Azure Synapse there is a way, but for a plain SQL database you will have to use some kind of driver like odbc/jdbc.

  • 0 kudos
3 More Replies
Atacama
by New Contributor II
  • 2680 Views
  • 3 replies
  • 1 kudos
  • 2680 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

the spilled data is written to some object store on the cloud provider.I believe all of them apply encryption by default.Of course it is up to you (or your colleagues) to restrict access to the storage.​

  • 1 kudos
2 More Replies
KKo
by Contributor III
  • 5545 Views
  • 3 replies
  • 4 kudos

Resolved! Reading multiple parquet files from same _delta_log under a path

I have a path where there is _delta_log and 3 snappy.parquet files. I am trying to read all those .parquet using spark.read.format('delta').load(path) but I am getting data from only one same file all the time. Can't I read from all these files? If s...

  • 5545 Views
  • 3 replies
  • 4 kudos
Latest Reply
KKo
Contributor III
  • 4 kudos

@Werner Stinckens​ Thanks for the reply and explanation, that was helpful to understand the delta feature.

  • 4 kudos
2 More Replies
SailajaB
by Valued Contributor III
  • 19459 Views
  • 4 replies
  • 4 kudos

Unable to mount the blob storage account as soft delete got enabled

Hi Team,when we try to mount or access the blob storage where soft delete enabled. But it is getting failed with below errororg.apache.hadoop.fs.FileAlreadyExistsException: Operation failed: "This endpoint does not support BlobStorageEvents or So...

  • 19459 Views
  • 4 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

Jeez, I was planning on enabling soft delete on our adls gen2, but I think I will wait a while after reading this.

  • 4 kudos
3 More Replies
JoeWMP
by New Contributor III
  • 3848 Views
  • 5 replies
  • 1 kudos

Resolved! Databricks Job ID's increasing in massive sequence gaps

Has anyone seen something like this before? Today around midnight, our Job ID's started increasing in increments of quadrillions - was this a new change to how Job ID's are generated?

  • 3848 Views
  • 5 replies
  • 1 kudos
Latest Reply
JoeWMP
New Contributor III
  • 1 kudos

Thank you Ravi! Glad that this confirms my understanding

  • 1 kudos
4 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels