cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Ben_Spark
by New Contributor III
  • 6743 Views
  • 4 replies
  • 2 kudos

Resolved! Databricks Spark XML parser : support for namespace declared at the ancestor level.

I'm trying to use Spark-XML API and I'm facing issue with the XSD validation option.Actually when I parser an XML file using the "rowValidationXSDPath" option the parser can't recognize the Prefixes/Namespaces declared at the root level. For this to...

  • 6743 Views
  • 4 replies
  • 2 kudos
Latest Reply
Ben_Spark
New Contributor III
  • 2 kudos

Hi sorry for the late response got busy looking for a permanent solution to this problem .At the end we are giving up on the XSDpath parser. This option does not work when Prefixes namespaces are declared at the ancestor level .Thank you anyway for ...

  • 2 kudos
3 More Replies
CrisBerg_65149
by New Contributor III
  • 3927 Views
  • 4 replies
  • 6 kudos

Resolved! SELECT * FROM delta doesn't work on Spark 3.2

Using DBR 10 or later and I’m getting an error when running the following querySELECT * FROM delta.`s3://some_path`getting org.apache.spark.SparkException: Unable to fetch tables of db deltaFor 3.2.0+ they recommend reading like this:CREATE TEMPORAR...

  • 3927 Views
  • 4 replies
  • 6 kudos
Latest Reply
CrisBerg_65149
New Contributor III
  • 6 kudos

Got support from Databricks.Unfortunately, someone created a DB called delta, so the query was done against that DB instead. Issue was solved

  • 6 kudos
3 More Replies
Gerhard
by New Contributor III
  • 5080 Views
  • 9 replies
  • 5 kudos

Overall security/access rights concept needed (combine Table Access Control and Credential Passthrough), how to allow users the benefits of both worlds

What we have:Databricks Workspace Premium on AzureADLS Gen2 storage for raw data, processed data (tables) and files like CSV, models, etc.What we want to do:We have users that want to work on Databricks to create and work with Python algorithms. We d...

  • 5080 Views
  • 9 replies
  • 5 kudos
Latest Reply
Gerhard
New Contributor III
  • 5 kudos

Hey @Vartika Nain​ , we are still at the same situation as described above. The Hive Metastore is a weak point.I would love to have the functionality that a mount can be dedicated to a given cluster.Regards, Gerhard

  • 5 kudos
8 More Replies
Reza
by New Contributor III
  • 6609 Views
  • 4 replies
  • 4 kudos

Datepicker widget

There are textbox and dropdown list widgets in Databricks. Is there any datepicker widget? If not, is there any plan to add it?

  • 6609 Views
  • 4 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

@Reza Rajabi​ , No, it is not I think at some meeting, someone discussed it. We can ask about it during the following office hours https://databricks.com/p/webinar/databricks-office-hours?utm_source=databricks&utm_medium=email&utm_campaign=7013f0000...

  • 4 kudos
3 More Replies
laus
by New Contributor III
  • 20437 Views
  • 3 replies
  • 2 kudos

Resolved! get a "Py4JJavaError: An error occurred while calling o5082.csv." when trying to save to csv file.

Hi, I'm trying to save a dataframe to csv with the code below:output.coalesce(1).write.mode('overwrite').option('header', 'true').csv(tmp_file_path) But it get "Py4JJavaError: An error occurred while calling o5082.csv." error. Any idea how to solve...

Screenshot 2022-03-31 at 17.33.13
  • 20437 Views
  • 3 replies
  • 2 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 2 kudos

This widget could not be displayed.
Hi, I'm trying to save a dataframe to csv with the code below:output.coalesce(1).write.mode('overwrite').option('header', 'true').csv(tmp_file_path) But it get "Py4JJavaError: An error occurred while calling o5082.csv." error. Any idea how to solve...

This widget could not be displayed.
  • 2 kudos
This widget could not be displayed.
2 More Replies
Rahul_Samant
by Contributor
  • 10953 Views
  • 4 replies
  • 4 kudos

Resolved! Bucketing on Delta Tables

getting error as below while creating buckets on delta table.Error in SQL statement: AnalysisException: Delta bucketed tables are not supported.have fall back to parquet table due to this for some use cases. is their any alternative for this. i have...

  • 10953 Views
  • 4 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @Rahul Samant​  , we checked internally on this due to certain limitations bucketing is not supported on delta tables, the only alternative for bucketing is to leverage the z ordering, below is the link for reference https://docs.databricks.com/de...

  • 4 kudos
3 More Replies
Michael_Galli
by Contributor III
  • 4354 Views
  • 3 replies
  • 2 kudos

Resolved! Spark Streaming - only process new files in streaming path?

In our streaming jobs, we currently run streaming (cloudFiles format) on a directory with sales transactions coming every 5 minutes.In this directory, the transactions are ordered in the following format:<streaming-checkpoint-root>/<transaction_date>...

  • 4354 Views
  • 3 replies
  • 2 kudos
Latest Reply
Michael_Galli
Contributor III
  • 2 kudos

Update:Seems that maxFileAge was not a good idea. The following with the option "includeExistingFiles" = False solved my problem:streaming_df = ( spark.readStream.format("cloudFiles") .option("cloudFiles.format", extension) .option("...

  • 2 kudos
2 More Replies
LightUp
by New Contributor III
  • 6714 Views
  • 2 replies
  • 4 kudos

Converting SQL Code to SQL Databricks

I am new to Databricks. Please excuse my ignorance. My requirement is to convert the SQL query below into Databricks SQL. The query comes from EventLog table and the output of the query goes into EventSummaryThese queries can be found hereCREATE TABL...

image
  • 6714 Views
  • 2 replies
  • 4 kudos
Latest Reply
LightUp
New Contributor III
  • 4 kudos

Thank you @Joseph Kambourakis​  The part that is not clear to me from the how to rework the part circled in the image above. Even this part of the code does not work in databricks:DATEADD(month, DATEDIFF(month, 0, DATEADD(month , 1 , EventStartDateTi...

  • 4 kudos
1 More Replies
AvijitDey
by New Contributor III
  • 4599 Views
  • 3 replies
  • 4 kudos

Resolved! Azure Databrick SQL bulk insert to AZ SQL

Env: Azure Databrick :version : 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12)Work Type : 56 GB Memory 2-8 node ( standard D13_V2)No of rows : 2470350 and 115 Column Size : 2.2 GBTime taken approx. 9 min Python Code .What will be best approach for...

  • 4599 Views
  • 3 replies
  • 4 kudos
Latest Reply
AvijitDey
New Contributor III
  • 4 kudos

Any further suggestion

  • 4 kudos
2 More Replies
reedzhang
by New Contributor III
  • 3977 Views
  • 4 replies
  • 3 kudos

Resolved! uninstalled libraries continue to get installed on cluster startup

We have been trying to update some library versions by uninstalling the old versions and installing new ones. However, the old libraries continue to get installed on cluster startup despite not showing up in the "libraries" tab of the cluster page. W...

  • 3977 Views
  • 4 replies
  • 3 kudos
Latest Reply
reedzhang
New Contributor III
  • 3 kudos

The issue seemed to go away on its own. At some point the libraries page started showing what was getting installed to the cluster, and removing libraries from the page caused them to stop getting installed on cluster startup. I'm guessing there was ...

  • 3 kudos
3 More Replies
tomnguyen_195
by New Contributor III
  • 3398 Views
  • 4 replies
  • 7 kudos

Resolved! Increase input rate in Delta Live Tables

Hi,I need to ingest 60 millions json files from S3 and have create a Delta Live Tables to ingest these data to delta table with Auto Loader. However the input rate in my DLT is always around 8 records/second no matter how many worker I add to the DLT...

  • 3398 Views
  • 4 replies
  • 7 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 7 kudos

Please consider the following:consider having driver 2 times bigger than worker,check is S3 in the same region, is communicating via the private gateway (local IPs),enable S3 transfer acceleration,in ingestion please user autoloader as described here...

  • 7 kudos
3 More Replies
Bill
by New Contributor III
  • 2448 Views
  • 5 replies
  • 2 kudos

Resolved! How to access tables created in 2017

In 2017 while working on my Masters degree, I created some tables that I would like to access again. Back then I could just write SQL and find them but today that doesn't work. I suspect it has something to do with Delta Lake. What do I have to do to...

  • 2448 Views
  • 5 replies
  • 2 kudos
Latest Reply
Bill
New Contributor III
  • 2 kudos

That did it. Thanks

  • 2 kudos
4 More Replies
Anonymous
by Not applicable
  • 1247 Views
  • 1 replies
  • 1 kudos

Resolved! Unable to start cluster on E2 Workspace

Hello Community,I'm trying to create and start my first cluster on my E2 Databricks Workspace on AWS; however, the cluster is created but after STARTING the cluster immediately the cluster status goes to TERMINATING. Logs provided by Databricks show ...

  • 1247 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Update:It was an error on my side with the KMS key.

  • 1 kudos
Taha_Hussain
by Databricks Employee
  • 1184 Views
  • 1 replies
  • 6 kudos

Databricks Office Hours Our next Office Hours session is scheduled for May 18th from 8:00 am - 9:00am PT. Do you have questions about how to set up or...

Databricks Office HoursOur next Office Hours session is scheduled for May 18th from 8:00 am - 9:00am PT.Do you have questions about how to set up or use Databricks? Do you want to learn more about the best practices for deploying your use case or tip...

  • 1184 Views
  • 1 replies
  • 6 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 6 kudos

Just registered!

  • 6 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels