cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ernijed
by New Contributor II
  • 9610 Views
  • 3 replies
  • 3 kudos

Resolved! Error in SQL statement: SparkFatalException. How to fix it?

When i try to execute sql query(2 joins) i get below message: com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: org.apache.spark.util.SparkFatalException at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$a...

  • 9610 Views
  • 3 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 3 kudos

@Erni Jed​ , I tested, and your query is ok. So it has to be some other issue. Maybe you could try it on a smaller data set. Please analyze/debug also using SPARK UI.

  • 3 kudos
2 More Replies
Surendra
by New Contributor III
  • 12794 Views
  • 3 replies
  • 6 kudos

Resolved! Databricks notebook is taking 2 hours to write to /dbfs/mnt (blob storage). Same job is taking 8 minutes to write to /dbfs/FileStore. I would like to understand why write performance is different in both cases.

Problem statement:Source file format : .tar.gzAvg size: 10 mbnumber of tar.gz files: 1000Each tar.gz file contails around 20000 csv files.Requirement : Untar the tar.gz file and write CSV files to blob storage / intermediate storage layer for further...

databricks_write_to_dbfsMount databricks_write_to_dbfsMount
  • 12794 Views
  • 3 replies
  • 6 kudos
Latest Reply
Surendra
New Contributor III
  • 6 kudos

@Hubert Dudek​  Thanks for your suggestions.After creating storage account in same region as databricks I can see that performance is as expected.Now it is clear that issue is with /mnt/ location is being in different region than databricks. I would ...

  • 6 kudos
2 More Replies
arkadiuszr
by New Contributor III
  • 4325 Views
  • 3 replies
  • 1 kudos

Resolved! Failure during cluster launch

Hi all,I am migrating to Databricks E2 from older one. I moved the cluster definitions from the old databricks instance as well as creating new ones. Databricks tries to start a cluster for an hour and then fails. This happens for modes: Single Node ...

  • 4325 Views
  • 3 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 1 kudos

Please check:CPU quotas, please request to increase them anyway https://go.aws/3EvY1fX and use pools to have better control as old instances can be there for a moment after termination,Check the network configuration. Maybe it is downloading somethin...

  • 1 kudos
2 More Replies
rohit2
by New Contributor
  • 1207 Views
  • 0 replies
  • 0 kudos

getting this issue how to resolve

Run result unavailable: job failed with error message Unexpected failure while waiting for the cluster (0425-153803-z370dv77) to be ready.Cause Unexpected state for cluster (job-1136322-run-1778866): Init scripts failed. instance_id: i-00d2e3661a2420...

  • 1207 Views
  • 0 replies
  • 0 kudos
Maverick1
by Valued Contributor II
  • 4837 Views
  • 3 replies
  • 6 kudos

How to deploy mlflow models to sagemaker endpoints where sagemaker refers the private docker registry?

Is it possible to deploy the mlflow model to a sagemaker endpoint where the image URL is not referring to an image in ECR but the image is actually present in a private docker registry?

  • 4837 Views
  • 3 replies
  • 6 kudos
Latest Reply
Atanu
Databricks Employee
  • 6 kudos

@Saurabh Verma​ , this to create the endpoint.also, check this out - https://github.com/mlflow/mlflow/blob/0fa849ad75e5733bf76cc14a4455657c5c32f107/mlflow/sagemaker/__init__.py#L361

  • 6 kudos
2 More Replies
IgnacioCastinei
by New Contributor III
  • 67289 Views
  • 9 replies
  • 5 kudos

Resolved! Download a dbfs:/FileStore File to my Local Machine?

Hi all, I am using saveAsTextFile() to store the results of a Spark job in the folder dbfs:/FileStore/my_result. I can access to the different "part-xxxxx" files using the web browser, but I would like to automate the process of downloading all fil...

  • 67289 Views
  • 9 replies
  • 5 kudos
Latest Reply
CraigJ
New Contributor II
  • 5 kudos

works well if the file is stored in FileStore. However if it is stored in the mnt folder, you will need something like this:https://community.cloud.databricks.com/dbfs/mnt/blob/<file_name>.csv?o=<your_number_here>Note that this will prompt you for yo...

  • 5 kudos
8 More Replies
dewan
by New Contributor
  • 1027 Views
  • 0 replies
  • 0 kudos

SIMEXBangladesh

SIMEX Bangladesh is one of the trusted construction company in Bangladesh, always striving to build a safe ecosystem in the construction industry.For more details: https://simex.com.bd/highway-construction-company-in-bangladesh/

  • 1027 Views
  • 0 replies
  • 0 kudos
gideonvos
by New Contributor
  • 1178 Views
  • 0 replies
  • 0 kudos

Databricks workspace API metadata

Hi, the API works great. However, when listing workspaces via API it would be great to also be able to get back extra metadata, for example, last modification date. Is this possible?

  • 1178 Views
  • 0 replies
  • 0 kudos
User16826992666
by Databricks Employee
  • 3253 Views
  • 3 replies
  • 2 kudos

Resolved! What is the best method for bringing an already trained model into MLflow?

I already have a trained and saved model that was created outside of MLflow. What is the best way to handle it if I want this model to be added to an MLflow experiment?

  • 3253 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Trevor Bishop​ Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

  • 2 kudos
2 More Replies
sgannavaram
by New Contributor III
  • 13969 Views
  • 6 replies
  • 4 kudos

Resolved! How to get the last time ( previous ) databricks job run time?

How to get the last databricks job run time? I have a requirement where i need to pass last job runtime as an argument in SQL and this SQL get the records from snowflake database based on this timestamp.  

  • 13969 Views
  • 6 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hey there @Srinivas Gannavaram​ Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members.Cheers!

  • 4 kudos
5 More Replies
athjain
by New Contributor III
  • 4661 Views
  • 5 replies
  • 9 kudos

Resolved! Control visibility of delta tables at sql endpoint

Hi Community,Let's take a scenario where the data from s3 is read to create delta table and then stored on dbfs, and then to query these delta table we used mysql endpoint from where all the delta tables are visible, but we need to control which all ...

  • 4661 Views
  • 5 replies
  • 9 kudos
Latest Reply
Anonymous
Not applicable
  • 9 kudos

Hey @Athlestan Jain​ Just checking in. Do you think you were able to find a solution to your problem from the above answers?  If yes, would you be happy to mark it as best so that other members can find the solution more quickly?Thank you!

  • 9 kudos
4 More Replies
Michael_Galli
by Databricks Partner
  • 5541 Views
  • 1 replies
  • 1 kudos

Resolved! Pipelines with alot of Spark Caching - best practices for cleanup?

We have the situation where many concurrent Azure Datafactory Notebooks are running in one single Databricks Interactive Cluster (Azure E8 Series Driver, 1-10 E4 Series Drivers autoscaling).Each notebook reads data, does a dataframe.cache(), just to ...

  • 5541 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 1 kudos

This cache is dynamically saved to disk if there is no place in memory. So I don't see it as an issue. However, the best practice is to use "unpersist()" method in your code after caching. As in the example below, my answer, the cache/persist method ...

  • 1 kudos
hiral_jasani
by Databricks Employee
  • 961 Views
  • 0 replies
  • 0 kudos

Hands-On Workshop: Simplify Data Integration for the Modern Data Stack  Do you have a lot of data that is stuck in your source systems? Data engineers...

Hands-On Workshop: Simplify Data Integration for the Modern Data Stack Do you have a lot of data that is stuck in your source systems? Data engineers too bottlenecked to build another ingest pipeline? Join us for a live, hands-on workshop on building...

Image
  • 961 Views
  • 0 replies
  • 0 kudos
Labels