Data Engineering

Forum Posts

Sorted by:

by LearningDatabri • Contributor II

07-05-2022 6:36:26 AM

9361 Views
7 replies
2 kudos

Resolved! Unable to read file from S3

I tried to read a file from S3, but facing the below error:org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 53.0 failed 4 times, most recent failure: Lost task 0.3 in stage 53.0 (TID 82, xx.xx.xx.xx, executor 0): com...

Data Engineering

9361 Views
7 replies
2 kudos

07-05-2022 6:36:26 AM

View Replies

Latest Reply

Sivaprasad1
Databricks Employee

07-05-2022 12:33:36 PM

2 kudos

Which DBR version are you using? Could you please test it with a different DBR version probably DBR 9.x?

2 kudos

07-05-2022 12:33:36 PM

6 More Replies

by THIAM_HUATTAN • Valued Contributor

06-29-2022 1:57:34 AM

3532 Views
4 replies
6 kudos

Resolved! why this not able to go through?

https://textdoc.co/index.php/UFEQdwxWn60LtOVfError:https://textdoc.co/index.php/3JisnHKGkvLIaAOF

Data Engineering

3532 Views
4 replies
6 kudos

06-29-2022 1:57:34 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

06-29-2022 7:20:54 AM

6 kudos

It would be best if you used Databricks ML runtime (in cluster settings), not the standard one.

6 kudos

06-29-2022 7:20:54 AM

3 More Replies

by THIAM_HUATTAN • Valued Contributor

06-29-2022 5:42:51 AM

2620 Views
2 replies
0 kudos

Resolved! Save data from Spark DataFrames to TFRecords

https://docs.microsoft.com/en-us/azure/databricks/_static/notebooks/deep-learning/tfrecords-save-load.htmlI could not run the Cell # 2java.lang.ClassNotFoundException: --------------------------------------------------------------------------- Py4JJ...

Data Engineering

2620 Views
2 replies
0 kudos

06-29-2022 5:42:51 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

07-05-2022 10:47:39 AM

0 kudos

Hi @THIAM HUAT TAN,Which DBR version are you using? are you using the ML runtime?

0 kudos

07-05-2022 10:47:39 AM

1 More Replies

by User16826994223 • Databricks Employee

06-28-2021 4:31:13 AM

2999 Views
1 replies
1 kudos

Unity Catalog will allow you to bring your own HMS

Anyone know more about how the Unity Catalog will allow you to bring your own HMS (eg Glue)?Will this be treated as a separate 'catalog', which you can access but you can't use the other features of Unity Catalog on eg ABAC etcAny reading on this top...

Data Engineering

2999 Views
1 replies
1 kudos

06-28-2021 4:31:13 AM

View Replies

Latest Reply

zpappa
Databricks Employee

07-05-2022 5:27:05 PM

1 kudos

@Kunal Gaurav yes, it is treated as a synthetic catalog. You can query it by using the convention "hive_metastore" as the catalog name. i.e. SELECT * FROM hive_metastore.schema_name.table_nameThis will work for internal HMS, external HMS and Glue.Yo...

1 kudos

07-05-2022 5:27:05 PM

by PP1 • New Contributor II

06-29-2022 8:14:39 AM

3781 Views
2 replies
2 kudos

Resolved! How would you integrate Unity Catalog with existing Data governance solutions like Collibra?

Data Engineering

3781 Views
2 replies
2 kudos

06-29-2022 8:14:39 AM

View Replies

Latest Reply

zpappa
Databricks Employee

07-05-2022 5:23:20 PM

2 kudos

@Prashanth P We offer a fully featured REST API with Unity Catalog that provides the ability to CRUD objects such as catalogs/schemas/tables/acls/lineage etc.Companies like Colliba/Alation/MS Purview etc use these in middleware integrations to integ...

2 kudos

07-05-2022 5:23:20 PM

1 More Replies

by bsun • New Contributor

06-28-2022 3:44:57 PM

2312 Views
2 replies
1 kudos

Dbt vs databricks sql

Can I use db sql to replace dbt?

Data Engineering

2312 Views
2 replies
1 kudos

06-28-2022 3:44:57 PM

View Replies

Latest Reply

anthony_tellez
New Contributor II

06-28-2022 4:15:35 PM

1 kudos

Yes. Just use sql in the notebooks to orchestrate your sql job.

1 kudos

06-28-2022 4:15:35 PM

1 More Replies

by OliverLewis • New Contributor

06-29-2022 9:51:16 AM

3183 Views
2 replies
1 kudos

Parallelize spark jobs on the same cluster?

Whats the best way to parallelize multiple spark jobs on the same cluster during a backfill?

Data Engineering

3183 Views
2 replies
1 kudos

06-29-2022 9:51:16 AM

View Replies

Latest Reply

ron_defreitas
Contributor

06-29-2022 11:45:45 AM

1 kudos

In the past I used direct multi-threaded orchestration inside of driver applications, but that was prior to Databricks supporting multi-task jobs.If you create a job through the workflows tab, you can set up multiple notebooks, python, or jar tasks t...

1 kudos

06-29-2022 11:45:45 AM

1 More Replies

by ofirski • New Contributor

06-29-2022 9:56:37 AM

3345 Views
3 replies
3 kudos

Resolved! Databricks Workflows

How do I get started with Databricks Workflows?

Data Engineering

3345 Views
3 replies
3 kudos

06-29-2022 9:56:37 AM

View Replies

Latest Reply

Mohit_m
Databricks Employee

07-05-2022 3:30:49 AM

3 kudos

One more example herehttps://www.youtube.com/watch?v=H2FS4ijpFZA

3 kudos

07-05-2022 3:30:49 AM

2 More Replies

by mortenhaga • Contributor

06-29-2022 1:35:01 AM

5229 Views
4 replies
4 kudos

Resolved! SQL Serverless Endpoint failing to start with Instance Profile

Hi allSuper stoked about the PP of SQL Serverless, but it does seem that the instance profile Im using doesnt have the required trust relationship for it to work with the Sererless Endpoint. Altough on "classic" mode it works fine. Does Serverless re...

Data Engineering

5229 Views
4 replies
4 kudos

06-29-2022 1:35:01 AM

View Replies

Latest Reply

Anonymous
Not applicable

07-05-2022 4:23:16 AM

4 kudos

Thank you for sharing your valuable solution, it's work properly.mymilestonecard

4 kudos

07-05-2022 4:23:16 AM

3 More Replies

by Prabakar • Databricks Employee

10-25-2021 9:03:24 AM

9494 Views
2 replies
7 kudos

Resolved! Library installation fails with mirror sync issue

While trying to install ffmpeg package using an init script on Databricks cluster, it fails with the below error.Init script:#! /bin/bash set -e sudo apt-get update sudo apt-get -y install ffmpegError message:E: Failed to fetch http://security.ubuntu...

Data Engineering

9494 Views
2 replies
7 kudos

10-25-2021 9:03:24 AM

View Replies

Latest Reply

Prabakar
Databricks Employee

10-25-2021 9:08:38 AM

7 kudos

Cause: The VMs are pointing to the cached old mirror which is not up-to-date. Hence there is a problem with downloading the package and it's failing. Workaround: Use the below init script to install the package "ffmpeg". To revert to the original lis...

7 kudos

10-25-2021 9:08:38 AM

1 More Replies

by Sunny • New Contributor III

06-08-2022 10:55:55 AM

9565 Views
7 replies
4 kudos

Resolved! Retrieve job id and run id from scala

I need to retrieve job id and run id of the job from a jar file in Scala.When I try to compile below code in IntelliJ, below error is shown.import com.databricks.dbutils_v1.DBUtilsHolder.dbutils object MainSNL { @throws(classOf[Exception]) de...

Data Engineering

9565 Views
7 replies
4 kudos

06-08-2022 10:55:55 AM

View Replies

Latest Reply

Mohit_m
Databricks Employee

07-05-2022 3:08:22 AM

4 kudos

Maybe its worth going through the Task Parameter variables section of the below dochttps://docs.databricks.com/data-engineering/jobs/jobs.html#task-parameter-variables

4 kudos

07-05-2022 3:08:22 AM

6 More Replies

by Mohit_m • Databricks Employee

07-05-2022 3:03:27 AM

5922 Views
1 replies
2 kudos

Resolved! Databricks jobs create API throws unexpected error

Databricks jobs create API throws unexpected errorError response :{"error_code": "INVALID_PARAMETER_VALUE","message": "Cluster validation error: Missing required field: settings.cluster_spec.new_cluster.size"}Any idea on this?

Data Engineering

5922 Views
1 replies
2 kudos

07-05-2022 3:03:27 AM

View Replies

Latest Reply

Mohit_m
Databricks Employee

07-05-2022 3:04:20 AM

2 kudos

Could you please specify num_workers in the json body and try API again.Also, another recommendation can be configuring what you want in UI, and then pressing “JSON” button that should show corresponding JSON which you can use for API

2 kudos

07-05-2022 3:04:20 AM

by lav • New Contributor III

07-04-2022 10:24:57 PM

1716 Views
1 replies
1 kudos

Correlated Column Exception in Spark SQL

Hi Johan,Were you able to resolve the correlated column exception issue? I have been stuck on this since past week. If you can guide me that will be alot of help.Thanks.

Data Engineering

1716 Views
1 replies
1 kudos

07-04-2022 10:24:57 PM

View Replies

Latest Reply

Johan_Van_Noten
New Contributor III

07-05-2022 12:24:11 AM

1 kudos

Seems to be a duplicate of your comment on https://community.databricks.com/s/question/0D53f00001XCuCACA1/correlated-column-exception-in-sql-udf-when-using-udf-parameters. I guess you did that to be able to put other tags?

1 kudos

07-05-2022 12:24:11 AM

by darshan • New Contributor III

06-27-2022 6:29:42 AM

26308 Views
13 replies
12 kudos

Resolved! Is there a way to run notebooks concurrently in same session?

tried using-dbutils.notebook.run(notebook.path, notebook.timeout, notebook.parameters)but it takes 20 seconds to start new session. %run uses same session but cannot figure out how to use it to run notebooks concurrently.

Data Engineering

26308 Views
13 replies
12 kudos

06-27-2022 6:29:42 AM

View Replies

Latest Reply

rudesingh56
New Contributor II

07-04-2022 6:49:31 AM

12 kudos

I’ve been struggling with opening multiple browser sessions to open more than one notebook at a time.

12 kudos

07-04-2022 6:49:31 AM

12 More Replies

by Hubert-Dudek • Esteemed Contributor III

07-03-2022 10:52:20 AM

1331 Views
0 replies
20 kudos

My favorite feature announced at Data+AI Summit is Connect from everywhere. I have been dreaming for a long time about sending SQL queries to SQL endp...

My favorite feature announced at Data+AI Summit is Connect from everywhere. I have been dreaming for a long time about sending SQL queries to SQL endpoint via API/SDK

Data Engineering

1331 Views
0 replies
20 kudos

07-03-2022 10:52:20 AM

Databricks Community

Forum Posts

Resolved! Unable to read file from S3

Resolved! why this not able to go through?

Resolved! Save data from Spark DataFrames to TFRecords

Unity Catalog will allow you to bring your own HMS

Resolved! How would you integrate Unity Catalog with existing Data governance solutions like Collibra?

Dbt vs databricks sql

Parallelize spark jobs on the same cluster?

Resolved! Databricks Workflows

Resolved! SQL Serverless Endpoint failing to start with Instance Profile

Resolved! Library installation fails with mirror sync issue

Resolved! Retrieve job id and run id from scala

Resolved! Databricks jobs create API throws unexpected error

Correlated Column Exception in Spark SQL

Resolved! Is there a way to run notebooks concurrently in same session?

My favorite feature announced at Data+AI Summit is Connect from everywhere. I have been dreaming for a long time about sending SQL queries to SQL endp...

Join Us as a Local Community Builder!

Databricks AutoLoader IncrementalListing mode chan...

Establishing a Connection between ADLS Gen2, Datab...

Simple append only in DLT

How to restrict the values permitted in a job or t...

how to avoid extra column after retry upon Unknown...