Data Engineering

Forum Posts

Sorted by:

by mac08_flo • New Contributor

09-12-2024 9:37:08 AM

1685 Views
1 replies
1 kudos

Creation of logs in a file

Good afternoon.I am trying to add logs in the creation of my code. The issue is that I haven't yet found a way to write the logs to a separate file, rather than having them output to the terminal; I want them to be stored in a file (example.log).I ha...

Data Engineering

1685 Views
1 replies
1 kudos

09-12-2024 9:37:08 AM

View Replies

Latest Reply

filipniziol
Esteemed Contributor

09-12-2024 10:08:01 AM

1 kudos

Hi @mac08_flo ,Use logging library. You can configure to log to terminal, to files etc.https://www.highlight.io/blog/5-best-python-logging-libraries

1 kudos

09-12-2024 10:08:01 AM

by ak4 • New Contributor II

09-04-2024 1:02:53 PM

3607 Views
2 replies
0 kudos

Failed to read job commit marker error

Recently, we migrate from DBR 11.3 LTS ML to DBR 14.3 LTS ML. We are struggling on one data source where we consume parquet files. New data are appended every 30 minutes to that data source. The data are generated by Databricks notebook which runs on...

Data Engineering

3607 Views
2 replies
0 kudos

09-04-2024 1:02:53 PM

View Replies

Latest Reply

ak4
New Contributor II

09-12-2024 8:42:23 AM

0 kudos

Thanks @menotron from your reply!Interestingly, we have been using REFRESH TABLE command even before this issue and it worked well so far. However, now with new runtime, it doesn't work anymore. I should specify the code which we use. It actually fai...

0 kudos

09-12-2024 8:42:23 AM

1 More Replies

by standup1 • Contributor

09-11-2024 12:31:53 PM

2588 Views
7 replies
3 kudos

Delt Live Table Path/Directory help

Hello, I am working on a dlt pipeline and I've been facing an issue. I hope someone here can help me find a solution.My files are json in azure storage. These files are stored in dircctory like this ( blobName/FolderName/xx.csv).The folder name is li...

Data Engineering

2588 Views
7 replies
3 kudos

09-11-2024 12:31:53 PM

View Replies

Latest Reply

filipniziol
Esteemed Contributor

09-12-2024 8:26:29 AM

3 kudos

Hi @standup1 , I'm glad the example was helpful

3 kudos

09-12-2024 8:26:29 AM

6 More Replies

by JR61276126 • New Contributor II

09-10-2024 2:05:27 PM

1764 Views
5 replies
1 kudos

Data Engineering with Databricks 3.1.12 - Unable to run Classroom-Setup-01.2

Receiving the following error when attempting to run the classroom setup for lesson 1.2 of the Data Engineering with Databricks 3.1.12. This has been tested with multiple accounts, both admins and non-admins.Below is the error message I am receiving....

Data Engineering

1764 Views
5 replies
1 kudos

09-10-2024 2:05:27 PM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

09-10-2024 2:16:48 PM

1 kudos

Hi @JR61276126 ,Since your workspace is deployed in azure with vent injection I assume it might be a network/firewall related issue. Could you check your driver logs also?

1 kudos

09-10-2024 2:16:48 PM

4 More Replies

by ADB0513 • New Contributor III

08-26-2024 6:04:33 AM

4456 Views
1 replies
1 kudos

Databricks Asset Bundle "Credential was not sent or was of an unsupported type"

I am working on setting up an asset bundle and it is failing when I try to validate the bundle. I am getting an error saying "Credential was not sent or was of an unsupported type for this API."I have a profile created and am using an access token t...

Data Engineering

4456 Views
1 replies
1 kudos

08-26-2024 6:04:33 AM

View Replies

Latest Reply

mvmiller
New Contributor III

09-12-2024 7:51:35 AM

1 kudos

I am having a similar issue, when trying to deploy my asset bundle.I ran the following:databricks auth login --host <hostname>I then was authenticated just fine, without issue. I then pointed to the relevant directory containing the asset bundle and ...

1 kudos

09-12-2024 7:51:35 AM

by Mathias_Peters • Contributor II

05-26-2024 2:38:07 AM

2459 Views
2 replies
0 kudos

How to properly implement incremental batching from Kinesis Data Streams

Hi, I implemented a job that should incrementally read all the available data from a Kinesis Data Stream and terminate afterwards. I schedule the job daily. The data retention period of the data stream is 7 days, i.e., there should be enough time to ...

Data Engineering

2459 Views
2 replies
0 kudos

05-26-2024 2:38:07 AM

View Replies

Latest Reply

fixhour
New Contributor II

09-12-2024 7:41:52 AM

0 kudos

It seems like the issue might be caused by potential data loss in the Kinesis stream. Even though you're using checkpoints and specifying the "earliest" position, data can expire due to the 7-day retention period, especially if there's a delay in job...

0 kudos

09-12-2024 7:41:52 AM

1 More Replies

by MGeiss • New Contributor III

09-10-2024 8:44:33 AM

4163 Views
3 replies
1 kudos

Resolved! Suddenly Getting Timeout Errors Across All Environments while waiting for Python REPL to start.

Hey - we currently have 4 environments spread out across separate workspaces, and as of Monday we've began to have transient failures in our DLT pipeline runs with the following error:"java.util.concurrent.TimeoutException: Timed out after 60 seconds...

Data Engineering

4163 Views
3 replies
1 kudos

09-10-2024 8:44:33 AM

View Replies

Latest Reply

MGeiss
New Contributor III

09-12-2024 6:22:43 AM

1 kudos

For anyone else who may be experiencing this issue - it seems to have been related to serverless compute for notebooks/workflows, which we had enabled for the account, but WERE NOT using for our DLT pipelines. After noticing references to serverless ...

1 kudos

09-12-2024 6:22:43 AM

2 More Replies

by EDDatabricks • Contributor

09-12-2024 5:31:33 AM

3586 Views
0 replies
0 kudos

Schema Registry certificate auth with Unity Catalog volumes.

Greetings.We currently have a Spark structured streaming job (Scala) retrieving avro data from an Azure Eventhub with a confluent schema registry endpoint (using an Azure Api Management gateway with certificate authentication).Until now the .jks file...

Data Engineering

3586 Views
0 replies
0 kudos

09-12-2024 5:31:33 AM

by varshini_reddy • New Contributor III

09-10-2024 10:31:35 AM

4349 Views
14 replies
2 kudos

In databricks workflows, can we stop the loop run if one of the iteration fails?

Data Engineering

4349 Views
14 replies
2 kudos

09-10-2024 10:31:35 AM

View Replies

Latest Reply

filipniziol
Esteemed Contributor

09-12-2024 1:50:27 AM

2 kudos

Hi @varshini_reddy ,There is no option to stop all the other iterations when for each is running and one of the iterations failed.This is why the shared workaround, that will simply skip/fail all the next iterations without doing anything.You can fai...

2 kudos

09-12-2024 1:50:27 AM

13 More Replies

by pritam_epam • New Contributor III

09-11-2024 1:57:55 AM

2713 Views
9 replies
0 kudos

WHERE 1=0, Error message from Server

Hi ,I am getting this Error:WHERE 1=0, Error message from Server: Configuration db table is not available. I am using PySpark and JDBC connection. Please help on this.

Data Engineering

2713 Views
9 replies
0 kudos

09-11-2024 1:57:55 AM

View Replies

Latest Reply

pritam_epam
New Contributor III

09-12-2024 2:02:38 AM

0 kudos

@szymon_dybczak Can you help us on this? Or could you provide a complete structure/steps how to connect with databricks using PySpark and JDBC step by step . Like initiate spark session then JDBC connection url then sql read all these in details.Also...

0 kudos

09-12-2024 2:02:38 AM

8 More Replies

by TheManOfSteele • New Contributor III

08-09-2024 12:40:09 PM

1405 Views
1 replies
1 kudos

azure pipeline databricks bundle deploy duplicating jobs

I am deploying an asset bundle using an azure pipeline.I use # Databricks Bundle Validate- bash: | databricks bundle validate -t $(BUNDLE_TARGET) displayName: 'Validate Asset Bundle' # Databricks Bundle Deploy- bash: | databricks bundle deploy...

Data Engineering

1405 Views
1 replies
1 kudos

08-09-2024 12:40:09 PM

View Replies

Latest Reply

Ricklen
New Contributor III

09-12-2024 12:49:10 AM

1 kudos

Hey! Same problem over here, tried upgrading to the latest version of the Databricks CLI but to no avail.I did find the issue on Github: https://github.com/databricks/cli/issues/1650

1 kudos

09-12-2024 12:49:10 AM

by ahsan_aj • Contributor II

08-20-2024 5:53:33 AM

15800 Views
27 replies
20 kudos

Resolved! Databricks connect 14.3.2 SparkConnectGrpcException Not found any cached local relation withthe hash

Hi All,I am using Databricks Connect 14.3.2 with Databricks Runtime 14.3 LTS to execute the code below. The CSV file is only 7MB, the code runs without issues on Databricks Runtime 15+ clusters but consistently produces the error message shown below ...

Data Engineering

databricks-connect

spark-connect

15800 Views
27 replies
20 kudos

08-20-2024 5:53:33 AM

View Replies

Latest Reply

ahsan_aj
Contributor II

09-11-2024 9:01:12 AM

20 kudos

As a workaround, please try the following Spark configuration, which seems to have resolved the issue for me on both 14.3 LTS and 15.4 LTS.spark.conf.set("spark.sql.session.localRelationCacheThreshold", 64 * 1024 * 1024)

20 kudos

09-11-2024 9:01:12 AM

26 More Replies

by Angus-Dawson • New Contributor III

07-26-2024 3:44:51 PM

3995 Views
5 replies
3 kudos

Resolved! PARSE_EMPTY_STATEMENT error when trying to use spark.sql via Databricks Connect

I'm trying to use Databricks Connect to run queries on Delta Tables locally. However, SQL queries using spark.sql don't seem to work properly, even though spark.read.table works.>>> from databricks.connect import DatabricksSession>>> spark = Databric...

Data Engineering

3995 Views
5 replies
3 kudos

07-26-2024 3:44:51 PM

View Replies

Latest Reply

alex_khakhlyuk
Databricks Employee

09-02-2024 4:21:48 AM

3 kudos

Hi everyone! I am an engineer working on Databricks Connect. This error appears because of the incompatibility between the Serverless Compute and Databricks Connect versions. The current Serverless Compute release roughly corresponds to Databricks Ru...

3 kudos

09-02-2024 4:21:48 AM

4 More Replies

by lauracoursera • New Contributor II

09-11-2024 3:42:22 AM

2213 Views
4 replies
5 kudos

Create New Table, Infer Schema gives error: Invalid column type {colSchemaType}

I'm doing a the course 'Distributed Computing with Spark SQL' on Coursera, and need to create a table by uploading a csv file. That seems to work at first, but as soon as I check the box for 'Infer schema' for the preview table, I get the following m...

Data Engineering

2213 Views
4 replies
5 kudos

09-11-2024 3:42:22 AM

View Replies

Latest Reply

filipniziol
Esteemed Contributor

09-11-2024 12:13:43 PM

5 kudos

Hi @lauracoursera ,In Databricks Community Edition I am getting the same error as you:The Community Edition is very limited - the UI is not updated to the newest version, it has old runtimes, missing features etc.My recommendation is to register a fr...

5 kudos

09-11-2024 12:13:43 PM

3 More Replies

by alexandrexixe • New Contributor

09-11-2024 8:35:02 AM

926 Views
1 replies
0 kudos

Best approach for handling batch processess from cloud object storage.

I'm working on a Databricks implementation project where external Kafka processes write JSON files to S3. I need to ingest these files daily, or in some cases every four hours, but I don't need to perform stream processing.I'm considering two approac...

Data Engineering

926 Views
1 replies
0 kudos

09-11-2024 8:35:02 AM

View Replies

Latest Reply

filipniziol
Esteemed Contributor

09-11-2024 12:26:10 PM

0 kudos

Hi @alexandrexixe ,Are you building a production solution or you want to simply explore the data?For something long-term I would recommend autoloader option. Having external tables you do not get the benefits of working with Delta tables: the queries...

0 kudos

09-11-2024 12:26:10 PM

Databricks Community

Forum Posts

Creation of logs in a file

Failed to read job commit marker error

Delt Live Table Path/Directory help

Data Engineering with Databricks 3.1.12 - Unable to run Classroom-Setup-01.2

Databricks Asset Bundle "Credential was not sent or was of an unsupported type"

How to properly implement incremental batching from Kinesis Data Streams

Resolved! Suddenly Getting Timeout Errors Across All Environments while waiting for Python REPL to start.

Schema Registry certificate auth with Unity Catalog volumes.

In databricks workflows, can we stop the loop run if one of the iteration fails?

WHERE 1=0, Error message from Server

azure pipeline databricks bundle deploy duplicating jobs

Resolved! Databricks connect 14.3.2 SparkConnectGrpcException Not found any cached local relation withthe hash

Resolved! PARSE_EMPTY_STATEMENT error when trying to use spark.sql via Databricks Connect

Create New Table, Infer Schema gives error: Invalid column type {colSchemaType}

Best approach for handling batch processess from cloud object storage.

Join Us as a Local Community Builder!

Issue with Lakebridge transpile installation – SSL...

Spark JDBC Netsuite error - SQLSyntaxErrorExcepti...

Syncing lakebase table to delta table

Online Table Migration

How can I execute a Spark SQL query inside a Unity...