Data Engineering

Forum Posts

Sorted by:

by OLAPTrader • New Contributor III

02-04-2024 1:26:32 PM

3989 Views
3 replies
1 kudos

Resolved! autoloader stops working if I do not drop table each time

I first create a catalog and schema and ingest some data into it as follows:catalogName = 'neo'schemaName ='indicators'locationPath = 's3a://databricks-workspace-olaptrader-stack-1-bucket/unity-catalog/99999999xxx'sqlContext.sql(f"CREATE CATALOG IF N...

Data Engineering

3989 Views
3 replies
1 kudos

02-04-2024 1:26:32 PM

View Replies

Latest Reply

OLAPTrader
New Contributor III

02-06-2024 7:29:42 PM

1 kudos

My issue was due to the fact that I have over 300 columns and due to datatype mismatches, the rows were actually written to the table, but values were all null. That's why I didnt get any errors. I am doing manual datatype mapping now and I am able t...

1 kudos

02-06-2024 7:29:42 PM

2 More Replies

by Anonymous • Not applicable

06-17-2021 11:40:04 AM

3611 Views
2 replies
0 kudos

Resolved! Auto optimize config

Does auto-optimize work for existing tables only or will it work for both existing and new tables when we enable at the cluster config level?

Data Engineering

3611 Views
2 replies
0 kudos

06-17-2021 11:40:04 AM

View Replies

Latest Reply

Mooune_DBU
Databricks Employee

06-17-2021 1:03:37 PM

0 kudos

If you're referring to Delta Tables, Auto-Optimize will work for both.For new tables:CREATE TABLE student (id INT, name STRING, age INT) TBLPROPERTIES (delta.autoOptimize.optimizeWrite = true, delta.autoOptimize.autoCompact = true)For existing tables...

0 kudos

06-17-2021 1:03:37 PM

1 More Replies

by karthik-kobai • New Contributor II

02-06-2024 11:18:49 AM

2378 Views
0 replies
0 kudos

Databricks-jdbc and vulnerabilities CVE-2021-36090 CVE-2023-6378 CVE-2023-6481

The latest version of Databricks-jdbc available through Maven (2.6.36) now has these three vulnerabilities:https://www.cve.org/CVERecord?id=CVE-2021-36090https://www.cve.org/CVERecord?id=CVE-2023-6378https://www.cve.org/CVERecord?id=CVE-2023-6481All ...

Data Engineering

2378 Views
0 replies
0 kudos

02-06-2024 11:18:49 AM

by Christoph • Databricks Partner

02-05-2024 7:55:16 AM

2096 Views
3 replies
0 kudos

Internal Error when querying a doubleType column of a delta table using ">" "<" operators

Hi there,we are currently facing a pretty confusing issue:We have a delta table (~2TB) which has been working just fine over the last few years and months. For a few days or weeks now, querying the table on one of its columns, let´s call it double_co...

Data Engineering

2096 Views
3 replies
0 kudos

02-05-2024 7:55:16 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

02-06-2024 5:22:45 AM

0 kudos

it might be a bug which is already logged, or a new one. You can check the Spark Jira pages.

0 kudos

02-06-2024 5:22:45 AM

2 More Replies

by Mohamednazeer • New Contributor III

01-26-2024 6:34:38 AM

3792 Views
1 replies
0 kudos

Resolved! IllegalArgumentException: Mount failed due to invalid mount source

We are trying to create mount for containers from two different storage accounts. We are using Azure Storage Account and Azure Data bricks.We could able to create mount for containers from one storage account, but when we try to create the mount for ...

Data Engineering

3792 Views
1 replies
0 kudos

01-26-2024 6:34:38 AM

View Replies

Latest Reply

Mohamednazeer
New Contributor III

02-06-2024 3:41:05 AM

0 kudos

Hi community,The issue was becoz of cross vent access. The storage account and the databricks workspace both are in different vnet. Since that we had to create private end point to access the cross vnet resources. Once we crated the private endpoint ...

0 kudos

02-06-2024 3:41:05 AM

by rhevarr • New Contributor II

02-06-2024 2:19:39 AM

1918 Views
0 replies
0 kudos

Course: Apache Spark Programming with Databricks ID: E-P0W7ZV // Issue Classroom-Setup

Hello,I am trying to run the Classroom-Setup from the course files notebook (ASP 1.1 - Databricks Platform)(Course:Apache Spark™ Programming with DatabricksID: E-P0W7ZV)Instructions: "Setup:Run classroom setup to mount Databricks training datasets an...

Data Engineering

academy

Course

Databricks

spark

1918 Views
0 replies
0 kudos

02-06-2024 2:19:39 AM

by Hardy • New Contributor III

02-05-2024 2:07:38 AM

14977 Views
6 replies
3 kudos

upload files to dbfs:/volume using databricks cli

In our azure pipeline we are using databricks-cli command to upload jar files at dbfs:/FileStore location and that works perfectly fine. But when we try to use the same command to upload files at dbfs:/Volume/dev/default/files, it does not work and g...

Data Engineering

14977 Views
6 replies
3 kudos

02-05-2024 2:07:38 AM

View Replies

Latest Reply

saikumar246
Databricks Employee

02-05-2024 10:34:34 PM

3 kudos

@Hardy I think you are using the word volume in the path but it should be Volumes(plural), not Volume(singular). Do one thing, copy the volume path directly from the Workspace and try.

3 kudos

02-05-2024 10:34:34 PM

5 More Replies

by Volker • Databricks Partner

01-29-2024 3:24:05 AM

3253 Views
2 replies
2 kudos

Preferred compression format for ingesting large amounts of JSON files with Autoloader

Hello Databricks Community,in an IOT context we plan to ingest a large amount of JSON files (~2 Million per Day). The JSON files are in json lines format und need to be compressed on the IOT devices. We can provide suggestions for the type of compres...

Data Engineering

3253 Views
2 replies
2 kudos

01-29-2024 3:24:05 AM

View Replies

Latest Reply

Volker
Databricks Partner

02-06-2024 1:13:09 AM

2 kudos

Hi, sorry I guess my response wasn't sent. The source are JSON files that are uploaded to an S3 bucket. The sink will be a Delta Table and we are using autoloader.The question was about the compression format of the incoming JSON files, e.g. if it wo...

2 kudos

02-06-2024 1:13:09 AM

1 More Replies

by FerArribas • Contributor

02-02-2024 8:01:47 AM

1977 Views
1 replies
0 kudos

Custom JobGroup in Spark UI for cluster with multiple executions

Does anyone know what the first digits of the jobgroup that are shown in the spark ui mean when using all purpose clusters to launch multiple jobs?Right now the pattern is something like: [id_random]_job_[jod_id]_run-[run_id]_action_[action].

Data Engineering

1977 Views
1 replies
0 kudos

02-02-2024 8:01:47 AM

View Replies

Latest Reply

saikumar246
Databricks Employee

02-05-2024 9:46:02 PM

0 kudos

Hi @FerArribas The first digits of the jobgroup that are shown in the spark UI are execContextId and cmdId(Command_ID). You can think of the execContextId as some kind of “REPL ID” For example, if you take the below job group ID as an example, jobGr...

0 kudos

02-05-2024 9:46:02 PM

by luriveros • New Contributor

12-19-2023 2:01:51 PM

6145 Views
1 replies
0 kudos

implementing liquid clustering for DataFrames directly

Hi !! I have a question is it possible to implementing liquid clustering for DataFrames directly saved to delta files (df.write.format("delta").save("path")), The conventional approach involving table creation

Data Engineering

6145 Views
1 replies
0 kudos

12-19-2023 2:01:51 PM

View Replies

Latest Reply

brockb
Databricks Employee

02-05-2024 8:10:30 PM

0 kudos

Hi,Hopefully this question is related to testing and any production data would get persisted to a table but one example is:df = (spark.range(10).write.format("delta").mode("append").save("file:/tmp/data"))ALTER TABLE delta.`file:/tmp/data` CLUSTER BY...

0 kudos

02-05-2024 8:10:30 PM

by pshuk • New Contributor III

12-17-2023 2:24:19 PM

1907 Views
2 replies
0 kudos

file transfer through CLI to DBFS, working manually but not in python code...

Hi,I ran my code sucessfully in the past but suddenly it stopped working. I have a python code that transfer local files to DBFS location using CLI. When I run the command manually on the screen, it works but in the code, it gives me the error "retu...

Data Engineering

1907 Views
2 replies
0 kudos

12-17-2023 2:24:19 PM

View Replies

Latest Reply

feiyun0112
Honored Contributor

02-05-2024 7:56:21 PM

0 kudos

The 127 error code indicates “command not found”,Try using the full path of the databricks command

0 kudos

02-05-2024 7:56:21 PM

1 More Replies

by kazinahian • New Contributor III

01-30-2024 8:13:32 AM

4382 Views
2 replies
1 kudos

Resolved! How can I Learn Databricks Data Pipeline in Azure environment?

Hello Esteemed Community,I have a fundamental question to ask, and I approach it with a sense of humility. Your guidance in my learning journey would be greatly appreciated. I am eager to learn how to build a hands-on data pipeline within the Databri...

Data Engineering

4382 Views
2 replies
1 kudos

01-30-2024 8:13:32 AM

View Replies

Latest Reply

Palash01
Valued Contributor

02-03-2024 10:44:09 PM

1 kudos

Hey @kazinahian I completely understand your hesitation and appreciate your approach to seeking guidance! Embarking on a learning journey can be daunting, especially when financial considerations are involved. I'm happy to offer some advice on buildi...

1 kudos

02-03-2024 10:44:09 PM

1 More Replies

by Tripalink • New Contributor III

05-23-2023 10:11:21 AM

9188 Views
4 replies
4 kudos

Error. Another git operation is in progress.

I am getting an error every time I try to view another branch or create a branch. Sometimes this has happened in the past, but usually seems to fix itself after about 10-30 minutes. This error has been lasting for over 12 hours, so I am now concerned...

Data Engineering

9188 Views
4 replies
4 kudos

05-23-2023 10:11:21 AM

View Replies

Latest Reply

Hakuna_Madata
New Contributor II

02-05-2024 5:00:54 AM

4 kudos

I had the same problem and I could resolve it by creating the repo again with a trailing ".git" in the Git repository URL.For example, use thishttps://gitlab.mycompany.com/my-project/my-repo.gitnot this:https://gitlab.mycompany.com/my-project/my-repo...

4 kudos

02-05-2024 5:00:54 AM

3 More Replies

by Arnold_Souza • New Contributor III

06-21-2023 10:44:37 AM

5678 Views
3 replies
0 kudos

Unable to enable entitlements to account groups in a workspace

Currently, I am both an account administrator and also a workspace administrator in Databricks.When I try to enable the entitlements "Workspace access" and "Databricks SQL access" to account groups I am receiving the error "Failed to enable entitlem...

Data Engineering

5678 Views
3 replies
0 kudos

06-21-2023 10:44:37 AM

View Replies

Latest Reply

saikumar246
Databricks Employee

02-05-2024 5:00:17 AM

0 kudos

Hi @Arnold_Souza, The error "Failed to enable entitlement.: Group not found" that you're experiencing when trying to enable the entitlements “Workspace access” and “Databricks SQL access” for account groups is likely due to the fact that Identity Fed...

0 kudos

02-05-2024 5:00:17 AM

2 More Replies

by Martinitus • Databricks Partner

02-02-2024 4:19:35 AM

7686 Views
4 replies
0 kudos

CSV Reader reads quoted fields inconsistently in last column

I just opened another issue: https://issues.apache.org/jira/browse/SPARK-46959It corrupts data even when read with mode="FAILFAST", i consider it critical, because basic stuff like this should just work!

Data Engineering

7686 Views
4 replies
0 kudos

02-02-2024 4:19:35 AM

View Replies

Latest Reply

Martinitus
Databricks Partner

02-05-2024 3:56:17 AM

0 kudos

either: [ 'some text', 'some text"', 'some text"' ]alternatively: [ '"some text"', 'some text"', 'some text"' ]probably most sane behavior would be a parser error ( with mode="FAILFAST").just parsing garbage without warning the user is certainly not...

0 kudos

02-05-2024 3:56:17 AM

3 More Replies

Databricks Community

Forum Posts

Resolved! autoloader stops working if I do not drop table each time

Resolved! Auto optimize config

Databricks-jdbc and vulnerabilities CVE-2021-36090 CVE-2023-6378 CVE-2023-6481

Internal Error when querying a doubleType column of a delta table using ">" "<" operators

Resolved! IllegalArgumentException: Mount failed due to invalid mount source

Course: Apache Spark Programming with Databricks ID: E-P0W7ZV // Issue Classroom-Setup

upload files to dbfs:/volume using databricks cli

Preferred compression format for ingesting large amounts of JSON files with Autoloader

Custom JobGroup in Spark UI for cluster with multiple executions

implementing liquid clustering for DataFrames directly

file transfer through CLI to DBFS, working manually but not in python code...

Resolved! How can I Learn Databricks Data Pipeline in Azure environment?

Error. Another git operation is in progress.

Unable to enable entitlements to account groups in a workspace

CSV Reader reads quoted fields inconsistently in last column

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template