Data Engineering

Forum Posts

Sorted by:

by Jennifer • New Contributor III

3 weeks ago

121 Views
1 replies
0 kudos

Optimization failed for timestampNtz

We have a table using timestampNtz type for timestamp, which is also a cluster key for this table using liquid clustering. I ran OPTIMIZE <table-name>, it failed with errorUnsupported datatype 'TimestampNTZType' But the failed optmization also broke ...

Data Engineering

121 Views
1 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

3 weeks ago

0 kudos

Hi @Jennifer, Since TimestampNTZType is not currently supported for optimization, you can try a workaround by converting the timestamp column to a different data type before running the OPTIMIZE command.For example, you could convert the timestampNt...

0 kudos

3 weeks ago

by vpacik • New Contributor

3 weeks ago

288 Views
1 replies
0 kudos

Databricks-connect OpenSSL Handshake failed on WSL2

When trying to setup databricks-connect on WSL2 using 13.3 cluster, I receive the following error regarding OpenSSL CERTIFICATE_ERIFY_FAILED.The authentication is done via SPARK_REMOTE env. variable. E0415 11:24:26.646129568 142172 ssl_transport_sec...

Data Engineering

288 Views
1 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

3 weeks ago

0 kudos

Hi @jp_allard, One approach to resolve this is to disable SSL certificate verification. However, keep in mind that this approach may compromise security.In your Databricks configuration file (usually located at ~/.databrickscfg), add the following l...

0 kudos

3 weeks ago

by pernilak • New Contributor III

3 weeks ago

171 Views
1 replies
0 kudos

Working with Unity Catalog from VSCode using the Databricks Extension

Hi!As suggested by Databricks, we are working with Databricks from VSCode using Databricks bundles for our deployment and using the VSCode Databricks Extension and Databricks Connect during development.However, there are some limitations that we are ...

Data Engineering

171 Views
1 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

3 weeks ago

0 kudos

Hi @pernilak, It’s great that you’re using Databricks with Visual Studio Code (VSCode) for your development workflow! Let’s address the limitations you’ve encountered when working with files from Unity Catalog using native Python. When running Python...

0 kudos

3 weeks ago

by jp_allard • New Contributor

3 weeks ago

150 Views
1 replies
0 kudos

Selective Overwrite to a Unity Catalog Table

I have been able to perform a selective overwrite using replace Where to a hive_metastore table, but when I use the same code for the same table in a unity catalog, no data is written.Has anyone else had this issue or is there common mistakes that ar...

Data Engineering

150 Views
1 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

3 weeks ago

0 kudos

Hi @jp_allard , The Unity Catalog is a newer feature in Databricks, designed to replace the traditional Hive Metastore.When transitioning from Hive Metastore to Unity Catalog, there might be differences in behavior due to underlying architectural ch...

0 kudos

3 weeks ago

by CDICSteph • New Contributor

01-13-2024 4:30:22 PM

888 Views
5 replies
0 kudos

permission denied listing external volume when using vscode databricks extension

hey, i'm using the Db extension for vscode (Databricks connect v2). When using dbutils to list an external volume defined in UC like so: dbutils.fs.ls("/Volumes/dev/bronze/rawdatafiles/") i get this error: "databricks.sdk.errors.mapping.PermissionD...

Data Engineering

888 Views
5 replies
0 kudos

01-13-2024 4:30:22 PM

View Replies

Latest Reply

lukasjh
New Contributor II

3 weeks ago

0 kudos

We still face the problem (UC enabled shared cluster). Is there any resolution? @Kaniz

0 kudos

3 weeks ago

4 More Replies

by JeanT • New Contributor

3 weeks ago

158 Views
1 replies
0 kudos

Help with Identifying and Parsing Varying Date Formats in Spark DataFrame

Hello Spark Community,I'm encountering an issue with parsing dates in a Spark DataFrame due to inconsistent date formats across my datasets. I need to identify and parse dates correctly, irrespective of their format. Below is a brief outline of my p...

Data Engineering

158 Views
1 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

-werners-
Esteemed Contributor III

3 weeks ago

0 kudos

How about not specifying the format? This will already match common formats.When you still have nulls, you can use your list with known exotic formats.Another solution is working with regular expressions. looking for 2 digit numbers not larger than...

0 kudos

3 weeks ago

by Phuonganh • New Contributor

3 weeks ago

173 Views
1 replies
0 kudos

Databricks SDK for Python: Errors with parameters for Statement Execution

Hi team,Im using Databricks SDK for python to run SQL queries. I created a variable as below:param = [{'name' : 'a', 'value' :x'}, {'name' : 'b', 'value' : 'y'}]and passed it the statement as below_ = w.statement_execution.execute_statement( warehous...

Data Engineering

173 Views
1 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

3 weeks ago

0 kudos

Hi @Phuonganh, This error is not directly related to the Databricks SDK, but rather a misunderstanding of how to pass parameters in your SQL query. The param dictionary you’ve defined seems to have a typo in the value for the ‘a’ parameter. It should...

0 kudos

3 weeks ago

by AnkithP • New Contributor

3 weeks ago

185 Views
1 replies
1 kudos

Infer schema eliminating leading zeros.

Upon reading a CSV file with schema inference enabled, I've noticed that a column originally designated as string datatype contains numeric values with leading zeros. However, upon reading the data to Pyspark data frame, it undergoes automatic conver...

Data Engineering

185 Views
1 replies
1 kudos

3 weeks ago

View Replies

Latest Reply

-werners-
Esteemed Contributor III

3 weeks ago

1 kudos

if you set .option("inferSchema", "false") all columns will be read as string.You will have to cast all the other columns to their appropriate type though. So passing a schema seems easier to me.

1 kudos

3 weeks ago

by zmsoft • New Contributor

3 weeks ago

162 Views
1 replies
0 kudos

Why is Dlt pipeline processing streaming data so slow?

Running a single table is fast, but running 80 tables at the same time takes a long time, is it serial queued execution? Isn't it concurrent?

Data Engineering

162 Views
1 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

3 weeks ago

0 kudos

Hi @zmsoft, The processing power of the nodes running your Dlt pipeline matters. Using more powerful node types can significantly impact performance.Consider using a more robust node type, such as the Standard_E16ds_v4 or Standard_E32ds_v4.

0 kudos

3 weeks ago

by PrebenOlsen • New Contributor III

3 weeks ago

233 Views
2 replies
0 kudos

Job stuck while utilizing all workers

Hi!Started a job yesterday. It was iterating over data, 2-months at a time, and writing to a table. It was successfully doing this for 4 out of 6 time periods. The 5th time period however, got stuck, 5 hours in.I can find one Failed Stage that reads ...

Data Engineering

job failed

Job froze

need help

233 Views
2 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

-werners-
Esteemed Contributor III

3 weeks ago

0 kudos

As Spark is lazy evaluated, using only small clusters for read and large ones for writes is not something that will happen.The data is read when you apply an action (write f.e.).That being said: I have no knowledge of a bug in Databricks on clusters...

0 kudos

3 weeks ago

1 More Replies

by laurenskuiper97 • New Contributor

3 weeks ago

228 Views
1 replies
0 kudos

JDBC / SSH-tunnel to connect to PostgreSQL not working on multi-node clusters

Hi everybody,I'm trying to setup a connection between Databricks' Notebooks and an external PostgreSQL database through a SSH-tunnel. On a single-node cluster, this is working perfectly fine. However, when this is ran on a multi-node cluster, this co...

Data Engineering

clusters

JDBC

spark

SSH

228 Views
1 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

-werners-
Esteemed Contributor III

3 weeks ago

0 kudos

I doubt it is possible.The driver runs the program, and sends tasks to the executors. But since creating the ssh tunnel is no spark task, I don't think it will be established on any executor.

0 kudos

3 weeks ago

by Jotav93 • New Contributor II

4 weeks ago

304 Views
2 replies
0 kudos

Move a delta table from a non UC metastore to a UC metastore preserving history

Hi, I am using Azure databricks and we recently enabled UC in our workspace. We have some tables in our non UC metastore that we want to move to a UC enabled metastore. Is there any way we can move these tables without loosing the delta table history...

Data Engineering

delta

unity

304 Views
2 replies
0 kudos

4 weeks ago

View Replies

Latest Reply

ThomazRossito
New Contributor III

3 weeks ago

0 kudos

Hello,It is possible to have the expected result with dbutils.fs.cp("Origin location", "Destination location", True) and then create the table with the LOCATION of the Destination locationHope this helps

0 kudos

3 weeks ago

1 More Replies

by MathewDRitch • New Contributor II

3 weeks ago

249 Views
3 replies
1 kudos

Connecting from Databricks to Network Path

Hi All,Will appreciate if someone can help me with some references links on connecting from Databricks to external network path. I have Databricks on AWS and previously used to connect to files on external network path using Mount method. Now Databri...

Data Engineering

249 Views
3 replies
1 kudos

3 weeks ago

View Replies

Latest Reply

-werners-
Esteemed Contributor III

3 weeks ago

1 kudos

I don't think that it is possible at the moment. UC focuses on cloud data.You might want to try to use Minio, but apparently UC does not support Minio yetPity, because that would be an awesome solution.

1 kudos

3 weeks ago

2 More Replies

by Dp15 • Contributor

a month ago

304 Views
2 replies
2 kudos

Using UDF in an insert command

Hi,I am trying to use a UDF to get the last day of the month and use the boolean result of the function in an insert command. Please find herewith the function and the my query.function:import calendarfrom datetime import datetime, date, timedeltadef...

Data Engineering

304 Views
2 replies
2 kudos

a month ago

View Replies

Latest Reply

Dp15
Contributor

3 weeks ago

2 kudos

Thank you @Kaniz for your detailed explanation

2 kudos

3 weeks ago

1 More Replies

by Kroy • Contributor

01-12-2024 4:38:45 AM

2104 Views
8 replies
1 kudos

Resolved! What is difference between streaming and streaming live table

Can anyone explain in layman what is difference between Streaming and streaming live table ?

Data Engineering

2104 Views
8 replies
1 kudos

01-12-2024 4:38:45 AM

View Replies

Latest Reply

CharlesReily
New Contributor III

01-18-2024 5:22:24 AM

1 kudos

Streaming, in a broad sense, refers to the continuous flow of data over a network. It allows you to watch or listen to content in real-time without having to download the entire file first. A "Streaming Live Table" might refer to a specific type of ...

1 kudos

01-18-2024 5:22:24 AM

7 More Replies

User

Count

1603

736

344

284

247

Databricks

Forum Posts

Optimization failed for timestampNtz

Databricks-connect OpenSSL Handshake failed on WSL2

Working with Unity Catalog from VSCode using the Databricks Extension

Selective Overwrite to a Unity Catalog Table

permission denied listing external volume when using vscode databricks extension

Help with Identifying and Parsing Varying Date Formats in Spark DataFrame

Databricks SDK for Python: Errors with parameters for Statement Execution

Infer schema eliminating leading zeros.

Why is Dlt pipeline processing streaming data so slow?

Job stuck while utilizing all workers

JDBC / SSH-tunnel to connect to PostgreSQL not working on multi-node clusters

Move a delta table from a non UC metastore to a UC metastore preserving history

Connecting from Databricks to Network Path

Using UDF in an insert command

Resolved! What is difference between streaming and streaming live table

Load multiple delta tables at once from Sql server

Starting Serverless sql cluster on GCP

"Can't login to databricks socket is closed" when ...

Temporary views no longer working for Share Comput...

Does DLT use one single SparkSession?