Data Engineering

Forum Posts

Sorted by:

by krocodl • Contributor

08-09-2023 3:41:23 AM

1535 Views
2 replies
0 kudos

Resolved! Thread leakage when connection cannot be established

During the execution of the next code we can observe a lost thread that will never end:@Testpublic void pureConnectionErrorTest() throws Exception { try { DriverManager.getConnection(DATABRICKS_JDBC_URL, DATABRICKS_USERNAME, DATABRICKS_PASS...

Data Engineering

JDBC

resource leaking

threading

1535 Views
2 replies
0 kudos

08-09-2023 3:41:23 AM

View Replies

Latest Reply

krocodl
Contributor

01-17-2024 12:26:57 AM

0 kudos

This issue is reported as fixed since v2.6.34. I validated version 2.6.36- it works normal. Many thanks to the developers for the work done!

0 kudos

01-17-2024 12:26:57 AM

1 More Replies

by rt-slowth • Contributor

12-27-2023 7:11:43 PM

311 Views
1 replies
0 kudos

Delta Live Table streaming pipeline

How do I do a simple left join of a static table and a streaming table under catalog in the streaming pipeline of a Delta Live Table?

Data Engineering

311 Views
1 replies
0 kudos

12-27-2023 7:11:43 PM

View Replies

Latest Reply

Priyanka_Biswas
Valued Contributor

01-16-2024 11:51:27 PM

0 kudos

Hi @rt-slowth I would like to share with you the Databricks documentation, which contains details about stream-static table joins https://docs.databricks.com/en/delta-live-tables/transform.html#stream-static-joins Stream-static joins are a good choic...

0 kudos

01-16-2024 11:51:27 PM

by MartinH • New Contributor II

03-23-2023 3:09:56 PM

2437 Views
6 replies
3 kudos

Azure Data Factory and Photon

Hello, we have Databricks Python workbooks accessing Delta tables. These workbooks are scheduled/invoked by Azure Data Factory. How can I enable Photon on the linked services that are used to call Databricks?If I specify new job cluster, there does n...

Data Engineering

2437 Views
6 replies
3 kudos

03-23-2023 3:09:56 PM

View Replies

Latest Reply

CharlesReily
New Contributor III

01-16-2024 11:22:48 PM

3 kudos

When you create a cluster on Databricks, you can enable Photon by selecting the "Photon" option in the cluster configuration settings. This is typically done when creating a new cluster, and you would find the option in the advanced cluster configura...

3 kudos

01-16-2024 11:22:48 PM

5 More Replies

by JasonThomas • New Contributor III

12-29-2023 9:00:20 AM

526 Views
2 replies
0 kudos

Row-level Concurrency and Liquid Clustering compatibility

The documentation is a little ambiguous:"Row-level concurrency is only supported on tables without partitioning, which includes tables with liquid clustering."https://docs.databricks.com/en/release-notes/runtime/14.2.html Tables with liquid clusterin...

Data Engineering

526 Views
2 replies
0 kudos

12-29-2023 9:00:20 AM

View Replies

Latest Reply

JasonThomas
New Contributor III

01-16-2024 4:57:12 PM

0 kudos

Cluster-on-write is something being worked on. The limitations at the moment have to do with accommodating streaming workloads.I found the following informative:https://www.youtube.com/watch?v=5t6wX28JC_M

0 kudos

01-16-2024 4:57:12 PM

1 More Replies

by RobsonNLPT • Contributor

01-16-2024 2:43:05 AM

1691 Views
1 replies
1 kudos

IDENTIFIER clause

Hi all.Just trying to implement adb sql scripts using identifier clause but I have errors like that using an example:DECLARE mytab = 'tab1'; CREATE TABLE IDENTIFIER(mytab) (c1 INT); [UNSUPPORTED_FEATURE.TEMP_VARIABLE_ON_DBSQL] The feature is not supp...

Data Engineering

identifier

1691 Views
1 replies
1 kudos

01-16-2024 2:43:05 AM

View Replies

Latest Reply

shan_chandra
Honored Contributor III

01-16-2024 11:59:39 AM

1 kudos

@RobsonNLPT - Engineering is still working on the feature that allow DECLARE statements in DBSQL. This is with a tentative ETA of Feb 20 available on preview channel.

1 kudos

01-16-2024 11:59:39 AM

by thedatacrew • New Contributor III

01-16-2024 1:54:08 AM

1362 Views
4 replies
0 kudos

DTL - Delta Live Tables & Handling De-Duplication from Source Data

Hello,Could anyone please help regarding the scenario below?Scenario• I'm using the DLT SQL Language• Parquet files are landed each day from a source system.• Each day, the data contains the 7 previous days of data. The source system can have very la...

Data Engineering

De-duplication

Delta Live Tables

dlt

1362 Views
4 replies
0 kudos

01-16-2024 1:54:08 AM

View Replies

Latest Reply

Lakshay
Esteemed Contributor

01-16-2024 6:29:32 AM

0 kudos

Yes, it is available in DLT. Check this document: https://docs.databricks.com/en/delta-live-tables/cdc.html

0 kudos

01-16-2024 6:29:32 AM

3 More Replies

by Phani1 • Valued Contributor

01-15-2024 9:13:46 PM

736 Views
2 replies
1 kudos

Snowflake connector

Hi Team, Databricks recommends storing data in a cloud storage location, but if we directly connect to Snowflake using the Snowflake connector, will we face any performance issues?Could you please suggest the best way to read a large volume of data f...

Data Engineering

snowflake

736 Views
2 replies
1 kudos

01-15-2024 9:13:46 PM

View Replies

Latest Reply

Phani1
Valued Contributor

01-16-2024 6:46:23 AM

1 kudos

Thanks !!

1 kudos

01-16-2024 6:46:23 AM

1 More Replies

by Amit_Garg • New Contributor

01-16-2024 4:08:57 AM

359 Views
1 replies
1 kudos

Calling a .py Function using DF from another file

I have created a file NBF_TextTranslationspark = SparkSession.builder.getOrCreate() df_TextTranslation = spark.read.format('delta').load(textTranslation_path) def getMediumText(TextID, PlantName): df1 = spark.sql("SELECT TextID, PlantName, Langu...

Data Engineering

359 Views
1 replies
1 kudos

01-16-2024 4:08:57 AM

View Replies

Latest Reply

Lakshay
Esteemed Contributor

01-16-2024 6:37:48 AM

1 kudos

You should create a udf on top of getMediumText function and then use the udf in the sql statement.

1 kudos

01-16-2024 6:37:48 AM

by Volker • New Contributor III

01-16-2024 2:38:59 AM

3077 Views
4 replies
2 kudos

Persisting and managing tables and table schemas in Unity Catalog

Hello Databricks Community,we are currently looking for a way to persist and manage our unity catalog tables in an IaC manner. That is, we want to trace any changes to a table's schema and properties and ideally be able to roll back those changes sea...

Data Engineering

3077 Views
4 replies
2 kudos

01-16-2024 2:38:59 AM

View Replies

Latest Reply

CharlesReily
New Contributor III

01-16-2024 5:28:12 AM

2 kudos

As you mentioned, using notebooks with Data Definition Language (DDL) scripts is a viable option. You can create notebooks that contain the table creation scripts and version control these notebooks along with your application code.

2 kudos

01-16-2024 5:28:12 AM

3 More Replies

by Heman2 • Valued Contributor II

12-10-2022 2:42:11 AM

6420 Views
4 replies
21 kudos

Resolved! How to export the output data in the Excel format into the dbfs location

Is there any way to export the output data in the Excel format into the dbfs?, I'm only able to do it in the CSV format

Data Engineering

6420 Views
4 replies
21 kudos

12-10-2022 2:42:11 AM

View Replies

Latest Reply

Sobreiro
New Contributor II

01-15-2024 11:13:06 AM

21 kudos

The easiest way I fount is to create a dashboard and export from there. It will enable a context menu with options to export to some file types including csv and excel.

21 kudos

01-15-2024 11:13:06 AM

3 More Replies

by pgruetter • Contributor

04-18-2023 11:05:44 PM

3209 Views
7 replies
2 kudos

Run Task as Service Principal with Code in Azure DevOps Repo

Hi allI have a task of type Notebook, source is Git (Azure DevOps). This task runs fine with my user, but if I change the Owner to a service principal, I get the following error:Run result unavailable: run failed with error message Failed to checkout...

Data Engineering

3209 Views
7 replies
2 kudos

04-18-2023 11:05:44 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-24-2023 8:43:16 PM

2 kudos

@pgruetter :To enable a service principal to access a specific Azure DevOps repository, you need to grant it the necessary permissions at both the organization and repository levels.Here are the steps to grant the service principal the necessary per...

2 kudos

04-24-2023 8:43:16 PM

6 More Replies

by sher • Valued Contributor II

12-19-2023 10:28:40 AM

487 Views
2 replies
1 kudos

how to i able to read column mapping metadata for delta tables

i want to read column mapping metadatahttps://github.com/delta-io/delta/blob/master/PROTOCOL.md#column-mappingin above link we can able to find the code block with json data. the same data i want to read in pyspark.. is there any option to read that ...

Data Engineering

487 Views
2 replies
1 kudos

12-19-2023 10:28:40 AM

View Replies

Latest Reply

brockb
New Contributor III

12-20-2023 3:12:52 PM

1 kudos

Hi,Information about the delta table such as history information could be found by running a `describe history table_name`. A `rename column` operation could be found in the `operation` column with a value of `RENAME COLUMN`. If you then look at the ...

1 kudos

12-20-2023 3:12:52 PM

1 More Replies

by QQ • New Contributor III

01-13-2024 2:04:03 AM

2572 Views
2 replies
0 kudos

Resolved! How to fix (AWS SSO) Test Connection failed

What did I configure incorrectly? About SSO settings in my Databricks account. What troubleshooting should I do? I don't see any error message.I follow step instruction from AWS View step-by-step instructions. link belowView step-by-step instructions...

Data Engineering

2572 Views
2 replies
0 kudos

01-13-2024 2:04:03 AM

View Replies

Latest Reply

QQ
New Contributor III

01-13-2024 5:50:53 AM

0 kudos

I got solution I forgot to create SaaS users with the same subject as the AD users.Preprovisioned usersPreprovisioned users, means users must already exist in the downstream SaaS application. For instance, you may need to create SaaS users with the s...

0 kudos

01-13-2024 5:50:53 AM

1 More Replies

by dwfchu1 • New Contributor II

01-09-2024 11:34:28 PM

920 Views
1 replies
1 kudos

UC Volume access for spark and other config files

Hi All,Wondering if anyone else getting this problem:We trying to host krb5.conf and jaas.conf for our compute to be able to connect to Kerberised JDBC sources, we attempting to store these files in Catalog volumes, but at run time when initiating th...

Data Engineering

920 Views
1 replies
1 kudos

01-09-2024 11:34:28 PM

View Replies

Latest Reply

mbendana
New Contributor II

01-12-2024 12:16:04 PM

1 kudos

Haven't been able to access volume path when using jdbc format.

1 kudos

01-12-2024 12:16:04 PM

by Sas • New Contributor II

12-25-2023 8:42:55 PM

398 Views
1 replies
0 kudos

Deltalke performance

HiI am new to databricks and i am trying to understand the use case of deta lakehouse. Is it good idea to build datawarehouse using deltalake architecture. Is it going to give same performance as that of RDBMS clouse datawarehous like snowflake? Whic...

Data Engineering

398 Views
1 replies
0 kudos

12-25-2023 8:42:55 PM

View Replies

Latest Reply

Miguel_Suarez
New Contributor III

01-12-2024 11:53:39 AM

0 kudos

Hi @Sas , One of the benefits of the Data Lakehouse architecture, is that it combines the best of both Data Warehouses and Data Lakes all on one unified platform to help you reduce costs and deliver on your data and AI initiatives faster. It brings t...

0 kudos

01-12-2024 11:53:39 AM

User

Count

1602

736

344

284

247

Databricks

Forum Posts

Resolved! Thread leakage when connection cannot be established

Delta Live Table streaming pipeline

Azure Data Factory and Photon

Row-level Concurrency and Liquid Clustering compatibility

IDENTIFIER clause

DTL - Delta Live Tables & Handling De-Duplication from Source Data

Snowflake connector

Calling a .py Function using DF from another file

Persisting and managing tables and table schemas in Unity Catalog

Resolved! How to export the output data in the Excel format into the dbfs location

Run Task as Service Principal with Code in Azure DevOps Repo

how to i able to read column mapping metadata for delta tables

Resolved! How to fix (AWS SSO) Test Connection failed

UC Volume access for spark and other config files

Deltalke performance

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...