Data Engineering

Forum Posts

Sorted by:

by AndyKeel • New Contributor II

01-04-2024 4:19:25 AM

1001 Views
3 replies
0 kudos

Creating an ADLS storage credential for an AWS Workspace

I'd like to create a storage credential for an Azure Storage Account in an AWS workspace. I then plan to use this storage credential to create an external volume.Is this possible, and if so what are the steps? Thanks for any help!

Data Engineering

1001 Views
3 replies
0 kudos

01-04-2024 4:19:25 AM

View Replies

Latest Reply

AndyKeel
New Contributor II

01-08-2024 6:38:08 AM

0 kudos

Thanks for your help.I'm struggling to create the Storage Credential. I have created a managed identity via an Azure Databricks Access Connector and am making an API call based on what I'm reading in the API docs: Create a storage credential | Storag...

0 kudos

01-08-2024 6:38:08 AM

2 More Replies

by NirmalaSathiya • New Contributor

01-02-2024 11:20:45 PM

1029 Views
2 replies
0 kudos

Not able to use _metadata to retrieve file name while reading xml files

We are trying to retrieve xml file name using _metadata but not working. we are not able to use input _file_name() also as we are using shared cluster.we are reading the xml files using com.datadricks.spark.xml library

Data Engineering

filename

read

XML

1029 Views
2 replies
0 kudos

01-02-2024 11:20:45 PM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

01-18-2024 1:34:57 AM

0 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

0 kudos

01-18-2024 1:34:57 AM

1 More Replies

by ElaPG • New Contributor III

01-01-2024 10:48:49 AM

3185 Views
7 replies
1 kudos

Command restrictions

Is there any possibility to restrict usage of specified commands (like mount/unmount or SQL grant) based on group assignment? I do not want everybody to be able to execute these commands.

Data Engineering

3185 Views
7 replies
1 kudos

01-01-2024 10:48:49 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

01-18-2024 1:30:56 AM

1 kudos

1 kudos

01-18-2024 1:30:56 AM

6 More Replies

by Cas • New Contributor III

01-15-2024 2:15:35 AM

1994 Views
3 replies
1 kudos

Asset Bundles: Dynamic job cluster insertion in jobs

Hi!As we are migrating from dbx to asset bundles we are running into some problems with the dynamic insertion of job clusters in the job definition as with dbx we did this nicely with jinja and defined all the clusters in one place and a change in th...

Data Engineering

asset bundles

jobs

1994 Views
3 replies
1 kudos

01-15-2024 2:15:35 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

01-18-2024 1:27:12 AM

1 kudos

1 kudos

01-18-2024 1:27:12 AM

2 More Replies

by israelst • New Contributor II

01-15-2024 2:28:17 AM

2166 Views
5 replies
1 kudos

structured streaming schema inference

I want to stream data from kinesis using DLT. the Data is in json format. How can I use structured streaming to automatically infer the schema? I know auto-loader has this feature but it doesn't make sense for me to use autoloader since my data is st...

Data Engineering

2166 Views
5 replies
1 kudos

01-15-2024 2:28:17 AM

View Replies

Latest Reply

israelst
New Contributor II

01-18-2024 12:57:00 AM

1 kudos

I wanted to use Databricks for this. I don't want to depend on AWS Glue. Same way I could do it with AutoLoader...

1 kudos

01-18-2024 12:57:00 AM

4 More Replies

by Esther_Tomi • New Contributor

01-15-2024 5:23:56 AM

1218 Views
2 replies
0 kudos

Unable to Install Cluster-Scoped Libraries on Runtime >13.3

Hello team,I'm trying to upgrade our databricks runtime to 13.3 from 9.1, but i've been having issues installing libraries on the compute from our internal artifactoryHowever, when I tried this on a unity-catalog enabled workspace, it works seamless...

Data Engineering

1218 Views
2 replies
0 kudos

01-15-2024 5:23:56 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

01-18-2024 1:15:12 AM

0 kudos

0 kudos

01-18-2024 1:15:12 AM

1 More Replies

by pshah83 • New Contributor II

08-11-2022 4:20:55 PM

1476 Views
2 replies
1 kudos

Use output of SHOW PARTITION commands in Sub-Query/CTE/Function

I am using SHOW PARTITIONS <<table_name>> to get all the partitions of a table. I want to use max() on the output of this command to get the latest partition for the table.However, I am not able to use SHOW PARTITIONS <<table_name>> in a CTE/sub-quer...

Data Engineering

1476 Views
2 replies
1 kudos

08-11-2022 4:20:55 PM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

01-18-2024 12:56:29 AM

1 kudos

1 kudos

01-18-2024 12:56:29 AM

1 More Replies

by Simha • New Contributor II

01-17-2024 4:43:42 AM

1669 Views
2 replies
1 kudos

How to write only file on to the Blob or ADLS from Databricks?

Hi All,I am trying to write a csv file on to the blob and ADLS from databricks notebook using pyspark and a separate folder is created with the mentioned filename and a partition is created within the folder.I want only file to be written. Can anyone...

Data Engineering

1669 Views
2 replies
1 kudos

01-17-2024 4:43:42 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

01-18-2024 12:55:29 AM

1 kudos

1 kudos

01-18-2024 12:55:29 AM

1 More Replies

by Mr__D • New Contributor II

03-23-2023 11:10:20 AM

11040 Views
7 replies
1 kudos

Resolved! Writing modular code in Databricks

Hi All, Could you please suggest to me the best way to write PySpark code in Databricks,I don't want to write my code in Databricks notebook but create python files(modular project) in Vscode and call only the primary function in the notebook(the res...

Data Engineering

11040 Views
7 replies
1 kudos

03-23-2023 11:10:20 AM

View Replies

Latest Reply

Gamlet
New Contributor II

01-17-2024 5:33:35 AM

1 kudos

Certainly! To write PySpark code in Databricks while maintaining a modular project in VSCode, you can organize your PySpark code into Python files in VSCode, with a primary function encapsulating the main logic. Then, upload these files to Databricks...

1 kudos

01-17-2024 5:33:35 AM

6 More Replies

by Danielsg94 • New Contributor II

08-24-2022 12:59:47 AM

31564 Views
6 replies
2 kudos

Resolved! How can I write a single file to a blob storage using a Python notebook, to a folder with other data?

When I use the following code: df .coalesce(1) .write.format("com.databricks.spark.csv") .option("header", "true") .save("/path/mydata.csv")it writes several files, and when used with .mode("overwrite"), it will overwrite everything in th...

Data Engineering

31564 Views
6 replies
2 kudos

08-24-2022 12:59:47 AM

View Replies

Latest Reply

Simha
New Contributor II

01-17-2024 4:37:17 AM

2 kudos

Hi Daniel,May I know, how did you fix this issue. I am facing similar issue while writing csv/parquet to blob/adls, it creates a separate folder with the filename and creates a partition file within that folder.I need to write just a file on to the b...

2 kudos

01-17-2024 4:37:17 AM

5 More Replies

by rt-slowth • Contributor

01-02-2024 4:01:11 PM

1052 Views
3 replies
0 kudos

Resolved! Pipelines using dlt modules from the Unity Catalog

[Situation]I am using AWS DMS to store mysql cdc in S3 as a parquet file.I have implemented a streaming pipeline using the DLT module.The target destination is Unity Catalog.[Questions and issues].- Where are the tables and materialized views specifi...

Data Engineering

1052 Views
3 replies
0 kudos

01-02-2024 4:01:11 PM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

01-17-2024 4:33:09 AM

0 kudos

0 kudos

01-17-2024 4:33:09 AM

2 More Replies

by sriv8792 • New Contributor

01-01-2024 12:03:21 AM

1141 Views
2 replies
0 kudos

how to configure databricks authentication via Azure AD Service principal token in Azure cicd task

Data Engineering

1141 Views
2 replies
0 kudos

01-01-2024 12:03:21 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

01-17-2024 4:19:24 AM

0 kudos

0 kudos

01-17-2024 4:19:24 AM

1 More Replies

by krocodl • Contributor

08-09-2023 3:41:23 AM

2500 Views
2 replies
0 kudos

Resolved! Thread leakage when connection cannot be established

During the execution of the next code we can observe a lost thread that will never end:@Testpublic void pureConnectionErrorTest() throws Exception { try { DriverManager.getConnection(DATABRICKS_JDBC_URL, DATABRICKS_USERNAME, DATABRICKS_PASS...

Data Engineering

JDBC

resource leaking

threading

2500 Views
2 replies
0 kudos

08-09-2023 3:41:23 AM

View Replies

Latest Reply

krocodl
Contributor

01-17-2024 12:26:57 AM

0 kudos

This issue is reported as fixed since v2.6.34. I validated version 2.6.36- it works normal. Many thanks to the developers for the work done!

0 kudos

01-17-2024 12:26:57 AM

1 More Replies

by rt-slowth • Contributor

12-27-2023 7:11:43 PM

631 Views
1 replies
0 kudos

Delta Live Table streaming pipeline

How do I do a simple left join of a static table and a streaming table under catalog in the streaming pipeline of a Delta Live Table?

Data Engineering

631 Views
1 replies
0 kudos

12-27-2023 7:11:43 PM

View Replies

Latest Reply

Priyanka_Biswas
Esteemed Contributor III

01-16-2024 11:51:27 PM

0 kudos

Hi @rt-slowth I would like to share with you the Databricks documentation, which contains details about stream-static table joins https://docs.databricks.com/en/delta-live-tables/transform.html#stream-static-joins Stream-static joins are a good choic...

0 kudos

01-16-2024 11:51:27 PM

by JasonThomas • New Contributor III

12-29-2023 9:00:20 AM

1224 Views
2 replies
0 kudos

Row-level Concurrency and Liquid Clustering compatibility

The documentation is a little ambiguous:"Row-level concurrency is only supported on tables without partitioning, which includes tables with liquid clustering."https://docs.databricks.com/en/release-notes/runtime/14.2.html Tables with liquid clusterin...

Data Engineering

1224 Views
2 replies
0 kudos

12-29-2023 9:00:20 AM

View Replies

Latest Reply

JasonThomas
New Contributor III

01-16-2024 4:57:12 PM

0 kudos

Cluster-on-write is something being worked on. The limitations at the moment have to do with accommodating streaming workloads.I found the following informative:https://www.youtube.com/watch?v=5t6wX28JC_M

0 kudos

01-16-2024 4:57:12 PM

1 More Replies

User

Count

1609

751

349

285

248

Databricks Community

Forum Posts

Creating an ADLS storage credential for an AWS Workspace

Not able to use _metadata to retrieve file name while reading xml files

Command restrictions

Asset Bundles: Dynamic job cluster insertion in jobs

structured streaming schema inference

Unable to Install Cluster-Scoped Libraries on Runtime >13.3

Use output of SHOW PARTITION commands in Sub-Query/CTE/Function

How to write only file on to the Blob or ADLS from Databricks?

Resolved! Writing modular code in Databricks

Resolved! How can I write a single file to a blob storage using a Python notebook, to a folder with other data?

Resolved! Pipelines using dlt modules from the Unity Catalog

how to configure databricks authentication via Azure AD Service principal token in Azure cicd task

Resolved! Thread leakage when connection cannot be established

Delta Live Table streaming pipeline

Row-level Concurrency and Liquid Clustering compatibility

Connect with Databricks Users in Your Area

Load parent columns and not unnest using pyspark? ...

How to increase executor memory in Databricks jobs

Can I have additional logic in a DLT notebook that...

Databricks-connect Configure a connection to serve...

Import from repo