Data Engineering

Forum Posts

Sorted by:

by Julien_Kronegg • Visitor

26m ago

12 Views
0 replies
0 kudos

Cannot use Delta Table columns containing struct with date fields in Power BI

Hi everyone,I have a Delta Table in Databricks with a column of struct type (containing a field of type date) and a column of type date:create table date_struct (s struct<d:date>, d date, s_json string); insert into date_struct (s, d, s_json) values ...

Data Engineering

12 Views
0 replies
0 kudos

26m ago

by etao • New Contributor II

2 weeks ago

140 Views
2 replies
1 kudos

Resolved! How to distribute pyspark dataframe repartition and row count on Databricks?

Try to compare large datasets for discrepancy. The datasets come from two database tables, each with around 500 million rows. I use Pyspark subtract, joins (leftanti, leftsemi) to sorted out the difference. To distribute the workload, I need to repar...

Data Engineering

140 Views
2 replies
1 kudos

2 weeks ago

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

2 weeks ago

1 kudos

Hi @etao, To distribute the workload effectively, try repartitioning by the join key column or increasing the number of partitions. Use coalesce to reduce partitions without shuffling data. For better performance, consider broadcast joins for smaller...

1 kudos

2 weeks ago

1 More Replies

by ToReSa • New Contributor

Saturday

156 Views
5 replies
1 kudos

Read each cell contains SQL from one notebook and execute it on another notebook and export result

Hi, I'm new to databricks, so, excuse me if the question is silly one. I have a requirement to read cell by cell from one notebook (say notebookA) and execute the contents of the cell in another notebook (say notebookB) using a python script. All the...

Data Engineering

156 Views
5 replies
1 kudos

Saturday

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

3 hours ago

1 kudos

Hi @ToReSa, If you just want to execute the notebook, calling another notebook would be easier. You can even exchange some data between the notebooks. But if you specifically want to pick each SQL from one notebook and execute it in another notebook,...

1 kudos

3 hours ago

4 More Replies

by sharukh_lodhi • New Contributor III

4 hours ago

39 Views
0 replies
0 kudos

Azure IMDS is not accesbile selecting shared compute policy

Hi, Databricks community,I recently encountered an issue while using the 'azure.identity' Python library on a cluster set to the personal compute policy in Databricks. In this case, Databricks successfully returns the Azure Databricks managed user id...

Data Engineering

azure IMDS

DefaultAzureCredential

39 Views
0 replies
0 kudos

4 hours ago

by prathameshJoshi • Visitor

4 hours ago

18 Views
0 replies
0 kudos

How to obtain the server url for using spark's REST API

Hi,I want to access the stage and job information (usually available through Spark UI) through the REST API provided by Spark: http://<server-url>:18080/api/v1/applications/[app-id]/stages. More information can be found at following link: https://spa...

Data Engineering

18 Views
0 replies
0 kudos

4 hours ago

by PiotrU • Contributor

06-05-2024 7:28:50 AM

876 Views
5 replies
1 kudos

Resolved! Adding extra libraries to databricks (rosbag)

HelloI have interesting challenge, I am required to install few libraries which are part of rosbag packages, for allowing some data deserialization tasks.While creating cluster I do use init_script that install this software using apt sudo apt upd...

Data Engineering

876 Views
5 replies
1 kudos

06-05-2024 7:28:50 AM

View Replies

Latest Reply

amandaK
New Contributor

Friday

1 kudos

@PiotrU did adding the path to sys.path resolve all of your ModuleNotFoundErrors? i'm trying to do something similar and adding the path to the sys.path resolved ModuleNotFoundError for rclpy, but i continue to see others related to ros

1 kudos

Friday

4 More Replies

by subhas_1729 • New Contributor

yesterday

24 Views
1 replies
0 kudos

CSV file and partitions

Hi I want to know whether csv files can be partitioned or not. I find from a book that parque, Avro .. these types of files can be partitioned only. RegardsSubhas

Data Engineering

24 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Witold
Contributor

6 hours ago

0 kudos

Basically every file type can be partitioned, as technically partitions are just sub folders.

0 kudos

6 hours ago

by Wenhui • New Contributor II

2 weeks ago

98 Views
3 replies
0 kudos

How Troubleshooting in user's env

Hi team, I want to do POC, but here I have a question confusing me is that if your teams engineer need access our data plan env to troubleshooting for us, How do you do can get permission to access our env ? could you help me, thank you very much.

Data Engineering

98 Views
3 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

Slash
New Contributor III

2 weeks ago

0 kudos

Hi @Wenhui ,But what's your setup? Which cloud provider? Do you use Unity Catalog?

0 kudos

2 weeks ago

2 More Replies

by labromb • Contributor

04-19-2023 9:05:29 AM

10813 Views
9 replies
4 kudos

How to pass configuration values to a Delta Live Tables job through the Delta Live Tables API

Hi Community,I have successfully run a job through the API but would need to be able to pass parameters (configuration) to the DLT workflow via the APII have tried passing JSON in this format:{ "full_refresh": "true", "configuration": [ ...

Data Engineering

10813 Views
9 replies
4 kudos

04-19-2023 9:05:29 AM

View Replies

Latest Reply

Manjula_Ganesap
Contributor

08-24-2023 9:16:36 AM

4 kudos

@Mo - it worked. Thank you so much.

4 kudos

08-24-2023 9:16:36 AM

8 More Replies

by seeker • Visitor

yesterday

38 Views
1 replies
0 kudos

Get metadata of files present in a zip

I have a .zip file present on an ADLS path which contains multiple files of different formats. I want to get metadata of the files like file name, modification time present in it without unzipping it. I have a code which works for smaller zip but run...

Data Engineering

38 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

seeker
Visitor

yesterday

0 kudos

Here is the code which i am using def register_udf(): def extract_file_metadata_from_zip(binary_content): metadata_list = [] with io.BytesIO(binary_content) as bio: with zipfile.ZipFile(bio, "r") as zip_ref: ...

0 kudos

yesterday

by Ulman • New Contributor II

05-05-2024 5:21:23 AM

1344 Views
8 replies
0 kudos

Switching to File Notification Mode with ADLS Gen2 - Encountering StorageException

Hello,We are currently utilizing an autoloader with file listing mode for a stream, which is experiencing significant latency due to the non-incremental naming of files in the directory—a condition that cannot be altered.In an effort to mitigate this...

Data Engineering

ADLS gen2

autoloader

file notification mode

1344 Views
8 replies
0 kudos

05-05-2024 5:21:23 AM

View Replies

Latest Reply

Rah_Cencora
Visitor

yesterday

0 kudos

You should also reevaluate your use of premium storage for your landing area files. Typically, storage for raw files does not need to be the fastest and most resilient and expensive tier. Unless you have a compelling reason for premium storage for la...

0 kudos

yesterday

7 More Replies

by ibrahim21124 • New Contributor III

07-05-2024 2:37:27 AM

566 Views
7 replies
0 kudos

Autoloader File Notification Mode not working as expected

I am using this given code to read from a source location in ADLS Gen 2 Azure Storage Container. core_df = ( spark.readStream.format("cloudFiles") .option("cloudFiles.format", "json") .option("multiLine", "false") .option(...

Data Engineering

566 Views
7 replies
0 kudos

07-05-2024 2:37:27 AM

View Replies

Latest Reply

Rishabh_Tiwari
Community Manager

07-18-2024 8:57:11 AM

0 kudos

Hi @ibrahim21124 , Thank you for reaching out to our community! We're here to help you. To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your fe...

0 kudos

07-18-2024 8:57:11 AM

6 More Replies

by jindalharsh2511 • New Contributor

Friday

87 Views
1 replies
0 kudos

facing frequent session expiration in Databricks community edition

facing frequent session expiration in Databricks community edition since 15-Aug. Is it a bug or a any technical update going on.Please confirm.Thanks

Data Engineering

87 Views
1 replies
0 kudos

Friday

View Replies

Latest Reply

Ajay203
Visitor

yesterday

0 kudos

Facing the same issue.Is it still going on for you ?

0 kudos

yesterday

by YFL • New Contributor III

12-09-2021 12:45:43 PM

4887 Views
12 replies
6 kudos

Resolved! When delta is a streaming source, how can we get the consumer lag?

Hi, I want to keep track of the streaming lag from the source table, which is a delta table. I see that in query progress logs, there is some information about the last version and the last file in the version for the end offset, but this don't give ...

Data Engineering

4887 Views
12 replies
6 kudos

12-09-2021 12:45:43 PM

View Replies

Latest Reply

Anonymous
Not applicable

05-12-2022 6:44:35 AM

6 kudos

Hey @Yerachmiel Feltzman I hope all is well.Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

6 kudos

05-12-2022 6:44:35 AM

11 More Replies

by lshar • New Contributor III

04-06-2022 8:42:23 AM

20315 Views
8 replies
5 kudos

Resolved! How do I pass arguments/variables from widgets to notebooks?

Hello,I am looking for a solution to this problem, which is known since 7 years: https://community.databricks.com/s/question/0D53f00001HKHZfCAP/how-do-i-pass-argumentsvariables-to-notebooksWhat I need is to parametrize my notebooks using widget infor...

Data Engineering

20315 Views
8 replies
5 kudos

04-06-2022 8:42:23 AM

View Replies

Latest Reply

T_Ash
Visitor

yesterday

5 kudos

Can we create paginated reports with multiple parameters(one parameter can dynamically change other parameter) or we can pass one variable from one dataset to other dataset like power bi paginated report using Databricks dashboard, please let me know...

5 kudos

yesterday

7 More Replies

User

Count

1604

747

349

285

248

Databricks Community

Forum Posts

Cannot use Delta Table columns containing struct with date fields in Power BI

Resolved! How to distribute pyspark dataframe repartition and row count on Databricks?

Read each cell contains SQL from one notebook and execute it on another notebook and export result

Azure IMDS is not accesbile selecting shared compute policy

How to obtain the server url for using spark's REST API

Resolved! Adding extra libraries to databricks (rosbag)

CSV file and partitions

How Troubleshooting in user's env

How to pass configuration values to a Delta Live Tables job through the Delta Live Tables API

Get metadata of files present in a zip

Switching to File Notification Mode with ADLS Gen2 - Encountering StorageException

Autoloader File Notification Mode not working as expected

facing frequent session expiration in Databricks community edition

Resolved! When delta is a streaming source, how can we get the consumer lag?

Resolved! How do I pass arguments/variables from widgets to notebooks?

How to distribute pyspark dataframe repartition an...

Databricks associate certification

IDENTIFIER not working in UPDATE

RESOURCE_EXHAUSTED dbutils.jobs.taskValues.get

Hanging/frozen cancelling/running cells in noteboo...