Data Engineering

Forum Posts

Sorted by:

Start a conversation

by prasadvaze • Valued Contributor

12-23-2022 12:30:19 PM

2670 Views
2 replies
1 kudos

Resolved! How to start local/city databricks user group?

Hello Lindsey, I would like to start Richmond, VA databricks user group (chapter) . How do I go about doing this?

Data Engineering

2670 Views
2 replies
1 kudos

12-23-2022 12:30:19 PM

View Replies

Latest Reply

Kaniz
Community Manager

11-19-2023 10:43:26 PM

1 kudos

Hi @prasad_vaze, Thank you for your interest in starting a Databricks user group in Richmond, VA! It’s a great initiative to foster collaboration and knowledge sharing among Databricks enthusiasts. I will let my team reach out to you on the same.

1 kudos

11-19-2023 10:43:26 PM

1 More Replies

by Rdipak • New Contributor II

11-16-2023 4:29:56 AM

671 Views
2 replies
0 kudos

Delta live table blocks pipeline autoloader rate limit

I have created a ETL pipeline with DLT. My first step is to ingest into raw delta table using autoloader file notification. when I have 20k notification pipe line run well across all stages. But when we have surge in number of messages pipeline waits...

Data Engineering

671 Views
2 replies
0 kudos

11-16-2023 4:29:56 AM

View Replies

Latest Reply

kulkpd
Contributor

11-17-2023 2:51:57 PM

0 kudos

Did you try following options:.option('cloudFiles.maxFilesPerTrigger', 10000) or maxBytesPerTrigger ?

0 kudos

11-17-2023 2:51:57 PM

1 More Replies

by AndrewSilver • New Contributor II

11-17-2023 5:06:38 AM

504 Views
1 replies
1 kudos

Uncertainty on Databricks job variables: {{run_id}}, {{parent_run_id}}.

In Azure's Databricks jobs, {{run_id}} and {{parent_run_id}} serve as variables. In jobs with multiple tasks, {{run_id}} aligns with task_run_id, while {{parent_run_id}} matches job_run_id. In single-task jobs, {{parent_run_id}} aligns with task_run_...

Data Engineering

504 Views
1 replies
1 kudos

11-17-2023 5:06:38 AM

View Replies

Latest Reply

kulkpd
Contributor

11-17-2023 9:40:23 AM

1 kudos

I am using job with single task and multiple retry.Upon job retry the run_id get changed, I tried to using {{parent_run_id}} but never worked so switched to val parentRunId = dbutils.notebook.getContext.tags("jobRunOriginalAttempt")

1 kudos

11-17-2023 9:40:23 AM

by Direo • Contributor

04-14-2022 2:24:25 AM

5562 Views
3 replies
1 kudos

Resolved! JavaPackage object is not callable - pydeequ

Hi!When I run a notebook on databricks, it throws error - " 'JavaPackage' object is not callable" which points to pydeequ library:/local_disk0/.ephemeral_nfs/envs/pythonEnv-3abbb1aa-ee5b-48da-aaf2-18f273299f52/lib/python3.8/site-packages/pydeequ/che...

Data Engineering

5562 Views
3 replies
1 kudos

04-14-2022 2:24:25 AM

View Replies

Latest Reply

JSatiro
New Contributor II

11-17-2023 7:39:08 AM

1 kudos

Hi. If you are struggling like I was, these were the steps I followed to make it work:1 - Created a cluster with Runtime 10.4 LTS, which has spark version 3.2.1 (it should work with more recent runtimes, but be aware of the spark version)2 - When cre...

1 kudos

11-17-2023 7:39:08 AM

2 More Replies

by NathanE • New Contributor II

11-17-2023 1:00:47 AM

735 Views
1 replies
1 kudos

Time travel on views

Hello,At my company, we design an application to analyze data, and we can do so on top of external databases such as Databricks. Our application cache some data in-memory and to avoid synchronization issues with the data on Databricks, we rely heavil...

Data Engineering

735 Views
1 replies
1 kudos

11-17-2023 1:00:47 AM

View Replies

Latest Reply

karthik_p
Esteemed Contributor

11-17-2023 7:00:45 AM

1 kudos

@NathanE As you said, based on below article it may not support currenlty https://docs.databricks.com/en/sql/user/materialized-views.html, but at the same time looks as Materialized View is built on top of table and It is synchronous operation ( when...

1 kudos

11-17-2023 7:00:45 AM

by DatabricksNIN • New Contributor II

11-17-2023 5:18:43 AM

704 Views
2 replies
0 kudos

Pulling data from Azure Boards (Specifically 'Analytics Views' into databricks

Building upon a previous post/topic from one year ago.. I am looking for best practises/examples on how to pull data from Azure Boards and specifically from 'Analytics Views' into databricks for analysis.I have succeeded in doing so with 'Work Items...

Data Engineering

704 Views
2 replies
0 kudos

11-17-2023 5:18:43 AM

View Replies

Latest Reply

Kaniz
Community Manager

11-17-2023 6:12:37 AM

0 kudos

Hi @DatabricksNIN , To pull data from Azure Boards and specifically from ‘Analytics Views’ into Databricks for analysis, you can use the Azure DevOps REST API.

0 kudos

11-17-2023 6:12:37 AM

1 More Replies

by erigaud • Honored Contributor

10-27-2023 12:04:48 AM

1505 Views
3 replies
0 kudos

Combining DLT and workflow - MATERIALIZED_VIEW_OPERATION_NOT_ALLOWED

Hello everyone !I currently have a DLT pipeline that loads into several Delta LIVE tables (both streaming and materialized view).The end table of my DLT pipeline is a materialized view called "silver.my_view".In a later step I need to join/union/merg...

Data Engineering

1505 Views
3 replies
0 kudos

10-27-2023 12:04:48 AM

View Replies

Latest Reply

Kaniz
Community Manager

11-17-2023 3:30:09 AM

0 kudos

Hi @erigaud , To read a table from a DLT pipeline with a regular non-shared cluster, you can use the dlt.table function in Databricks. This function reads data from a table registered in the Hive metastore.

0 kudos

11-17-2023 3:30:09 AM

2 More Replies

by JonLaRose • New Contributor III

10-30-2023 5:21:59 AM

1063 Views
2 replies
0 kudos

Adding custom Jars to SQL Warehouses

Hi there,I want to add custom JARs to an SQL warehouse (Pro if that matters) like I can in an interactive cluster, yet I don't see a way.Is that a degraded functionality when transitioning to a SQL warehouse, or have I missed something? Thank you.

Data Engineering

1063 Views
2 replies
0 kudos

10-30-2023 5:21:59 AM

View Replies

Latest Reply

Kaniz
Community Manager

11-17-2023 3:16:36 AM

0 kudos

Hi @JonLaRose , You can add custom JARs to an SQL warehouse in Databricks. The ADD JAR command is used to add a JAR file to the list of resources in Databricks Runtime. Here’s an example of how to use the ADD JAR command: ADD JAR /tmp/test.jar; Th...

0 kudos

11-17-2023 3:16:36 AM

1 More Replies

by chari • Contributor

10-29-2023 11:22:46 PM

2745 Views
3 replies
1 kudos

Cant connect power BI desktop to Azure databricks

Hello,I am trying to connect Power BI desktop to azure databricks (source: delta table) by downloading a connection file from Databricks. I see an error message like below when I open the connection file with power BI. Repeated attempts have given th...

Data Engineering

2745 Views
3 replies
1 kudos

10-29-2023 11:22:46 PM

View Replies

Latest Reply

Kaniz
Community Manager

11-17-2023 2:52:51 AM

1 kudos

Hi @chari , To resolve this issue, I would recommend checking the following: Ensure that the connection file you downloaded from Databricks is correct and up-to-date.Check if the Databricks server is up and running.Verify that the Databricks server...

1 kudos

11-17-2023 2:52:51 AM

2 More Replies

by Michael_Appiah • New Contributor III

10-14-2023 9:48:08 AM

4445 Views
3 replies
1 kudos

Resolved! Hashing Functions in PySpark

Hashes are commonly used in SCD2 merges to determine whether data has changed by comparing the hashes of the new rows in the source with the hashes of the existing rows in the target table. PySpark offers multiple different hashing functions like:MD5...

Data Engineering

4445 Views
3 replies
1 kudos

10-14-2023 9:48:08 AM

View Replies

Latest Reply

Michael_Appiah
New Contributor III

10-17-2023 9:19:30 AM

1 kudos

Hi @Kaniz ,thank you for your comprehensive answer. What is your opinion on the trade-off between using a hash like xxHASH64 which returns a LongType column and thus would offer good performance when there is a need to join on the hash column versus ...

1 kudos

10-17-2023 9:19:30 AM

2 More Replies

by VtotheG • New Contributor

11-17-2023 2:03:12 AM

752 Views
0 replies
0 kudos

Problem Visual Studio Plugin with custom modules

We are using the Databricks Visual Studio Plugin to write our python / spark code.We are using the upload file to databricks functionality because our organisation has turned unity catelog off. We are now running into a weird bug with custom modules....

Data Engineering

databricks visual studio plug in

visual studio code

752 Views
0 replies
0 kudos

11-17-2023 2:03:12 AM

by alj_a • New Contributor III

11-16-2023 10:32:45 PM

736 Views
1 replies
1 kudos

Connect databricks delta lake which is hosted in AWS from PowerBI - conn str/push dataset

Hi,I have a requirement. databricks has been hosted in AWS. and, i need to read the delta table from powerbi. tried push dataset but not working. is there any way to connect.we are using Active Directory as company wide

Data Engineering

aws databrics

736 Views
1 replies
1 kudos

11-16-2023 10:32:45 PM

View Replies

Latest Reply

Kaniz
Community Manager

11-17-2023 1:59:17 AM

1 kudos

Hi @alj_a, it is possible to connect Power BI to Delta Lake tables hosted on Databricks on AWS. You can use the Azure Databricks Power BI connector to connect Power BI Desktop to your Azure Databricks clusters and Databricks SQL warehouses 12. Here...

1 kudos

11-17-2023 1:59:17 AM

by sriradh • New Contributor

10-05-2023 4:12:17 PM

719 Views
1 replies
0 kudos

Resolved! ACID properties in delta?

How are locks maintained within a Delta Lake? For instance, lets say there are 2 simple tables, customer_details and say orders. Lets say I am running a job that will say insert an order in the orders table for say $100 for a specific customerId, it ...

Data Engineering

acid

delta

719 Views
1 replies
0 kudos

10-05-2023 4:12:17 PM

View Replies

Latest Reply

Kaniz
Community Manager

11-17-2023 1:54:26 AM

0 kudos

Hi @sriradh, In Delta Lake, ACID transaction guarantees are provided between reads and writes. This means multiple writers across multiple clusters can modify a table partition simultaneously. Writers see a consistent snapshot view of the table, and...

0 kudos

11-17-2023 1:54:26 AM

by RiyuLite • New Contributor III

10-05-2023 1:06:12 AM

1380 Views
7 replies
4 kudos

Where do I get Account level logs after enabling diagnostic logs for Azure databricks?

I need to retrieve the accountBillage usage from Audit logsI have enabled Diagnostic logs, and it's been 36 hours. While enabling the logs , I selected every possible logs in this image. But still i am not able to see the containers for account level...

Data Engineering

1380 Views
7 replies
4 kudos

10-05-2023 1:06:12 AM

View Replies

Latest Reply

RiyuLite
New Contributor III

10-06-2023 3:05:03 AM

4 kudos

Hi @Kaniz , I checked Azure Monitoring and log delivery documentations, The log delivery is same as workspace level.What is the procedure to enable account level service in audit logs for Azure ?

4 kudos

10-06-2023 3:05:03 AM

6 More Replies

by sanjay • Valued Contributor II

10-05-2023 12:34:49 AM

1082 Views
3 replies
4 kudos

Resolved! Trigger Events in data pipeline

Hi,I am running datapipeline in databrick using matillion architecture. I am facing inconsistent events in silver to gold layer in case any row deleted/updated from a partition. Let me explain with example.e.g. I have data in silver layer with partit...

Data Engineering

1082 Views
3 replies
4 kudos

10-05-2023 12:34:49 AM

View Replies

Latest Reply

sanjay
Valued Contributor II

10-11-2023 3:08:37 AM

4 kudos

Thank you Kaniz. Further queries on this.1. If I have nested partitions e.g. on department & date, finance->09, finance->10 and if am updating one record in finance->09 then will then updates partition finance->10 as well2. Is it good idea to have sm...

4 kudos

10-11-2023 3:08:37 AM

2 More Replies

User

Count

1602

736

344

284

247

Databricks

Forum Posts

Resolved! How to start local/city databricks user group?

Delta live table blocks pipeline autoloader rate limit

Uncertainty on Databricks job variables: {{run_id}}, {{parent_run_id}}.

Resolved! JavaPackage object is not callable - pydeequ

Time travel on views

Pulling data from Azure Boards (Specifically 'Analytics Views' into databricks

Combining DLT and workflow - MATERIALIZED_VIEW_OPERATION_NOT_ALLOWED

Adding custom Jars to SQL Warehouses

Cant connect power BI desktop to Azure databricks

Resolved! Hashing Functions in PySpark

Problem Visual Studio Plugin with custom modules

Connect databricks delta lake which is hosted in AWS from PowerBI - conn str/push dataset

Resolved! ACID properties in delta?

Where do I get Account level logs after enabling diagnostic logs for Azure databricks?

Resolved! Trigger Events in data pipeline

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...