Data Engineering

Forum Posts

Sorted by:

by Nisha2 • New Contributor II

02-22-2024 9:25:20 PM

417 Views
2 replies
0 kudos

Databricks spark_jar_task failed when submitted via API

Hello,We are submitting jobs to the data bricks cluster using /api/2.0/jobs/create this API and running a spark java application (jar that is submitted to this API). We are noticing Java application is executing as expected. however, we see that the...

Data Engineering

API

Databricks

spark

417 Views
2 replies
0 kudos

02-22-2024 9:25:20 PM

View Replies

Latest Reply

Kaniz
Community Manager

03-15-2024 4:04:10 AM

0 kudos

Hi @Nisha2 , It appears that you’re encountering issues with your Spark Java application running on Databricks. Let’s break down the error message and explore potential solutions: Spark Down Exception: The log indicates that Spark is detected to b...

0 kudos

03-15-2024 4:04:10 AM

1 More Replies

by Nurota • New Contributor

01-24-2024 10:22:54 AM

814 Views
1 replies
0 kudos

Describe table extended on materialized views - UC, DLT and cluster access modes

We have a daily job with a notebook that loops through all the databases and tables, and optimizes and vacuums them.Since in UC DLT tables are materialized views, the "optimize" or "vacuum" commands do not work on them, and they need to be excluded. ...

Data Engineering

cluster access mode

dlt

materialized views

optimize

Unity Catalog

814 Views
1 replies
0 kudos

01-24-2024 10:22:54 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-18-2024 1:52:55 AM

0 kudos

Hi @Nurota, Let’s delve into the intricacies of Databricks and explore why scenario 3 throws an error despite the shared access mode cluster and the service principal ownership. Cluster Type and Materialized Views: In Databricks, the type of clus...

0 kudos

03-18-2024 1:52:55 AM

by Kaniz • Community Manager

03-15-2024 4:15:06 AM

528 Views
2 replies
0 kudos

Passing Parameters Between Nested 'Run Job' Tasks in Databricks Workflows

Posting this on behalf of zaheer.abbas. I'm dealing with a similar scenario as mentioned here where I have jobs composed of tasks that need to pass parameters to each other, but all my tasks are configured as "Run Job" tasks rather than directly runn...

Data Engineering

528 Views
2 replies
0 kudos

03-15-2024 4:15:06 AM

View Replies

Latest Reply

zaheerabbas
New Contributor II

03-18-2024 1:37:28 AM

0 kudos

Thanks, @Kaniz, I have tried the above approach by setting values in the notebooks within the `Job Run` type tasks. But when retrieving them - the notebook runs into errors saying the task name is not defined in the workflow. The above approach of se...

0 kudos

03-18-2024 1:37:28 AM

1 More Replies

by ElaPG • New Contributor III

03-15-2024 8:16:43 AM

443 Views
2 replies
2 kudos

Cluster creation / unrestricted policy option

Hi,as an workspace admin I would like to disable cluster creation with "no isolation" access mode. I created a custom policy for that but I still have the option to create cluster with "unrestricted" policy. How can I make sure that nobody will creat...

Data Engineering

443 Views
2 replies
2 kudos

03-15-2024 8:16:43 AM

View Replies

Latest Reply

ElaPG
New Contributor III

03-18-2024 1:25:04 AM

2 kudos

Hi,thank you for a very informative reply.To sum up, in order to enforce these suggestions:- first solution must be executed on an account level- second solution must be executed on a workspace level (workspace level admin settings)

2 kudos

03-18-2024 1:25:04 AM

1 More Replies

by Coders • New Contributor II

03-15-2024 10:23:08 AM

302 Views
1 replies
0 kudos

New delta log folder is not getting created

I have following code which reads the stream of data and process the data in the foreachBatch and writes to the provided path as shown below.public static void writeToDatalake(SparkSession session, Configuration config, Dataset<Row> data, Entity enti...

Data Engineering

302 Views
1 replies
0 kudos

03-15-2024 10:23:08 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-17-2024 10:57:47 PM

0 kudos

Hi @Coders, It seems you’re encountering an issue while writing data to Delta Lake in Azure Databricks. The error message indicates that the format is incompatible, and it’s related to the absence of a transaction log. Let’s troubleshoot this togethe...

0 kudos

03-17-2024 10:57:47 PM

by Gilg • Contributor II

03-17-2024 1:51:34 PM

442 Views
1 replies
0 kudos

DLT Performance

Hi,Context:I have created a Delta Live Table pipeline in a UC enabled workspace that is set to Continuous.Within this pipeline,I have bronze which uses Autoloader and reads files stored in ADLS Gen2 storage account in a JSON file format. We received ...

Data Engineering

442 Views
1 replies
0 kudos

03-17-2024 1:51:34 PM

View Replies

Latest Reply

Kaniz
Community Manager

03-17-2024 10:40:26 PM

0 kudos

Hi @Gilg, It’s great that you’ve set up a Delta Live Table (DLT) pipeline! However, it’s not uncommon to encounter performance degradation as your data grows. Let’s explore some strategies to optimize your DLT pipeline: Partitioning and Clusterin...

0 kudos

03-17-2024 10:40:26 PM

by William_Scardua • Valued Contributor

11-28-2023 10:25:45 AM

9240 Views
3 replies
0 kudos

How to estimate dataframe size in bytes ?

How guys,How do I estimate the size in bytes from my dataframe (pyspark) ?Have any ideia ?Thank you

Data Engineering

9240 Views
3 replies
0 kudos

11-28-2023 10:25:45 AM

View Replies

Latest Reply

Enneagram1w2
New Contributor II

03-17-2024 9:21:06 AM

0 kudos

Unveil the Enneagram 1w9 mix: merging Type 1's perfectionism with Type 9's calm. Explore their key traits, hurdles, and development path. https://www.enneagramzoom.com/EnneagramTypes/EnneagramType1/Enneagram1w2

0 kudos

03-17-2024 9:21:06 AM

2 More Replies

by Abdul1 • New Contributor

03-16-2024 6:23:42 AM

268 Views
1 replies
0 kudos

How to output data from Databricks?

Hello,I am just starting with Databricks in Azure and I need to output the data to an Affinity CRM system.Affinity has an API and I am wondering is there any sort of automated / data pipeline sort of way to tell databricks to just pump the data into ...

Data Engineering

268 Views
1 replies
0 kudos

03-16-2024 6:23:42 AM

View Replies

Latest Reply

Edthehead
New Contributor III

03-16-2024 6:06:43 PM

0 kudos

We need more info on what kind of data, volume and what the called APi can handle. Calling an API for single records in parallel can be achieved using UDF(see THIS). You need to be careful to batch the records so that the target API can handle the pa...

0 kudos

03-16-2024 6:06:43 PM

by Edthehead • New Contributor III

03-15-2024 12:58:26 AM

389 Views
2 replies
0 kudos

Parameterized Delta live table pipeline

I'm trying to create an ETL framework on delta live tables and basically use the same pipeline for all the transformation from bronze to silver to gold. This works absolutely fine when I hard code the tables and the SQL transformations as an array wi...

Data Engineering

Databricks

Delta Live Table

dlt

389 Views
2 replies
0 kudos

03-15-2024 12:58:26 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-15-2024 2:17:28 AM

0 kudos

Hi @Edthehead, Configuring your ETL framework for Delta Live Tables (DLT) can be done in a flexible and maintainable way. Let’s explore some options: Pipeline Settings in DLT: DLT provides a user-friendly interface for configuring pipeline settin...

0 kudos

03-15-2024 2:17:28 AM

1 More Replies

by george_ognyanov • New Contributor III

11-29-2023 3:58:00 AM

363 Views
1 replies
1 kudos

Orchestrate jobs using a parameter set in a notebook

I am trying to orchestrate my Databricks Workflows tasks using a parameter I would set in a notebook.Given the workflow below I am trying to set a parameter in the Cinderella task which is a python notebook. Once set I would like to use this paramete...

Data Engineering

363 Views
1 replies
1 kudos

11-29-2023 3:58:00 AM

View Replies

Latest Reply

Panda
New Contributor II

03-16-2024 5:34:38 PM

1 kudos

Here's how we can proceed, follow the instructions below:In your previous task, depending on whether you're using Python or Scala, set the task value like this:dbutils.jobs.taskValues.set("check_value", "2")In your if-else task, you must reference th...

1 kudos

03-16-2024 5:34:38 PM

by tyas • New Contributor II

03-15-2024 10:42:01 PM

785 Views
1 replies
1 kudos

Defining Keys

Hello,I have a DataFrame in a Databricks notebook that I've already read and transformed using PySpark-Python. I want to create a table with defined keys (primary and foreign). What is the best method to do this:Create a table and directly define key...

Data Engineering

785 Views
1 replies
1 kudos

03-15-2024 10:42:01 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

03-16-2024 1:30:45 PM

1 kudos

Remember that keys are for information purposes (they don't validate data integrity). They are used for information in a few places (Feature tables, online tables, PowerBi modelling). The best is to define them in CREATE TABLE syntax, for example:CRE...

1 kudos

03-16-2024 1:30:45 PM

by Jorge3 • New Contributor III

03-15-2024 1:59:28 AM

767 Views
3 replies
1 kudos

Dynamic partition overwrite with Streaming Data

Hi,I'm working on a job that propagate updates of data from a delta table to a parquet files (requirement of the consumer). The data is partitioned by day (year > month > day) and the daily data is updated every hour. I'm using table read streaming w...

Data Engineering

767 Views
3 replies
1 kudos

03-15-2024 1:59:28 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

03-16-2024 12:11:08 PM

1 kudos

Why not migrate to Delta and just use MERGE inside forEachBatch?

1 kudos

03-16-2024 12:11:08 PM

2 More Replies

by kickbuttowski • New Contributor II

03-16-2024 7:44:19 AM

426 Views
1 replies
1 kudos

Resolved! issue in loading the json files in same container with different schemas

Could you tell whether this scenario will work or not Scenario : i have a container which is having two different json files with diff schemas which will be coming in a streaming manner , i am using an auto loader here to load the files incrementall...

Data Engineering

426 Views
1 replies
1 kudos

03-16-2024 7:44:19 AM

View Replies

Latest Reply

MichTalebzadeh
Contributor

03-16-2024 9:07:26 AM

1 kudos

Short answer is no. A single Spark AutoLoader typically cannot handle JSON files in a container with two different schemas by default.. AutoLoader relies on schema inference to determine the data structure. It analyses a sample of data from files ass...

1 kudos

03-16-2024 9:07:26 AM

by Sans1 • New Contributor II

03-14-2024 7:49:12 PM

348 Views
2 replies
1 kudos

Delta table vs dynamic views

Hi,My current design is to host the gold layer as dynamic views with masking. I will have couple of use cases that needs the views to be queried with filters.Does this provide equal performance like tables (which has data skipping based on transactio...

Data Engineering

348 Views
2 replies
1 kudos

03-14-2024 7:49:12 PM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

03-14-2024 9:53:09 PM

1 kudos

Hi @Sans1 Have you only used masking, or you have used any row or column level access control?If it's only masking, then you should go with delta table and if it's row or column level access control then you should prefer dynamic views

1 kudos

03-14-2024 9:53:09 PM

1 More Replies

by Sans • New Contributor III

03-12-2024 11:18:49 PM

820 Views
7 replies
3 kudos

Unable to create new compute in community databricks

Hi Team,I am unable to create computer in databricks community due to below error. Please advice.Bootstrap Timeout:Node daemon ping timeout in 780000 ms for instance i-0ab6798b2c762fb25 @ 10.172.246.217. Please check network connectivity between the ...

Data Engineering

820 Views
7 replies
3 kudos

03-12-2024 11:18:49 PM

View Replies

Latest Reply

Sans
New Contributor III

03-15-2024 11:07:04 PM

3 kudos

This issue was resolved for some time but again reoccurring from yesterday. Please advice

3 kudos

03-15-2024 11:07:04 PM

6 More Replies

User

Count

1602

736

343

284

247

Databricks

Forum Posts

Databricks spark_jar_task failed when submitted via API

Describe table extended on materialized views - UC, DLT and cluster access modes

Passing Parameters Between Nested 'Run Job' Tasks in Databricks Workflows

Cluster creation / unrestricted policy option

New delta log folder is not getting created

DLT Performance

How to estimate dataframe size in bytes ?

How to output data from Databricks?

Parameterized Delta live table pipeline

Orchestrate jobs using a parameter set in a notebook

Defining Keys

Dynamic partition overwrite with Streaming Data

Resolved! issue in loading the json files in same container with different schemas

Delta table vs dynamic views

Unable to create new compute in community databricks

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...