Data Engineering

Forum Posts

Sorted by:

by pokus • New Contributor III

03-21-2023 2:23:27 AM

2880 Views
3 replies
2 kudos

Resolved! use DeltaLog class in databricks cluster

I need to use DeltaLog class in the code to get the AddFiles dataset. I have to keep the implemented code in a repo and run it in databricks cluster. Some docs say to use org.apache.spark.sql.delta.DeltaLog class, but it seems databricks gets rid of ...

Data Engineering

2880 Views
3 replies
2 kudos

03-21-2023 2:23:27 AM

View Replies

Latest Reply

dbal
New Contributor III

03-18-2024 1:02:39 PM

2 kudos

Thanks for providing a solution @pokus .What I dont understand is why Databricks cannot provide the DeltaLog at runtime. How can this be the official solution? We need a better solution for this instead of depending on reflections.

2 kudos

03-18-2024 1:02:39 PM

2 More Replies

by RobsonNLPT • Contributor

03-14-2024 8:40:29 AM

972 Views
3 replies
0 kudos

Resolved! scala-xml : how to move child to another parent node

Hi allThe mandatory rowTag for writing to XML cause doesn't make any sense as I have the complete nested dataframe schema.In my case I need to implement an extra step to remove that extra node (default: Row) after xml generation.I need some examples ...

Data Engineering

972 Views
3 replies
0 kudos

03-14-2024 8:40:29 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-15-2024 2:26:03 AM

0 kudos

Hi @RobsonNLPT, Working with XML in Scala using the scala-xml library can be powerful and flexible. Let’s break down your requirements and provide an example of how to achieve this. Removing the “Row” Node: When converting a DataFrame to XML, th...

0 kudos

03-15-2024 2:26:03 AM

2 More Replies

by LoiNguyen • New Contributor II

07-27-2021 3:32:32 AM

10961 Views
5 replies
2 kudos

The authentication type 10 is not supported

I use below code to connect to postgresql. df = spark.read \ .jdbc("jdbc:postgresql://hostname:5432/dbname", "schema.table", properties={"user": "user", "password": "password"})\ .load() df.printSchema() However, I got the ...

Data Engineering

10961 Views
5 replies
2 kudos

07-27-2021 3:32:32 AM

View Replies

Latest Reply

simboss
New Contributor II

09-04-2023 11:54:50 PM

2 kudos

But how are we going to do this for those who use Windows?

2 kudos

09-04-2023 11:54:50 PM

4 More Replies

by Hertz • New Contributor

03-18-2024 6:37:11 AM

350 Views
1 replies
0 kudos

Structured Streaming Event in Audit Logs

I am trying to monitor when a table is created or updated using the audit logs. I have found that structured streaming writes/appends are not captured in the audit logs? Am I missing something shouldn't this be captured as a unity catalog event. Eith...

Data Engineering

Audit Logs

structured streaming

350 Views
1 replies
0 kudos

03-18-2024 6:37:11 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-18-2024 7:16:53 AM

0 kudos

Hi @Hertz, Monitoring table creation and updates using audit logs is essential for maintaining data governance and security. Let’s explore this further. Databricks, being a cloud-native platform, provides audit logs that allow administrators to t...

0 kudos

03-18-2024 7:16:53 AM

by Floody • New Contributor II

03-18-2024 6:31:34 AM

376 Views
1 replies
1 kudos

Delta Live Tables use case

Hi all,We have the following use case and wondering if DLT is the correct approach.Landing area with daily dumps of parquet files into our Data Lake container.The daily dump does a full overwrite of the parquet each time, keeping the same file name.T...

Data Engineering

Delta Live Tables

376 Views
1 replies
1 kudos

03-18-2024 6:31:34 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-18-2024 7:10:06 AM

1 kudos

Hi @Floody, Let’s explore how Delta Lake (DLT) can be a suitable approach for your use case. Delta Lake Overview: Delta Lake is an open source storage layer that brings ACID transactions to Apache Spark™ and big data workloads. It provides reliab...

1 kudos

03-18-2024 7:10:06 AM

by PassionateDBD • New Contributor II

03-15-2024 9:03:12 AM

253 Views
2 replies
1 kudos

Is it possible to create/update non dlt table in init phase of dlt task?

We have a dlt task that is written in python. Is it possible to create or update a delta table programatically from inside a dlt task? The delta table would not be managed from inside the dlt task because we never want to fully refresh that table. Th...

Data Engineering

253 Views
2 replies
1 kudos

03-15-2024 9:03:12 AM

View Replies

Latest Reply

PassionateDBD
New Contributor II

03-18-2024 7:05:17 AM

1 kudos

Thanks for you reply @Kaniz ! I'm aware of the possibility to create or not create a table based on some parameter.What I'm trying to figure out is basically how to achieve following:-DLT pipeline starts and logs some information to a delta table.-On...

1 kudos

03-18-2024 7:05:17 AM

1 More Replies

by Pragati_17 • New Contributor II

03-18-2024 1:45:14 AM

497 Views
1 replies
0 kudos

Parameters Passing to dataset in Databricks Lakeview Dashboard

I have a date range filter in Lakeview Dashboard and i want to distinct count number of months in selected date range filter and divide it with one of the columns and that column is used in counter viualization. But passing parameters is not possible...

Data Engineering

497 Views
1 replies
0 kudos

03-18-2024 1:45:14 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-18-2024 6:58:04 AM

0 kudos

Hi @Pragati_17, Let’s break down the steps to achieve this in Databricks Lakeview Dashboard: Define Your Datasets: Use the Data tab in your Lakeview dashboard to define the underlying datasets. You can define datasets as follows: An existing Unit...

0 kudos

03-18-2024 6:58:04 AM

by srinivas_001 • New Contributor III

03-15-2024 8:27:21 AM

263 Views
2 replies
1 kudos

File trigger options -- cloudFiles.allowOverwrites

I have a Job configured to run on the file arrival I have provided the path as File arrival path: s3://test_bucket/test_cat/test_schema/When a new parquet file arrived in this path the job was triggering automatically and processed the fileIn case of...

Data Engineering

263 Views
2 replies
1 kudos

03-15-2024 8:27:21 AM

View Replies

Latest Reply

srinivas_001
New Contributor III

03-18-2024 3:40:55 AM

1 kudos

Hi Kaniz,Thank you for the response.I am using the databricks runtime 11.3, also checked the checkpoint and data source location which are properly configured. Still I am unable to trigger the job.NOTE: Incoming files are pushed to AWS s3 location fr...

1 kudos

03-18-2024 3:40:55 AM

1 More Replies

by Nisha2 • New Contributor II

02-22-2024 9:25:20 PM

426 Views
2 replies
0 kudos

Databricks spark_jar_task failed when submitted via API

Hello,We are submitting jobs to the data bricks cluster using /api/2.0/jobs/create this API and running a spark java application (jar that is submitted to this API). We are noticing Java application is executing as expected. however, we see that the...

Data Engineering

API

Databricks

spark

426 Views
2 replies
0 kudos

02-22-2024 9:25:20 PM

View Replies

Latest Reply

Kaniz
Community Manager

03-15-2024 4:04:10 AM

0 kudos

Hi @Nisha2 , It appears that you’re encountering issues with your Spark Java application running on Databricks. Let’s break down the error message and explore potential solutions: Spark Down Exception: The log indicates that Spark is detected to b...

0 kudos

03-15-2024 4:04:10 AM

1 More Replies

by Nurota • New Contributor

01-24-2024 10:22:54 AM

831 Views
1 replies
0 kudos

Describe table extended on materialized views - UC, DLT and cluster access modes

We have a daily job with a notebook that loops through all the databases and tables, and optimizes and vacuums them.Since in UC DLT tables are materialized views, the "optimize" or "vacuum" commands do not work on them, and they need to be excluded. ...

Data Engineering

cluster access mode

dlt

materialized views

optimize

Unity Catalog

831 Views
1 replies
0 kudos

01-24-2024 10:22:54 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-18-2024 1:52:55 AM

0 kudos

Hi @Nurota, Let’s delve into the intricacies of Databricks and explore why scenario 3 throws an error despite the shared access mode cluster and the service principal ownership. Cluster Type and Materialized Views: In Databricks, the type of clus...

0 kudos

03-18-2024 1:52:55 AM

by Kaniz • Community Manager

03-15-2024 4:15:06 AM

553 Views
2 replies
0 kudos

Passing Parameters Between Nested 'Run Job' Tasks in Databricks Workflows

Posting this on behalf of zaheer.abbas. I'm dealing with a similar scenario as mentioned here where I have jobs composed of tasks that need to pass parameters to each other, but all my tasks are configured as "Run Job" tasks rather than directly runn...

Data Engineering

553 Views
2 replies
0 kudos

03-15-2024 4:15:06 AM

View Replies

Latest Reply

zaheerabbas
New Contributor II

03-18-2024 1:37:28 AM

0 kudos

Thanks, @Kaniz, I have tried the above approach by setting values in the notebooks within the `Job Run` type tasks. But when retrieving them - the notebook runs into errors saying the task name is not defined in the workflow. The above approach of se...

0 kudos

03-18-2024 1:37:28 AM

1 More Replies

by ElaPG • New Contributor III

03-15-2024 8:16:43 AM

456 Views
2 replies
2 kudos

Cluster creation / unrestricted policy option

Hi,as an workspace admin I would like to disable cluster creation with "no isolation" access mode. I created a custom policy for that but I still have the option to create cluster with "unrestricted" policy. How can I make sure that nobody will creat...

Data Engineering

456 Views
2 replies
2 kudos

03-15-2024 8:16:43 AM

View Replies

Latest Reply

ElaPG
New Contributor III

03-18-2024 1:25:04 AM

2 kudos

Hi,thank you for a very informative reply.To sum up, in order to enforce these suggestions:- first solution must be executed on an account level- second solution must be executed on a workspace level (workspace level admin settings)

2 kudos

03-18-2024 1:25:04 AM

1 More Replies

by Coders • New Contributor II

03-15-2024 10:23:08 AM

314 Views
1 replies
0 kudos

New delta log folder is not getting created

I have following code which reads the stream of data and process the data in the foreachBatch and writes to the provided path as shown below.public static void writeToDatalake(SparkSession session, Configuration config, Dataset<Row> data, Entity enti...

Data Engineering

314 Views
1 replies
0 kudos

03-15-2024 10:23:08 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-17-2024 10:57:47 PM

0 kudos

Hi @Coders, It seems you’re encountering an issue while writing data to Delta Lake in Azure Databricks. The error message indicates that the format is incompatible, and it’s related to the absence of a transaction log. Let’s troubleshoot this togethe...

0 kudos

03-17-2024 10:57:47 PM

by Gilg • Contributor II

03-17-2024 1:51:34 PM

446 Views
1 replies
0 kudos

DLT Performance

Hi,Context:I have created a Delta Live Table pipeline in a UC enabled workspace that is set to Continuous.Within this pipeline,I have bronze which uses Autoloader and reads files stored in ADLS Gen2 storage account in a JSON file format. We received ...

Data Engineering

446 Views
1 replies
0 kudos

03-17-2024 1:51:34 PM

View Replies

Latest Reply

Kaniz
Community Manager

03-17-2024 10:40:26 PM

0 kudos

Hi @Gilg, It’s great that you’ve set up a Delta Live Table (DLT) pipeline! However, it’s not uncommon to encounter performance degradation as your data grows. Let’s explore some strategies to optimize your DLT pipeline: Partitioning and Clusterin...

0 kudos

03-17-2024 10:40:26 PM

by William_Scardua • Valued Contributor

11-28-2023 10:25:45 AM

9328 Views
3 replies
0 kudos

How to estimate dataframe size in bytes ?

How guys,How do I estimate the size in bytes from my dataframe (pyspark) ?Have any ideia ?Thank you

Data Engineering

9328 Views
3 replies
0 kudos

11-28-2023 10:25:45 AM

View Replies

Latest Reply

Enneagram1w2
New Contributor II

03-17-2024 9:21:06 AM

0 kudos

Unveil the Enneagram 1w9 mix: merging Type 1's perfectionism with Type 9's calm. Explore their key traits, hurdles, and development path. https://www.enneagramzoom.com/EnneagramTypes/EnneagramType1/Enneagram1w2

0 kudos

03-17-2024 9:21:06 AM

2 More Replies

User

Count

1602

736

344

284

247

Databricks

Forum Posts

Resolved! use DeltaLog class in databricks cluster

Resolved! scala-xml : how to move child to another parent node

The authentication type 10 is not supported

Structured Streaming Event in Audit Logs

Delta Live Tables use case

Is it possible to create/update non dlt table in init phase of dlt task?

Parameters Passing to dataset in Databricks Lakeview Dashboard

File trigger options -- cloudFiles.allowOverwrites

Databricks spark_jar_task failed when submitted via API

Describe table extended on materialized views - UC, DLT and cluster access modes

Passing Parameters Between Nested 'Run Job' Tasks in Databricks Workflows

Cluster creation / unrestricted policy option

New delta log folder is not getting created

DLT Performance

How to estimate dataframe size in bytes ?

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...