cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

pokus
by New Contributor III
  • 2880 Views
  • 3 replies
  • 2 kudos

Resolved! use DeltaLog class in databricks cluster

I need to use DeltaLog class in the code to get the AddFiles dataset. I have to keep the implemented code in a repo and run it in databricks cluster. Some docs say to use org.apache.spark.sql.delta.DeltaLog class, but it seems databricks gets rid of ...

  • 2880 Views
  • 3 replies
  • 2 kudos
Latest Reply
dbal
New Contributor III
  • 2 kudos

Thanks for providing a solution @pokus .What I dont understand is why Databricks cannot provide the DeltaLog at runtime. How can this be the official solution? We need a better solution for this instead of depending on reflections.

  • 2 kudos
2 More Replies
RobsonNLPT
by Contributor
  • 972 Views
  • 3 replies
  • 0 kudos

Resolved! scala-xml : how to move child to another parent node

Hi allThe mandatory rowTag for writing to XML cause doesn't make any sense as I have the complete nested dataframe schema.In my case I need to implement an extra step to remove that extra node (default: Row) after xml generation.I need some examples ...

  • 972 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @RobsonNLPT, Working with XML in Scala using the scala-xml library can be powerful and flexible. Let’s break down your requirements and provide an example of how to achieve this. Removing the “Row” Node: When converting a DataFrame to XML, th...

  • 0 kudos
2 More Replies
LoiNguyen
by New Contributor II
  • 10961 Views
  • 5 replies
  • 2 kudos

The authentication type 10 is not supported

I use below code to connect to postgresql. df = spark.read \ .jdbc("jdbc:postgresql://hostname:5432/dbname", "schema.table", properties={"user": "user", "password": "password"})\ .load() df.printSchema() However, I got the ...

  • 10961 Views
  • 5 replies
  • 2 kudos
Latest Reply
simboss
New Contributor II
  • 2 kudos

But how are we going to do this for those who use Windows?

  • 2 kudos
4 More Replies
Hertz
by New Contributor
  • 350 Views
  • 1 replies
  • 0 kudos

Structured Streaming Event in Audit Logs

I am trying to monitor when a table is created or updated using the audit logs. I have found that structured streaming writes/appends are not captured in the audit logs? Am I missing something shouldn't this be captured as a unity catalog event. Eith...

Data Engineering
Audit Logs
structured streaming
  • 350 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Hertz, Monitoring table creation and updates using audit logs is essential for maintaining data governance and security. Let’s explore this further. Databricks, being a cloud-native platform, provides audit logs that allow administrators to t...

  • 0 kudos
Floody
by New Contributor II
  • 376 Views
  • 1 replies
  • 1 kudos

Delta Live Tables use case

Hi all,We have the following use case and wondering if DLT is the correct approach.Landing area with daily dumps of parquet files into our Data Lake container.The daily dump does a full overwrite of the parquet each time, keeping the same file name.T...

Data Engineering
Delta Live Tables
  • 376 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @Floody, Let’s explore how Delta Lake (DLT) can be a suitable approach for your use case. Delta Lake Overview: Delta Lake is an open source storage layer that brings ACID transactions to Apache Spark™ and big data workloads. It provides reliab...

  • 1 kudos
PassionateDBD
by New Contributor II
  • 253 Views
  • 2 replies
  • 1 kudos

Is it possible to create/update non dlt table in init phase of dlt task?

We have a dlt task that is written in python. Is it possible to create or update a delta table programatically from inside a dlt task? The delta table would not be managed from inside the dlt task because we never want to fully refresh that table. Th...

  • 253 Views
  • 2 replies
  • 1 kudos
Latest Reply
PassionateDBD
New Contributor II
  • 1 kudos

Thanks for you reply @Kaniz ! I'm aware of the possibility to create or not create a table based on some parameter.What I'm trying to figure out is basically how to achieve following:-DLT pipeline starts and logs some information to a delta table.-On...

  • 1 kudos
1 More Replies
Pragati_17
by New Contributor II
  • 497 Views
  • 1 replies
  • 0 kudos

Parameters Passing to dataset in Databricks Lakeview Dashboard

I have a date range filter in Lakeview Dashboard and i want to distinct count number of months in selected date range filter and divide it with one of the columns and that column is used in counter viualization. But passing parameters is not possible...

  • 497 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Pragati_17, Let’s break down the steps to achieve this in Databricks Lakeview Dashboard: Define Your Datasets: Use the Data tab in your Lakeview dashboard to define the underlying datasets. You can define datasets as follows: An existing Unit...

  • 0 kudos
srinivas_001
by New Contributor III
  • 263 Views
  • 2 replies
  • 1 kudos

File trigger options -- cloudFiles.allowOverwrites

I have a Job configured to run on the file arrival I have provided the path as File arrival path: s3://test_bucket/test_cat/test_schema/When a new parquet file arrived in this path the job was triggering automatically and processed the fileIn case of...

  • 263 Views
  • 2 replies
  • 1 kudos
Latest Reply
srinivas_001
New Contributor III
  • 1 kudos

Hi Kaniz,Thank you for the response.I am using the databricks runtime 11.3, also checked the checkpoint and data source location which are properly configured. Still I am unable to trigger the job.NOTE: Incoming files are pushed to AWS s3 location fr...

  • 1 kudos
1 More Replies
Nisha2
by New Contributor II
  • 426 Views
  • 2 replies
  • 0 kudos

Databricks spark_jar_task failed when submitted via API

Hello,We are submitting jobs to the data bricks cluster using  /api/2.0/jobs/create this API and running a spark java application (jar that is submitted to this API). We are noticing Java application is executing as expected. however, we see that the...

Data Engineering
API
Databricks
spark
  • 426 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Nisha2 , It appears that you’re encountering issues with your Spark Java application running on Databricks. Let’s break down the error message and explore potential solutions: Spark Down Exception: The log indicates that Spark is detected to b...

  • 0 kudos
1 More Replies
Nurota
by New Contributor
  • 831 Views
  • 1 replies
  • 0 kudos

Describe table extended on materialized views - UC, DLT and cluster access modes

We have a daily job with a notebook that loops through all the databases and tables, and optimizes and vacuums them.Since in UC DLT tables are materialized views, the "optimize" or "vacuum" commands do not work on them, and they need to be excluded. ...

Data Engineering
cluster access mode
dlt
materialized views
optimize
Unity Catalog
  • 831 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Nurota, Let’s delve into the intricacies of Databricks and explore why scenario 3 throws an error despite the shared access mode cluster and the service principal ownership. Cluster Type and Materialized Views: In Databricks, the type of clus...

  • 0 kudos
Kaniz
by Community Manager
  • 553 Views
  • 2 replies
  • 0 kudos

Passing Parameters Between Nested 'Run Job' Tasks in Databricks Workflows

Posting this on behalf of zaheer.abbas. I'm dealing with a similar scenario as mentioned here where I have jobs composed of tasks that need to pass parameters to each other, but all my tasks are configured as "Run Job" tasks rather than directly runn...

  • 553 Views
  • 2 replies
  • 0 kudos
Latest Reply
zaheerabbas
New Contributor II
  • 0 kudos

Thanks, @Kaniz, I have tried the above approach by setting values in the notebooks within the `Job Run` type tasks. But when retrieving them - the notebook runs into errors saying the task name is not defined in the workflow. The above approach of se...

  • 0 kudos
1 More Replies
ElaPG
by New Contributor III
  • 456 Views
  • 2 replies
  • 2 kudos

Cluster creation / unrestricted policy option

Hi,as an workspace admin I would like to disable cluster creation with "no isolation" access mode. I created a custom policy for that but I still have the option to create cluster with "unrestricted" policy. How can I make sure that nobody will creat...

  • 456 Views
  • 2 replies
  • 2 kudos
Latest Reply
ElaPG
New Contributor III
  • 2 kudos

Hi,thank you for a very informative reply.To sum up, in order to enforce these suggestions:- first solution must be executed on an account level- second solution must be executed on a workspace level (workspace level admin settings)

  • 2 kudos
1 More Replies
Coders
by New Contributor II
  • 314 Views
  • 1 replies
  • 0 kudos

New delta log folder is not getting created

I have following code which reads the stream of data and process the data in the foreachBatch and writes to the provided path as shown below.public static void writeToDatalake(SparkSession session, Configuration config, Dataset<Row> data, Entity enti...

  • 314 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Coders, It seems you’re encountering an issue while writing data to Delta Lake in Azure Databricks. The error message indicates that the format is incompatible, and it’s related to the absence of a transaction log. Let’s troubleshoot this togethe...

  • 0 kudos
Gilg
by Contributor II
  • 446 Views
  • 1 replies
  • 0 kudos

DLT Performance

Hi,Context:I have created a Delta Live Table pipeline in a UC enabled workspace that is set to Continuous.Within this pipeline,I have bronze which uses Autoloader and reads files stored in ADLS Gen2 storage account in a JSON file format. We received ...

  • 446 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Gilg, It’s great that you’ve set up a Delta Live Table (DLT) pipeline! However, it’s not uncommon to encounter performance degradation as your data grows. Let’s explore some strategies to optimize your DLT pipeline: Partitioning and Clusterin...

  • 0 kudos
William_Scardua
by Valued Contributor
  • 9328 Views
  • 3 replies
  • 0 kudos

How to estimate dataframe size in bytes ?

How guys,How do I estimate the size in bytes from my dataframe (pyspark) ?Have any ideia ?Thank you

  • 9328 Views
  • 3 replies
  • 0 kudos
Latest Reply
Enneagram1w2
New Contributor II
  • 0 kudos

Unveil the Enneagram 1w9 mix: merging Type 1&#x27;s perfectionism with Type 9&#x27;s calm. Explore their key traits, hurdles, and development path.  https://www.enneagramzoom.com/EnneagramTypes/EnneagramType1/Enneagram1w2

  • 0 kudos
2 More Replies
Labels
Top Kudoed Authors