by
pokus
• New Contributor III
- 2880 Views
- 3 replies
- 2 kudos
I need to use DeltaLog class in the code to get the AddFiles dataset. I have to keep the implemented code in a repo and run it in databricks cluster. Some docs say to use org.apache.spark.sql.delta.DeltaLog class, but it seems databricks gets rid of ...
- 2880 Views
- 3 replies
- 2 kudos
Latest Reply
Thanks for providing a solution @pokus .What I dont understand is why Databricks cannot provide the DeltaLog at runtime. How can this be the official solution? We need a better solution for this instead of depending on reflections.
2 More Replies
- 972 Views
- 3 replies
- 0 kudos
Hi allThe mandatory rowTag for writing to XML cause doesn't make any sense as I have the complete nested dataframe schema.In my case I need to implement an extra step to remove that extra node (default: Row) after xml generation.I need some examples ...
- 972 Views
- 3 replies
- 0 kudos
Latest Reply
Hi @RobsonNLPT, Working with XML in Scala using the scala-xml library can be powerful and flexible.
Let’s break down your requirements and provide an example of how to achieve this.
Removing the “Row” Node: When converting a DataFrame to XML, th...
2 More Replies
- 10961 Views
- 5 replies
- 2 kudos
I use below code to connect to postgresql.
df = spark.read \
.jdbc("jdbc:postgresql://hostname:5432/dbname", "schema.table",
properties={"user": "user", "password": "password"})\
.load()
df.printSchema()
However, I got the ...
- 10961 Views
- 5 replies
- 2 kudos
Latest Reply
But how are we going to do this for those who use Windows?
4 More Replies
by
Hertz
• New Contributor
- 350 Views
- 1 replies
- 0 kudos
I am trying to monitor when a table is created or updated using the audit logs. I have found that structured streaming writes/appends are not captured in the audit logs? Am I missing something shouldn't this be captured as a unity catalog event. Eith...
- 350 Views
- 1 replies
- 0 kudos
Latest Reply
Hi @Hertz, Monitoring table creation and updates using audit logs is essential for maintaining data governance and security.
Let’s explore this further.
Databricks, being a cloud-native platform, provides audit logs that allow administrators to t...
by
Floody
• New Contributor II
- 376 Views
- 1 replies
- 1 kudos
Hi all,We have the following use case and wondering if DLT is the correct approach.Landing area with daily dumps of parquet files into our Data Lake container.The daily dump does a full overwrite of the parquet each time, keeping the same file name.T...
- 376 Views
- 1 replies
- 1 kudos
Latest Reply
Hi @Floody, Let’s explore how Delta Lake (DLT) can be a suitable approach for your use case.
Delta Lake Overview:
Delta Lake is an open source storage layer that brings ACID transactions to Apache Spark™ and big data workloads. It provides reliab...
- 253 Views
- 2 replies
- 1 kudos
We have a dlt task that is written in python. Is it possible to create or update a delta table programatically from inside a dlt task? The delta table would not be managed from inside the dlt task because we never want to fully refresh that table. Th...
- 253 Views
- 2 replies
- 1 kudos
Latest Reply
Thanks for you reply @Kaniz ! I'm aware of the possibility to create or not create a table based on some parameter.What I'm trying to figure out is basically how to achieve following:-DLT pipeline starts and logs some information to a delta table.-On...
1 More Replies
- 497 Views
- 1 replies
- 0 kudos
I have a date range filter in Lakeview Dashboard and i want to distinct count number of months in selected date range filter and divide it with one of the columns and that column is used in counter viualization. But passing parameters is not possible...
- 497 Views
- 1 replies
- 0 kudos
Latest Reply
Hi @Pragati_17, Let’s break down the steps to achieve this in Databricks Lakeview Dashboard:
Define Your Datasets:
Use the Data tab in your Lakeview dashboard to define the underlying datasets. You can define datasets as follows:
An existing Unit...
- 263 Views
- 2 replies
- 1 kudos
I have a Job configured to run on the file arrival I have provided the path as File arrival path: s3://test_bucket/test_cat/test_schema/When a new parquet file arrived in this path the job was triggering automatically and processed the fileIn case of...
- 263 Views
- 2 replies
- 1 kudos
Latest Reply
Hi Kaniz,Thank you for the response.I am using the databricks runtime 11.3, also checked the checkpoint and data source location which are properly configured. Still I am unable to trigger the job.NOTE: Incoming files are pushed to AWS s3 location fr...
1 More Replies
by
Nisha2
• New Contributor II
- 426 Views
- 2 replies
- 0 kudos
Hello,We are submitting jobs to the data bricks cluster using /api/2.0/jobs/create this API and running a spark java application (jar that is submitted to this API). We are noticing Java application is executing as expected. however, we see that the...
- 426 Views
- 2 replies
- 0 kudos
Latest Reply
Hi @Nisha2 , It appears that you’re encountering issues with your Spark Java application running on Databricks.
Let’s break down the error message and explore potential solutions:
Spark Down Exception:
The log indicates that Spark is detected to b...
1 More Replies
- 831 Views
- 1 replies
- 0 kudos
We have a daily job with a notebook that loops through all the databases and tables, and optimizes and vacuums them.Since in UC DLT tables are materialized views, the "optimize" or "vacuum" commands do not work on them, and they need to be excluded. ...
- 831 Views
- 1 replies
- 0 kudos
Latest Reply
Hi @Nurota, Let’s delve into the intricacies of Databricks and explore why scenario 3 throws an error despite the shared access mode cluster and the service principal ownership.
Cluster Type and Materialized Views:
In Databricks, the type of clus...
by
Kaniz
• Community Manager
- 553 Views
- 2 replies
- 0 kudos
Posting this on behalf of zaheer.abbas.
I'm dealing with a similar scenario as mentioned here where I have jobs composed of tasks that need to pass parameters to each other, but all my tasks are configured as "Run Job" tasks rather than directly runn...
- 553 Views
- 2 replies
- 0 kudos
Latest Reply
Thanks, @Kaniz, I have tried the above approach by setting values in the notebooks within the `Job Run` type tasks. But when retrieving them - the notebook runs into errors saying the task name is not defined in the workflow. The above approach of se...
1 More Replies
by
ElaPG
• New Contributor III
- 456 Views
- 2 replies
- 2 kudos
Hi,as an workspace admin I would like to disable cluster creation with "no isolation" access mode. I created a custom policy for that but I still have the option to create cluster with "unrestricted" policy. How can I make sure that nobody will creat...
- 456 Views
- 2 replies
- 2 kudos
Latest Reply
ElaPG
New Contributor III
Hi,thank you for a very informative reply.To sum up, in order to enforce these suggestions:- first solution must be executed on an account level- second solution must be executed on a workspace level (workspace level admin settings)
1 More Replies
by
Coders
• New Contributor II
- 314 Views
- 1 replies
- 0 kudos
I have following code which reads the stream of data and process the data in the foreachBatch and writes to the provided path as shown below.public static void writeToDatalake(SparkSession session, Configuration config, Dataset<Row> data, Entity enti...
- 314 Views
- 1 replies
- 0 kudos
Latest Reply
Hi @Coders, It seems you’re encountering an issue while writing data to Delta Lake in Azure Databricks. The error message indicates that the format is incompatible, and it’s related to the absence of a transaction log. Let’s troubleshoot this togethe...
- 446 Views
- 1 replies
- 0 kudos
Hi,Context:I have created a Delta Live Table pipeline in a UC enabled workspace that is set to Continuous.Within this pipeline,I have bronze which uses Autoloader and reads files stored in ADLS Gen2 storage account in a JSON file format. We received ...
- 446 Views
- 1 replies
- 0 kudos
Latest Reply
Hi @Gilg, It’s great that you’ve set up a Delta Live Table (DLT) pipeline! However, it’s not uncommon to encounter performance degradation as your data grows.
Let’s explore some strategies to optimize your DLT pipeline:
Partitioning and Clusterin...
- 9328 Views
- 3 replies
- 0 kudos
How guys,How do I estimate the size in bytes from my dataframe (pyspark) ?Have any ideia ?Thank you
- 9328 Views
- 3 replies
- 0 kudos
Latest Reply
Unveil the Enneagram 1w9 mix: merging Type 1's perfectionism with Type 9's calm. Explore their key traits, hurdles, and development path. https://www.enneagramzoom.com/EnneagramTypes/EnneagramType1/Enneagram1w2
2 More Replies