I have a use case to create a table using JSON files. There are 36 million files in the upstream(S3 bucket). I just created a volume on top of it. So the volume has 36M files. I'm trying to form a data frame by reading this volume using the below sp...
Hi @Sampath_Kumar, Let’s delve into the limitations and best practices related to Databricks volumes.
Volume Limitations:
Managed Volumes: These are Unity Catalog-governed storage volumes created within the default storage location of the contain...
Hi, I'm trying to create some 3D charts. With the same code and same cluster, sometimes it can show, sometimes it cannot. Previously it cannot display, but last week I opened a notebook with failed run and found the result can be shown by itself (as ...
I'm just getting started with Databricks and wondering if it is possible to ingest a GeoJSON or GeoParquet file into a new table without writing code? My goal here is to load vector data into a table and perform H3 polyfill operations on all the vect...
I'm looking at this page (Databricks Asset Bundles development work tasks) in the Databricks documentation.When repo assets are deployed to a databricks workspace, it is not clear if the "databricks bundle deploy" will remove files from the target wo...
One further question:The purpose of “databricks bundle destroy” is to remove all previously-deployed jobs, pipelines, and artifacts that are defined in the bundle configuration files.Which bundle configuration files? The ones in the repo? Or are ther...
I need to use DeltaLog class in the code to get the AddFiles dataset. I have to keep the implemented code in a repo and run it in databricks cluster. Some docs say to use org.apache.spark.sql.delta.DeltaLog class, but it seems databricks gets rid of ...
Thanks for providing a solution @pokus .What I dont understand is why Databricks cannot provide the DeltaLog at runtime. How can this be the official solution? We need a better solution for this instead of depending on reflections.
Hey FolksI have dbc file in a git repo and i cloned in the databricks when tried to open the .dbc file it is saying ```Failed to load file. The file encoding is not supported```can anyone please advice me on this #help #beginner
Hi allThe mandatory rowTag for writing to XML cause doesn't make any sense as I have the complete nested dataframe schema.In my case I need to implement an extra step to remove that extra node (default: Row) after xml generation.I need some examples ...
Hi @RobsonNLPT, Working with XML in Scala using the scala-xml library can be powerful and flexible.
Let’s break down your requirements and provide an example of how to achieve this.
Removing the “Row” Node: When converting a DataFrame to XML, th...
I use below code to connect to postgresql.
df = spark.read \
.jdbc("jdbc:postgresql://hostname:5432/dbname", "schema.table",
properties={"user": "user", "password": "password"})\
.load()
df.printSchema()
However, I got the ...
facing issue with integrating our Spring boot JPA supported application with Databricks.Below are the steps and setting we did for the integration.When we are starting the spring boot application we are getting a warning as :HikariPool-1 - Driver doe...
Hi @satishnavik, It seems you’re encountering issues while integrating your Spring Boot JPA application with Databricks.
Let’s address the warnings and exceptions you’re facing.
Warning: Driver Does Not Support Network Timeout for Connections
The...
I am trying to monitor when a table is created or updated using the audit logs. I have found that structured streaming writes/appends are not captured in the audit logs? Am I missing something shouldn't this be captured as a unity catalog event. Eith...
Hi @Hertz, Monitoring table creation and updates using audit logs is essential for maintaining data governance and security.
Let’s explore this further.
Databricks, being a cloud-native platform, provides audit logs that allow administrators to t...
Hi all,We have the following use case and wondering if DLT is the correct approach.Landing area with daily dumps of parquet files into our Data Lake container.The daily dump does a full overwrite of the parquet each time, keeping the same file name.T...
Hi @Floody, Let’s explore how Delta Lake (DLT) can be a suitable approach for your use case.
Delta Lake Overview:
Delta Lake is an open source storage layer that brings ACID transactions to Apache Spark™ and big data workloads. It provides reliab...
We have a dlt task that is written in python. Is it possible to create or update a delta table programatically from inside a dlt task? The delta table would not be managed from inside the dlt task because we never want to fully refresh that table. Th...
Thanks for you reply @Kaniz ! I'm aware of the possibility to create or not create a table based on some parameter.What I'm trying to figure out is basically how to achieve following:-DLT pipeline starts and logs some information to a delta table.-On...
I have a date range filter in Lakeview Dashboard and i want to distinct count number of months in selected date range filter and divide it with one of the columns and that column is used in counter viualization. But passing parameters is not possible...
Hi @Pragati_17, Let’s break down the steps to achieve this in Databricks Lakeview Dashboard:
Define Your Datasets:
Use the Data tab in your Lakeview dashboard to define the underlying datasets. You can define datasets as follows:
An existing Unit...
I have a Job configured to run on the file arrival I have provided the path as File arrival path: s3://test_bucket/test_cat/test_schema/When a new parquet file arrived in this path the job was triggering automatically and processed the fileIn case of...
Hi Kaniz,Thank you for the response.I am using the databricks runtime 11.3, also checked the checkpoint and data source location which are properly configured. Still I am unable to trigger the job.NOTE: Incoming files are pushed to AWS s3 location fr...