cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Sampath_Kumar
by Visitor
  • 73 Views
  • 2 replies
  • 1 kudos

Volume Limitations

I have a use case to create a table using JSON files. There are 36 million files in the upstream(S3 bucket). I just created a volume on top of it. So the volume has 36M files.  I'm trying to form a data frame by reading this volume using the below sp...

  • 73 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @Sampath_Kumar, Let’s delve into the limitations and best practices related to Databricks volumes. Volume Limitations: Managed Volumes: These are Unity Catalog-governed storage volumes created within the default storage location of the contain...

  • 1 kudos
1 More Replies
Brad
by New Contributor II
  • 73 Views
  • 3 replies
  • 0 kudos

Inconsistent behavior when displaying chart in notebook

Hi, I'm trying to create some 3D charts. With the same code and same cluster, sometimes it can show, sometimes it cannot. Previously it cannot display, but last week I opened a notebook with failed run and found the result can be shown by itself (as ...

  • 73 Views
  • 3 replies
  • 0 kudos
Latest Reply
Brad
New Contributor II
  • 0 kudos

Is it possible somehow the iframe used in the cell removed sandboxing and caused this? 

  • 0 kudos
2 More Replies
cpd
by Visitor
  • 11 Views
  • 0 replies
  • 0 kudos

Ingesting geospatial data into a table

I'm just getting started with Databricks and wondering if it is possible to ingest a GeoJSON or GeoParquet file into a new table without writing code? My goal here is to load vector data into a table and perform H3 polyfill operations on all the vect...

  • 11 Views
  • 0 replies
  • 0 kudos
xhead
by New Contributor II
  • 1692 Views
  • 3 replies
  • 0 kudos

Resolved! Does "databricks bundle deploy" clean up old files?

I'm looking at this page (Databricks Asset Bundles development work tasks) in the Databricks documentation.When repo assets are deployed to a databricks workspace, it is not clear if the "databricks bundle deploy" will remove files from the target wo...

Data Engineering
bundle
cli
deploy
  • 1692 Views
  • 3 replies
  • 0 kudos
Latest Reply
xhead
New Contributor II
  • 0 kudos

One further question:The purpose of “databricks bundle destroy” is to remove all previously-deployed jobs, pipelines, and artifacts that are defined in the bundle configuration files.Which bundle configuration files? The ones in the repo? Or are ther...

  • 0 kudos
2 More Replies
pokus
by New Contributor III
  • 1940 Views
  • 3 replies
  • 2 kudos

Resolved! use DeltaLog class in databricks cluster

I need to use DeltaLog class in the code to get the AddFiles dataset. I have to keep the implemented code in a repo and run it in databricks cluster. Some docs say to use org.apache.spark.sql.delta.DeltaLog class, but it seems databricks gets rid of ...

  • 1940 Views
  • 3 replies
  • 2 kudos
Latest Reply
dbal
New Contributor
  • 2 kudos

Thanks for providing a solution @pokus .What I dont understand is why Databricks cannot provide the DeltaLog at runtime. How can this be the official solution? We need a better solution for this instead of depending on reflections.

  • 2 kudos
2 More Replies
VGS777
by New Contributor II
  • 26 Views
  • 0 replies
  • 0 kudos

Regarding Cloning dbc file from git

Hey FolksI have dbc file in a git repo and i cloned in the databricks when tried to open the .dbc file it is saying ```Failed to load file. The file encoding is not supported```can anyone please advice me on this #help #beginner

  • 26 Views
  • 0 replies
  • 0 kudos
RobsonNLPT
by Contributor
  • 98 Views
  • 3 replies
  • 0 kudos

Resolved! scala-xml : how to move child to another parent node

Hi allThe mandatory rowTag for writing to XML cause doesn't make any sense as I have the complete nested dataframe schema.In my case I need to implement an extra step to remove that extra node (default: Row) after xml generation.I need some examples ...

  • 98 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @RobsonNLPT, Working with XML in Scala using the scala-xml library can be powerful and flexible. Let’s break down your requirements and provide an example of how to achieve this. Removing the “Row” Node: When converting a DataFrame to XML, th...

  • 0 kudos
2 More Replies
LoiNguyen
by New Contributor II
  • 9442 Views
  • 5 replies
  • 2 kudos

The authentication type 10 is not supported

I use below code to connect to postgresql. df = spark.read \ .jdbc("jdbc:postgresql://hostname:5432/dbname", "schema.table", properties={"user": "user", "password": "password"})\ .load() df.printSchema() However, I got the ...

  • 9442 Views
  • 5 replies
  • 2 kudos
Latest Reply
simboss
New Contributor II
  • 2 kudos

But how are we going to do this for those who use Windows?

  • 2 kudos
4 More Replies
satishnavik
by Visitor
  • 55 Views
  • 1 replies
  • 0 kudos

How to connect Databricks Database with Springboot application using JPA

facing issue with integrating our Spring boot JPA supported application with Databricks.Below are the steps and setting we did for the integration.When we are starting the spring boot application we are getting a warning as :HikariPool-1 - Driver doe...

  • 55 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @satishnavik, It seems you’re encountering issues while integrating your Spring Boot JPA application with Databricks. Let’s address the warnings and exceptions you’re facing. Warning: Driver Does Not Support Network Timeout for Connections The...

  • 0 kudos
Hertz
by New Contributor
  • 48 Views
  • 1 replies
  • 0 kudos

Structured Streaming Event in Audit Logs

I am trying to monitor when a table is created or updated using the audit logs. I have found that structured streaming writes/appends are not captured in the audit logs? Am I missing something shouldn't this be captured as a unity catalog event. Eith...

Data Engineering
Audit Logs
structured streaming
  • 48 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Hertz, Monitoring table creation and updates using audit logs is essential for maintaining data governance and security. Let’s explore this further. Databricks, being a cloud-native platform, provides audit logs that allow administrators to t...

  • 0 kudos
Floody
by New Contributor
  • 57 Views
  • 1 replies
  • 0 kudos

Delta Live Tables use case

Hi all,We have the following use case and wondering if DLT is the correct approach.Landing area with daily dumps of parquet files into our Data Lake container.The daily dump does a full overwrite of the parquet each time, keeping the same file name.T...

Data Engineering
Delta Live Tables
  • 57 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Floody, Let’s explore how Delta Lake (DLT) can be a suitable approach for your use case. Delta Lake Overview: Delta Lake is an open source storage layer that brings ACID transactions to Apache Spark™ and big data workloads. It provides reliab...

  • 0 kudos
PassionateDBD
by New Contributor II
  • 60 Views
  • 2 replies
  • 1 kudos

Is it possible to create/update non dlt table in init phase of dlt task?

We have a dlt task that is written in python. Is it possible to create or update a delta table programatically from inside a dlt task? The delta table would not be managed from inside the dlt task because we never want to fully refresh that table. Th...

  • 60 Views
  • 2 replies
  • 1 kudos
Latest Reply
PassionateDBD
New Contributor II
  • 1 kudos

Thanks for you reply @Kaniz ! I'm aware of the possibility to create or not create a table based on some parameter.What I'm trying to figure out is basically how to achieve following:-DLT pipeline starts and logs some information to a delta table.-On...

  • 1 kudos
1 More Replies
Pragati_17
by Visitor
  • 70 Views
  • 1 replies
  • 0 kudos

Parameters Passing to dataset in Databricks Lakeview Dashboard

I have a date range filter in Lakeview Dashboard and i want to distinct count number of months in selected date range filter and divide it with one of the columns and that column is used in counter viualization. But passing parameters is not possible...

  • 70 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Pragati_17, Let’s break down the steps to achieve this in Databricks Lakeview Dashboard: Define Your Datasets: Use the Data tab in your Lakeview dashboard to define the underlying datasets. You can define datasets as follows: An existing Unit...

  • 0 kudos
srinivas_001
by New Contributor II
  • 85 Views
  • 2 replies
  • 1 kudos

File trigger options -- cloudFiles.allowOverwrites

I have a Job configured to run on the file arrival I have provided the path as File arrival path: s3://test_bucket/test_cat/test_schema/When a new parquet file arrived in this path the job was triggering automatically and processed the fileIn case of...

  • 85 Views
  • 2 replies
  • 1 kudos
Latest Reply
srinivas_001
New Contributor II
  • 1 kudos

Hi Kaniz,Thank you for the response.I am using the databricks runtime 11.3, also checked the checkpoint and data source location which are properly configured. Still I am unable to trigger the job.NOTE: Incoming files are pushed to AWS s3 location fr...

  • 1 kudos
1 More Replies