cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

DB_developer
by New Contributor III
  • 5756 Views
  • 3 replies
  • 7 kudos

Resolved! How nulls are stored in delta lake and databricks?

In my findings I have found a lot of delta tables in the lake house to be sparse so just wondering what space data lake takes to store null data and also any suggestions to handle sparse data tables in lake house would be appreciated.I also want to o...

  • 5756 Views
  • 3 replies
  • 7 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 7 kudos

As delta uses parquet files to store data inside delta:"Nullity is encoded in the definition levels (which is run-length encoded). NULL values are not encoded in the data. For example, in a non-nested schema, a column with 1000 NULLs would be encoded...

  • 7 kudos
2 More Replies
ridrasura
by New Contributor III
  • 2963 Views
  • 1 replies
  • 5 kudos

Optimal Batch Size for Batch Insert Queries using JDBC for Delta Tables

Hi,I am currently experimenting with databricks-jdbc : 2.6.29 and trying to execute batch insert queries What is the optimal batch size recommended by Databricks for performing Batch Insert queries?Currently it seems that values are inserted row by r...

  • 2963 Views
  • 1 replies
  • 5 kudos
Latest Reply
ridrasura
New Contributor III
  • 5 kudos

Just an observation : By using auto optimize table level property, I was able to see batch inserts inserting records in single file.https://docs.databricks.com/optimizations/auto-optimize.html

  • 5 kudos
Priyanka48
by Contributor
  • 16860 Views
  • 4 replies
  • 11 kudos

The functionality of table property delta.logRetentionDuration

We have one project requirement where we have to store only the 14 days history for delta tables. So for testing, I have set the delta.logRetentionDuration = 2 days using the below commandspark.sql("alter table delta.`[delta_file_path]` set TBLPROPER...

  • 16860 Views
  • 4 replies
  • 11 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 11 kudos

Hi, by default there is a safety interval enabled. So if you set a retentionperiod lower than that interval (7 days), data in that interval will not be deleted.You have to specificall override this safety interval by setting spark.databricks.delta.r...

  • 11 kudos
3 More Replies
nevoezov
by New Contributor II
  • 1868 Views
  • 0 replies
  • 2 kudos

java.lang.SecurityException: Could not verify permissions for OverwritePartitionsDynamic RelationV2 - Delta tables dynamic partition overwrite on Databricks ACL enabled clusters

I'm working on Databricks ACL enabled clusters, and having trouble performing dynamic partition overwrite to Delta tables.I have created a test table using the following query:CREATE TABLE IF NOT EXISTS test_01 ( id STRING, name STRING, c...

  • 1868 Views
  • 0 replies
  • 2 kudos
hari
by Contributor
  • 23113 Views
  • 3 replies
  • 7 kudos

How to add the partition for an existing delta table

We didn't need to set partitions for our delta tables as we didn't have many performance concerns and delta lake out-of-the-box optimization worked great for us. But there is now a need to set a specific partition column for some tables to allow conc...

  • 23113 Views
  • 3 replies
  • 7 kudos
Latest Reply
hari
Contributor
  • 7 kudos

Updated the description

  • 7 kudos
2 More Replies
jakubk
by Contributor
  • 4019 Views
  • 2 replies
  • 0 kudos

spark.read.parquet() - how to check for file lock before reading? (azure)

I have some python code which takes parquet files from an adlsv2 location and merges it into delta tables (run as a workflow job on a schedule)I have a try catch wrapper around this so that any files that fail get moved into a failed folder using dbu...

  • 4019 Views
  • 2 replies
  • 0 kudos
Latest Reply
jakubk
Contributor
  • 0 kudos

That's the problem - it's not being locked (or fs.mv() isn't checking/honoring the lock). The upload process/tool is a 3rd-prty external toolI can see via the upload tool that the file upload is 'in progress'I can also see the 0 byte destination file...

  • 0 kudos
1 More Replies
Malcoln_Dandaro
by New Contributor
  • 1984 Views
  • 0 replies
  • 0 kudos

Is there any way to navigate/access cloud files using the direct abfss URI (no mount) with default python functions/libs like open() or os.listdir()?

Hello, Today on our workspace we access everything via mount points, we plan to change it to "abfss://" because of security, governance and performance reasons. The problem is sometimes we interact with files using "python only" code, and apparently ...

  • 1984 Views
  • 0 replies
  • 0 kudos
Ashok1
by New Contributor II
  • 1538 Views
  • 2 replies
  • 1 kudos
  • 1538 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hey there @Ashok ch​ Hope everything is going great.Does @Ivan Tang​'s response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if you need more hel...

  • 1 kudos
1 More Replies
palzor
by New Contributor III
  • 1063 Views
  • 0 replies
  • 2 kudos

What is the best practice while loading delta table , do I infer the schema or provide the schema?

I am loading avro files into the detla tables. I am doing this for multiple tables and some files are big like (2-3GB) and most of them are small like in few MBs.I am using autoloader to load the data into the delta tables.My question is:What is the ...

  • 1063 Views
  • 0 replies
  • 2 kudos
Zair
by New Contributor III
  • 2163 Views
  • 2 replies
  • 2 kudos

How to handle 100+ tables ETL through spark structured streaming?

I am writing a streaming job which will be performing ETL for more than 130 tables. I would like to know is there any other better way to do this. Another solution I am thinking is to write separate streaming job for all tables. source data is coming...

  • 2163 Views
  • 2 replies
  • 2 kudos
Latest Reply
artsheiko
Databricks Employee
  • 2 kudos

Hi, I guess to answer your question it might be helpful to get more details on what you're trying to achieve and the bottleneck that you encounter now.Indeed handle the processing of 130 tables in one monolith could be challenging as the business rul...

  • 2 kudos
1 More Replies
Dicer
by Valued Contributor
  • 3872 Views
  • 5 replies
  • 5 kudos

Resolved! Azure Databricks: AnalysisException: Database 'bf' not found

I wanted to save my delta tables in my Databricks database. When I saveAsTable, there is an error message Azure Databricks: AnalysisException: Database 'bf' not found​Ye, There is no database named "bf" in my database.Here is my full code:import os i...

  • 3872 Views
  • 5 replies
  • 5 kudos
Latest Reply
Dicer
Valued Contributor
  • 5 kudos

Some data can be saved as delta tables while some cannot.

  • 5 kudos
4 More Replies
prasadvaze
by Valued Contributor II
  • 42001 Views
  • 10 replies
  • 8 kudos

When to use delta lake versus relational database as a source for BI reporting?

Assume all of your data exists in delta tables and also in SQL server so you have a choice to report from either. Can someone share thoughts on "In what scenario you would not want report created from delta table and instead use the traditional rel...

  • 42001 Views
  • 10 replies
  • 8 kudos
Latest Reply
PCJ
New Contributor II
  • 8 kudos

Hi @Kaniz Fatma​  - I would like a follow-up on @prasad vaze​ question regarding unsupported referential integrity. How does one work around that, using best practices as Databricks sees it?

  • 8 kudos
9 More Replies
Ruby8376
by Valued Contributor
  • 2281 Views
  • 2 replies
  • 0 kudos

Primary/Foreign key Costraints on Delta tables?

Hi All!I am using databricks in data migration project . We need to transform the data before loading it to SalesForce. Can we do Primary key/foreign key constraints on databricks delta tables?

  • 2281 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Ruby Rubi​  following- up did you get a chance to check @Werner Stinckens​ previous comments or do you need any further help on this?

  • 0 kudos
1 More Replies
Braxx
by Contributor II
  • 5672 Views
  • 2 replies
  • 1 kudos

Resolved! delta table storage

I couldn't find it clearly explained anywhere, so hope sb here shed some light on that.Few questions:1) Where does delta tables are stored? Docs say: "Delta Lake uses versioned Parquet files to store your data in your cloud storage"So where exactly i...

  • 5672 Views
  • 2 replies
  • 1 kudos
Latest Reply
Braxx
Contributor II
  • 1 kudos

thanks, very helpful

  • 1 kudos
1 More Replies
Labels