cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

DB_developer
by New Contributor III
  • 2778 Views
  • 4 replies
  • 7 kudos

Resolved! How nulls are stored in delta lake and databricks?

In my findings I have found a lot of delta tables in the lake house to be sparse so just wondering what space data lake takes to store null data and also any suggestions to handle sparse data tables in lake house would be appreciated.I also want to o...

  • 2778 Views
  • 4 replies
  • 7 kudos
Latest Reply
Kaniz
Community Manager
  • 7 kudos

Hi @Akash Ragothu​, We haven’t heard from you since the last response from @Ajay Pandey​, and I was checking back to see if his suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to othe...

  • 7 kudos
3 More Replies
ridrasura
by New Contributor III
  • 1442 Views
  • 1 replies
  • 5 kudos

Optimal Batch Size for Batch Insert Queries using JDBC for Delta Tables

Hi,I am currently experimenting with databricks-jdbc : 2.6.29 and trying to execute batch insert queries What is the optimal batch size recommended by Databricks for performing Batch Insert queries?Currently it seems that values are inserted row by r...

  • 1442 Views
  • 1 replies
  • 5 kudos
Latest Reply
ridrasura
New Contributor III
  • 5 kudos

Just an observation : By using auto optimize table level property, I was able to see batch inserts inserting records in single file.https://docs.databricks.com/optimizations/auto-optimize.html

  • 5 kudos
Priyanka48
by New Contributor III
  • 6969 Views
  • 5 replies
  • 10 kudos

The functionality of table property delta.logRetentionDuration

We have one project requirement where we have to store only the 14 days history for delta tables. So for testing, I have set the delta.logRetentionDuration = 2 days using the below commandspark.sql("alter table delta.`[delta_file_path]` set TBLPROPER...

  • 6969 Views
  • 5 replies
  • 10 kudos
Latest Reply
Kaniz
Community Manager
  • 10 kudos

Hi @Priyanka Mane​, We haven’t heard from you since the last response from @Werner Stinckens​ and @Uma Maheswara Rao Desula​, and I was checking back to see if their suggestions helped you.Or else, If you have any solution, please share it with the c...

  • 10 kudos
4 More Replies
nevoezov
by New Contributor II
  • 1034 Views
  • 0 replies
  • 2 kudos

java.lang.SecurityException: Could not verify permissions for OverwritePartitionsDynamic RelationV2 - Delta tables dynamic partition overwrite on Databricks ACL enabled clusters

I'm working on Databricks ACL enabled clusters, and having trouble performing dynamic partition overwrite to Delta tables.I have created a test table using the following query:CREATE TABLE IF NOT EXISTS test_01 ( id STRING, name STRING, c...

  • 1034 Views
  • 0 replies
  • 2 kudos
hari
by Contributor
  • 15075 Views
  • 5 replies
  • 5 kudos

How to add the partition for an existing delta table

We didn't need to set partitions for our delta tables as we didn't have many performance concerns and delta lake out-of-the-box optimization worked great for us. But there is now a need to set a specific partition column for some tables to allow conc...

  • 15075 Views
  • 5 replies
  • 5 kudos
Latest Reply
Kaniz
Community Manager
  • 5 kudos

Hi @Harikrishnan P H​ , We haven’t heard from you since the last response from @Hubert Dudek​ , and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to others. ...

  • 5 kudos
4 More Replies
jakubk
by Contributor
  • 2111 Views
  • 2 replies
  • 0 kudos

spark.read.parquet() - how to check for file lock before reading? (azure)

I have some python code which takes parquet files from an adlsv2 location and merges it into delta tables (run as a workflow job on a schedule)I have a try catch wrapper around this so that any files that fail get moved into a failed folder using dbu...

  • 2111 Views
  • 2 replies
  • 0 kudos
Latest Reply
jakubk
Contributor
  • 0 kudos

That's the problem - it's not being locked (or fs.mv() isn't checking/honoring the lock). The upload process/tool is a 3rd-prty external toolI can see via the upload tool that the file upload is 'in progress'I can also see the 0 byte destination file...

  • 0 kudos
1 More Replies
Malcoln_Dandaro
by New Contributor
  • 1228 Views
  • 0 replies
  • 0 kudos

Is there any way to navigate/access cloud files using the direct abfss URI (no mount) with default python functions/libs like open() or os.listdir()?

Hello, Today on our workspace we access everything via mount points, we plan to change it to "abfss://" because of security, governance and performance reasons. The problem is sometimes we interact with files using "python only" code, and apparently ...

  • 1228 Views
  • 0 replies
  • 0 kudos
Ashok1
by New Contributor II
  • 842 Views
  • 2 replies
  • 1 kudos
  • 842 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hey there @Ashok ch​ Hope everything is going great.Does @Ivan Tang​'s response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if you need more hel...

  • 1 kudos
1 More Replies
palzor
by New Contributor III
  • 485 Views
  • 0 replies
  • 2 kudos

What is the best practice while loading delta table , do I infer the schema or provide the schema?

I am loading avro files into the detla tables. I am doing this for multiple tables and some files are big like (2-3GB) and most of them are small like in few MBs.I am using autoloader to load the data into the delta tables.My question is:What is the ...

  • 485 Views
  • 0 replies
  • 2 kudos
Zair
by New Contributor II
  • 1027 Views
  • 2 replies
  • 2 kudos

How to handle 100+ tables ETL through spark structured streaming?

I am writing a streaming job which will be performing ETL for more than 130 tables. I would like to know is there any other better way to do this. Another solution I am thinking is to write separate streaming job for all tables. source data is coming...

  • 1027 Views
  • 2 replies
  • 2 kudos
Latest Reply
artsheiko
Valued Contributor III
  • 2 kudos

Hi, I guess to answer your question it might be helpful to get more details on what you're trying to achieve and the bottleneck that you encounter now.Indeed handle the processing of 130 tables in one monolith could be challenging as the business rul...

  • 2 kudos
1 More Replies
Dicer
by Valued Contributor
  • 1932 Views
  • 5 replies
  • 5 kudos

Resolved! Azure Databricks: AnalysisException: Database 'bf' not found

I wanted to save my delta tables in my Databricks database. When I saveAsTable, there is an error message Azure Databricks: AnalysisException: Database 'bf' not found​Ye, There is no database named "bf" in my database.Here is my full code:import os i...

  • 1932 Views
  • 5 replies
  • 5 kudos
Latest Reply
Dicer
Valued Contributor
  • 5 kudos

Some data can be saved as delta tables while some cannot.

  • 5 kudos
4 More Replies
prasadvaze
by Valued Contributor II
  • 21087 Views
  • 11 replies
  • 9 kudos

When to use delta lake versus relational database as a source for BI reporting?

Assume all of your data exists in delta tables and also in SQL server so you have a choice to report from either. Can someone share thoughts on "In what scenario you would not want report created from delta table and instead use the traditional rel...

  • 21087 Views
  • 11 replies
  • 9 kudos
Latest Reply
PCJ
New Contributor II
  • 9 kudos

Hi @Kaniz Fatma​  - I would like a follow-up on @prasad vaze​ question regarding unsupported referential integrity. How does one work around that, using best practices as Databricks sees it?

  • 9 kudos
10 More Replies
Ruby8376
by Valued Contributor
  • 1565 Views
  • 2 replies
  • 0 kudos

Primary/Foreign key Costraints on Delta tables?

Hi All!I am using databricks in data migration project . We need to transform the data before loading it to SalesForce. Can we do Primary key/foreign key constraints on databricks delta tables?

  • 1565 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Ruby Rubi​  following- up did you get a chance to check @Werner Stinckens​ previous comments or do you need any further help on this?

  • 0 kudos
1 More Replies
Braxx
by Contributor II
  • 2716 Views
  • 2 replies
  • 1 kudos

Resolved! delta table storage

I couldn't find it clearly explained anywhere, so hope sb here shed some light on that.Few questions:1) Where does delta tables are stored? Docs say: "Delta Lake uses versioned Parquet files to store your data in your cloud storage"So where exactly i...

  • 2716 Views
  • 2 replies
  • 1 kudos
Latest Reply
Braxx
Contributor II
  • 1 kudos

thanks, very helpful

  • 1 kudos
1 More Replies
Labels