cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

DB_developer
by New Contributor III
  • 4820 Views
  • 3 replies
  • 7 kudos

Resolved! How nulls are stored in delta lake and databricks?

In my findings I have found a lot of delta tables in the lake house to be sparse so just wondering what space data lake takes to store null data and also any suggestions to handle sparse data tables in lake house would be appreciated.I also want to o...

  • 4820 Views
  • 3 replies
  • 7 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 7 kudos

As delta uses parquet files to store data inside delta:"Nullity is encoded in the definition levels (which is run-length encoded). NULL values are not encoded in the data. For example, in a non-nested schema, a column with 1000 NULLs would be encoded...

  • 7 kudos
2 More Replies
Philblakeman
by New Contributor III
  • 4970 Views
  • 4 replies
  • 5 kudos

How to %run a list of notebooks in Databricks

I'd like to %run a list of notebooks from another Databricks notebook.my_notebooks = ["./setup", "./do_the_main_thing", "./check_results"] for notebook in my_notebooks: %run notebookThis doesn't work ofcourse. I don't want to use dbutils.notebook....

  • 4970 Views
  • 4 replies
  • 5 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 5 kudos

Please refer below codeimport scala.concurrent.{Future, Await} import scala.concurrent.duration._ import scala.util.control.NonFatal   case class NotebookData(path: String, timeout: Int, parameters: Map[String, String] = Map.empty[String, String])   ...

  • 5 kudos
3 More Replies
brickster_2018
by Databricks Employee
  • 7371 Views
  • 2 replies
  • 2 kudos

Resolved! How to get the count of files/partition for a Delta table?

I have a delta table and I run optimize command regularly. However, I still see a large number of files in the table. I wanted to get a break up of the files in each partition and identify which partition has more files. What is the easiest way to ge...

  • 7371 Views
  • 2 replies
  • 2 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 2 kudos

The below code snippet will give details about the file count per partitionimport com.databricks.sql.transaction.tahoe.DeltaLog import org.apache.hadoop.fs.Path   val deltaPath = "<table_path>" val deltaLog = DeltaLog(spark, new Path(deltaPath + "/_d...

  • 2 kudos
1 More Replies
Senthil1
by Contributor
  • 1101 Views
  • 1 replies
  • 0 kudos
  • 1101 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 0 kudos

Hi @SENTHIL KUMARR MALLI SUDARSAN​ below link might help youLink

  • 0 kudos
164079
by Contributor II
  • 6056 Views
  • 14 replies
  • 2 kudos

Resolved! Terraform keep show changes for databricks_sql_permissions on plan and apply

Hi team, A very weird behaviour when using databricks_sql_permissions with terraform, the changes keep repeating to show on plan and apply.Its repeating also after i apply the changes...Please advise.

  • 6056 Views
  • 14 replies
  • 2 kudos
Latest Reply
Pat
Honored Contributor III
  • 2 kudos

Hi @Avi Edri​ ,I can see from the screen that you are using id = "any file/", it seems to be related to the import: https://registry.terraform.io/providers/databricks/databricks/0.5.3/docs/resources/sql_permissions#importcan you try the below:resourc...

  • 2 kudos
13 More Replies
THIAM_HUATTAN
by Valued Contributor
  • 4926 Views
  • 6 replies
  • 5 kudos

Error in Databricks code?

https://www.databricks.com/notebooks/recitibikenycdraft/data-preparation.htmlCould someone help to see in that Step 3: Prepare Calendar Info# derive complete list of dates between first and last datesdates = ( spark .range(0,days_between).withCol...

  • 4926 Views
  • 6 replies
  • 5 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 5 kudos

Hi @THIAM HUAT TAN​ In your notebook, you are creating a integer column days_between with the codedays_between = (last_date - first_date).days + 10Logically speaking, what the nb trying to do is to fetch all the dates between two dates to do a foreca...

  • 5 kudos
5 More Replies
vr
by Contributor
  • 3779 Views
  • 7 replies
  • 7 kudos

Where can I report about a problem on community.databricks.com?

I tried contact details on the bottom, but they seem to be generic Databricks contact and support links. The issue I faced was this:I think this word made its way to the stop list by a mistake.

wrong stop word
  • 3779 Views
  • 7 replies
  • 7 kudos
Latest Reply
Vartika
Databricks Employee
  • 7 kudos

Hey @Vladimir Ryabtsev​ and @Hubert Dudek​,Thank you for highlighting this. Seems they were added to the block list in combination with other words.We will have this fixed as soon as possible.It's always great to have help from our community members....

  • 7 kudos
6 More Replies
Rishabh-Pandey
by Esteemed Contributor
  • 2513 Views
  • 6 replies
  • 6 kudos

delta live table

If i have two stages bronze and silver and when i create delta live tables we need to give the target schema to store the results , but i need to store tables in two databases bronze AND silver , for this i need to create two different delta live tab...

  • 2513 Views
  • 6 replies
  • 6 kudos
Latest Reply
Geeta1
Valued Contributor
  • 6 kudos

Hi @Rishabh Pandey​ , yes you have to create 2 DLT tables

  • 6 kudos
5 More Replies
LavaLiah_85929
by New Contributor II
  • 2053 Views
  • 2 replies
  • 1 kudos

Resolved! Log has failed integrity check error when altering a table property

Below is the integrity check error we are getting when trying to set the deletedRetentionFileDuration table property to 10 days. Observation: The table data is sitting in S3. The size of all the files in S3 is in TB. There are millions of files for t...

image.png image
  • 2053 Views
  • 2 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Please backup your table, then run the repair of filesFSCK REPAIR TABLE table_nameyou can also try to make dry run firstFSCK REPAIR TABLE table_name DRY RUNif data is partitioned can be helpful to refresh metastoreMSCK REPAIR TABLE mytable

  • 1 kudos
1 More Replies
Sreekanth1
by New Contributor II
  • 1337 Views
  • 2 replies
  • 0 kudos

How to pass job task parameters to another task in scala

Hi Team,​I have a requirement in workflow job. Job has two tasks, one is python-task and another one is scala-task (both are running their own cluster).​I have defined dbutils.job.taskValue in python which is not able to read value in scala because o...

  • 1337 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 0 kudos

Hi @Sreekanth Nallapa​ please refer this link This might help you in this

  • 0 kudos
1 More Replies
ridrasura
by New Contributor III
  • 2402 Views
  • 1 replies
  • 5 kudos

Optimal Batch Size for Batch Insert Queries using JDBC for Delta Tables

Hi,I am currently experimenting with databricks-jdbc : 2.6.29 and trying to execute batch insert queries What is the optimal batch size recommended by Databricks for performing Batch Insert queries?Currently it seems that values are inserted row by r...

  • 2402 Views
  • 1 replies
  • 5 kudos
Latest Reply
ridrasura
New Contributor III
  • 5 kudos

Just an observation : By using auto optimize table level property, I was able to see batch inserts inserting records in single file.https://docs.databricks.com/optimizations/auto-optimize.html

  • 5 kudos
BkP
by Contributor
  • 7028 Views
  • 14 replies
  • 9 kudos

Suggestion Needed for a Orchestrator/Scheduler to schedule and execute Jobs in an automated way

Hello Friends,We have an application which extracts dat from various tables in Azure Databricks and we extract it to postgres tables (postgres installed on top of Azure VMs). After extraction we apply transformation on those datasets in postgres tabl...

image
  • 7028 Views
  • 14 replies
  • 9 kudos
Latest Reply
VaibB
Contributor
  • 9 kudos

You can leverage Airflow, which provides a connector for databricks jobs API, or can use databricks workflow to orchestrate your jobs where you can define several tasks and set dependencies accordingly.

  • 9 kudos
13 More Replies
nk76
by New Contributor III
  • 7396 Views
  • 7 replies
  • 5 kudos

Resolved! Custom library import fails randomly with error: not found: value it

Hello,I have an issue with the import of a custom library, in Azure Databricks.(roughly) 95% of the times it works fine, but sometimes it fails.I searched the internet and this community with no luck, so far.It is a scala library in a scala notebook,...

  • 7396 Views
  • 7 replies
  • 5 kudos
Latest Reply
Naskar
New Contributor II
  • 5 kudos

Even I also encountered the same error. While Importing a file getting an error as "Import failed with error: Could not deserialize: Exceeded 16777216 bytes (current = 16778609)"

  • 5 kudos
6 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels