cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16765131552
by Contributor III
  • 1689 Views
  • 1 replies
  • 3 kudos

Delta Sharing Costs

When Delta Sharing is enabled and a link is shared, I understand that the data transfer happens directly and not through the sharing server. I'm curious how costs are calculated. Is the entity making the share available charged for data egress and ...

  • 1689 Views
  • 1 replies
  • 3 kudos
Latest Reply
Databricks_love
New Contributor II
  • 3 kudos

Any news

  • 3 kudos
blackcoffeeAR
by Contributor
  • 4131 Views
  • 5 replies
  • 2 kudos

Cannot install com.microsoft.azure.kusto:kusto-spark

Hello,I'm trying to install/update the library com.microsoft.azure.kusto:kusto-spark_3.0_2.12:3.1.xTried to install with Maven central repository and using Terraform.It was working previously and now the installation always ends with error:│ Error: c...

  • 4131 Views
  • 5 replies
  • 2 kudos
Latest Reply
phisolani
New Contributor II
  • 2 kudos

I have the same problem with a slightly different version of the connector (change on the minor version). I have a job that runs every hour and specifically, this started to happen on the 23rd of January onwards. The error indeed does say the same:Ru...

  • 2 kudos
4 More Replies
Dipesh
by New Contributor II
  • 4558 Views
  • 4 replies
  • 2 kudos

Pausing a scheduled Azure Databricks job after failure

Hi All,I have a job/workflow scheduled in Databricks to run after every hour.How can I configure my Job to pause whenever a job run fails? (Pause the job/workflow on first failure)I would want to prevent triggering multiple runs due to the scheduled/...

  • 4558 Views
  • 4 replies
  • 2 kudos
Latest Reply
Dipesh
New Contributor II
  • 2 kudos

Hi @Hubert Dudek​ , Thank you for your suggestion.I understand that we can use Jobs API to change the pasue_status of job on errors, but sometimes we observed that the workflow/job fails due to cluster issues (while the job clusters are getting creat...

  • 2 kudos
3 More Replies
User16783853906
by Contributor III
  • 2148 Views
  • 1 replies
  • 1 kudos

Understanding file retention with Vacuum

I have seen few instances where users reported that they run OPTIMIZE for the past week worth of data and they follow by VACUUM with RETAIN of 168 HOURS (for example), the old files aren't being deleted, "VACUUM is not removing old files from the tab...

  • 2148 Views
  • 1 replies
  • 1 kudos
Latest Reply
Priyanka_Biswas
Databricks Employee
  • 1 kudos

Hello @Venkatesh Kottapalli​ VACUUM removes all files from the table directory that are not managed by Delta, as well as data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold. ...

  • 1 kudos
User16826992666
by Valued Contributor
  • 1345 Views
  • 1 replies
  • 3 kudos

When developing Delta Live Tables, is there a way to see the query history?

I am not sure where I can look currently to see how my DLT queries are performing. How can I investigate the query plan for past DLT runs?

  • 1345 Views
  • 1 replies
  • 3 kudos
Latest Reply
Priyanka_Biswas
Databricks Employee
  • 3 kudos

Hello @Trevor Bishop​ You can check the query plan in the Spark UI , SQL tab. You would need to select the past run from dropdown and click on SparkUIAdditionally an event log is created and maintained for every Delta Live Tables pipeline. The event ...

  • 3 kudos
databicky
by Contributor II
  • 1917 Views
  • 2 replies
  • 1 kudos

how to get the status of notebook in different notebook

i want to run two notebook like if the count is not equal to zero, first i want to trigger the first notebook and i want to check the particular notebook is succeeded or not ,until the success it need to wait like sleep, if its succeeded means then ...

  • 1917 Views
  • 2 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

You can use dbutils.notebook.run() to execute a notebook from another notebook if conditions are met in your custom logic; you can also use dbutils.jobs.taskValues to pass values between notebooks https://docs.databricks.com/workflows/jobs/how-to-sha...

  • 1 kudos
1 More Replies
Dipesh
by New Contributor II
  • 2140 Views
  • 1 replies
  • 1 kudos

Resolved! Bulk updating Delta tables in Databricks

Hi All,I have some data in Delta table with multiple columns and each record has a unique identifier.I want to update some columns as per the new values coming in for each of these unique records. However updating one record at a time is taking a lot...

  • 2140 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

yes by using MERGE statment

  • 1 kudos
venkat09
by New Contributor III
  • 1196 Views
  • 1 replies
  • 1 kudos

Schema Evolution - Auto Loader for Avro format is not working as expected

* Reading Avro files from s3 and then writing to the delta table * Ingested sample data of 10 files, which contain four columns, and it infers the schema automatically as expected * Introducing a new file which contains a new column [foo] along wi...

  • 1196 Views
  • 1 replies
  • 1 kudos
Latest Reply
venkat09
New Contributor III
  • 1 kudos

I am attaching the sample code notebook that helps to reproduce the issue.

  • 1 kudos
KuldeepChitraka
by New Contributor III
  • 1637 Views
  • 3 replies
  • 6 kudos

Performance Issue : Create DELTA table form 2 TB PARQUET file

We are trying to create a DELTA table (CTAS statement) from 2 TB PARQUET file and its taking huge amount of time around 12~ hrs.is it normal.? What are option to tune/optimize this ? are we doing anything wrongCluster : Interactive/30 Cores / 320 GB ...

  • 1637 Views
  • 3 replies
  • 6 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 6 kudos

Please use COPY INTO (first create an empty delta table) or CONVERT TO DELTA instead of CTAS it will be much more faster, and it process will be auto-optimized.

  • 6 kudos
2 More Replies
mriccardi
by New Contributor II
  • 3259 Views
  • 1 replies
  • 0 kudos

Structured Streaming Checkpoint corrupted.

Hello,We are experiencing an error with one Structured Streaming Job that we have, that basically the checkpoint gets corrupted and we are unable to continue with the execution.I've checked the errors and this happens when it triggers an autocompact,...

  • 3259 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Hi @Martin Riccardi​,Could you share the following please:1) whats your Source?2) whats your Sink?3) could you share your readStream() and writeStream() code?4) full error stack trace5) did you stop and re-run your query after weeks of not being acti...

  • 0 kudos
Sameer_876675
by New Contributor III
  • 4893 Views
  • 3 replies
  • 2 kudos

How to efficiently process a 100GiB JSON nested file and store it in Delta?

Hi, I'm a fairly new user and I am using Azure Databricks to process a ~1000GiB JSON nested file containing insurance policy data. I uploaded the JSON file to Azure Data Lake Gen2 storage and read the JSON file into a dataframe.df=spark.read.option("...

Cluster Summary OOM Error
  • 4893 Views
  • 3 replies
  • 2 kudos
Latest Reply
Annapurna_Hiriy
Databricks Employee
  • 2 kudos

Hi Sameer, please refer to following documents on how to work with nested json:https://docs.databricks.com/optimizations/semi-structured.htmlhttps://learn.microsoft.com/en-us/azure/databricks/kb/_static/notebooks/scala/nested-json-to-dataframe.html

  • 2 kudos
2 More Replies
pramalin
by New Contributor
  • 3027 Views
  • 3 replies
  • 2 kudos
  • 3027 Views
  • 3 replies
  • 2 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 2 kudos

@prudhvi ramalingam​ - Please refer to the below example code.import org.apache.spark.sql.functions.expr val person = Seq( (0, "Bill Chambers", 0, Seq(100)), (1, "Matei Zaharia", 1, Seq(500, 250, 100)), (2, "Michael Armbrust", 1, Seq(250,...

  • 2 kudos
2 More Replies
KVNARK
by Honored Contributor II
  • 1550 Views
  • 2 replies
  • 2 kudos

Encrypt in azure SQL DB and decrypt in Power BI

If some columns are encrypted in Azure SQL DB.I need to decrypt them in Power BI.Are there any pre-requisites to implement this.

  • 1550 Views
  • 2 replies
  • 2 kudos
Latest Reply
Nhan_Nguyen
Valued Contributor
  • 2 kudos

Could you describe more detail your case?

  • 2 kudos
1 More Replies
LidorAbo
by New Contributor II
  • 2113 Views
  • 1 replies
  • 0 kudos

Databricks can write to s3 bucket through panda but not from spark

Hey,I have problem with access to s3 bucket using cross account bucket permission, i got the following error:Steps to repreduce:Checking the role that assoicated to ec2 instance:{ "Version": "2012-10-17", "Statement": [ { ...

Access_Denied_S3_Bucket
  • 2113 Views
  • 1 replies
  • 0 kudos
Latest Reply
Nhan_Nguyen
Valued Contributor
  • 0 kudos

Could you try to map s3 bucket location with Databricks File System then write output to this new location instead of directly write to S3 location.

  • 0 kudos
sedat
by New Contributor II
  • 2004 Views
  • 2 replies
  • 2 kudos

Hi, is there any document for databricks about performance tuning and reporting?

Hi, I need to analyse performance issues for databricks. Is there any document or monitoring tool to run to see what is happening in databricks? I am very new in databricks. Best

  • 2004 Views
  • 2 replies
  • 2 kudos
Latest Reply
Nhan_Nguyen
Valued Contributor
  • 2 kudos

You could try some courses in "https://customer-academy.databricks.com/"What's New In Apache Spark 3.0Optimizing Apache Spark on Databricks

  • 2 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels