cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Asterol
by New Contributor III
  • 3966 Views
  • 4 replies
  • 5 kudos

Data Engineer Associate and Professional tittle holders count

How many people hold tittles of certified Databricks Data Engineers Asociate/Professional right now?Is there any place I can check the global certificate count?

  • 3966 Views
  • 4 replies
  • 5 kudos
Latest Reply
sher
Valued Contributor II
  • 5 kudos

check here: https://credentials.databricks.com/collection/da21363e-5c7d-410a-b144-dd07d3e22942?_ga=2.163643839.1823848454.1674389186-2106443313.1667211405&_gac=1.49521364.1672812437.CjwKCAiAwc-dBhA7EiwAxPRylBN9S-JeQ8779ec3GXJYBQPfnu_qkv5l_MKO1u4jw2w-...

  • 5 kudos
3 More Replies
Ogi
by New Contributor II
  • 3986 Views
  • 3 replies
  • 1 kudos

Resolved! Azure CosmosDB change feed ingestion via DLT

Is there a way to ingest Azure CosmosDB data via Delta Live Tables? If I use regular workflows it works well, but with DLT I'm not able to set CosmosDB Connector on a cluster.

  • 3986 Views
  • 3 replies
  • 1 kudos
Latest Reply
Ogi
New Contributor II
  • 1 kudos

Thanks a lot! Just wanted to doublecheck whether this natively exists.

  • 1 kudos
2 More Replies
andrew0117
by Contributor
  • 3414 Views
  • 2 replies
  • 0 kudos

depth of view exceeds the maximum view resolution depth (100).

I got this error after updating a view. How can I increase the value of spark.sql.view.maNestedViewDepth to work around this? Thanks!

  • 3414 Views
  • 2 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

Hi, Could you please confirm if you are showing the view? (https://docs.databricks.com/sql/language-manual/sql-ref-syntax-aux-show-views.html) also, it will be helpful if you post the screenshot of the error.

  • 0 kudos
1 More Replies
User16765131552
by Contributor III
  • 2144 Views
  • 1 replies
  • 3 kudos

Delta Sharing Costs

When Delta Sharing is enabled and a link is shared, I understand that the data transfer happens directly and not through the sharing server. I'm curious how costs are calculated. Is the entity making the share available charged for data egress and ...

  • 2144 Views
  • 1 replies
  • 3 kudos
Latest Reply
Databricks_love
New Contributor II
  • 3 kudos

Any news

  • 3 kudos
blackcoffeeAR
by Contributor
  • 5246 Views
  • 5 replies
  • 2 kudos

Cannot install com.microsoft.azure.kusto:kusto-spark

Hello,I'm trying to install/update the library com.microsoft.azure.kusto:kusto-spark_3.0_2.12:3.1.xTried to install with Maven central repository and using Terraform.It was working previously and now the installation always ends with error:│ Error: c...

  • 5246 Views
  • 5 replies
  • 2 kudos
Latest Reply
phisolani
New Contributor II
  • 2 kudos

I have the same problem with a slightly different version of the connector (change on the minor version). I have a job that runs every hour and specifically, this started to happen on the 23rd of January onwards. The error indeed does say the same:Ru...

  • 2 kudos
4 More Replies
Dipesh
by New Contributor II
  • 5964 Views
  • 4 replies
  • 2 kudos

Pausing a scheduled Azure Databricks job after failure

Hi All,I have a job/workflow scheduled in Databricks to run after every hour.How can I configure my Job to pause whenever a job run fails? (Pause the job/workflow on first failure)I would want to prevent triggering multiple runs due to the scheduled/...

  • 5964 Views
  • 4 replies
  • 2 kudos
Latest Reply
Dipesh
New Contributor II
  • 2 kudos

Hi @Hubert Dudek​ , Thank you for your suggestion.I understand that we can use Jobs API to change the pasue_status of job on errors, but sometimes we observed that the workflow/job fails due to cluster issues (while the job clusters are getting creat...

  • 2 kudos
3 More Replies
User16783853906
by Contributor III
  • 3105 Views
  • 1 replies
  • 1 kudos

Understanding file retention with Vacuum

I have seen few instances where users reported that they run OPTIMIZE for the past week worth of data and they follow by VACUUM with RETAIN of 168 HOURS (for example), the old files aren't being deleted, "VACUUM is not removing old files from the tab...

  • 3105 Views
  • 1 replies
  • 1 kudos
Latest Reply
Priyanka_Biswas
Databricks Employee
  • 1 kudos

Hello @Venkatesh Kottapalli​ VACUUM removes all files from the table directory that are not managed by Delta, as well as data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold. ...

  • 1 kudos
User16826992666
by Valued Contributor
  • 1910 Views
  • 1 replies
  • 3 kudos

When developing Delta Live Tables, is there a way to see the query history?

I am not sure where I can look currently to see how my DLT queries are performing. How can I investigate the query plan for past DLT runs?

  • 1910 Views
  • 1 replies
  • 3 kudos
Latest Reply
Priyanka_Biswas
Databricks Employee
  • 3 kudos

Hello @Trevor Bishop​ You can check the query plan in the Spark UI , SQL tab. You would need to select the past run from dropdown and click on SparkUIAdditionally an event log is created and maintained for every Delta Live Tables pipeline. The event ...

  • 3 kudos
databicky
by Contributor II
  • 2584 Views
  • 2 replies
  • 1 kudos

how to get the status of notebook in different notebook

i want to run two notebook like if the count is not equal to zero, first i want to trigger the first notebook and i want to check the particular notebook is succeeded or not ,until the success it need to wait like sleep, if its succeeded means then ...

  • 2584 Views
  • 2 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

You can use dbutils.notebook.run() to execute a notebook from another notebook if conditions are met in your custom logic; you can also use dbutils.jobs.taskValues to pass values between notebooks https://docs.databricks.com/workflows/jobs/how-to-sha...

  • 1 kudos
1 More Replies
Dipesh
by New Contributor II
  • 3006 Views
  • 1 replies
  • 1 kudos

Resolved! Bulk updating Delta tables in Databricks

Hi All,I have some data in Delta table with multiple columns and each record has a unique identifier.I want to update some columns as per the new values coming in for each of these unique records. However updating one record at a time is taking a lot...

  • 3006 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

yes by using MERGE statment

  • 1 kudos
venkat09
by New Contributor III
  • 1690 Views
  • 1 replies
  • 1 kudos

Schema Evolution - Auto Loader for Avro format is not working as expected

* Reading Avro files from s3 and then writing to the delta table * Ingested sample data of 10 files, which contain four columns, and it infers the schema automatically as expected * Introducing a new file which contains a new column [foo] along wi...

  • 1690 Views
  • 1 replies
  • 1 kudos
Latest Reply
venkat09
New Contributor III
  • 1 kudos

I am attaching the sample code notebook that helps to reproduce the issue.

  • 1 kudos
KuldeepChitraka
by New Contributor III
  • 2380 Views
  • 3 replies
  • 6 kudos

Performance Issue : Create DELTA table form 2 TB PARQUET file

We are trying to create a DELTA table (CTAS statement) from 2 TB PARQUET file and its taking huge amount of time around 12~ hrs.is it normal.? What are option to tune/optimize this ? are we doing anything wrongCluster : Interactive/30 Cores / 320 GB ...

  • 2380 Views
  • 3 replies
  • 6 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 6 kudos

Please use COPY INTO (first create an empty delta table) or CONVERT TO DELTA instead of CTAS it will be much more faster, and it process will be auto-optimized.

  • 6 kudos
2 More Replies
mriccardi
by New Contributor II
  • 4449 Views
  • 1 replies
  • 0 kudos

Structured Streaming Checkpoint corrupted.

Hello,We are experiencing an error with one Structured Streaming Job that we have, that basically the checkpoint gets corrupted and we are unable to continue with the execution.I've checked the errors and this happens when it triggers an autocompact,...

  • 4449 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Hi @Martin Riccardi​,Could you share the following please:1) whats your Source?2) whats your Sink?3) could you share your readStream() and writeStream() code?4) full error stack trace5) did you stop and re-run your query after weeks of not being acti...

  • 0 kudos
Sameer_876675
by New Contributor III
  • 6903 Views
  • 3 replies
  • 2 kudos

How to efficiently process a 100GiB JSON nested file and store it in Delta?

Hi, I'm a fairly new user and I am using Azure Databricks to process a ~1000GiB JSON nested file containing insurance policy data. I uploaded the JSON file to Azure Data Lake Gen2 storage and read the JSON file into a dataframe.df=spark.read.option("...

Cluster Summary OOM Error
  • 6903 Views
  • 3 replies
  • 2 kudos
Latest Reply
Annapurna_Hiriy
Databricks Employee
  • 2 kudos

Hi Sameer, please refer to following documents on how to work with nested json:https://docs.databricks.com/optimizations/semi-structured.htmlhttps://learn.microsoft.com/en-us/azure/databricks/kb/_static/notebooks/scala/nested-json-to-dataframe.html

  • 2 kudos
2 More Replies
pramalin
by New Contributor
  • 4105 Views
  • 3 replies
  • 2 kudos
  • 4105 Views
  • 3 replies
  • 2 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 2 kudos

@prudhvi ramalingam​ - Please refer to the below example code.import org.apache.spark.sql.functions.expr val person = Seq( (0, "Bill Chambers", 0, Seq(100)), (1, "Matei Zaharia", 1, Seq(500, 250, 100)), (2, "Michael Armbrust", 1, Seq(250,...

  • 2 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels