cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

mai_luca
by Contributor
  • 1249 Views
  • 5 replies
  • 2 kudos

Resolved! Validation with views - Dlt pipeline expectations

I have a question about how expectations work when applied to views inside a Delta Live Tables (DLT) pipeline. For instance, suppose we define this view inside a pipeline to stop the pipeline if we spot some duplicates:@Dlt.view( name=view_name, ...

  • 1249 Views
  • 5 replies
  • 2 kudos
Latest Reply
Yogesh_Verma_
Contributor II
  • 2 kudos

In DLT, expectations defined with dlt.expect_or_fail() on views are only evaluated if the view is used downstream by a materialized table. Since views are logical and lazily evaluated, if no table depends on the view, the expectation is skipped and t...

  • 2 kudos
4 More Replies
bgerhardi
by New Contributor III
  • 15641 Views
  • 13 replies
  • 13 kudos

Surrogate Keys with Delta Live

We are considering moving to Delta Live tables from a traditional sql-based data warehouse. Worrying me is this FAQ on identity columns Delta Live Tables frequently asked questions | Databricks on AWS this seems to suggest that we basically can't cre...

  • 15641 Views
  • 13 replies
  • 13 kudos
Latest Reply
tmaund1704
New Contributor II
  • 13 kudos

Hi , Is there any resolution for the above?Thanks

  • 13 kudos
12 More Replies
sahil_s_jain
by New Contributor III
  • 1174 Views
  • 3 replies
  • 0 kudos

How to Exclude or Overwrite Specific JARs in Databricks Jars

Spark Version in Databricks 15.5 LTS: The runtime includes Apache Spark 3.5.x, which defines the SparkListenerApplicationEnd constructor as:public SparkListenerApplicationEnd(long time)This constructor takes a single long parameter.Conflicting Spark ...

  • 1174 Views
  • 3 replies
  • 0 kudos
Latest Reply
baljeetyadav_23
New Contributor II
  • 0 kudos

Hi Alberto_Umana,Do we have fix of this issue in 16.4 LTS?

  • 0 kudos
2 More Replies
pooja_bhumandla
by New Contributor III
  • 1440 Views
  • 3 replies
  • 0 kudos

data file size

"numRemovedFiles": "2099","numRemovedBytes": "29658974681","p25FileSize": "29701688","numDeletionVectorsRemoved": "0","minFileSize": "19920357","numAddedFiles": "883","maxFileSize": "43475356","p75FileSize": "34394580","p50FileSize": "31978037","numA...

  • 1440 Views
  • 3 replies
  • 0 kudos
Latest Reply
pooja_bhumandla
New Contributor III
  • 0 kudos

What are the criterias based on which max and min files sizes vary from target file size? 

  • 0 kudos
2 More Replies
Alex79
by New Contributor II
  • 1340 Views
  • 2 replies
  • 0 kudos

Get Job Run output through Rest API call

I have a simple notebook reading a dataframe as input and returning another dataframe, which is as follows:from pyspark.sql import SparkSessionimport pandas as pd, jsonspark = SparkSession.builder \    .appName("Pandas to Spark DataFrame Conversion")...

  • 1340 Views
  • 2 replies
  • 0 kudos
Latest Reply
Vidhi_Khaitan
Databricks Employee
  • 0 kudos

Hi team,{"error_code": "INVALID_PARAMETER_VALUE","message": "Retrieving the output of runs with multiple tasks is not supported..."}means the job you're triggering (job_id = 'my_job_id') is a multi-task job (even if it has only one task). In such cas...

  • 0 kudos
1 More Replies
cool_cool_cool
by New Contributor II
  • 2604 Views
  • 3 replies
  • 0 kudos

Databricks Workflow is stuck on the first task and doesnt do anyworkload

Heya I have a workflow in databricks with 2 tasks. They are configured to run on the same job cluster, and the second task depends on the first.I have a weird behavior that happened twice now - the job takes a long time (it usually finishes within 30...

  • 2604 Views
  • 3 replies
  • 0 kudos
Latest Reply
Sri_M
New Contributor II
  • 0 kudos

@cool_cool_cool I am facing same issue as well.Is this issue resolved for you? If yes, can you please let me know what action have you taken?

  • 0 kudos
2 More Replies
lorenz
by New Contributor III
  • 13355 Views
  • 8 replies
  • 3 kudos

Resolved! Databricks approaches to CDC

I'm interested in learning more about Change Data Capture (CDC) approaches with Databricks. Can anyone provide insights on the best practices and recommendations for utilizing CDC effectively in Databricks? Are there any specific connectors or tools ...

  • 13355 Views
  • 8 replies
  • 3 kudos
Latest Reply
Deekay
New Contributor II
  • 3 kudos

Hi @jcozar ,Thank you so much for your response  I have some queries, it will be really helpful if you can share your thoughts.How are you segregating the tables from raw to bronze? Suppose Debezium is capturing CDCs from 100 tables, all changes are ...

  • 3 kudos
7 More Replies
lezwon
by Contributor
  • 1071 Views
  • 2 replies
  • 3 kudos

Resolved! Install custom wheel from dbfs in serverless enviroment

Hey folks,I have a job that runs on a serverless compute. I have also created a wheel file with custom functions, which I require in this job. I see that from here, we cannot install libraries for a task and must use notebook-scoped libraries. So wha...

  • 1071 Views
  • 2 replies
  • 3 kudos
Latest Reply
loui_wentzel
Contributor
  • 3 kudos

Is your dbfs mounted?Otherwise, try uploading it to your workspace's "shared" folder - this is a common place to put these sorts of files. dbfs is slowly getting phased out and not really in any best practices.

  • 3 kudos
1 More Replies
pooja_bhumandla
by New Contributor III
  • 753 Views
  • 3 replies
  • 0 kudos

Auto tuning of file size

Why maxFileSize and minFileSize are different from targetFileSize after optimization? What is the significance of targetFileSize? "numRemovedFiles": "2099","numRemovedBytes": "29658974681","p25FileSize": "29701688","numDeletionVectorsRemoved": "0","m...

  • 753 Views
  • 3 replies
  • 0 kudos
Latest Reply
loui_wentzel
Contributor
  • 0 kudos

there could be several different reasons, but mainly, it's because grouping arbitrary data into some target file-size is well... arbitrary.Imagine I gave you a large container of sand and some emtpy buckets, and asked you to move the sand from the co...

  • 0 kudos
2 More Replies
SreedharVengala
by New Contributor III
  • 30361 Views
  • 11 replies
  • 7 kudos

PGP Encryption / Decryption in Databricks

Is there a way to Decrypt / Encrypt Blob files in Databricks using Key stored in Key Vault. What libraries need to be used? Any code snippets? Links?

  • 30361 Views
  • 11 replies
  • 7 kudos
Latest Reply
Junpei_Liang
New Contributor II
  • 7 kudos

anyone has update on this?

  • 7 kudos
10 More Replies
Ramki
by New Contributor
  • 361 Views
  • 1 replies
  • 0 kudos

Lakeflow clarification

Are there options to modify the streaming table after it has been created by the Lakeflow pipeline? In the use case I'm trying to solve, I need to add delta.enableIcebergCompatV2 and delta.universalFormat.enabledFormats to the target streaming table....

  • 361 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

Hi @Ramki Yes, you can modify a streaming table created by a LakeFlow pipeline, especially when the pipeline is in triggered mode (not running continuously).In your case, you want to add the following Delta table properties: TBLPROPERTIES ( 'delta....

  • 0 kudos
michelleliu
by New Contributor III
  • 1909 Views
  • 3 replies
  • 2 kudos

Resolved! DLT Performance Issue

I've been seeing patterns in DLT process time in all my pipelines, as in attached screenshot. Each data point is an "update" that's set to "continuous". The process time keeps increasing until a point and drops back to what it's desired to be. This w...

  • 1909 Views
  • 3 replies
  • 2 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 2 kudos

Hi @michelleliu This sawtooth pattern in DLT processing times is actually quite common and typically indicates one of several underlying issues. Here are the most likely causes and solutions:Common Causes1. Memory Pressure & Garbage CollectionProcess...

  • 2 kudos
2 More Replies
alau131
by New Contributor
  • 878 Views
  • 2 replies
  • 2 kudos

How to dynamically have the parent notebook call on a child notebook?

Hi! I would please like help on how to dynamically call one notebook from another in Databricks and have the parent notebook get the dataframe results from the child notebook. Some background info is that I have a main python notebook and multiple SQ...

  • 878 Views
  • 2 replies
  • 2 kudos
Latest Reply
jameshughes
Contributor II
  • 2 kudos

What you are looking to do is really not the intent of notebooks and you cannot pass complex data types between notebooks. You would need to persist your data frame from the child notebook so your parent notebook could retrieve the results after the ...

  • 2 kudos
1 More Replies
Abel_Martinez
by Contributor
  • 20569 Views
  • 10 replies
  • 10 kudos

Resolved! Why I'm getting connection timeout when connecting to MongoDB using MongoDB Connector for Spark 10.x from Databricks

I'm able to connect to MongoDB using org.mongodb.spark:mongo-spark-connector_2.12:3.0.2 and this code:df = spark.read.format("com.mongodb.spark.sql.DefaultSource").option("uri", jdbcUrl)It works well, but if I install last MongoDB Spark Connector ve...

  • 20569 Views
  • 10 replies
  • 10 kudos
Latest Reply
ravisharma1024
New Contributor II
  • 10 kudos

I was facing the same issue, now It is resolved, and thanks to @Abel_Martinez.I am using this like below code:df = spark.read.format("mongodb") \.option('spark.mongodb.read.connection.uri', "mongodb+srv://*****:*****@******/?retryWrites=true&w=majori...

  • 10 kudos
9 More Replies
vanverne
by New Contributor II
  • 2286 Views
  • 3 replies
  • 1 kudos

Assistance with Capturing Auto-Generated IDs in Databricks SQL

Hello,I am currently working on a project where I need to insert multiple rows into a table and capture the auto-generated IDs for each row. I am using databricks sql connector. Here is a simplified version of my current workflow:I create a temporary...

  • 2286 Views
  • 3 replies
  • 1 kudos
Latest Reply
vanverne
New Contributor II
  • 1 kudos

Thanks for the reply, Alfonso. I noticed you mentioned "Below are a few alternatives...", however, I am not seeing those. Please let me know if I am missing something. Also, do you know if Databricks is working on supporting the RETURNING clause soon...

  • 1 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels