cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

chitrar
by New Contributor III
  • 779 Views
  • 9 replies
  • 4 kudos

workflow/lakeflow -why does it not capture all the metadata of the jobs/tasks

Hi, I see with unity catalog we have the workflow and now the lakeflow schema.   I guess the intention is to capture audit logs of changes/ monitor runs but I wonder why we don't have all the  metadata  info on the jobs /tasks too for a given job   =...

  • 779 Views
  • 9 replies
  • 4 kudos
Latest Reply
chitrar
New Contributor III
  • 4 kudos

@Sujitha  so, we can expect these enhancements in the "near" future ?

  • 4 kudos
8 More Replies
antoniomf
by New Contributor
  • 332 Views
  • 0 replies
  • 0 kudos

Bug Delta Live Tables - Checkpoint

Hello, I've encountered an issue with Delta Live Table in both my Development and Production Workspaces. The data is arriving correctly in my Azure Storage Account; however, the checkpoint is being stored in the path dbfs:/. I haven't modified the St...

  • 332 Views
  • 0 replies
  • 0 kudos
jeremy98
by Contributor III
  • 291 Views
  • 1 replies
  • 0 kudos

if else condition task doubt

Hi community,The if else condition task couldn't be used as real if condition? Seems that if the condition goes to False the entire job will be stop. Is it a right behaviour?

  • 291 Views
  • 1 replies
  • 0 kudos
Latest Reply
jeremy98
Contributor III
  • 0 kudos

Hi, I found that the problem is here: - task_key: get_email_infos max_retries: 3 min_retry_interval_millis: 150000 depends_on: - task_key: check_type_of_trigger outcome: "true" ...

  • 0 kudos
htu
by New Contributor III
  • 10517 Views
  • 19 replies
  • 23 kudos

Installing Databricks Connect breaks pyspark local cluster mode

Hi, It seems that when databricks-connect is installed, pyspark is at the same time modified so that it will not anymore work with local master node. This has been especially useful in testing, when unit tests for spark-related code without any remot...

  • 10517 Views
  • 19 replies
  • 23 kudos
Latest Reply
Martinitus
New Contributor III
  • 23 kudos

I agree with most of the comments above, that the current approach of databricks-connect is not great (it sucks to be frankly). Its an issue that was bugging me since more than 2 years now.By the way, i checked how this could be done with poetry and ...

  • 23 kudos
18 More Replies
turagittech
by New Contributor III
  • 314 Views
  • 3 replies
  • 1 kudos

External Table refresh

Hi,I have a blob storage area in Azure where json files are being created. I can create an external table on the storage blob container, but when new files are added I don't see extra rows to query the data. Is there a better approach to accessing th...

  • 314 Views
  • 3 replies
  • 1 kudos
Latest Reply
Nivethan_Venkat
Contributor
  • 1 kudos

Hi @turagittech,The above error indicates that your table seems to be in DELTA format. Please check the table creation statement, if the table format is JSON or DELTA.PS: By default, if you are not specifying any format while creating the table on to...

  • 1 kudos
2 More Replies
Walter_N
by New Contributor II
  • 284 Views
  • 2 replies
  • 0 kudos

Resolved! DLT pipeline task with full refresh once in a while

Hi all, I'm using Databricks workflow with some dlt pipeline tasks. These tasks requires a full refresh at some times due to schema changes in the source. I've been doing the full refresh manually or set the full refresh option in the job settings, t...

  • 284 Views
  • 2 replies
  • 0 kudos
Latest Reply
MariuszK
Contributor III
  • 0 kudos

Hi,Did you check a possibility to use if/else task? You could define some scriteria and pass it from a notebok that will check if it's time for full refresh or just resfres.

  • 0 kudos
1 More Replies
scorpusfx1
by New Contributor II
  • 327 Views
  • 4 replies
  • 0 kudos

Delta Live Table SCD2 performance issue

Hi Community,I am working on ingestion pipelines that take data from Parquet files (200 MB per day) and integrate them into my Lakehouse. This data is used to create an SCD Type 2 using apply_changes, with the row ID as the key and the file date as t...

Data Engineering
apply_change
dlt
SCD2
  • 327 Views
  • 4 replies
  • 0 kudos
Latest Reply
Stefan-Koch
Valued Contributor II
  • 0 kudos

hi @scorpusfx1 What kind of source data do you have? Are these parquet files daily full snapshots of source tables? If so, you should use apply_changes_from_snapshot, which is exactly built for this use case. https://docs.databricks.com/aws/en/dlt/py...

  • 0 kudos
3 More Replies
soumiknow
by Contributor
  • 2103 Views
  • 16 replies
  • 1 kudos

Resolved! BQ partition data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMIC

We have a date (DD/MM/YYYY) partitioned BQ table. We want to update a specific partition data in 'overwrite' mode using PySpark. So to do this, I applied 'spark.sql.sources.partitionOverwriteMode' to 'DYNAMIC' as per the spark bq connector documentat...

  • 2103 Views
  • 16 replies
  • 1 kudos
Latest Reply
VZLA
Databricks Employee
  • 1 kudos

@soumiknow , Just checking if there are any further questions, and did my last comment help?

  • 1 kudos
15 More Replies
Livingstone
by New Contributor II
  • 1352 Views
  • 3 replies
  • 3 kudos

Install maven package to serverless cluster

My task is to export data from CSV/SQL into Excel format with minimal latency. To achieve this, I used a Serverless cluster.Since PySpark does not support saving in XLSX format, it is necessary to install the Maven package spark-excel_2.12. However, ...

  • 1352 Views
  • 3 replies
  • 3 kudos
Latest Reply
GalenSwint
New Contributor II
  • 3 kudos

I also have this question and wondered what the options were / are 

  • 3 kudos
2 More Replies
analytics_eng
by New Contributor II
  • 1718 Views
  • 4 replies
  • 1 kudos

Connection reset by peer logging when importing custom package

Hi! I'm trying to import a custom package I published to Azure Artifacts, but I keep seeing the INFO logging below, which I don't want to display. The package was installed correctly on the cluster, and it imports successfully, but the log still appe...

  • 1718 Views
  • 4 replies
  • 1 kudos
Latest Reply
siklosib
New Contributor II
  • 1 kudos

What solved this problem for me is to remove the root logger configuration from the logging config and create another one within the loggers section. See below.{ 'version': 1, 'disable_existing_loggers': False, 'formatters': { 'simple...

  • 1 kudos
3 More Replies
nhuthao
by New Contributor II
  • 526 Views
  • 5 replies
  • 1 kudos

SQL is not enabled

Hi All,I have registered on Databricks successfully. However, SQL is not enabled.Please help me how to activate SQL.Thank you very much,

nhuthao_0-1741243500074.png
  • 526 Views
  • 5 replies
  • 1 kudos
Latest Reply
Stefan-Koch
Valued Contributor II
  • 1 kudos

@nhuthao How did you solved it? What was the problem?

  • 1 kudos
4 More Replies
cgrant
by Databricks Employee
  • 14200 Views
  • 3 replies
  • 4 kudos

What is the difference between OPTIMIZE and Auto Optimize?

I see that Delta Lake has an OPTIMIZE command and also table properties for Auto Optimize. What are the differences between these and when should I use one over the other?

  • 14200 Views
  • 3 replies
  • 4 kudos
Latest Reply
basit
New Contributor II
  • 4 kudos

Is this still valid answer in 2025 ? https://docs.databricks.com/aws/en/delta/tune-file-size#auto-compaction-for-delta-lake-on-databricks 

  • 4 kudos
2 More Replies
ShivangiB
by New Contributor III
  • 413 Views
  • 2 replies
  • 0 kudos

Resolved! Fatctors deciding to choose between zorder, partitioning and liquid clustering

What are the factors on which we should choose the optimization approach

  • 413 Views
  • 2 replies
  • 0 kudos
Latest Reply
canadiandataguy
New Contributor III
  • 0 kudos

I have built a decision tree on how to think about it https://www.canadiandataguy.com/p/optimizing-delta-lake-tables-liquid?triedRedirect=true

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels