cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Alby091
by New Contributor
  • 2384 Views
  • 2 replies
  • 0 kudos

Multiple schedules in workflow with different parameters

I have a notebook that takes a file from the landing, processes it and saves a delta table.This notebook contains a parameter (time_prm) that allows you to do this option for the different versions of files that arrive every day.Specifically, for eac...

Data Engineering
parameters
Workflows
  • 2384 Views
  • 2 replies
  • 0 kudos
Latest Reply
ImranA
Contributor
  • 0 kudos

You can do multiple schedules with Cron expression. If you are using a Cron expression in Databricks asset bundle YAML, but the limitation is you can't have one running at 0 past the hour and another at 25 past.i.e: quartz_cron_expression: 0 45 9,23 ...

  • 0 kudos
1 More Replies
Spenyo
by New Contributor II
  • 1883 Views
  • 1 replies
  • 1 kudos

Delta table size not shrinking after Vacuum

Hi team.Everyday once we overwrite the last X month data in tables. So it generate a every day a larger amount of history. We don't use time travel so we don't need it.What we done:SET spark.databricks.delta.retentionDurationCheck.enabled = false ALT...

chrome_KZMxPl8x1d.png
  • 1883 Views
  • 1 replies
  • 1 kudos
Latest Reply
pabloaschieri
New Contributor II
  • 1 kudos

Hi, any update on this? Thanks

  • 1 kudos
vamsi_simbus
by Contributor
  • 1332 Views
  • 2 replies
  • 2 kudos

Resolved! Migrating Talend ETL Jobs to Databricks – Best Practices & Challenges

Hi All,I’m currently working on a Proof of Concept (POC) to migrate existing Talend ETL jobs to Databricks. The goal is to leverage Databricks for data processing and orchestration while moving away from Talend.I’d appreciate insights on the followin...

Data Engineering
migration
Talend
  • 1332 Views
  • 2 replies
  • 2 kudos
Latest Reply
vamsi_simbus
Contributor
  • 2 kudos

@AbhaySingh Thank you for your insights.

  • 2 kudos
1 More Replies
fjrodriguez
by New Contributor III
  • 835 Views
  • 2 replies
  • 1 kudos

Resolved! Ingestion Framework

I would to like to update my ingestion framework that is orchestrated by ADF, running couples Databricks notebook and copying the data to DB afterwards. I want to rely everything on Databricks i though this could be the design:Step 1. Expose target t...

  • 835 Views
  • 2 replies
  • 1 kudos
Latest Reply
fjrodriguez
New Contributor III
  • 1 kudos

Hey @saurabh18cs , It is taking longer than expected to expose Azure SQL tables in UC. I can do that through Foreign Catalog but this is not what i want due to is read-only. As far i can see external connection is for cloud object storage paths (ADLS...

  • 1 kudos
1 More Replies
Rjdudley
by Honored Contributor
  • 920 Views
  • 3 replies
  • 0 kudos

Resolved! AUTO CDC API and sequence column

The docs for AUTO CDC API stateYou must specify a column in the source data on which to sequence records, which Lakeflow Declarative Pipelines interprets as a monotonically increasing representation of the proper ordering of the source data.Can this ...

  • 920 Views
  • 3 replies
  • 0 kudos
Latest Reply
Rjdudley
Honored Contributor
  • 0 kudos

Thanks Szymon, I'm familiar with the Postgre SQL implementation and was hoping Databricks would behave the same.

  • 0 kudos
2 More Replies
ankit001mittal
by New Contributor III
  • 2434 Views
  • 1 replies
  • 2 kudos

DLT schema evolution/changes in the logs

Hi all,I want to figure out how to find when the schema evolution/changes are happening in the objects in DLT pipelines through the DLT logs.Could you please share some sample DLT logs which explains about the schema changes?Thank you for your help.

  • 2434 Views
  • 1 replies
  • 2 kudos
Latest Reply
mark_ott
Databricks Employee
  • 2 kudos

To find when schema evolution or changes are happening in objects within DLT (Delta Live Table) pipelines, you need to monitor certain entries within the DLT logs or Delta transaction logs that signal modifications to the underlying schema of a table...

  • 2 kudos
minhhung0507
by Valued Contributor
  • 3045 Views
  • 3 replies
  • 0 kudos

DLT Flow Failed Due to Missing Flow Checkpoints Directory When Using Unity Catalog

I’m encountering an issue while running a Delta Live Tables (DLT) pipeline that is managed using Unity Catalog on Databricks. The pipeline has failed and is not restarting, showing the following error:java.lang.IllegalArgumentException: flow checkpoi...

  • 3045 Views
  • 3 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The best practices for setting up checkpointing in Delta Live Tables (DLT) pipelines when using Unity Catalog are largely centered on leveraging Databricks' managed services, adhering to Unity Catalog's table management conventions, and minimizing th...

  • 0 kudos
2 More Replies
sumitkumar_284
by New Contributor II
  • 1447 Views
  • 4 replies
  • 1 kudos

Not able to refresh powerbi dashboar form databricks jobs

I am trying to refresh Power BI Dashboard using Databricks jobs and constantly getting this error, but I am providing optional parameters which includes catalog and database. Also, things to note that I am able to do refresh on Power BI UI using both...

  • 1447 Views
  • 4 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @sumitkumar_284 ,Can you provide us more details? Are you using Unity Catalog? Which authentication mechanism you have? In which version of Power BI Desktop you've developed your semantic model/dashboard? Do you meet all below requirements?Publish...

  • 1 kudos
3 More Replies
maninegi05
by New Contributor II
  • 881 Views
  • 3 replies
  • 1 kudos

Resolved! DLT Pipeline Stopped working

Hello, Suddenly our DLT pipelines we're getting failures saying thatLookupError: Traceback (most recent call last): result_df = result_df.withColumn("input_file_path", col("_metadata.file_path")).withColumn( ...

  • 881 Views
  • 3 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Greetings @maninegi05 , I did some digging internally and I believe some recent changes to the DLT image may be to blame. We are aware of regression issue and are actively working to address them. TL/DR Why you might see “LookupError: ContextVar 'par...

  • 1 kudos
2 More Replies
devpavan
by New Contributor III
  • 1184 Views
  • 7 replies
  • 0 kudos

Resolved! Encountering an error while setting up a single-node cluster on top of aws

Hi Team,I'm trying to create a single-node cluster in Databricks on AWS, but I'm encountering an error. Could you please assist me with this?{ "reason": { "code": "INVALID_ARGUMENT", "type": "CLIENT_ERROR", "parameters": { "databr...

  • 1184 Views
  • 7 replies
  • 0 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 0 kudos

@devpavan Are you using API or terraform to create. Can you please share the json config that you are passing?

  • 0 kudos
6 More Replies
lezwon
by Contributor
  • 3114 Views
  • 1 replies
  • 1 kudos

Resolved! Cant view DAB deployed pipelines in Databricks UI

I am using the databricks asset pipeline to version control the jobs and pipelines in my workspace. I recently pulled these pipelines from the workspace using the `databricks bundle generate pipeline` command and deployed them back using `databricks ...

  • 3114 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Hey @lezwon    Thanks for the details and screenshots—this looks like a permissions/ownership issue with your newly deployed Delta Live Tables pipelines.   What’s going on Pipelines run under the pipeline owner’s identity (Databricks recommends a ser...

  • 1 kudos
mahfooz_iiitian
by New Contributor III
  • 438 Views
  • 3 replies
  • 0 kudos

databricks serverless cluster and poetry private repository

Currently we are evaluating the databricks serverless. It support public repository in poetry as dependency path but it is not supporting private repository as we are not sure whether put the credentials details regarding privare repository.

  • 438 Views
  • 3 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @mahfooz_iiitian ,Databricks supports private repositories only for Notebook-scoped libraries.In serverless you can use do it using pip install (of course store you token in a safe palce):Notebook-scoped Python libraries - Azure Databricks | Micro...

  • 0 kudos
2 More Replies
VaDim
by New Contributor III
  • 2171 Views
  • 2 replies
  • 3 kudos

Resolved! ModuleNotFound error when using transformWithStateInPandas via a class defined outside the notebook

As per Databricks documentation when I define the class that extends `StatefulProcessor` in a Notebook everything works ok however, execution fails with ModuleNotFound error as soon as the class definition is moved to a file (module) of it's own in a...

Data Engineering
transformWithState
  • 2171 Views
  • 2 replies
  • 3 kudos
Latest Reply
VaDim
New Contributor III
  • 3 kudos

This is no longer an issue; it must be some patch version of DBX Runtime 16.4 fixed it and it works now without doing any changes to original code.Thanks.

  • 3 kudos
1 More Replies
smoortema
by Contributor
  • 1054 Views
  • 5 replies
  • 4 kudos

Resolved! when automatic liquid clustering is enabled, how to know which columns are used for clustering?

Let's say a table is configured to have automatic liquid clustering:ALTER TABLE table1 CLUSTER BY AUTO; How to know which columns were chosen by Databricks?

  • 1054 Views
  • 5 replies
  • 4 kudos
Latest Reply
smoortema
Contributor
  • 4 kudos

From the documentation, it seems that in Python, there is such an option, only when creating or replacing a table.# To set clustering columns and auto, which serves as a way to give a hint # for the initial selection. df.writeTo(...).using("delta") ...

  • 4 kudos
4 More Replies
tarunnagpal
by New Contributor III
  • 2042 Views
  • 7 replies
  • 3 kudos

Lakebridge questions

We have a few questions before we propose Lakebridge as the migration tooling for one of our customers, where the requirement is to migrate from Redshift to Databricks. We need help with your quick response so we can proceed with the next steps:Our u...

  • 2042 Views
  • 7 replies
  • 3 kudos
Latest Reply
sky_bricks
New Contributor II
  • 3 kudos

Hi community,We’re currently planning a migration from an on-premise SQL Server data warehouse (with associated SSIS packages) to Databricks Unity Catalog. As part of this effort, we’re evaluating the use of Lakebridge for assessment, conversion, and...

  • 3 kudos
6 More Replies
Labels