Data Engineering

Forum Posts

Sorted by:

by yit • Databricks Partner

09-02-2025 3:38:29 AM

1118 Views
3 replies
2 kudos

Resolved! Autoloader: Trigger batch vs micro-batch (as in .forEachBatch)

Hey everyone,I’m trying to clarify a confusion in AutoLoader regarding trigger batches and micro-batches when using .forEachBatch.Here’s what I understand so far:Trigger batch – Controlled by cloudFiles.maxFilesPerTrigger and cloudFiles.maxBytesPerTr...

Data Engineering

autoloader

batch

micro-batch

spark

1118 Views
3 replies
2 kudos

09-02-2025 3:38:29 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

09-02-2025 4:01:53 AM

2 kudos

Hi @yit ,1. They are not quite the same. Trigger batch defines how many new files Auto Loader lists for ingestion per streaming trigger (this is controlled as you correctly pointed out by cloudFiles.maxFilesPerTrigger and cloudFiles.maxBytesPerTrigge...

2 kudos

09-02-2025 4:01:53 AM

2 More Replies

by xavier_db • Databricks Partner

09-02-2025 4:15:47 AM

482 Views
1 replies
0 kudos

Postgress Lakeflow connect

I want to get data from postgress using lakeflow connect for every 10 mins, how to set-up lakeflow connect, can you give step-by-step process, for creating lakeflow connect pipeline?

Data Engineering

482 Views
1 replies
0 kudos

09-02-2025 4:15:47 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

09-02-2025 4:23:25 AM

0 kudos

Hi @xavier_db ,Postgres lakeflow connector is currently in private preview according to below thread:Solved: Lakeflow Connect - Postgres connector - Databricks Community - 127633But the thing is I cannot see it in Workspace Preview and Account Previe...

0 kudos

09-02-2025 4:23:25 AM

by ck7007 • Contributor II

09-01-2025 9:04:23 AM

704 Views
3 replies
3 kudos

Advanced Technique

Reduced Monthly Databricks Bill from $47K to $12.7KThe Problem: We were scanning 2.3TB for queries needing only 8GB of data.Three Quick Wins1. Multi-dimensional Partitioning (30% savings)# Beforedf.write.partitionBy("date").parquet(path)# After-parti...

Data Engineering

704 Views
3 replies
3 kudos

09-01-2025 9:04:23 AM

View Replies

Latest Reply

BS_THE_ANALYST
Databricks Partner

09-02-2025 1:28:14 AM

3 kudos

@ck7007 no worries. I asked a question on the other thread: https://community.databricks.com/t5/data-engineering/cost/td-p/130078 , I'm not sure if you're classing this thread as the duplicate or the other one so I'll repost.I didn't see you mention ...

3 kudos

09-02-2025 1:28:14 AM

2 More Replies

by Pratikmsbsvm • Contributor

09-01-2025 8:45:41 AM

856 Views
2 replies
2 kudos

Resolved! Read Files from Adobe and Push to Delta table ADLS Gen2

The Upstream is sending 2 files of different schema. The Storage Account has Private Endpoints. there is no public access.no public IP (NPIP) = yes.How to design using only Databricks :-1. Databricks API to read data file from Adobe and Push it to AD...

Data Engineering

856 Views
2 replies
2 kudos

09-01-2025 8:45:41 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

09-02-2025 1:26:20 AM

2 kudos

Hi @Pratikmsbsvm ,Okay, since you’re going to use Databricks compute for data extraction and you wrote that your workspace is deployed with the secure connectivity cluster (NPIP) option enabled, you first need to make sure that you have a stable egre...

2 kudos

09-02-2025 1:26:20 AM

1 More Replies

by brian999 • Contributor

07-10-2024 10:03:19 AM

4646 Views
5 replies
2 kudos

Resolved! Managing libraries in workflows with multiple tasks - need to configure a list of libs for all tasks

I have workflows with multiple tasks, each of which need 5 different libraries to run. When I have to update those libraries, I have to go in and make the update in each and every task. So for one workflow I have 20 different places where I have to g...

Data Engineering

4646 Views
5 replies
2 kudos

07-10-2024 10:03:19 AM

View Replies

Latest Reply

brian999
Contributor

07-11-2024 8:22:53 AM

2 kudos

Actually I think I found most of a solution here in one of the replies: https://community.databricks.com/t5/administration-architecture/installing-libraries-on-job-clusters/m-p/37365/highlight/true#M245It seems like I only have to define libs for the...

2 kudos

07-11-2024 8:22:53 AM

4 More Replies

by guilhermecs001 • New Contributor II

09-01-2025 12:49:36 PM

461 Views
1 replies
2 kudos

How to work with 300 billions rows and 5 columns?

Hi guys!I'm having a problem at work where I need to process a customer data dataset with 300 billion rows and 5 columns. The transformations I need to perform are "simple," like joins to assign characteristics to customers. And at the end of the pro...

Data Engineering

461 Views
1 replies
2 kudos

09-01-2025 12:49:36 PM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

09-01-2025 1:32:53 PM

2 kudos

Hi @guilhermecs001 ,Wow, that's massive amount of rows. Can you somehow preprocess first this huge CSV file? For example, read CSV, partition by some columns that makes sense (maybe country from which customer is coming from) and save that data as de...

2 kudos

09-01-2025 1:32:53 PM

by Sainath368 • Contributor

09-01-2025 4:08:47 AM

434 Views
1 replies
2 kudos

Is Photon Acceleration Helpful for All Maintenance Tasks (OPTIMIZE, VACUUM, ANALYZE_COMPUTE_STATS)?

Hi everyone,We’re currently reviewing the performance impact of enabling Photon acceleration on our Databricks jobs, particularly those involving table maintenance tasks. Our job includes three main operations: OPTIMIZE, VACUUM, and ANALYZE_COMPUTE_S...

Data Engineering

434 Views
1 replies
2 kudos

09-01-2025 4:08:47 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

09-01-2025 4:41:17 AM

2 kudos

Hi @Sainath368 ,I wouldn't use photon for this kind of task. You should use it primarly for ETL transformations where it shines.VACUUM and OPTIMIZE are more of maintenance tasks and using photon would be pricey overkill here.According to documentatio...

2 kudos

09-01-2025 4:41:17 AM

by merca • Valued Contributor II

12-15-2021 10:59:27 PM

14634 Views
13 replies
7 kudos

Value array {{QUERY_RESULT_ROWS}} in Databricks SQL alerts custom template

Please include in documentation an example how to incorporate the `QUERY_RESULT_ROWS` variable in the custom template.

Data Engineering

14634 Views
13 replies
7 kudos

12-15-2021 10:59:27 PM

View Replies

Latest Reply

CJK053000
New Contributor III

10-28-2024 8:11:57 AM

7 kudos

Databricks confirmed this was an issue on their end and it should be resolved now. It is working for me.

7 kudos

10-28-2024 8:11:57 AM

12 More Replies

by Phani1 • Databricks MVP

09-01-2025 1:32:49 AM

826 Views
2 replies
1 kudos

Resolved! cosmosdb metadata integration with unity catalog

Hi Team,How can we integrate Cosmos DB Meta data with Unity Catalog, can you please provide some insights on this?Regards,Phani

Data Engineering

826 Views
2 replies
1 kudos

09-01-2025 1:32:49 AM

View Replies

Latest Reply

Khaja_Zaffer
Esteemed Contributor

09-01-2025 1:42:40 AM

1 kudos

Hello @Phani1 Good day:I have found a whole document on your requirementshttps://community.databricks.com/t5/technical-blog/optimising-data-integration-and-serving-patterns-with-cosmos-db/ba-p/91977 It has a project with it as well.

1 kudos

09-01-2025 1:42:40 AM

1 More Replies

by Datalight • Contributor

08-31-2025 10:49:45 PM

550 Views
1 replies
0 kudos

Resolved! How to build Data Pipeline to consume data from Adobe Campaign to Azure Databricks

May Techie please help me design the pipeline with Databricks.I don't have any control over Adobe.How to set up a data pipeline that moves csv files from Adobe to ADLS Gen2 via a cron job, using Databricks.where this cron job will execute ? how ADLS ...

Data Engineering

550 Views
1 replies
0 kudos

08-31-2025 10:49:45 PM

View Replies

Latest Reply

Khaja_Zaffer
Esteemed Contributor

09-01-2025 1:05:40 AM

0 kudos

Hello @Datalight Good day!Can I please know what do you mean by "you dont have any control over adobe"? I found similar case study over here: https://learn.microsoft.com/en-us/answers/questions/5533633/data-pipeline-to-push-files-from-external-system...

0 kudos

09-01-2025 1:05:40 AM

by SangNguyen • New Contributor III

08-25-2025 8:53:53 AM

1793 Views
8 replies
5 kudos

Resolved! Cannot deploy DAB with the Job branch using a feature branch in Workspace UI

Hi, I tried to deploy DAB on Workspace UI with a feature branch (sf-trans-seq) targeted to Dev. After deploying successfully, the Job branch is, however, using the master branch (see the screenshot below).Is there any option to force the Job branch t...

Issue - DAB Deployment on Workspace UI.png

Data Engineering

1793 Views
8 replies
5 kudos

08-25-2025 8:53:53 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

09-01-2025 12:14:31 AM

5 kudos

I agree.Can you mark your (or someone else´s) answer as solved? Because I think you won´t be the only one with this issue/feature.

5 kudos

09-01-2025 12:14:31 AM

7 More Replies

by xavier_db • Databricks Partner

08-31-2025 9:33:38 AM

955 Views
1 replies
1 kudos

Resolved! Mongodb connection in GCP Databricks

I am trying to connect with Mongodb from databricks which is UC enabled, and both the mongodb and databricks are in same VPC, I am using the below code, df = ( spark.read.format("mongodb") .option( "connection.uri", f'''mongodb://{username}:{password...

Data Engineering

955 Views
1 replies
1 kudos

08-31-2025 9:33:38 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

08-31-2025 12:37:25 PM

1 kudos

Hi @xavier_db ,Standard access mode has more limitations compared to dedicate access mode. For example, look at the limitations list of standard access mode:Standard compute requirements and limitations | Databricks on AWSNow, compare it to dedicated...

1 kudos

08-31-2025 12:37:25 PM

by fix_databricks • New Contributor II

05-08-2024 10:47:01 AM

4602 Views
3 replies
0 kudos

Cannot run another notebook from same directory

Hello, I am having a similar problem from this thread which was never resolved: https://community.databricks.com/t5/data-engineering/unexpected-error-while-calling-notebook-string-matching-regex-w/td-p/18691 I renamed a notebook (utility_data_wrangli...

Data Engineering

4602 Views
3 replies
0 kudos

05-08-2024 10:47:01 AM

View Replies

Latest Reply

ddundovic
New Contributor III

08-31-2025 10:09:55 AM

0 kudos

I am running into the same issue. It seems like the `%run` magic command is trying to parse the entire cell content as its arguments. So if you have%run "my_notebook" print("hello")in the same cell, you will get the following error: `Failed to parse...

0 kudos

08-31-2025 10:09:55 AM

2 More Replies

by Raj_DB • Contributor

08-28-2025 5:39:00 AM

3630 Views
9 replies
12 kudos

Resolved! Pass Notebook parameters dynamically in Job task.

Hi Everyone, I'm working on scheduling a job and would like to pass parameters that I've defined in my notebook. Ideally, I'd like these parameters to be dynamic meaning that if I update their values in the notebook, the scheduled job should automati...

Data Engineering

3630 Views
9 replies
12 kudos

08-28-2025 5:39:00 AM

View Replies

Latest Reply

ck7007
Contributor II

08-28-2025 12:07:17 PM

12 kudos

I see you're using dbutils.widgets. text and dropdown—perfect! You're already on the right track.Quick SolutionYour widgets are already dynamic! Just pass parameters in your job configuration:In your notebook (slight refactor of your code):# Define w...

12 kudos

08-28-2025 12:07:17 PM

8 More Replies

by Erik • Valued Contributor III

12-16-2021 11:23:08 AM

20437 Views
13 replies
8 kudos

Grafana + databricks = True?

We have some timeseries in databricks, and we are reading them into powerbi through sql compute endpoints. For timeseries powerbi is ... not optimal. Earlier I have used grafana with various backends, and quite like it, but I cant find any way to con...

Data Engineering

20437 Views
13 replies
8 kudos

12-16-2021 11:23:08 AM

View Replies

Latest Reply

frugson
New Contributor II

08-29-2025 9:03:50 PM

8 kudos

@Erik wrote:We have some timeseries in databricks, and we are reading them into powerbi through sql compute endpoints. For timeseries powerbi is ... not optimal. Earlier I have used grafana with various backends, and quite like it, but I cant find an...

8 kudos

08-29-2025 9:03:50 PM

12 More Replies

Databricks Community

Forum Posts

Resolved! Autoloader: Trigger batch vs micro-batch (as in .forEachBatch)

Postgress Lakeflow connect

Advanced Technique

Resolved! Read Files from Adobe and Push to Delta table ADLS Gen2

Resolved! Managing libraries in workflows with multiple tasks - need to configure a list of libs for all tasks

How to work with 300 billions rows and 5 columns?

Is Photon Acceleration Helpful for All Maintenance Tasks (OPTIMIZE, VACUUM, ANALYZE_COMPUTE_STATS)?

Value array {{QUERY_RESULT_ROWS}} in Databricks SQL alerts custom template

Resolved! cosmosdb metadata integration with unity catalog

Resolved! How to build Data Pipeline to consume data from Adobe Campaign to Azure Databricks

Resolved! Cannot deploy DAB with the Job branch using a feature branch in Workspace UI

Resolved! Mongodb connection in GCP Databricks

Cannot run another notebook from same directory

Resolved! Pass Notebook parameters dynamically in Job task.

Grafana + databricks = True?

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template