cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

DRock
by New Contributor II
  • 3300 Views
  • 7 replies
  • 0 kudos

Resolved! ODBC data source to connect to a Databricks catalog.database via MS Access Not Working

When using an ODBC data source to connect to a Databricks catalog database via Microsoft Access, the tables are not listing/appearing in the MS Access database for selection.However, when using the same ODBC data source to connect to Microsoft Excel,...

  • 3300 Views
  • 7 replies
  • 0 kudos
Latest Reply
Senefelder
New Contributor II
  • 0 kudos

Why do «Databricks employee» keep answering with the same AI generated reply, when that obviously not is the solution? Has anyone been able to come up with a solution which actually works?

  • 0 kudos
6 More Replies
noorbasha534
by Valued Contributor II
  • 226 Views
  • 2 replies
  • 0 kudos

Databricks job calling DBT - persist job name

Hello all,Is it possible to persist Databricks job name into the Brooklyn audit tables data model when when a Databricks job calls DBT model?Currently, my colleagues persist audit information into fact & dimensional tables of the Brooklyn data model....

  • 226 Views
  • 2 replies
  • 0 kudos
Latest Reply
Yogesh_378691
Contributor
  • 0 kudos

Yes, it’s possible to include the Databricks job name in your Brooklyn audit tables, but it won’t happen automatically. Right now, only the job run ID is being logged, so you’d need to extend your audit logic a bit. One common approach is to pass the...

  • 0 kudos
1 More Replies
auso
by New Contributor
  • 2362 Views
  • 3 replies
  • 2 kudos

Asset Bundles: Shared libraries and notebooks in monorepo multi-bundle setup

I am part of a small team of Data Engineers which started using Databricks Asset Bundles one year ago. Our code base consists of typical ETL-workloads written primarily in Jupyter notebooks (.ipynb), and jobs (.yaml) with our codebase spanning across...

  • 2362 Views
  • 3 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

1. the easiest way to do this is to package your shared librabries into a wheel (suppose you use python).  Like that you do not have to mess with the pythonpath and you can install these libs automatically to any cluster (via policies or dabs or what...

  • 2 kudos
2 More Replies
yit
by Contributor
  • 293 Views
  • 3 replies
  • 2 kudos

Resolved! Autoloader: Trigger batch vs micro-batch (as in .forEachBatch)

Hey everyone,I’m trying to clarify a confusion in AutoLoader regarding trigger batches and micro-batches when using .forEachBatch.Here’s what I understand so far:Trigger batch – Controlled by cloudFiles.maxFilesPerTrigger and cloudFiles.maxBytesPerTr...

Data Engineering
autoloader
batch
micro-batch
spark
  • 293 Views
  • 3 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @yit ,1. They are not quite the same. Trigger batch defines how many new files Auto Loader lists for ingestion per streaming trigger (this is controlled as you correctly pointed out by cloudFiles.maxFilesPerTrigger and cloudFiles.maxBytesPerTrigge...

  • 2 kudos
2 More Replies
xavier_db
by New Contributor III
  • 146 Views
  • 1 replies
  • 0 kudos

Postgress Lakeflow connect

I want to get data from postgress using lakeflow connect for every 10 mins, how to set-up lakeflow connect, can you give step-by-step process, for creating lakeflow connect pipeline?

  • 146 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @xavier_db ,Postgres lakeflow connector is currently in private preview according to below thread:Solved: Lakeflow Connect - Postgres connector - Databricks Community - 127633But the thing is I cannot see it in Workspace Preview and Account Previe...

  • 0 kudos
ck7007
by New Contributor III
  • 293 Views
  • 3 replies
  • 3 kudos

Advanced Technique

Reduced Monthly Databricks Bill from $47K to $12.7KThe Problem: We were scanning 2.3TB for queries needing only 8GB of data.Three Quick Wins1. Multi-dimensional Partitioning (30% savings)# Beforedf.write.partitionBy("date").parquet(path)# After-parti...

  • 293 Views
  • 3 replies
  • 3 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor II
  • 3 kudos

@ck7007 no worries. I asked a question on the other thread: https://community.databricks.com/t5/data-engineering/cost/td-p/130078 , I'm not sure if you're classing this thread as the duplicate or the other one so I'll repost.I didn't see you mention ...

  • 3 kudos
2 More Replies
Pratikmsbsvm
by Contributor
  • 326 Views
  • 2 replies
  • 2 kudos

Resolved! Read Files from Adobe and Push to Delta table ADLS Gen2

The Upstream is sending 2 files of different schema. The Storage Account has Private Endpoints. there is no public access.no public IP (NPIP) = yes.How to design using only Databricks :-1. Databricks API to read data file from Adobe and Push it to AD...

Pratikmsbsvm_0-1756741451588.png
  • 326 Views
  • 2 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @Pratikmsbsvm ,Okay, since you’re going to use Databricks compute for data extraction and you wrote that your workspace is deployed with the secure connectivity cluster (NPIP) option enabled, you first need to make sure that you have a stable egre...

  • 2 kudos
1 More Replies
brian999
by Contributor
  • 3516 Views
  • 5 replies
  • 2 kudos

Resolved! Managing libraries in workflows with multiple tasks - need to configure a list of libs for all tasks

I have workflows with multiple tasks, each of which need 5 different libraries to run. When I have to update those libraries, I have to go in and make the update in each and every task. So for one workflow I have 20 different places where I have to g...

  • 3516 Views
  • 5 replies
  • 2 kudos
Latest Reply
brian999
Contributor
  • 2 kudos

Actually I think I found most of a solution here in one of the replies: https://community.databricks.com/t5/administration-architecture/installing-libraries-on-job-clusters/m-p/37365/highlight/true#M245It seems like I only have to define libs for the...

  • 2 kudos
4 More Replies
guilhermecs001
by New Contributor II
  • 137 Views
  • 1 replies
  • 2 kudos

How to work with 300 billions rows and 5 columns?

Hi guys!I'm having a problem at work where I need to process a customer data dataset with 300 billion rows and 5 columns. The transformations I need to perform are "simple," like joins to assign characteristics to customers. And at the end of the pro...

  • 137 Views
  • 1 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @guilhermecs001 ,Wow, that's massive amount of rows. Can you somehow preprocess first this huge CSV file? For example, read CSV, partition by some columns that makes sense (maybe country from which customer is coming from) and save that data as de...

  • 2 kudos
Sainath368
by New Contributor III
  • 171 Views
  • 1 replies
  • 1 kudos

Is Photon Acceleration Helpful for All Maintenance Tasks (OPTIMIZE, VACUUM, ANALYZE_COMPUTE_STATS)?

Hi everyone,We’re currently reviewing the performance impact of enabling Photon acceleration on our Databricks jobs, particularly those involving table maintenance tasks. Our job includes three main operations: OPTIMIZE, VACUUM, and ANALYZE_COMPUTE_S...

  • 171 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @Sainath368 ,I wouldn't use photon for this kind of task. You should use it primarly for ETL transformations where it shines.VACUUM and OPTIMIZE are more of maintenance tasks and using photon would be pricey overkill here.According to documentatio...

  • 1 kudos
merca
by Valued Contributor II
  • 12432 Views
  • 13 replies
  • 7 kudos

Value array {{QUERY_RESULT_ROWS}} in Databricks SQL alerts custom template

Please include in documentation an example how to incorporate the `QUERY_RESULT_ROWS` variable in the custom template.

  • 12432 Views
  • 13 replies
  • 7 kudos
Latest Reply
CJK053000
New Contributor III
  • 7 kudos

Databricks confirmed this was an issue on their end and it should be resolved now. It is working for me.

  • 7 kudos
12 More Replies
Phani1
by Valued Contributor II
  • 262 Views
  • 2 replies
  • 1 kudos

Resolved! cosmosdb metadata integration with unity catalog

Hi Team,How can we integrate Cosmos DB Meta data with Unity Catalog, can you please provide some insights on this?Regards,Phani

  • 262 Views
  • 2 replies
  • 1 kudos
Latest Reply
Khaja_Zaffer
Contributor
  • 1 kudos

Hello @Phani1 Good day:I have found a whole document on your requirementshttps://community.databricks.com/t5/technical-blog/optimising-data-integration-and-serving-patterns-with-cosmos-db/ba-p/91977 It has a project with it as well. 

  • 1 kudos
1 More Replies
Datalight
by New Contributor III
  • 207 Views
  • 1 replies
  • 0 kudos

Resolved! How to build Data Pipeline to consume data from Adobe Campaign to Azure Databricks

May Techie please help me design the pipeline with Databricks.I don't have any control over Adobe.How to set up a data pipeline that moves csv files from Adobe to ADLS Gen2 via a cron job, using Databricks.where this cron job will execute ? how ADLS ...

  • 207 Views
  • 1 replies
  • 0 kudos
Latest Reply
Khaja_Zaffer
Contributor
  • 0 kudos

Hello @Datalight Good day!Can I please know what do you mean by "you dont have any control over adobe"? I found similar case study over here: https://learn.microsoft.com/en-us/answers/questions/5533633/data-pipeline-to-push-files-from-external-system...

  • 0 kudos
SangNguyen
by New Contributor III
  • 919 Views
  • 8 replies
  • 5 kudos

Resolved! Cannot deploy DAB with the Job branch using a feature branch in Workspace UI

Hi, I tried to deploy DAB on Workspace UI with a feature branch (sf-trans-seq) targeted to Dev. After deploying successfully, the Job branch is, however, using the master branch (see the screenshot below).Is there any option to force the Job branch t...

Issue - DAB Deployment on Workspace UI.png SangNguyen_0-1756138153447.png
  • 919 Views
  • 8 replies
  • 5 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 5 kudos

I agree.Can you mark your (or someone else´s) answer as solved?  Because I think you won´t be the only one with this issue/feature.

  • 5 kudos
7 More Replies
xavier_db
by New Contributor III
  • 223 Views
  • 1 replies
  • 1 kudos

Resolved! Mongodb connection in GCP Databricks

I am trying to connect with Mongodb from databricks which is UC enabled, and both the mongodb and databricks are in same VPC, I am using the below code, df = ( spark.read.format("mongodb") .option( "connection.uri", f'''mongodb://{username}:{password...

  • 223 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @xavier_db ,Standard access mode has more limitations compared to dedicate access mode. For example, look at the limitations list of standard access mode:Standard compute requirements and limitations | Databricks on AWSNow, compare it to dedicated...

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels