cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

guilhermecs001
by New Contributor
  • 66 Views
  • 1 replies
  • 2 kudos

How to work with 300 billions rows and 5 columns?

Hi guys!I'm having a problem at work where I need to process a customer data dataset with 300 billion rows and 5 columns. The transformations I need to perform are "simple," like joins to assign characteristics to customers. And at the end of the pro...

  • 66 Views
  • 1 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @guilhermecs001 ,Wow, that's massive amount of rows. Can you somehow preprocess first this huge CSV file? For example, read CSV, partition by some columns that makes sense (maybe country from which customer is coming from) and save that data as de...

  • 2 kudos
felix4572
by New Contributor
  • 130 Views
  • 6 replies
  • 2 kudos

transformWithStateInPandas throws "Spark connect directory is not ready" error

Hello,we employ arbitrary stateful aggregations in our data processing streams on Azure Databricks, and would like to migrate from applyInPandasWithState to transformWithStateInPandas. We employ the Python API throughout our solution, and some of our...

felix4572_0-1756710186921.png
  • 130 Views
  • 6 replies
  • 2 kudos
Latest Reply
Advika
Databricks Employee
  • 2 kudos

Hello @felix4572! Could you please share the driver log, or even better, the executor log (without any sensitive details)?

  • 2 kudos
5 More Replies
DataDev
by New Contributor
  • 84 Views
  • 4 replies
  • 3 kudos

Schedule databricks job based on custom calendar

I want to schedule the databricks jobs based on the custom calender, like skip the job run on random days or holidays.#databricks @DataBricks @DATA 

  • 84 Views
  • 4 replies
  • 3 kudos
Latest Reply
Pilsner
Contributor
  • 3 kudos

Hello @DataDev Nice idea, I haven't thought about this before, but I like the suggestion.If I had to implement a custom schedule, there are two ways that come to mind.Firstly, if the schedule is relatively regular, with just an occasional day missed,...

  • 3 kudos
3 More Replies
Sainath368
by New Contributor III
  • 91 Views
  • 1 replies
  • 0 kudos

Is Photon Acceleration Helpful for All Maintenance Tasks (OPTIMIZE, VACUUM, ANALYZE_COMPUTE_STATS)?

Hi everyone,We’re currently reviewing the performance impact of enabling Photon acceleration on our Databricks jobs, particularly those involving table maintenance tasks. Our job includes three main operations: OPTIMIZE, VACUUM, and ANALYZE_COMPUTE_S...

  • 91 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Sainath368 ,I wouldn't use photon for this kind of task. You should use it primarly for ETL transformations where it shines.VACUUM and OPTIMIZE are more of maintenance tasks and using photon would be pricey overkill here.According to documentatio...

  • 0 kudos
merca
by Valued Contributor II
  • 12033 Views
  • 13 replies
  • 7 kudos

Value array {{QUERY_RESULT_ROWS}} in Databricks SQL alerts custom template

Please include in documentation an example how to incorporate the `QUERY_RESULT_ROWS` variable in the custom template.

  • 12033 Views
  • 13 replies
  • 7 kudos
Latest Reply
CJK053000
New Contributor III
  • 7 kudos

Databricks confirmed this was an issue on their end and it should be resolved now. It is working for me.

  • 7 kudos
12 More Replies
dbdev
by New Contributor II
  • 568 Views
  • 8 replies
  • 3 kudos

Maven libraries in VNet injected, UC enabled workspace on Standard Access Mode Cluster

Hi!As the title suggests, I want to install Maven libaries on my cluster with access mode 'Standard'. Our workspace is VNet injected and has Unity Catalog enabled.The coordinates have been allowlisted by the account team according to these instructio...

dbdev_1-1756137297433.png dbdev_2-1756137354610.png dbdev_3-1756137433510.png
  • 568 Views
  • 8 replies
  • 3 kudos
Latest Reply
dbdev
New Contributor II
  • 3 kudos

@nayan_wylde @szymon_dybczak I just tried using a JAR I uploaded to an allowlisted Volume (ojdbc8 of oracle) and I get the same error. it seems like I'm able to install an JAR, but once it's installed my cluster is broken.

  • 3 kudos
7 More Replies
shan-databricks
by New Contributor III
  • 62 Views
  • 2 replies
  • 2 kudos

How to load all the previous day's data only into the newly added column of the existing delta table

How to load all the previous day's data only into the newly added column of the existing delta table? Is there any option available to do that without writing any logic?

  • 62 Views
  • 2 replies
  • 2 kudos
Latest Reply
BS_THE_ANALYST
Honored Contributor III
  • 2 kudos

@shan-databricks there's certainly ways for the schema to evolve within your delta tables that's supported out of the box: https://docs.databricks.com/aws/en/delta/update-schema#enable-schema-evolution To update older records, they'd likely have NULL...

  • 2 kudos
1 More Replies
Phani1
by Valued Contributor II
  • 102 Views
  • 2 replies
  • 1 kudos

cosmosdb metadata integration with unity catalog

Hi Team,How can we integrate Cosmos DB Meta data with Unity Catalog, can you please provide some insights on this?Regards,Phani

  • 102 Views
  • 2 replies
  • 1 kudos
Latest Reply
Khaja_Zaffer
Contributor
  • 1 kudos

Hello @Phani1 Good day:I have found a whole document on your requirementshttps://community.databricks.com/t5/technical-blog/optimising-data-integration-and-serving-patterns-with-cosmos-db/ba-p/91977 It has a project with it as well. 

  • 1 kudos
1 More Replies
Datalight
by New Contributor II
  • 68 Views
  • 1 replies
  • 0 kudos

Resolved! How to build Data Pipeline to consume data from Adobe Campaign to Azure Databricks

May Techie please help me design the pipeline with Databricks.I don't have any control over Adobe.How to set up a data pipeline that moves csv files from Adobe to ADLS Gen2 via a cron job, using Databricks.where this cron job will execute ? how ADLS ...

  • 68 Views
  • 1 replies
  • 0 kudos
Latest Reply
Khaja_Zaffer
Contributor
  • 0 kudos

Hello @Datalight Good day!Can I please know what do you mean by "you dont have any control over adobe"? I found similar case study over here: https://learn.microsoft.com/en-us/answers/questions/5533633/data-pipeline-to-push-files-from-external-system...

  • 0 kudos
SangNguyen
by New Contributor II
  • 528 Views
  • 8 replies
  • 5 kudos

Resolved! Cannot deploy DAB with the Job branch using a feature branch in Workspace UI

Hi, I tried to deploy DAB on Workspace UI with a feature branch (sf-trans-seq) targeted to Dev. After deploying successfully, the Job branch is, however, using the master branch (see the screenshot below).Is there any option to force the Job branch t...

Issue - DAB Deployment on Workspace UI.png SangNguyen_0-1756138153447.png
  • 528 Views
  • 8 replies
  • 5 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 5 kudos

I agree.Can you mark your (or someone else´s) answer as solved?  Because I think you won´t be the only one with this issue/feature.

  • 5 kudos
7 More Replies
xavier_db
by New Contributor
  • 54 Views
  • 1 replies
  • 0 kudos

Mongodb connection in GCP Databricks

I am trying to connect with Mongodb from databricks which is UC enabled, and both the mongodb and databricks are in same VPC, I am using the below code, df = ( spark.read.format("mongodb") .option( "connection.uri", f'''mongodb://{username}:{password...

  • 54 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @xavier_db ,Standard access mode has more limitations compared to dedicate access mode. For example, look at the limitations list of standard access mode:Standard compute requirements and limitations | Databricks on AWSNow, compare it to dedicated...

  • 0 kudos
fix_databricks
by New Contributor II
  • 3476 Views
  • 3 replies
  • 0 kudos

Cannot run another notebook from same directory

Hello, I am having a similar problem from this thread which was never resolved: https://community.databricks.com/t5/data-engineering/unexpected-error-while-calling-notebook-string-matching-regex-w/td-p/18691 I renamed a notebook (utility_data_wrangli...

  • 3476 Views
  • 3 replies
  • 0 kudos
Latest Reply
ddundovic
New Contributor III
  • 0 kudos

I am running into the same issue. It seems like the `%run` magic command is trying to parse the entire cell content as its arguments. So if you have%run "my_notebook" print("hello")in the same cell, you will get the following error: `Failed to parse...

  • 0 kudos
2 More Replies
Anubhav2011
by New Contributor II
  • 302 Views
  • 2 replies
  • 1 kudos

What is the Power of DLT Pipeline to read streaming data

I am getting thousands of records every second in my bronze table from Qlik and every second the bronze table is getting truncated and load with new data by Qlik itself. How do I process this much data every second to my silver streaming table before...

  • 302 Views
  • 2 replies
  • 1 kudos
Latest Reply
ManojkMohan
Contributor III
  • 1 kudos

Core ProblemBronze table is not append-only, but truncate + insert every second.DLT (Delta Live Tables) in continuous mode assumes append-only streaming sources (like Kafka).Because Qlik wipes and replaces data every second, DLT cannot guarantee no d...

  • 1 kudos
1 More Replies
Raj_DB
by New Contributor III
  • 364 Views
  • 9 replies
  • 12 kudos

Resolved! Pass Notebook parameters dynamically in Job task.

Hi Everyone, I'm working on scheduling a job and would like to pass parameters that I've defined in my notebook. Ideally, I'd like these parameters to be dynamic meaning that if I update their values in the notebook, the scheduled job should automati...

Raj_DB_0-1756383510542.png
  • 364 Views
  • 9 replies
  • 12 kudos
Latest Reply
ck7007
New Contributor II
  • 12 kudos

I see you're using dbutils.widgets. text and dropdown—perfect! You're already on the right track.Quick SolutionYour widgets are already dynamic! Just pass parameters in your job configuration:In your notebook (slight refactor of your code):# Define w...

  • 12 kudos
8 More Replies
Erik
by Valued Contributor III
  • 17101 Views
  • 13 replies
  • 8 kudos

Grafana + databricks = True?

We have some timeseries in databricks, and we are reading them into powerbi through sql compute endpoints. For timeseries powerbi is ... not optimal. Earlier I have used grafana with various backends, and quite like it, but I cant find any way to con...

  • 17101 Views
  • 13 replies
  • 8 kudos
Latest Reply
frugson
New Contributor
  • 8 kudos

@Erik wrote:We have some timeseries in databricks, and we are reading them into powerbi through sql compute endpoints. For timeseries powerbi is ... not optimal. Earlier I have used grafana with various backends, and quite like it, but I cant find an...

  • 8 kudos
12 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels