cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Brendon_Daugher
by New Contributor II
  • 2203 Views
  • 0 replies
  • 0 kudos

Understanding Dependency Update Failure

Heyooooo!I'm using Azure Databricks and sparklyr to do some geospatial analysis.Before I actually work with Spark Dataframes, I've been using the R packagesstarsandsfto do some preprocessing on my data so that it's easier to interact with later.In or...

  • 2203 Views
  • 0 replies
  • 0 kudos
Anonymous
by Not applicable
  • 13404 Views
  • 0 replies
  • 1 kudos

Yay! It's September! ���� On September 22nd we are hosting another Community Social - we're doing these monthly ! We want to mak...

Yay! It's September! On September 22nd we are hosting another Community Social - we're doing these monthly ! We want to make sure that we all have the chance to connect as a community often. Come network, talk data, and just get social! Join us for o...

  • 13404 Views
  • 0 replies
  • 1 kudos
arthur_wang
by Databricks Partner
  • 5776 Views
  • 2 replies
  • 1 kudos

How does Task Orchestration compare to Airflow (for Databricks-only jobs)?

One of my clients has been orchestration Databricks notebooks using Airflow + REST API. They're curious about the pros/cons of switching these jobs to Databricks jobs with Task Orchestration.I know there are all sorts of considerations - for example,...

  • 5776 Views
  • 2 replies
  • 1 kudos
Latest Reply
Shourya
Databricks Partner
  • 1 kudos

@Kaniz Fatma​ Hello Kaniz, I'm currently working with a major Enterprise Client looking to make the choice between the Airflow vs Databricks for Jobs scheduling. Our Entire code base is in Databricks and we are trying to figure out the complexities t...

  • 1 kudos
1 More Replies
datatello
by New Contributor II
  • 3715 Views
  • 3 replies
  • 1 kudos

Exponentially slower joins using Pyspark

I'm new to Pyspark, but I've stumbled across an odd issue when I perform joins, where the action seems to take exponentially longer every time I add a new join to a function I'm writing.I'm trying to join a dataset of ~3 million records to one of ~17...

  • 3715 Views
  • 3 replies
  • 1 kudos
Latest Reply
Vidula
Databricks Partner
  • 1 kudos

Hi @Lee Bevers​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 1 kudos
2 More Replies
byrdman
by New Contributor III
  • 3601 Views
  • 3 replies
  • 1 kudos

databricks multistage jobs in workflow Params not passing

I am using a multi-stage job calling different notebooks all have the same PARAMNAME that needs to be passed in. one the second and third job, I input the new a different PARAM's value .. but those values do not show up when it runs the task. I...

image image
  • 3601 Views
  • 3 replies
  • 1 kudos
Latest Reply
Prabakar
Databricks Employee
  • 1 kudos

Hi @David Byrd​ this is already a known thing and we have raised it to our engineering team. If you have the same key but different values in the parameters, then its most likely takes the first value for the key and will use the same for all the tas...

  • 1 kudos
2 More Replies
pmt
by New Contributor III
  • 5400 Views
  • 7 replies
  • 1 kudos

Handling Changing Schema in CDC DLT

We are building a DLT pipeline and the autoloader is handling schema evolution fine. However, further down the pipeline we are trying to load that streamed data with the apply_changes() function into a new table and, from the looks of it, doesn't see...

  • 5400 Views
  • 7 replies
  • 1 kudos
Latest Reply
Vidula
Databricks Partner
  • 1 kudos

Hey there @Palani Thangaraj​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear fro...

  • 1 kudos
6 More Replies
liamod
by Databricks Partner
  • 6548 Views
  • 2 replies
  • 8 kudos

Resolved! AWS and Azure on the same Databricks workspace.

Can I use compute instances from different providers i.e. AWS and Azure on the same Databricks workspace?

  • 6548 Views
  • 2 replies
  • 8 kudos
Latest Reply
Prabakar
Databricks Employee
  • 8 kudos

Hi @Liam ODonoghue​ there are a few methods using which you can connect AWS and Azure resources. In such cases, it involves only your accounts. But with Databricks, you need to handle two accounts for both cloud providers. Let's say if you create a w...

  • 8 kudos
1 More Replies
Rayan
by New Contributor II
  • 1618 Views
  • 2 replies
  • 2 kudos

Terraform managing workspaces

Hi , Do we have any references? I am actually looking for managing workspaces and other databricks related stuff with terraform, trying to create in the form of modules so that everytime i need to create a new workspace i just need to call the worksp...

  • 1618 Views
  • 2 replies
  • 2 kudos
Latest Reply
Vidula
Databricks Partner
  • 2 kudos

Hi there @Rayan D​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

  • 2 kudos
1 More Replies
niels
by New Contributor III
  • 2406 Views
  • 2 replies
  • 0 kudos

Why is my Plotly graph squished?

fig = make_subplots(1,4) cols = ['OrderValue', 'TransactionPrice', 'ProductPrice', 'ProductUnits']     for i, col in enumerate(cols): fig.add_trace( go.Histogram(x=silver_df.select(col).toPandas()[col]), row=1, col=i+1 )   p =...

  • 2406 Views
  • 2 replies
  • 0 kudos
Latest Reply
Vidula
Databricks Partner
  • 0 kudos

Hey there @Niels Ota​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.T...

  • 0 kudos
1 More Replies
askme
by New Contributor II
  • 3159 Views
  • 2 replies
  • 2 kudos

Databricks jobs UPdate/Reset API throws an unexpected error

{ "error_code": "INVALID_PARAMETER_VALUE", "message": "Missing required field: job_id"}I have a test job cluster and I need to update the docker image filed with the other version using reset/update job API. I went through the documentation of data b...

  • 3159 Views
  • 2 replies
  • 2 kudos
Latest Reply
Vidula
Databricks Partner
  • 2 kudos

Hey there @radha kilaru​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from yo...

  • 2 kudos
1 More Replies
Venkat1
by New Contributor II
  • 5840 Views
  • 3 replies
  • 1 kudos

Resolved! Error: "Backend service unavailable" when trying to start a cluster in community edition.

I am using community edition of databricks for learning and hands-on projects. However, when I try to create a cluster today, I am getting an error popup- "Backend service unavailable". I would like to know if it is a problem with my account or a bac...

  • 5840 Views
  • 3 replies
  • 1 kudos
Latest Reply
Vidula
Databricks Partner
  • 1 kudos

Hey there @Venkat K​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...

  • 1 kudos
2 More Replies
Tahseen0354
by Valued Contributor
  • 5271 Views
  • 3 replies
  • 1 kudos

Resolved! Can I add custom cluster tag from init script ?

Hi, is it possible to add custom tags from init script during cluster initialization ? We would like to automatically add custom tags whenever someone creates a new cluster in databricks.

  • 5271 Views
  • 3 replies
  • 1 kudos
Latest Reply
Prabakar
Databricks Employee
  • 1 kudos

Hi @Md Tahseen Anam​ I don't think there is a possibility to use an init script for cust tags. But the easiest way is to use cluster policies. You can mention a list of custom tags in the policy so that you can simply add the policy to the cluster wh...

  • 1 kudos
2 More Replies
Paramesh
by New Contributor II
  • 5521 Views
  • 3 replies
  • 2 kudos

Resolved! How to read multiple tiny XML files in parallel

Hi team, we are trying to read multiple tiny XML files, able to parse them using the data bricks XML jar, but is there any way to read these files in parallel and distribute the load across the cluster? right now our job is taking 90% of the time rea...

  • 5521 Views
  • 3 replies
  • 2 kudos
Latest Reply
Paramesh
New Contributor II
  • 2 kudos

Thank you @Hubert Dudek​ for the suggestion. Similar to your recommendation, we added a step in our pipeline to merge the small files to large files and make them available for the spark job.

  • 2 kudos
2 More Replies
Labels