Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
One of my clients has been orchestration Databricks notebooks using Airflow + REST API. They're curious about the pros/cons of switching these jobs to Databricks jobs with Task Orchestration.I know there are all sorts of considerations - for example,...
@Kaniz Fatma Hello Kaniz, I'm currently working with a major Enterprise Client looking to make the choice between the Airflow vs Databricks for Jobs scheduling. Our Entire code base is in Databricks and we are trying to figure out the complexities t...
I'm new to Pyspark, but I've stumbled across an odd issue when I perform joins, where the action seems to take exponentially longer every time I add a new join to a function I'm writing.I'm trying to join a dataset of ~3 million records to one of ~17...
Hi @Lee Bevers Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!
I am using a multi-stage job calling different notebooks all have the same PARAMNAME that needs to be passed in. one the second and third job, I input the new a different PARAM's value .. but those values do not show up when it runs the task. I...
Hi @David Byrd this is already a known thing and we have raised it to our engineering team. If you have the same key but different values in the parameters, then its most likely takes the first value for the key and will use the same for all the tas...
We are building a DLT pipeline and the autoloader is handling schema evolution fine. However, further down the pipeline we are trying to load that streamed data with the apply_changes() function into a new table and, from the looks of it, doesn't see...
Hey there @Palani Thangaraj Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear fro...
Hi @Liam ODonoghue there are a few methods using which you can connect AWS and Azure resources. In such cases, it involves only your accounts. But with Databricks, you need to handle two accounts for both cloud providers. Let's say if you create a w...
Hi , Do we have any references? I am actually looking for managing workspaces and other databricks related stuff with terraform, trying to create in the form of modules so that everytime i need to create a new workspace i just need to call the worksp...
Hi there @Rayan D Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...
fig = make_subplots(1,4)
cols = ['OrderValue', 'TransactionPrice', 'ProductPrice', 'ProductUnits']
for i, col in enumerate(cols):
fig.add_trace(
go.Histogram(x=silver_df.select(col).toPandas()[col]),
row=1, col=i+1
)
p =...
Hey there @Niels Ota Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.T...
{ "error_code": "INVALID_PARAMETER_VALUE", "message": "Missing required field: job_id"}I have a test job cluster and I need to update the docker image filed with the other version using reset/update job API. I went through the documentation of data b...
Hey there @radha kilaru Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from yo...
I am getting error while executing scripts 6.1 onward with various error messages. All the earlier scripts were fine but not any more. I am not sure what is the problem and whom to contact. Multiple images are attached below from examples 6.1 and 6.2...
I am using community edition of databricks for learning and hands-on projects. However, when I try to create a cluster today, I am getting an error popup- "Backend service unavailable". I would like to know if it is a problem with my account or a bac...
Hey there @Venkat K Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...
Hi, is it possible to add custom tags from init script during cluster initialization ? We would like to automatically add custom tags whenever someone creates a new cluster in databricks.
Hi @Md Tahseen Anam I don't think there is a possibility to use an init script for cust tags. But the easiest way is to use cluster policies. You can mention a list of custom tags in the policy so that you can simply add the policy to the cluster wh...
Hi team, we are trying to read multiple tiny XML files, able to parse them using the data bricks XML jar, but is there any way to read these files in parallel and distribute the load across the cluster? right now our job is taking 90% of the time rea...
Thank you @Hubert Dudek for the suggestion. Similar to your recommendation, we added a step in our pipeline to merge the small files to large files and make them available for the spark job.
I need to export some data from the database to csv which will be downloaded to another application. What would be the procedure for that? I don't have a lot of knowledge in DataBricks and I didn't find much information in the documentation.Thanks.
You can manually download data to your local in CSV from databricks notebook cell and pass it to your another application.Your application can run Databricks notebook inside a workflow via an API that writes data to S3 bucket in CSV and in response y...
Hi @rajat kumar Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...
I am attempting to load an excel file that's located in a blob storage that I've mounted. In the first cell, when I use the dbutils.fs.ls command, I can see the file I want to load. However, when I try to actually load it, it can't find the file. It ...
Hi @Niels Ota Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.