cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

bedrosa
by New Contributor
  • 787 Views
  • 1 replies
  • 0 kudos

How to upload a Spark Dataframe to Azure Table Storage?

Is it possible to make a table in Azure Table Storage based on a Spark Dataframe using Python, any ideas?

  • 787 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @bedrosa, Yes, making a table in Azure Table Storage based on a Spark DataFrame using Python is possible. You can load data from any data source supported by Databricks using Delta Live Tables. You can define data sets (tables and views) in Delta ...

  • 0 kudos
alesventus
by New Contributor III
  • 853 Views
  • 1 replies
  • 0 kudos

Performance issue: Running 50 notebooks from ADF

I have process in Data factory, that loads CDC changes from sql server and then trigger notebook with merge to bronze and silver zone. Single notebook takes about 1 minute to run but when all 50 notebooks are fired at once the whole process takes 25 ...

Data Engineering
performance issue
  • 853 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @alesventus ,  Running multiple notebooks simultaneously can overload the cluster and increase processing time. - The current cluster configuration may not be sufficient for handling 50 resource-intensive notebooks.- Databricks recommends specific...

  • 0 kudos
pygreg
by New Contributor
  • 2143 Views
  • 1 replies
  • 1 kudos

Resolved! Workflows : pass parameters to a "run job" task

Hi folks!I would like to know if there is a way to pass parameters to a "run job" task.For example:Let's have a Job A with:a notebook task A.1 that takes as input a parameter year-month in the format yyyymma "run job" task A.2 that calls a Job BI wou...

  • 2143 Views
  • 1 replies
  • 1 kudos
Latest Reply
btafur
Contributor III
  • 1 kudos

This feature will be available soon as part of Job Parameters. Right now it is not possible to easily pass parameters to a child job.

  • 1 kudos
peterwishart
by New Contributor III
  • 2973 Views
  • 4 replies
  • 0 kudos

Resolved! Programmatically updating the “run_as_user_name” parameter for jobs

I am trying to write a process that will programmatically update the “run_as_user_name” parameter for all jobs in an Azure Databricks workspace, using powershell to interact with the Jobs API. I have been trying to do this with a test job without suc...

  • 2973 Views
  • 4 replies
  • 0 kudos
Latest Reply
baubleglue
New Contributor II
  • 0 kudos

  Solution you've submitted is a solution for different topic (permission to run job, the job still runs as the user in run_as_user_name field). Here is an example of changing "run_as_user_name"Docs:https://docs.databricks.com/api/azure/workspace/job...

  • 0 kudos
3 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 1406 Views
  • 1 replies
  • 0 kudos

Spark Configuration Parameter for Cluster Downscaling

spark.databricks.aggressiveWindowDownS This parameter is designed to determine the frequency, in seconds, at which the cluster decides to downscale.By adjusting this setting, you can fine-tune how rapidly clusters release workers. A higher value will...

  • 1406 Views
  • 1 replies
  • 0 kudos
Latest Reply
Haiyangl104
New Contributor III
  • 0 kudos

I wish there was a configuration to toggle upscaling behavior. I want the clusters to scale up only if the bottleneck is approaching 70% memory usage. Currently the autoscaling is only based on CPU not Memory (RAM).

  • 0 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 18084 Views
  • 10 replies
  • 30 kudos

Selenium chrome driver on databricks driver On the databricks community, I see repeated problems regarding the selenium installation on the databricks...

Selenium chrome driver on databricks driverOn the databricks community, I see repeated problems regarding the selenium installation on the databricks driver. Installing selenium on databricks can be surprising, but for example, sometimes we need to g...

init install_library results import
  • 18084 Views
  • 10 replies
  • 30 kudos
Latest Reply
aa_204
New Contributor II
  • 30 kudos

@Hubert Dudek​ : I am trying to run the above script but my chrome driver installation is failing intermittently . Can you please sugget some solution.

  • 30 kudos
9 More Replies
SaraCorralLou
by New Contributor III
  • 1017 Views
  • 1 replies
  • 0 kudos

Clean driver during notebook execution

Is there any way to clear the memory driver during the execution of my notebook? I have several functions that are executed in the driver and that generate in it different dataframes that are not necessary (these dataframes are created just to do som...

  • 1017 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

since spark uses lazy execution, those dataframes you do not need cannot be cleared unless you do use them (why define them otherwise?).So when doing an action, spark will execute all code that is necessary.  If you run into memory issues, you can do...

  • 0 kudos
Biswa9
by New Contributor
  • 1012 Views
  • 1 replies
  • 0 kudos

Why I am getting a parse syntax error while running SQL?

Can anyone please help me what is wrong here in the syntax: dbutils.fs.ls;Error in SQL statement: ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near 'dbutils'.(line 1, pos 0)

  • 1012 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Biswa9 , dbutils.fs.ls is typically used in Databricks Notebooks or scripts, and it's not a valid SQL statement. To list files and directories in the Databricks File System (DBFS), you should use it within a Databricks Notebook or script, not in ...

  • 0 kudos
GrossoKubo
by New Contributor II
  • 1239 Views
  • 2 replies
  • 3 kudos

Resolved! Export Jobs Schedulation

Hello, I have to export all my notebooks from DEV to PROD.My problem is that I can't find a way to export my jobs (not the outputs, the actual notebook schedulation), is it even possible? I have hundreds of jobs to export and have to keep the same pa...

  • 1239 Views
  • 2 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @GrossoKubo, We haven't heard from you since the last response from @btafur, and I was checking back to see if his suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to others.Also, P...

  • 3 kudos
1 More Replies
Greg
by New Contributor III
  • 1232 Views
  • 1 replies
  • 4 kudos

How to reduce storage space consumed by delta with many updates

I have 1 delta table that I continuously append events into, and a 2nd delta table that I continuously merge into (streamed from the 1st table) that has unique ID's where properties are updated from the events (An ID represents a unique thing that ge...

  • 1232 Views
  • 1 replies
  • 4 kudos
Latest Reply
Jb11
New Contributor II
  • 4 kudos

Did you already solved this problem?

  • 4 kudos
bfridley
by New Contributor II
  • 1945 Views
  • 2 replies
  • 0 kudos

DLT Pipeline Out Of Memory Errors

I have a DLT pipeline that has been running for weeks. Now, trying to rerun the pipeline with the same code and same data fails. I've even tried updating the compute on the cluster to about 3x of what was previously working and it still fails with ou...

bfridley_1-1695328329708.png bfridley_2-1695328372419.png
  • 1945 Views
  • 2 replies
  • 0 kudos
Latest Reply
rajib_bahar_ptg
New Contributor III
  • 0 kudos

I'd focus on understanding the codebase first. It'll help you decide what logic or data asset to keep or not keep when you try to optimize it. If you share the architecture of the application, the problem it solves, and some sample code here, it'll h...

  • 0 kudos
1 More Replies
gkrilis
by New Contributor
  • 3621 Views
  • 1 replies
  • 0 kudos

How to stop SparkSession within notebook without errr

I want to run an ETL job and when the job ends I would like to stop SparkSession to free my cluster's resources, by doing this I could avoid restarting the cluster, but when calling spark.stop() the job returns with status failed even though it has f...

Data Engineering
cluster
SparkSession
  • 3621 Views
  • 1 replies
  • 0 kudos
Latest Reply
PremadasV
New Contributor II
  • 0 kudos

Please refer to this Job fails, but Apache Spark tasks finish - Databricks

  • 0 kudos
Martin1
by New Contributor II
  • 6967 Views
  • 3 replies
  • 1 kudos

Referring to Azure Keyvault secrets in spark config

Hi allIn spark config for a cluster, it works well to refer to a Azure Keyvault secret in the "value" part of the name/value combo on a config row/setting.For example, this works fine (I've removed the string that is our specific storage account name...

  • 6967 Views
  • 3 replies
  • 1 kudos
Latest Reply
kp12
New Contributor II
  • 1 kudos

Hello,Is there any update on this issue please? Databricks no longer recommend mounting external location, so the other way to access Azure storage is to use spark config as mentioned in this document - https://learn.microsoft.com/en-us/azure/databri...

  • 1 kudos
2 More Replies
marvin1
by New Contributor III
  • 118 Views
  • 0 replies
  • 0 kudos

Bamboolib error

What is the status of bamboolib?  I understand that it is public preview but I'm unable to find any support references.  I am getting error below.  I've tried installing in a notebook, on a cluster, creating a pandas dataframe and running bam, etc.  ...

  • 118 Views
  • 0 replies
  • 0 kudos
mbvb_py
by New Contributor II
  • 2128 Views
  • 4 replies
  • 0 kudos

Create cluster error: Backend service unavailable

hello,i'm new to Databricks (community edition account) and encountered a problem just now.When creating a new cluster (default 10.4 LTS) it fails with the following error: Backend service unavailable.I've tried a different runtime > same issue.I've ...

  • 2128 Views
  • 4 replies
  • 0 kudos
Latest Reply
stefnhuy
New Contributor III
  • 0 kudos

Hey mbvb_py,I'm sorry to hear you're facing this "Backend service unavailable" issue with Databricks. I've encountered similar problems in the past, and it can be frustrating. Don't worry; you're not alone in this!From my experience, this error can o...

  • 0 kudos
3 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels