cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ChristianWuerdi
by New Contributor III
  • 18271 Views
  • 4 replies
  • 5 kudos

Resolved! How can I backup my Databricks instance?

We have a Databricks instance on Azure that has somewhat organically grow with dozens of users and hundreds of notebooks. How do I conveniently backup this env so in case disaster strikes the notebooks aren't lost? The data itself is backed by Azure ...

  • 18271 Views
  • 4 replies
  • 5 kudos
Latest Reply
ChristianWuerdi
New Contributor III
  • 5 kudos

@Kaniz Fatma​ All good thanks, combination of CLI + gradually migrating everything to git is a viable solution

  • 5 kudos
3 More Replies
StephanieAlba
by Databricks Employee
  • 9421 Views
  • 2 replies
  • 5 kudos

Resolved! How to add a select all option in a Databricks SQL parameter? I would like to use a query-based drop-down list.

So I want to create a select all button in a parameter. The actual parameter has around 200 options because of the size of the database. However, if I want a general summary where you can see all the options I would have to select one by one and that...

  • 9421 Views
  • 2 replies
  • 5 kudos
Latest Reply
StephanieAlba
Databricks Employee
  • 5 kudos

You could add '--- All Stores ---' to your list. Here is the query I would use to populate the drop-down. S.O. answer hereSELECT store as store_name FROM ( Select Distinct store From Table   UNION ALL   SELECT ...

  • 5 kudos
1 More Replies
pantelis_mare
by Contributor III
  • 7410 Views
  • 4 replies
  • 5 kudos

Resolved! Slow imports for concurrent notebooks

Hello all,I have a large number of light notebooks to run so I am taking the concurrent approach launching notebook runs with dbutils.notebook.run in parallel. The more I increase parallelism the more I see the duration of each notebook increasing.I ...

  • 7410 Views
  • 4 replies
  • 5 kudos
Latest Reply
pantelis_mare
Contributor III
  • 5 kudos

Hello @Kaniz Fatma​ yes it is clear.Following some tests on my side using a ***** notebook that all it does is importing stuff and sleeping for 15 secs (so nothing to do with spark) I figured that even with a 32 cores driver, the fatigue point is clo...

  • 5 kudos
3 More Replies
Anonymous
by Not applicable
  • 3857 Views
  • 3 replies
  • 2 kudos

Resolved! JOB API KEEPS SAYING THE JOB IS RUNNING

I have a library that waits until the job goes in the "TERMINATED" / "SKIPPED" state before continuing. It pools the JOB API.Unfortunately, I'm experiencing cases where the job is terminated on the GUI but the API still keeps saying "RUNNING".There i...

  • 3857 Views
  • 3 replies
  • 2 kudos
Latest Reply
Prabakar
Databricks Employee
  • 2 kudos

@Alessio Palma​ could you please provide the API that you are using? Also share some sample output and logs that would help us with some information.

  • 2 kudos
2 More Replies
Serhii
by Contributor
  • 4306 Views
  • 2 replies
  • 6 kudos

Resolved! DBFS FileStore html document not showing in the browser

hello all! I am using the guide https://docs.databricks.com/data/filestore.html to save folder of static html content to the DBFS FileStore directory (as a sub-directory) and have "enable DBFS web browsing" setting on but still I can't view the web p...

  • 4306 Views
  • 2 replies
  • 6 kudos
Latest Reply
Prabakar
Databricks Employee
  • 6 kudos

@Sergii Ivakhno​ In FileStore you can save files, such as images and libraries, that are accessible within HTML and JavaScript when you call displayHTML. However when you try to access the link it will download the file to your local desktop.

  • 6 kudos
1 More Replies
my_community2
by New Contributor III
  • 11185 Views
  • 8 replies
  • 1 kudos

Running notebooks on DataBricks in Azure blowing up all over since morning of Apr 5 (MST). Was there another poor deployment at DataBricks? This reall...

Running notebooks on DataBricks in Azure blowing up all over since morning of Apr 5 (MST). Was there another poor deployment at DataBricks? This really needs to stop. We are running premium DataBricks on Azure and calling notebooks from ADF.10.2 (inc...

image
  • 11185 Views
  • 8 replies
  • 1 kudos
Latest Reply
Prabakar
Databricks Employee
  • 1 kudos

@Maciej G​ try using the below init script to increase the repl timeout.-------------------------------------- #!/bin/bash cat > /databricks/common/conf/set_repl_timeout.conf << EOL {  databricks.daemon.driver.launchTimeout = 150 }EOL----------------...

  • 1 kudos
7 More Replies
mo91
by New Contributor III
  • 6913 Views
  • 4 replies
  • 9 kudos

Resolved! Community edition - RestException: PERMISSION_DENIED: Model Registry is not enabled for organization 2183541758974102.

Currently running this cmmd:-model_name = "Quality"model_version = mlflow.register_model(f"runs:/{run_id}/random_forest_model", model_name)# Registering the model takes a few seconds, so add a small delaytime.sleep(15)however I get this error:-RestEx...

  • 6913 Views
  • 4 replies
  • 9 kudos
Latest Reply
Prabakar
Databricks Employee
  • 9 kudos

@Martin Olowe​ There are certain limitations with the community edition and you do not have this feature there. To use this you need to go with the commercial version of Databricks as mentioned by @Hubert Dudek​ .

  • 9 kudos
3 More Replies
harish_s
by New Contributor II
  • 7628 Views
  • 3 replies
  • 4 kudos

Resolved! Hi, I get the following error when I enable model serving for spacy model via MLFLOW.

+ echo 'GUNICORN_CMD_ARGS=--timeout 63 --workers 4 'GUNICORN_CMD_ARGS=--timeout 63 --workers 4 + mlflow models serve --no-conda -m /tmp/tmp1a4ltdrk/spacymodelv1 -h unix:/tmp/3.sock -p12022/03/01 08:26:37 INFO mlflow.models.cli: Selected backend for f...

  • 7628 Views
  • 3 replies
  • 4 kudos
Latest Reply
Prabakar
Databricks Employee
  • 4 kudos

Hi @Harish S​ this error could happen if the backend services are not updated. Are you doing this test in a PVC environment or a standard workspace?

  • 4 kudos
2 More Replies
rsp334
by New Contributor II
  • 2231 Views
  • 0 replies
  • 3 kudos

Databricks quickstart cloudformation error

Anyone recently encountered the following error in cloudformation stack while attempting to create a databricks quickstart workspace in AWS?[ERROR] 2022-05-17T16:25:35.920Z 6593c6c0-677c-4918-bcb2-0f5fc9a1c482 Exception: An error occurred (AccessDen...

  • 2231 Views
  • 0 replies
  • 3 kudos
Doaa_Rashad
by New Contributor III
  • 14733 Views
  • 7 replies
  • 8 kudos

Resolved! import Github repo into Databricks

I am trying to import some data from a public repo in GitHub so that to use it from my Databricks notebooks.So far I tried to connect my Databricks account with my GitHub as described here, without results though since it seems that GitHub support co...

image.png image.png
  • 14733 Views
  • 7 replies
  • 8 kudos
Latest Reply
User16753725182
Databricks Employee
  • 8 kudos

Hi @Doaa MohamedRashad​ , To access this setting, you must be an Admin.Please check if you have 'Repos' enabled in the Admin Console --> Workspace settings--> Repos. 

  • 8 kudos
6 More Replies
FRG96
by New Contributor III
  • 28870 Views
  • 4 replies
  • 7 kudos

Resolved! How to programmatically get the Spark Job ID of a running Spark Task?

In Spark we can get the Spark Application ID inside the Task programmatically using:SparkEnv.get.blockManager.conf.getAppIdand we can get the Stage ID and Task Attempt ID of the running Task using:TaskContext.get.stageId TaskContext.get.taskAttemptId...

  • 28870 Views
  • 4 replies
  • 7 kudos
Latest Reply
FRG96
New Contributor III
  • 7 kudos

Hi @Gaurav Rupnar​ , I have Spark SQL UDFs (implemented as Scala methods) in which I want to get the details of the Spark SQL query that called the UDF, especially a unique query ID, which in SparkSQL is the Spark Job ID. That's why I wanted a way to...

  • 7 kudos
3 More Replies
Bharat105
by New Contributor
  • 1311 Views
  • 0 replies
  • 0 kudos

Unable to complete signup

I am trying signup on databricks for my organization use . I am unable to complete as i am not receiving any mail.Please help ​

  • 1311 Views
  • 0 replies
  • 0 kudos
Thom
by New Contributor
  • 745 Views
  • 0 replies
  • 0 kudos

There seems to be missing lesson files in the repo I downloaded for the Data Engineering with Databricks course. The lesson Advanced SQL Transformati...

There seems to be missing lesson files in the repo I downloaded for the Data Engineering with Databricks course. The lesson Advanced SQL Transformations refers to files that aren't in the repo. One or two other lessons were missing as well.

  • 745 Views
  • 0 replies
  • 0 kudos
NicolasJ
by New Contributor
  • 5821 Views
  • 0 replies
  • 0 kudos

How to use Apache Sedona on Delta Live tables?

Hello,I am trying to run some geospatial transformations in Delta Live Table, using Apache Sedona.I tried defining a minimal example pipeline demonstrating the problem I encounter.First cell of my Notebook, I install apache-sedona Python package:%pip...

image
  • 5821 Views
  • 0 replies
  • 0 kudos
alejandrofm
by Valued Contributor
  • 3961 Views
  • 1 replies
  • 5 kudos

Resolved! How to set a global checkpoint for all notebooks?

I have several users doing data analysis on Databricks Spark notebooks, everything is smooth, now I want to make sure that the checkpointdir is configured on the cluster start, so every user doesn't had to set it on the Notebook (ending up in a lot o...

image
  • 3961 Views
  • 1 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 5 kudos

@Alejandro Martinez​ , For streaming jobs, there are, but others couldn't find them. Here are spark conf Configuration - Spark 3.2.1 Documentation (apache.org)spark.sql.streaming.checkpointLocation

  • 5 kudos
Labels