Data Engineering

Forum Posts

Sorted by:

by kinsun • New Contributor II

04-27-2023 8:24:07 AM

5744 Views
2 replies
3 kudos

Resolved! Azure Key Vault Keys client library for Python - keys list permission issue

Dear Databricks ExpertI am trying to get a key which is stored in the Azure Key Vault, using Azure Key Vault Keys client library for Python. However error was met.Python Code:#from azure.identity import DefaultAzureCredentialfrom azure.identity impor...

Data Engineering

5744 Views
2 replies
3 kudos

04-27-2023 8:24:07 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-28-2023 10:39:51 PM

3 kudos

Hi @KS LAU Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your q...

3 kudos

04-28-2023 10:39:51 PM

1 More Replies

by AnuVat • New Contributor III

02-02-2023 5:19:47 PM

53178 Views
7 replies
13 kudos

Resolved! How to read data from a table into a dataframe outside of Databricks environment?

Hi, I am working on an ML project and I need to access the data in tables hosted in my Databricks cluster through a notebook that I am running locally. This has been very easy while I run the notebooks in Databricks but I cannot figure out how to do ...

Data Engineering

53178 Views
7 replies
13 kudos

02-02-2023 5:19:47 PM

View Replies

Latest Reply

chakri
New Contributor III

04-28-2023 10:04:36 PM

13 kudos

We can use Apis and pyodbc to achieve this. Once go through the official documentation of databricks that might be helpful to access outside of the databricks environment.

13 kudos

04-28-2023 10:04:36 PM

6 More Replies

by sensanjoy • Contributor II

04-27-2023 10:27:32 AM

6619 Views
3 replies
1 kudos

Resolved! Loading data from dataframe to Azure Storage Queue/Message Queue.

Hi Experts,We do have one use case where we have batch load that create a dataframe at end and now we want to load this data at Azure Storage Queue/Message Queue so that some Rest API can read the data/messages from the queue later and process it acc...

Data Engineering

6619 Views
3 replies
1 kudos

04-27-2023 10:27:32 AM

View Replies

Latest Reply

sensanjoy
Contributor II

04-28-2023 9:26:48 AM

1 kudos

@Suteja Kanuri looking for your input here. Thanks.

1 kudos

04-28-2023 9:26:48 AM

2 More Replies

by daffersoo • New Contributor

04-28-2023 3:15:57 PM

899 Views
0 replies
0 kudos

test

txt

Data Engineering

899 Views
0 replies
0 kudos

04-28-2023 3:15:57 PM

by arlok • New Contributor

03-25-2023 8:01:52 PM

6978 Views
4 replies
1 kudos

Partner session - Data Engineer Associate Course Schedule help

I have been wanting to enroll into the Data Engineer Associate Course as my co is a Databricks partner. I have been unsuccessful thusfar whenever the session has happened in North America and have always been waitlisted.1.Is there a hack to get a slo...

Data Engineering

6978 Views
4 replies
1 kudos

03-25-2023 8:01:52 PM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

04-28-2023 2:25:21 PM

1 kudos

Hi @Lokesh AR,Just a friendly follow-up. Do you still need help? Please let us know if you still need help

1 kudos

04-28-2023 2:25:21 PM

3 More Replies

by SrinuM • New Contributor III

04-28-2023 4:34:03 AM

1823 Views
1 replies
0 kudos

Not ab;e to login into databricks community edition

I forgot my password and when i am trying to reset my password , it was struck at reset password page.Could you please help me with it.

Data Engineering

1823 Views
1 replies
0 kudos

04-28-2023 4:34:03 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

04-28-2023 2:14:33 PM

0 kudos

Adding @Vidula Khanna and @Kaniz Fatma for visibility to help with this request

0 kudos

04-28-2023 2:14:33 PM

by jasperputs • New Contributor III

04-17-2023 4:45:40 AM

18543 Views
5 replies
0 kudos

Databricks SQL Dashboard refresh not updating

I am trying to create a SQL Dashboard on top of a streaming dataset (Delta format). I created multiple queries referencing the file on the datalake, not a hive table. With these queries I created multiple visualizations in a Databricks SQL Dashboard....

Data Engineering

18543 Views
5 replies
0 kudos

04-17-2023 4:45:40 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-18-2023 1:58:34 AM

0 kudos

@Jasper Puts :Here are some potential solutions or next steps you can try:Check the query refresh rate: Confirm that the queries are also set to refresh every minute or less. If the queries are not refreshing at the same rate as the dashboard, this ...

0 kudos

04-18-2023 1:58:34 AM

4 More Replies

by MarsSu • New Contributor II

04-20-2023 7:36:38 PM

11431 Views
5 replies
1 kudos

Resolved! Databricks job about spark structured streaming zero downtime deployment in terraform.

I would like to ask how to implement zero downtime deployment of spark structured streaming in databricks job compute with terraform. Because we will upgrade spark application code version. But currently we found every deployment will cancel original...

Data Engineering

11431 Views
5 replies
1 kudos

04-20-2023 7:36:38 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-25-2023 10:22:00 PM

1 kudos

@Mars Su :Yes, you can implement zero downtime deployment of Spark Structured Streaming in Databricks job compute using Terraform. One way to achieve this is by using Databricks' "job clusters" feature, which allows you to create a cluster specifica...

1 kudos

04-25-2023 10:22:00 PM

4 More Replies

by MRTN • Contributor

04-27-2023 12:31:46 AM

2700 Views
1 replies
1 kudos

Columns archive_time, commit_time, archive_time always NULL when running cloud_files_state

Am attempting to find the commit_time for a given file for a delta table using the cloud_files_state command. However, the archive_time, commit_time, and archive_time coluns are always NULL. I am running databrics runtime 11.3 and have also verified ...

Data Engineering

2700 Views
1 replies
1 kudos

04-27-2023 12:31:46 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-28-2023 10:57:11 AM

1 kudos

@Morten Stakkeland :The issue you are facing with the cloud_files_state command is a known limitation in Delta Lake as of the latest stable release (Delta Lake 1.0). The commit_time and protocol columns are always null, and the archive_time column i...

1 kudos

04-28-2023 10:57:11 AM

by Pbarbosa154 • New Contributor III

04-28-2023 7:30:44 AM

2321 Views
2 replies
0 kudos

What is the best way to ingest GCS data into Databricks and apply Anomaly Detection Model?

I recently started exploring the field of Data Engineering and came across some difficulties. I have a bucket in GCS with millions of parquet files and I want to create an Anomaly Detection model with them. I was trying to ingest that data into Datab...

Data Engineering

2321 Views
2 replies
0 kudos

04-28-2023 7:30:44 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-28-2023 10:34:53 AM

0 kudos

@Pedro Barbosa :It seems like you are running out of memory when trying to convert the PySpark dataframe to an H2O frame. One possible approach to solve this issue is to partition the PySpark dataframe before converting it to an H2O frame.You can us...

0 kudos

04-28-2023 10:34:53 AM

1 More Replies

by joao_albuquerqu • New Contributor II

04-25-2023 7:54:11 AM

14872 Views
2 replies
2 kudos

Is it possible to have Cluster with pre-installed dependencies?

I run some jobs in the Databricks environment where some resources need authentication. I do this (and I need to) through the vault-cli in the init-script.However, every time in the init-script I need to install vault-cli and other libraries. Is ther...

Data Engineering

14872 Views
2 replies
2 kudos

04-25-2023 7:54:11 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-26-2023 9:58:47 PM

2 kudos

@João Victor Albuquerque :Yes, there are a few ways to pre-install libraries and tools in the Databricks environment:Cluster-scoped init scripts: You can specify a shell script to be run when a cluster is created or restarted. This script can includ...

2 kudos

04-26-2023 9:58:47 PM

1 More Replies

by __Databricks_Su • Databricks Employee

02-18-2015 1:26:52 PM

109322 Views
17 replies
20 kudos

Resolved! How do I pass arguments/variables to notebooks?

Data Engineering

109322 Views
17 replies
20 kudos

02-18-2015 1:26:52 PM

View Replies

Latest Reply

luis_herrera
Databricks Employee

04-28-2023 8:50:04 AM

20 kudos

To pass arguments/variables to a notebook, you can use a JSON file to temporarily store the arguments and then pass it as one argument to the notebook. After passing the JSON file to the notebook, you can parse it with json.loads(). The argument list...

20 kudos

04-28-2023 8:50:04 AM

16 More Replies

by Data_Analytics1 • Contributor III

02-07-2023 9:59:37 PM

23813 Views
17 replies
24 kudos

Fatal error: The Python kernel is unresponsive.

I am using MultiThread in this job which creates 8 parallel jobs. It fails for few times in a day and sometimes stuck in any of the Python notebook cell process. Here The Python process exited with an unknown exit code.The last 10 KB of the process's...

Data Engineering

23813 Views
17 replies
24 kudos

02-07-2023 9:59:37 PM

View Replies

Latest Reply

luis_herrera
Databricks Employee

04-28-2023 8:48:37 AM

24 kudos

Hey, it seems that the issue is related to the driver undergoing a memory bottleneck, which causes it to crash with an out of memory (OOM) condition and gets restarted or becomes unresponsive due to frequent full garbage collection. The reason for th...

24 kudos

04-28-2023 8:48:37 AM

16 More Replies

by source2sea • Contributor

04-25-2023 7:38:08 AM

9851 Views
2 replies
0 kudos

Resolved! pass application.conf file into databricks jobs

i copied my question from an very old question/post that i reponded. and decided to move it to here:context:I have jar (scala), using scala pureconfig (wrapper of typesafe config)uploaded an application.conf file to a path which is mounted to the wor...

Data Engineering

9851 Views
2 replies
0 kudos

04-25-2023 7:38:08 AM

View Replies

Latest Reply

source2sea
Contributor

04-28-2023 6:29:27 AM

0 kudos

we had to put the conf in the root folder of the mounted path, and that works.maybe the mounted storage account being blob instead of adls2 is causing the issues.

0 kudos

04-28-2023 6:29:27 AM

1 More Replies

by mbaumga • New Contributor III

04-18-2023 3:34:20 AM

9377 Views
3 replies
2 kudos

Performance issues when loading an Excel file from DBFS using R

I have uploaded small Excel files on my DBFS. I then use function read_xlsx() from the "readxl" package in R to import the file into the R memory. I use a standard cluster (12.1, non ML). The function works but it takes ages. E.g. a simple Excel tabl...

Data Engineering

9377 Views
3 replies
2 kudos

04-18-2023 3:34:20 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-23-2023 9:29:01 PM

2 kudos

Hi @Marcel Baumgartner Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best an...

2 kudos

04-23-2023 9:29:01 PM

2 More Replies

Databricks Community

Forum Posts

Resolved! Azure Key Vault Keys client library for Python - keys list permission issue

Resolved! How to read data from a table into a dataframe outside of Databricks environment?

Resolved! Loading data from dataframe to Azure Storage Queue/Message Queue.

test

Partner session - Data Engineer Associate Course Schedule help

Not ab;e to login into databricks community edition

Databricks SQL Dashboard refresh not updating

Resolved! Databricks job about spark structured streaming zero downtime deployment in terraform.

Columns archive_time, commit_time, archive_time always NULL when running cloud_files_state

What is the best way to ingest GCS data into Databricks and apply Anomaly Detection Model?

Is it possible to have Cluster with pre-installed dependencies?

Resolved! How do I pass arguments/variables to notebooks?

Fatal error: The Python kernel is unresponsive.

Resolved! pass application.conf file into databricks jobs

Performance issues when loading an Excel file from DBFS using R

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template