cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

kinsun
by New Contributor II
  • 5744 Views
  • 2 replies
  • 3 kudos

Resolved! Azure Key Vault Keys client library for Python - keys list permission issue

Dear Databricks ExpertI am trying to get a key which is stored in the Azure Key Vault, using Azure Key Vault Keys client library for Python. However error was met.Python Code:#from azure.identity import DefaultAzureCredentialfrom azure.identity impor...

  • 5744 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @KS LAU​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your q...

  • 3 kudos
1 More Replies
AnuVat
by New Contributor III
  • 53178 Views
  • 7 replies
  • 13 kudos

Resolved! How to read data from a table into a dataframe outside of Databricks environment?

Hi, I am working on an ML project and I need to access the data in tables hosted in my Databricks cluster through a notebook that I am running locally. This has been very easy while I run the notebooks in Databricks but I cannot figure out how to do ...

  • 53178 Views
  • 7 replies
  • 13 kudos
Latest Reply
chakri
New Contributor III
  • 13 kudos

We can use Apis and pyodbc to achieve this. Once go through the official documentation of databricks that might be helpful to access outside of the databricks environment.

  • 13 kudos
6 More Replies
sensanjoy
by Contributor II
  • 6619 Views
  • 3 replies
  • 1 kudos

Resolved! Loading data from dataframe to Azure Storage Queue/Message Queue.

Hi Experts,We do have one use case where we have batch load that create a dataframe at end and now we want to load this data at Azure Storage Queue/Message Queue so that some Rest API can read the data/messages from the queue later and process it acc...

  • 6619 Views
  • 3 replies
  • 1 kudos
Latest Reply
sensanjoy
Contributor II
  • 1 kudos

@Suteja Kanuri​  looking for your input here. Thanks.

  • 1 kudos
2 More Replies
arlok
by New Contributor
  • 6978 Views
  • 4 replies
  • 1 kudos

Partner session - Data Engineer Associate Course Schedule help

I have been wanting to enroll into the Data Engineer Associate Course as my co is a Databricks partner. I have been unsuccessful thusfar whenever the session has happened in North America and have always been waitlisted.1.Is there a hack to get a slo...

  • 6978 Views
  • 4 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi @Lokesh AR​,Just a friendly follow-up. Do you still need help? Please let us know if you still need help

  • 1 kudos
3 More Replies
SrinuM
by New Contributor III
  • 1823 Views
  • 1 replies
  • 0 kudos

Not ab;e to login into databricks community edition

I forgot my password and when i am trying to reset my password , it was struck at reset password page.Could you please help me with it.

  • 1823 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Adding @Vidula Khanna​ and @Kaniz Fatma​ for visibility to help with this request

  • 0 kudos
jasperputs
by New Contributor III
  • 18543 Views
  • 5 replies
  • 0 kudos

Databricks SQL Dashboard refresh not updating

I am trying to create a SQL Dashboard on top of a streaming dataset (Delta format). I created multiple queries referencing the file on the datalake, not a hive table. With these queries I created multiple visualizations in a Databricks SQL Dashboard....

image.png image
  • 18543 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Jasper Puts​ :Here are some potential solutions or next steps you can try:Check the query refresh rate: Confirm that the queries are also set to refresh every minute or less. If the queries are not refreshing at the same rate as the dashboard, this ...

  • 0 kudos
4 More Replies
MarsSu
by New Contributor II
  • 11431 Views
  • 5 replies
  • 1 kudos

Resolved! Databricks job about spark structured streaming zero downtime deployment in terraform.

I would like to ask how to implement zero downtime deployment of spark structured streaming in databricks job compute with terraform. Because we will upgrade spark application code version. But currently we found every deployment will cancel original...

  • 11431 Views
  • 5 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Mars Su​ :Yes, you can implement zero downtime deployment of Spark Structured Streaming in Databricks job compute using Terraform. One way to achieve this is by using Databricks' "job clusters" feature, which allows you to create a cluster specifica...

  • 1 kudos
4 More Replies
MRTN
by Contributor
  • 2700 Views
  • 1 replies
  • 1 kudos

Columns archive_time, commit_time, archive_time always NULL when running cloud_files_state

Am attempting to find the commit_time for a given file for a delta table using the cloud_files_state command. However, the archive_time, commit_time, and archive_time coluns are always NULL. I am running databrics runtime 11.3 and have also verified ...

cloud_files_state
  • 2700 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Morten Stakkeland​ :The issue you are facing with the cloud_files_state command is a known limitation in Delta Lake as of the latest stable release (Delta Lake 1.0). The commit_time and protocol columns are always null, and the archive_time column i...

  • 1 kudos
Pbarbosa154
by New Contributor III
  • 2321 Views
  • 2 replies
  • 0 kudos

What is the best way to ingest GCS data into Databricks and apply Anomaly Detection Model?

I recently started exploring the field of Data Engineering and came across some difficulties. I have a bucket in GCS with millions of parquet files and I want to create an Anomaly Detection model with them. I was trying to ingest that data into Datab...

  • 2321 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Pedro Barbosa​ :It seems like you are running out of memory when trying to convert the PySpark dataframe to an H2O frame. One possible approach to solve this issue is to partition the PySpark dataframe before converting it to an H2O frame.You can us...

  • 0 kudos
1 More Replies
joao_albuquerqu
by New Contributor II
  • 14872 Views
  • 2 replies
  • 2 kudos

Is it possible to have Cluster with pre-installed dependencies?

I run some jobs in the Databricks environment where some resources need authentication. I do this (and I need to) through the vault-cli in the init-script.However, every time in the init-script I need to install vault-cli and other libraries. Is ther...

  • 14872 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@João Victor Albuquerque​ :Yes, there are a few ways to pre-install libraries and tools in the Databricks environment:Cluster-scoped init scripts: You can specify a shell script to be run when a cluster is created or restarted. This script can includ...

  • 2 kudos
1 More Replies
__Databricks_Su
by Databricks Employee
  • 109322 Views
  • 17 replies
  • 20 kudos
  • 109322 Views
  • 17 replies
  • 20 kudos
Latest Reply
luis_herrera
Databricks Employee
  • 20 kudos

To pass arguments/variables to a notebook, you can use a JSON file to temporarily store the arguments and then pass it as one argument to the notebook. After passing the JSON file to the notebook, you can parse it with json.loads(). The argument list...

  • 20 kudos
16 More Replies
Data_Analytics1
by Contributor III
  • 23813 Views
  • 17 replies
  • 24 kudos

Fatal error: The Python kernel is unresponsive.

I am using MultiThread in this job which creates 8 parallel jobs. It fails for few times in a day and sometimes stuck in any of the Python notebook cell process. Here The Python process exited with an unknown exit code.The last 10 KB of the process's...

  • 23813 Views
  • 17 replies
  • 24 kudos
Latest Reply
luis_herrera
Databricks Employee
  • 24 kudos

Hey, it seems that the issue is related to the driver undergoing a memory bottleneck, which causes it to crash with an out of memory (OOM) condition and gets restarted or becomes unresponsive due to frequent full garbage collection. The reason for th...

  • 24 kudos
16 More Replies
source2sea
by Contributor
  • 9851 Views
  • 2 replies
  • 0 kudos

Resolved! pass application.conf file into databricks jobs

i copied my question from an very old question/post that i reponded. and decided to move it to here:context:I have jar (scala), using scala pureconfig (wrapper of typesafe config)uploaded an application.conf file to a path which is mounted to the wor...

  • 9851 Views
  • 2 replies
  • 0 kudos
Latest Reply
source2sea
Contributor
  • 0 kudos

we had to put the conf in the root folder of the mounted path, and that works.maybe the mounted storage account being blob instead of adls2 is causing the issues.

  • 0 kudos
1 More Replies
mbaumga
by New Contributor III
  • 9377 Views
  • 3 replies
  • 2 kudos

Performance issues when loading an Excel file from DBFS using R

I have uploaded small Excel files on my DBFS. I then use function read_xlsx() from the "readxl" package in R to import the file into the R memory. I use a standard cluster (12.1, non ML). The function works but it takes ages. E.g. a simple Excel tabl...

  • 9377 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Marcel Baumgartner​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best an...

  • 2 kudos
2 More Replies
Labels