cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

CDICSteph
by New Contributor
  • 2449 Views
  • 2 replies
  • 0 kudos

Need pattern for loading a million small XML files

Hi, looking for the right solution pattern for this scenario: We have millions of relatively small XML files (currently sitting in ADLS) that we have to load into delta lake. Each XML file has to be read, parsed, and pivoted before writing to a delta...

  • 2449 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Steph Swierenga​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answe...

  • 0 kudos
1 More Replies
Krish1
by New Contributor II
  • 8913 Views
  • 4 replies
  • 0 kudos

Error while mounting ADLS in python using AccountKey

I'm using the below code using Account key to mount ADLS in python but running into error:shaded.databricks.org.apache.hadoop.fs.azure.AzureException: java.lang.IllegalArgumentException: The String is not a valid Base64-encoded string. Can you pleas...

  • 8913 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Krish Lam​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers you...

  • 0 kudos
3 More Replies
Tripalink
by New Contributor III
  • 4297 Views
  • 6 replies
  • 2 kudos

Resolved! Auto-Suggestion Completes with Return, want only Tab

How do I adjust the settings for the auto-suggestion? If I click Return, then it fills in the suggestion. If I click Tab, then it fills in the suggestion. I would really like it to only use the auto-suggestion value when I click Tab. How can I change...

  • 4297 Views
  • 6 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Dagart Allison​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answer...

  • 2 kudos
5 More Replies
kinsun
by New Contributor II
  • 3836 Views
  • 2 replies
  • 3 kudos

Resolved! Azure Key Vault Keys client library for Python - keys list permission issue

Dear Databricks ExpertI am trying to get a key which is stored in the Azure Key Vault, using Azure Key Vault Keys client library for Python. However error was met.Python Code:#from azure.identity import DefaultAzureCredentialfrom azure.identity impor...

  • 3836 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @KS LAU​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your q...

  • 3 kudos
1 More Replies
AnuVat
by New Contributor III
  • 39266 Views
  • 7 replies
  • 13 kudos

Resolved! How to read data from a table into a dataframe outside of Databricks environment?

Hi, I am working on an ML project and I need to access the data in tables hosted in my Databricks cluster through a notebook that I am running locally. This has been very easy while I run the notebooks in Databricks but I cannot figure out how to do ...

  • 39266 Views
  • 7 replies
  • 13 kudos
Latest Reply
chakri
New Contributor III
  • 13 kudos

We can use Apis and pyodbc to achieve this. Once go through the official documentation of databricks that might be helpful to access outside of the databricks environment.

  • 13 kudos
6 More Replies
sensanjoy
by Contributor
  • 4586 Views
  • 3 replies
  • 1 kudos

Resolved! Loading data from dataframe to Azure Storage Queue/Message Queue.

Hi Experts,We do have one use case where we have batch load that create a dataframe at end and now we want to load this data at Azure Storage Queue/Message Queue so that some Rest API can read the data/messages from the queue later and process it acc...

  • 4586 Views
  • 3 replies
  • 1 kudos
Latest Reply
sensanjoy
Contributor
  • 1 kudos

@Suteja Kanuri​  looking for your input here. Thanks.

  • 1 kudos
2 More Replies
arlok
by New Contributor
  • 6122 Views
  • 4 replies
  • 1 kudos

Partner session - Data Engineer Associate Course Schedule help

I have been wanting to enroll into the Data Engineer Associate Course as my co is a Databricks partner. I have been unsuccessful thusfar whenever the session has happened in North America and have always been waitlisted.1.Is there a hack to get a slo...

  • 6122 Views
  • 4 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi @Lokesh AR​,Just a friendly follow-up. Do you still need help? Please let us know if you still need help

  • 1 kudos
3 More Replies
SrinuM
by New Contributor III
  • 1230 Views
  • 1 replies
  • 0 kudos

Not ab;e to login into databricks community edition

I forgot my password and when i am trying to reset my password , it was struck at reset password page.Could you please help me with it.

  • 1230 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Adding @Vidula Khanna​ and @Kaniz Fatma​ for visibility to help with this request

  • 0 kudos
jasperputs
by New Contributor III
  • 12561 Views
  • 5 replies
  • 0 kudos

Databricks SQL Dashboard refresh not updating

I am trying to create a SQL Dashboard on top of a streaming dataset (Delta format). I created multiple queries referencing the file on the datalake, not a hive table. With these queries I created multiple visualizations in a Databricks SQL Dashboard....

image.png image
  • 12561 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Jasper Puts​ :Here are some potential solutions or next steps you can try:Check the query refresh rate: Confirm that the queries are also set to refresh every minute or less. If the queries are not refreshing at the same rate as the dashboard, this ...

  • 0 kudos
4 More Replies
MarsSu
by New Contributor II
  • 9209 Views
  • 5 replies
  • 1 kudos

Resolved! Databricks job about spark structured streaming zero downtime deployment in terraform.

I would like to ask how to implement zero downtime deployment of spark structured streaming in databricks job compute with terraform. Because we will upgrade spark application code version. But currently we found every deployment will cancel original...

  • 9209 Views
  • 5 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Mars Su​ :Yes, you can implement zero downtime deployment of Spark Structured Streaming in Databricks job compute using Terraform. One way to achieve this is by using Databricks' "job clusters" feature, which allows you to create a cluster specifica...

  • 1 kudos
4 More Replies
MRTN
by New Contributor III
  • 1612 Views
  • 1 replies
  • 1 kudos

Columns archive_time, commit_time, archive_time always NULL when running cloud_files_state

Am attempting to find the commit_time for a given file for a delta table using the cloud_files_state command. However, the archive_time, commit_time, and archive_time coluns are always NULL. I am running databrics runtime 11.3 and have also verified ...

cloud_files_state
  • 1612 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Morten Stakkeland​ :The issue you are facing with the cloud_files_state command is a known limitation in Delta Lake as of the latest stable release (Delta Lake 1.0). The commit_time and protocol columns are always null, and the archive_time column i...

  • 1 kudos
Pbarbosa154
by New Contributor III
  • 1380 Views
  • 2 replies
  • 0 kudos

What is the best way to ingest GCS data into Databricks and apply Anomaly Detection Model?

I recently started exploring the field of Data Engineering and came across some difficulties. I have a bucket in GCS with millions of parquet files and I want to create an Anomaly Detection model with them. I was trying to ingest that data into Datab...

  • 1380 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Pedro Barbosa​ :It seems like you are running out of memory when trying to convert the PySpark dataframe to an H2O frame. One possible approach to solve this issue is to partition the PySpark dataframe before converting it to an H2O frame.You can us...

  • 0 kudos
1 More Replies
joao_albuquerqu
by New Contributor II
  • 12769 Views
  • 2 replies
  • 2 kudos

Is it possible to have Cluster with pre-installed dependencies?

I run some jobs in the Databricks environment where some resources need authentication. I do this (and I need to) through the vault-cli in the init-script.However, every time in the init-script I need to install vault-cli and other libraries. Is ther...

  • 12769 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@João Victor Albuquerque​ :Yes, there are a few ways to pre-install libraries and tools in the Databricks environment:Cluster-scoped init scripts: You can specify a shell script to be run when a cluster is created or restarted. This script can includ...

  • 2 kudos
1 More Replies
__Databricks_Su
by Contributor
  • 101087 Views
  • 17 replies
  • 20 kudos
  • 101087 Views
  • 17 replies
  • 20 kudos
Latest Reply
luis_herrera
Databricks Employee
  • 20 kudos

To pass arguments/variables to a notebook, you can use a JSON file to temporarily store the arguments and then pass it as one argument to the notebook. After passing the JSON file to the notebook, you can parse it with json.loads(). The argument list...

  • 20 kudos
16 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels