cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

carlosna
by New Contributor II
  • 43980 Views
  • 0 replies
  • 0 kudos

Recover files from previous cluster execution

I saved a file with results by just opening a file via fopen("filename.csv", "a").Once the execution ended (and the cluster shutted down) I couldn't retrieve the file.I found that the file was stored in "/databricks/driver", and that folder empties w...

  • 43980 Views
  • 0 replies
  • 0 kudos
Databricks143
by New Contributor III
  • 1243 Views
  • 0 replies
  • 0 kudos

Failure to intialize congratulations

Hi team,When we reading the CSV file from azure blob using databricks we are not getting any key error and able to read the data from  blob .But if we are trying to read XML file  it failed with key issue invalid configuration . Error:Failure to inti...

  • 1243 Views
  • 0 replies
  • 0 kudos
Liliana
by New Contributor
  • 2360 Views
  • 1 replies
  • 0 kudos

Updated sys.path not working any more

We have a monorepo so our pyspark notebooks do not use namespace relative to the root of the repo. Thus the default sys.path of repo root and cwd does not work. We used to package a whl dependency but recently moved to having code update sys.path wit...

  • 2360 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

 Hi @Liliana , Just a friendly follow-up. Have you had a chance to review my colleague's response to your inquiry? Did it prove helpful, or are you still in need of assistance? Your response would be greatly appreciated.

  • 0 kudos
hal-qna
by New Contributor
  • 3173 Views
  • 1 replies
  • 0 kudos

Unable to instantiate Hive Meta Store Client

Databricks python sql script gives below error: Error in SQL statement: AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient 

halqna_0-1698241888233.png
  • 3173 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Hi @hal-qna, Just a friendly follow-up. Have you had a chance to review my colleague's response to your inquiry? Did it prove helpful, or are you still in need of assistance? Your response would be greatly appreciated.

  • 0 kudos
Srikanth_Gupta_
by Databricks Employee
  • 4661 Views
  • 2 replies
  • 1 kudos
  • 4661 Views
  • 2 replies
  • 1 kudos
Latest Reply
BilalAslamDbrx
Databricks Employee
  • 1 kudos

I'll try to answer this in the simplest possible way 1. Spark is an imperative programming framework. You tell it what it to do, it does it. DLT is declarative - you describe what you want the datasets to be (i.e. the transforms), and it takes care ...

  • 1 kudos
1 More Replies
rt-slowth
by Contributor
  • 7505 Views
  • 1 replies
  • 0 kudos

Resolved! how to run @dlt pipeline in vscode

I want to test a pipeline created using dlt and python in vscode.

  • 7505 Views
  • 1 replies
  • 0 kudos
Latest Reply
BilalAslamDbrx
Databricks Employee
  • 0 kudos

Hey @rt-slowth check out this tutorial. You won't get debugging in VSCode yet, but this workflow is pretty nice.

  • 0 kudos
Gilg
by Contributor II
  • 2158 Views
  • 1 replies
  • 0 kudos

Data Encryption in DLT

Hi Team,We have a requirement to Encrypt PII data in Silver layer. What is the best way to implement this in DLT? and only users that has security privileges are able to decrypt the PII info.I have done this in the past using Structured Streaming but...

Data Engineering
Delta Live Table
Encryption
  • 2158 Views
  • 1 replies
  • 0 kudos
Latest Reply
Gilg
Contributor II
  • 0 kudos

Can you show me how to use the functions built in pyspark using DLT please.Also, trying to implement column/row level security in silver tables that is generated by DLT, but giving me the following error[RequestId=35024c5d-ad05-4f68-a4cb-f3a723f66e1c...

  • 0 kudos
T_1
by New Contributor III
  • 28239 Views
  • 13 replies
  • 3 kudos

Resolved! displayHTML can't seem to be used from Python code, only hand typed into a cell???

Trying to use displayHTML from w/in a Python module gets a Python exception:NameError: name 'displayHTML' is not definedand I've found no way around this. It seems to be something at the UI layer or something, not a Python function that can be refere...

  • 28239 Views
  • 13 replies
  • 3 kudos
Latest Reply
T_1
New Contributor III
  • 3 kudos

Holy Guacamole Batman! It works finally!!!! Wow, thanks @ptweir That's awesome! I can go back and update my doc (and code, to just use databricks the same, now, and Jupyter!) and it'll work by default. It's great they fixed it, shame they never told ...

  • 3 kudos
12 More Replies
pavlos_skev
by New Contributor III
  • 5788 Views
  • 2 replies
  • 0 kudos

Resolved! Invalid configuration value detected for fs.azure.account.key only when trying to save RDD

Hello,We have encountered a weird issue in our (old) set-up that looks like a bug in the Unity Catalog. The storage account which we are trying to persist is configured via External Volumes.We have a pipeline that gets XML data and stores it in an RD...

  • 5788 Views
  • 2 replies
  • 0 kudos
Latest Reply
pavlos_skev
New Contributor III
  • 0 kudos

I will post here what worked resolving this error for us, in case someone else in the future encounters this.It turns out that this error appears in this case, when we were using the below command while the directory 'staging2' already exists. To avo...

  • 0 kudos
1 More Replies
Braxx
by Contributor II
  • 12066 Views
  • 3 replies
  • 1 kudos

Resolved! How to kill the execution of a notebook on specyfic cell?

Let's say I want to check if a condition is false then stop the execution of the rest of the script. I tried with two approaches:1) raising exceptionif not data_input_cols.issubset(data.columns): raise Exception("Missing column or column's name mis...

  • 12066 Views
  • 3 replies
  • 1 kudos
Latest Reply
Invasioned
New Contributor II
  • 1 kudos

In Jupyter notebooks or similar environments, you can stop the execution of a notebook at a specific cell by raising an exception. However, you need to handle the exception properly to ensure the execution stops. The issue you're encountering could b...

  • 1 kudos
2 More Replies
ashdam
by New Contributor III
  • 8210 Views
  • 9 replies
  • 2 kudos

Resolved! How to version your workflows/jobs

WE would like to version control workflows/jobs over git, not the underlying notebooks but the job logic itselfis it possible?

  • 8210 Views
  • 9 replies
  • 2 kudos
Latest Reply
ashdam
New Contributor III
  • 2 kudos

Thank you very much for all your answers

  • 2 kudos
8 More Replies
madhav_dhruve
by New Contributor III
  • 4672 Views
  • 1 replies
  • 0 kudos

Move Files from S3 to Local File System with Unity Catalog Enabled

Dear Databricks Community Experts,I am working on databricks on AWS with unity catalog.One usecase for me is to uncompress files with many extensions there on S3 Bucket.Below is my strategy:-Move files from S3 to Local file system (where spark driver...

Screenshot 2023-07-18 at 10.57.19 AM.png
  • 4672 Views
  • 1 replies
  • 0 kudos
Latest Reply
rvadali2
New Contributor II
  • 0 kudos

did you find a solution to this? 

  • 0 kudos
pratik21
by New Contributor II
  • 6575 Views
  • 3 replies
  • 1 kudos

Unexpected error while calling Notebook string matching regex `\$[\w_]+' expected but `M' found

Run result unavailable: job failed with error message INVALID_PARAMETER_VALUE: Failed to parse %run command: string matching regex `\$[\w_]+' expected but `M' found) Stacktrace:/Notebookpath: scalato call notebook we are using dbutils.notebook.run("N...

  • 6575 Views
  • 3 replies
  • 1 kudos
Latest Reply
wise_owl
New Contributor III
  • 1 kudos

Not sure of @pratik21 , but for me cloning the notebook at a different location worked for me and it stopped giving me the error altogether.

  • 1 kudos
2 More Replies
User16826990884
by New Contributor III
  • 3717 Views
  • 3 replies
  • 0 kudos

Version control jobs

How do engineering teams out there version control their jobs? If there is a production issue, can I revert to an older version of the job?

  • 3717 Views
  • 3 replies
  • 0 kudos
Latest Reply
Rom
New Contributor III
  • 0 kudos

You can use version controlled source code for you databricks job and each time you need to rollback to older version of your job you need just to move to older version code. For version controlled source code you have multiple choises:-  Use a noteb...

  • 0 kudos
2 More Replies
Diogo_W
by New Contributor III
  • 4964 Views
  • 2 replies
  • 1 kudos

Resolved! Spark in not executing any tasks

I have an issue where Spark in not submiting any task, on any worksapce or cluster, even SQLWarehouse.Even for very simple code it hangs forever.Anyone ever faced something similar? Our infra is AWS. 

Diogo_W_0-1698352974280.png Diogo_W_1-1698353051402.png
  • 4964 Views
  • 2 replies
  • 1 kudos
Latest Reply
Diogo_W
New Contributor III
  • 1 kudos

Found the solution: Turned out to be an issue with the Security Groups. The internal security group communication was not open to all ports for TCP and UDP. After fixing that the jobs ran fine. Seems like we did require more workers too.

  • 1 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels