cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

anirudh_a
by New Contributor II
  • 8579 Views
  • 8 replies
  • 3 kudos

Resolved! 'No file or Directory' error when using pandas.read_excel in Databricks

I am baffled by the behaviour of Databricks:Below you can see the contents of the directory using dbutils in Databricks. It shows the `test.xlsx` file clearly in directory (and I can even open it using `dbutils.fs.head`) But when I go to use panda.re...

wCLqf
Data Engineering
dbfs
panda
spark
spark config
  • 8579 Views
  • 8 replies
  • 3 kudos
Latest Reply
DamnKush
New Contributor II
  • 3 kudos

Hey, I encountered it recently. I can see you are using the shared cluster, try switching to a single user cluster and it will fix it.Can someone let me know why it wasn't working w a shared cluster?Thanks.

  • 3 kudos
7 More Replies
priyanananthram
by New Contributor II
  • 6017 Views
  • 4 replies
  • 1 kudos

Delta live tables for large number of tables

Hi There I am hoping for some guidance I have some 850 tables that I need to ingest using  a DLT Pipeline. When I do this my event log shows that driver node dies becomes unresponsive likely due to GC.Can DLT be used to ingest large number of tablesI...

  • 6017 Views
  • 4 replies
  • 1 kudos
Latest Reply
Sidhant07
New Contributor III
  • 1 kudos

Delta Live Tables (DLT) can indeed be used to ingest a large number of tables. However, if you're experiencing issues with the driver node becoming unresponsive due to garbage collection (GC), it might be a sign that the resources allocated to the dr...

  • 1 kudos
3 More Replies
Databricks143
by New Contributor III
  • 1008 Views
  • 1 replies
  • 0 kudos

Failure to intialize congratulations

Hi team,When we reading the CSV file from azure blob using databricks we are not getting any key error and able to read the data from  blob .But if we are trying to read XML file  it failed with key issue invalid configuration . Error:Failure to inti...

  • 1008 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Databricks143, Please check this link here. Please LMK if that helps.

  • 0 kudos
Graham
by New Contributor III
  • 4875 Views
  • 5 replies
  • 2 kudos

"MERGE" always slower than "CREATE OR REPLACE"

OverviewTo update our Data Warehouse tables, we have tried two methods: "CREATE OR REPLACE" and "MERGE". With every query we've tried, "MERGE" is slower.My question is this: Has anyone successfully gotten a "MERGE" to perform faster than a "CREATE OR...

  • 4875 Views
  • 5 replies
  • 2 kudos
Latest Reply
Manisha_Jena
New Contributor III
  • 2 kudos

Hi @Graham Can you please try Low Shuffle Merge [LSM]  and see if it helps? LSM is a new MERGE algorithm that aims to maintain the existing data organization (including z-order clustering) for unmodified data, while simultaneously improving performan...

  • 2 kudos
4 More Replies
Liliana
by New Contributor
  • 1670 Views
  • 2 replies
  • 1 kudos

Updated sys.path not working any more

We have a monorepo so our pyspark notebooks do not use namespace relative to the root of the repo. Thus the default sys.path of repo root and cwd does not work. We used to package a whl dependency but recently moved to having code update sys.path wit...

  • 1670 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

 Hi @Liliana , Just a friendly follow-up. Have you had a chance to review my colleague's response to your inquiry? Did it prove helpful, or are you still in need of assistance? Your response would be greatly appreciated.

  • 1 kudos
1 More Replies
hal-qna
by New Contributor
  • 1979 Views
  • 2 replies
  • 0 kudos

Unable to instantiate Hive Meta Store Client

Databricks python sql script gives below error: Error in SQL statement: AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient 

halqna_0-1698241888233.png
  • 1979 Views
  • 2 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Hi @hal-qna, Just a friendly follow-up. Have you had a chance to review my colleague's response to your inquiry? Did it prove helpful, or are you still in need of assistance? Your response would be greatly appreciated.

  • 0 kudos
1 More Replies
Srikanth_Gupta_
by Valued Contributor
  • 3804 Views
  • 2 replies
  • 0 kudos
  • 3804 Views
  • 2 replies
  • 0 kudos
Latest Reply
BilalAslamDbrx
Honored Contributor III
  • 0 kudos

I'll try to answer this in the simplest possible way 1. Spark is an imperative programming framework. You tell it what it to do, it does it. DLT is declarative - you describe what you want the datasets to be (i.e. the transforms), and it takes care ...

  • 0 kudos
1 More Replies
Gilg
by Contributor II
  • 1548 Views
  • 2 replies
  • 0 kudos

Data Encryption in DLT

Hi Team,We have a requirement to Encrypt PII data in Silver layer. What is the best way to implement this in DLT? and only users that has security privileges are able to decrypt the PII info.I have done this in the past using Structured Streaming but...

Data Engineering
Delta Live Table
Encryption
  • 1548 Views
  • 2 replies
  • 0 kudos
Latest Reply
Gilg
Contributor II
  • 0 kudos

Can you show me how to use the functions built in pyspark using DLT please.Also, trying to implement column/row level security in silver tables that is generated by DLT, but giving me the following error[RequestId=35024c5d-ad05-4f68-a4cb-f3a723f66e1c...

  • 0 kudos
1 More Replies
T_1
by New Contributor III
  • 22129 Views
  • 13 replies
  • 3 kudos

Resolved! displayHTML can't seem to be used from Python code, only hand typed into a cell???

Trying to use displayHTML from w/in a Python module gets a Python exception:NameError: name 'displayHTML' is not definedand I've found no way around this. It seems to be something at the UI layer or something, not a Python function that can be refere...

  • 22129 Views
  • 13 replies
  • 3 kudos
Latest Reply
T_1
New Contributor III
  • 3 kudos

Holy Guacamole Batman! It works finally!!!! Wow, thanks @ptweir That's awesome! I can go back and update my doc (and code, to just use databricks the same, now, and Jupyter!) and it'll work by default. It's great they fixed it, shame they never told ...

  • 3 kudos
12 More Replies
pavlos_skev
by New Contributor III
  • 3876 Views
  • 3 replies
  • 0 kudos

Resolved! Invalid configuration value detected for fs.azure.account.key only when trying to save RDD

Hello,We have encountered a weird issue in our (old) set-up that looks like a bug in the Unity Catalog. The storage account which we are trying to persist is configured via External Volumes.We have a pipeline that gets XML data and stores it in an RD...

  • 3876 Views
  • 3 replies
  • 0 kudos
Latest Reply
pavlos_skev
New Contributor III
  • 0 kudos

I will post here what worked resolving this error for us, in case someone else in the future encounters this.It turns out that this error appears in this case, when we were using the below command while the directory 'staging2' already exists. To avo...

  • 0 kudos
2 More Replies
Braxx
by Contributor II
  • 9681 Views
  • 3 replies
  • 1 kudos

Resolved! How to kill the execution of a notebook on specyfic cell?

Let's say I want to check if a condition is false then stop the execution of the rest of the script. I tried with two approaches:1) raising exceptionif not data_input_cols.issubset(data.columns): raise Exception("Missing column or column's name mis...

  • 9681 Views
  • 3 replies
  • 1 kudos
Latest Reply
Invasioned
New Contributor II
  • 1 kudos

In Jupyter notebooks or similar environments, you can stop the execution of a notebook at a specific cell by raising an exception. However, you need to handle the exception properly to ensure the execution stops. The issue you're encountering could b...

  • 1 kudos
2 More Replies
ashdam
by New Contributor III
  • 5670 Views
  • 10 replies
  • 1 kudos

Resolved! How to version your workflows/jobs

WE would like to version control workflows/jobs over git, not the underlying notebooks but the job logic itselfis it possible?

  • 5670 Views
  • 10 replies
  • 1 kudos
Latest Reply
ashdam
New Contributor III
  • 1 kudos

Thank you very much for all your answers

  • 1 kudos
9 More Replies
madhav_dhruve
by New Contributor III
  • 3344 Views
  • 1 replies
  • 0 kudos

Move Files from S3 to Local File System with Unity Catalog Enabled

Dear Databricks Community Experts,I am working on databricks on AWS with unity catalog.One usecase for me is to uncompress files with many extensions there on S3 Bucket.Below is my strategy:-Move files from S3 to Local file system (where spark driver...

Screenshot 2023-07-18 at 10.57.19 AM.png
  • 3344 Views
  • 1 replies
  • 0 kudos
Latest Reply
rvadali2
New Contributor II
  • 0 kudos

did you find a solution to this? 

  • 0 kudos
dfoard
by New Contributor
  • 2137 Views
  • 1 replies
  • 0 kudos

ERROR: No matching distribution found for databricks-smolder

I'm trying to follow along with the blog post Gaining Insights Into Your HL7 Data With Smolder and Databricks-#1 of 3. I was able to finally get a jar file built from the repo using Java 17 and it successfully imports into the cluster. However, when ...

  • 2137 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @dfoard ,  It appears that the error is due to attempting to import a Java package in Python code, which isn't supported. The Smolder library is designed to work with Scala code in a Databricks Notebook environment. To use the com.data...

  • 0 kudos
Akash2
by Contributor
  • 844 Views
  • 1 replies
  • 0 kudos

Data Engineer Professional Exam Suspended

Hi team,I was giving my exam today and 40 minutes into the exam I was interrupted by the proctor to show the test area. The table had a guitar e string and an almost eaten apple. Nothing else was on the table. Then the proctor asked me to show the ro...

  • 844 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Akash2 , Thank you for posting your concern on Community! To expedite your request, please list your concerns on our ticketing portal. Our support staff would be able to act faster on the resolution (our standard resolution time is 24-48 hours).

  • 0 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels