cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

johnb1
by Contributor
  • 22927 Views
  • 16 replies
  • 15 kudos

Problems with pandas.read_parquet() and path

I am doing the "Data Engineering with Databricks V2" learning path.I cannot run "DE 4.2 - Providing Options for External Sources", as the first code cell does not run successful:%run ../Includes/Classroom-Setup-04.2Screenshot 1: Inside the setup note...

MicrosoftTeams-image MicrosoftTeams-image (1) Capture Capture_2
  • 22927 Views
  • 16 replies
  • 15 kudos
Latest Reply
hebied
New Contributor II
  • 15 kudos

Thanks for sharing bro ..It really helped.

  • 15 kudos
15 More Replies
dg
by New Contributor II
  • 16881 Views
  • 7 replies
  • 3 kudos

Trying to use pdf2image on databricks

Trying to use pdf2image on databricks, but its failing with "PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?"I've installed pdf2image & poppler-utils by running the following in a cell:%pip install pdf2image%pip ...

  • 16881 Views
  • 7 replies
  • 3 kudos
Latest Reply
Slalom_Tobias
New Contributor III
  • 3 kudos

Seems like this thread has died, but for posterity, databricks provides the following code for installing poppler on a cluster. The code is sourced from the dbdemos accelerators, specifically the "LLM Chatbot With Retrieval Augmented Generation (RAG)...

  • 3 kudos
6 More Replies
feed
by New Contributor III
  • 9008 Views
  • 6 replies
  • 3 kudos

TesseractNotFoundError

TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information. in databricks

  • 9008 Views
  • 6 replies
  • 3 kudos
Latest Reply
neha_ayodhya
New Contributor II
  • 3 kudos

%sh apt-get install -y tesseract-ocr this command is not working in my new Databricks free trail account, earlier it worked fine in my old Databricks instance. I get below error: E: Could not open lock file /var/lib/dpkg/lock-frontend - open (13: Per...

  • 3 kudos
5 More Replies
harraz
by New Contributor III
  • 1493 Views
  • 2 replies
  • 0 kudos

how to setup the path to a remote notebook in bitbucket to run as a jobI tried everything in the path and nothing is workingI keep getting this error:...

how to setup the path to a remote notebook in bitbucket to run as a jobI tried everything in the path and nothing is workingI keep getting this error:Run result unavailable: run failed with error message Notebook not found:Note that I already connec...

Screen Shot 2023-05-31 at 6.45.47 PM
  • 1493 Views
  • 2 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

Hi @mohamed harraz​ , Could you please confirm if files in repos has been enabled? https://docs.databricks.com/files/workspace.html#configure-support-for-files-in-repos.You can use the command  %sh pwd in a notebook inside a repo to check if Files in...

  • 0 kudos
1 More Replies
Dilorom
by New Contributor
  • 6244 Views
  • 3 replies
  • 4 kudos

What is a recommended directory for creating a database with a specified path?

I was going through Data Engineering with Databricks training, and in DE 3.3L - Databases, Tables & Views Lab section, it says "Defining database directories for groups of users can greatly reduce the chances of accidental data exfiltration." I agree...

  • 6244 Views
  • 3 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @Dilorom A​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ...

  • 4 kudos
2 More Replies
pankajBhatt
by New Contributor II
  • 1609 Views
  • 1 replies
  • 1 kudos

Databricks not able to access latest files in Azure ADLS Gen1

I have mounted my path from Databricks to AzureADLS Gen1. using SPN as service accuntuntill yesterday everything was ok, but today I see, I can view all older deleted folders. I can not see them in ADLS . but my databricks dbutils.fs.ls() shows them....

  • 1609 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @pankaj bhatt​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so ...

  • 1 kudos
tinendra
by New Contributor III
  • 4032 Views
  • 7 replies
  • 8 kudos

Can we run pandas dataframe inside databricks?

Hi, I want to run df=pd.read_csv('/dbfs/FileStore/airlines1.csv') while trying to run getting error likeFileNotFoundError: [Errno 2] No such file or directory: '/dbfs/FileStore/airlines1.csv'Could you please help me out how to run pandas dataframe in...

  • 4032 Views
  • 7 replies
  • 8 kudos
Latest Reply
Anonymous
Not applicable
  • 8 kudos

Hi @Tinendra Kumar​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

  • 8 kudos
6 More Replies
sunil_smile
by Contributor
  • 8516 Views
  • 8 replies
  • 10 kudos

Resolved! How i can add ADLS Gen2 - OAuth 2.0 as Cluster scope for my High concurrency Shared Cluster (without unity catalog)?

Hi All,Kindly help me , how i can add the ADLS gen2 OAuth 2.0 authentication to my high concurrency shared cluster. I want to scope this authentication to entire cluster not for particular notebook.Currently i have added them as spark configuration o...

image.png image
  • 8516 Views
  • 8 replies
  • 10 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 10 kudos

error is because of missing default settings (create new cluster and do not remove them),the warning is because secrets should be put in secret scope, and then you should reference secrets in settings

  • 10 kudos
7 More Replies
KKo
by Contributor III
  • 2390 Views
  • 2 replies
  • 2 kudos

delete and append in delta path

I am deleting data from curated path based on date column and appending staged data on it on each run, using below script. My fear is, just after the delete operation, if any network issue appeared and the job stopped before it appended the staged da...

  • 2390 Views
  • 2 replies
  • 2 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 2 kudos

thanks man

  • 2 kudos
1 More Replies
KrishZ
by Contributor
  • 4812 Views
  • 4 replies
  • 1 kudos

How to print the path of a .py file or a notebook?

I have stored a test.py in the dbfs at the below location "/dbfs/FileStore/shared_uploads/krishna@company.com/Project_Folder/test.py"I have a print statement in test.py which says the belowprint( os.getcwd() )and it prints the below'/databricks/drive...

  • 4812 Views
  • 4 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

Hey @Krishna Zanwar​  Please use the below code this will work and as you want the specific location you can create a custom code and format the path using a python formatter , it will give you desired result .

  • 1 kudos
3 More Replies
KVNARK
by Honored Contributor II
  • 4480 Views
  • 4 replies
  • 11 kudos

Resolved! Pyspark learning path

Can anyone suggest to take the best series of courses offered by Databricks to learn pyspark for ETL purpose either in Databricks partner learning portal or Databricks learning portal.

  • 4480 Views
  • 4 replies
  • 11 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 11 kudos

To learn Databricks ETL, I highy recommend videos made by Simon on that channel https://www.youtube.com/@AdvancingAnalytics

  • 11 kudos
3 More Replies
jd1
by New Contributor II
  • 835 Views
  • 1 replies
  • 3 kudos

Hello, When working in a python notebook and using tab-complete to navigate the file system, I find that pressing enter on a partially completed path ...

Hello,When working in a python notebook and using tab-complete to navigate the file system, I find that pressing enter on a partially completed path will add the full path to the cell in the notebook. This is annoying behaviour, since you end up with...

  • 835 Views
  • 1 replies
  • 3 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 3 kudos

Someone heard you In the experimental Monaco editor, I found this particular issue not appearing.

  • 3 kudos
Mado
by Valued Contributor II
  • 1686 Views
  • 2 replies
  • 3 kudos

When should I use ".start()" with writeStream?

Hi,I am practicing with Databricks. In sample notebooks,I have seen different use of writeStream with or without ".start()" method. Samples are below:Without .start() spark.readStream   .format("cloudFiles")   .option("cloudFiles.f...

  • 1686 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Mohammad Saber​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks

  • 3 kudos
1 More Replies
Kavin
by New Contributor II
  • 1744 Views
  • 1 replies
  • 2 kudos

Issue converting the datasets into JSON

Im a newbie to Databricks, I need to convert the data sets into JSON. i tried bth FOR JSON AUTO AND FOR JSON PATH, However im getting an issue - [PARSE_SYNTAX_ERROR] Syntax error at or near 'json'line My Query works fine without FOR JSON AUTO AND FOR...

  • 1744 Views
  • 1 replies
  • 2 kudos
Latest Reply
Debayan
Databricks Employee
  • 2 kudos

Hi @Kavin Natarajan​ , Could you please go through https://www.tutorialkart.com/apache-spark/spark-write-dataset-to-json-file-example/ , looks like the steps are okay.

  • 2 kudos
Ank
by New Contributor II
  • 1200 Views
  • 1 replies
  • 2 kudos

Why am I getting a FileNotFoundError after providing the file path?

I used copy file path to get the file path of the notebook I am trying to run from another notebook.file_path = "/Users/ankur.lohiya@workday.com/PAS/Training/Ingest/TrainingQueries-Cloned.py/"ddi = DatabricksDataIngestion(file_path=file_path,        ...

  • 1200 Views
  • 1 replies
  • 2 kudos
Latest Reply
Vidula
Honored Contributor
  • 2 kudos

Hello @Ankur Lohiya​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...

  • 2 kudos
Labels