cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

dg
by New Contributor II
  • 7449 Views
  • 7 replies
  • 1 kudos

Trying to use pdf2image on databricks

Trying to use pdf2image on databricks, but its failing with "PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?"I've installed pdf2image & poppler-utils by running the following in a cell:%pip install pdf2image%pip ...

  • 7449 Views
  • 7 replies
  • 1 kudos
Latest Reply
Slalom_Tobias
New Contributor III
  • 1 kudos

Seems like this thread has died, but for posterity, databricks provides the following code for installing poppler on a cluster. The code is sourced from the dbdemos accelerators, specifically the "LLM Chatbot With Retrieval Augmented Generation (RAG)...

  • 1 kudos
6 More Replies
johnb1
by New Contributor III
  • 7758 Views
  • 12 replies
  • 7 kudos

Problems with pandas.read_parquet() and path

I am doing the "Data Engineering with Databricks V2" learning path.I cannot run "DE 4.2 - Providing Options for External Sources", as the first code cell does not run successful:%run ../Includes/Classroom-Setup-04.2Screenshot 1: Inside the setup note...

MicrosoftTeams-image MicrosoftTeams-image (1) Capture Capture_2
  • 7758 Views
  • 12 replies
  • 7 kudos
Latest Reply
jonathanchcc
New Contributor III
  • 7 kudos

Thanks for sharing this helped me too 

  • 7 kudos
11 More Replies
feed
by New Contributor III
  • 3818 Views
  • 7 replies
  • 3 kudos

TesseractNotFoundError

TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information. in databricks

  • 3818 Views
  • 7 replies
  • 3 kudos
Latest Reply
neha_ayodhya
New Contributor II
  • 3 kudos

%sh apt-get install -y tesseract-ocr this command is not working in my new Databricks free trail account, earlier it worked fine in my old Databricks instance. I get below error: E: Could not open lock file /var/lib/dpkg/lock-frontend - open (13: Per...

  • 3 kudos
6 More Replies
harraz
by New Contributor III
  • 705 Views
  • 2 replies
  • 0 kudos

how to setup the path to a remote notebook in bitbucket to run as a jobI tried everything in the path and nothing is workingI keep getting this error:...

how to setup the path to a remote notebook in bitbucket to run as a jobI tried everything in the path and nothing is workingI keep getting this error:Run result unavailable: run failed with error message Notebook not found:Note that I already connec...

Screen Shot 2023-05-31 at 6.45.47 PM
  • 705 Views
  • 2 replies
  • 0 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 0 kudos

Hi @mohamed harraz​ , Could you please confirm if files in repos has been enabled? https://docs.databricks.com/files/workspace.html#configure-support-for-files-in-repos.You can use the command  %sh pwd in a notebook inside a repo to check if Files in...

  • 0 kudos
1 More Replies
Dilorom
by New Contributor
  • 3346 Views
  • 5 replies
  • 3 kudos

What is a recommended directory for creating a database with a specified path?

I was going through Data Engineering with Databricks training, and in DE 3.3L - Databases, Tables & Views Lab section, it says "Defining database directories for groups of users can greatly reduce the chances of accidental data exfiltration." I agree...

  • 3346 Views
  • 5 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Dilorom A​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ...

  • 3 kudos
4 More Replies
pankajBhatt
by New Contributor II
  • 889 Views
  • 2 replies
  • 1 kudos

Databricks not able to access latest files in Azure ADLS Gen1

I have mounted my path from Databricks to AzureADLS Gen1. using SPN as service accuntuntill yesterday everything was ok, but today I see, I can view all older deleted folders. I can not see them in ADLS . but my databricks dbutils.fs.ls() shows them....

  • 889 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @pankaj bhatt​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so ...

  • 1 kudos
1 More Replies
tinendra
by New Contributor III
  • 1777 Views
  • 7 replies
  • 8 kudos

Can we run pandas dataframe inside databricks?

Hi, I want to run df=pd.read_csv('/dbfs/FileStore/airlines1.csv') while trying to run getting error likeFileNotFoundError: [Errno 2] No such file or directory: '/dbfs/FileStore/airlines1.csv'Could you please help me out how to run pandas dataframe in...

  • 1777 Views
  • 7 replies
  • 8 kudos
Latest Reply
Anonymous
Not applicable
  • 8 kudos

Hi @Tinendra Kumar​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

  • 8 kudos
6 More Replies
sunil_smile
by Contributor
  • 3572 Views
  • 9 replies
  • 11 kudos

Resolved! How i can add ADLS Gen2 - OAuth 2.0 as Cluster scope for my High concurrency Shared Cluster (without unity catalog)?

Hi All,Kindly help me , how i can add the ADLS gen2 OAuth 2.0 authentication to my high concurrency shared cluster. I want to scope this authentication to entire cluster not for particular notebook.Currently i have added them as spark configuration o...

image.png image
  • 3572 Views
  • 9 replies
  • 11 kudos
Latest Reply
Kaniz
Community Manager
  • 11 kudos

Hi @Sunilprasath Elangovan​ , We haven’t heard from you since the last response from @Hubert Dudek​​, and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please do share that with the community as it can b...

  • 11 kudos
8 More Replies
KKo
by Contributor III
  • 1278 Views
  • 3 replies
  • 3 kudos

delete and append in delta path

I am deleting data from curated path based on date column and appending staged data on it on each run, using below script. My fear is, just after the delete operation, if any network issue appeared and the job stopped before it appended the staged da...

  • 1278 Views
  • 3 replies
  • 3 kudos
Latest Reply
Kaniz
Community Manager
  • 3 kudos

Hi @Kris Koirala​ , We haven’t heard from you since the last response from @Hubert Dudek​ , and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please do share that with the community as it can be helpful ...

  • 3 kudos
2 More Replies
KrishZ
by Contributor
  • 2379 Views
  • 4 replies
  • 1 kudos

How to print the path of a .py file or a notebook?

I have stored a test.py in the dbfs at the below location "/dbfs/FileStore/shared_uploads/krishna@company.com/Project_Folder/test.py"I have a print statement in test.py which says the belowprint( os.getcwd() )and it prints the below'/databricks/drive...

  • 2379 Views
  • 4 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

Hey @Krishna Zanwar​  Please use the below code this will work and as you want the specific location you can create a custom code and format the path using a python formatter , it will give you desired result .

  • 1 kudos
3 More Replies
KVNARK
by Honored Contributor II
  • 1634 Views
  • 4 replies
  • 11 kudos

Resolved! Pyspark learning path

Can anyone suggest to take the best series of courses offered by Databricks to learn pyspark for ETL purpose either in Databricks partner learning portal or Databricks learning portal.

  • 1634 Views
  • 4 replies
  • 11 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 11 kudos

To learn Databricks ETL, I highy recommend videos made by Simon on that channel https://www.youtube.com/@AdvancingAnalytics

  • 11 kudos
3 More Replies
jd1
by New Contributor II
  • 361 Views
  • 1 replies
  • 3 kudos

Hello, When working in a python notebook and using tab-complete to navigate the file system, I find that pressing enter on a partially completed path ...

Hello,When working in a python notebook and using tab-complete to navigate the file system, I find that pressing enter on a partially completed path will add the full path to the cell in the notebook. This is annoying behaviour, since you end up with...

  • 361 Views
  • 1 replies
  • 3 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 3 kudos

Someone heard you In the experimental Monaco editor, I found this particular issue not appearing.

  • 3 kudos
Mado
by Valued Contributor II
  • 650 Views
  • 2 replies
  • 3 kudos

When should I use ".start()" with writeStream?

Hi,I am practicing with Databricks. In sample notebooks,I have seen different use of writeStream with or without ".start()" method. Samples are below:Without .start() spark.readStream   .format("cloudFiles")   .option("cloudFiles.f...

  • 650 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Mohammad Saber​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks

  • 3 kudos
1 More Replies
Kavin
by New Contributor II
  • 959 Views
  • 2 replies
  • 2 kudos

Issue converting the datasets into JSON

Im a newbie to Databricks, I need to convert the data sets into JSON. i tried bth FOR JSON AUTO AND FOR JSON PATH, However im getting an issue - [PARSE_SYNTAX_ERROR] Syntax error at or near 'json'line My Query works fine without FOR JSON AUTO AND FOR...

  • 959 Views
  • 2 replies
  • 2 kudos
Latest Reply
Kaniz
Community Manager
  • 2 kudos

Hi @Kavin Natarajan​, We haven’t heard from you since the last response from @Debayan Mukherjee​​, and I was checking back to see if their suggestions helped you. Or else, If you have any solution, please share it with the community, as it can be hel...

  • 2 kudos
1 More Replies
Ank
by New Contributor II
  • 601 Views
  • 2 replies
  • 2 kudos

Why am I getting a FileNotFoundError after providing the file path?

I used copy file path to get the file path of the notebook I am trying to run from another notebook.file_path = "/Users/ankur.lohiya@workday.com/PAS/Training/Ingest/TrainingQueries-Cloned.py/"ddi = DatabricksDataIngestion(file_path=file_path,        ...

  • 601 Views
  • 2 replies
  • 2 kudos
Latest Reply
Vidula
Honored Contributor
  • 2 kudos

Hello @Ankur Lohiya​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...

  • 2 kudos
1 More Replies
Labels