cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

johnb1
by Contributor
  • 16559 Views
  • 13 replies
  • 8 kudos

Problems with pandas.read_parquet() and path

I am doing the "Data Engineering with Databricks V2" learning path.I cannot run "DE 4.2 - Providing Options for External Sources", as the first code cell does not run successful:%run ../Includes/Classroom-Setup-04.2Screenshot 1: Inside the setup note...

MicrosoftTeams-image MicrosoftTeams-image (1) Capture Capture_2
  • 16559 Views
  • 13 replies
  • 8 kudos
Latest Reply
jonathanchcc
New Contributor III
  • 8 kudos

Thanks for sharing this helped me too 

  • 8 kudos
12 More Replies
dg
by New Contributor II
  • 13851 Views
  • 7 replies
  • 1 kudos

Trying to use pdf2image on databricks

Trying to use pdf2image on databricks, but its failing with "PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?"I've installed pdf2image & poppler-utils by running the following in a cell:%pip install pdf2image%pip ...

  • 13851 Views
  • 7 replies
  • 1 kudos
Latest Reply
Slalom_Tobias
New Contributor III
  • 1 kudos

Seems like this thread has died, but for posterity, databricks provides the following code for installing poppler on a cluster. The code is sourced from the dbdemos accelerators, specifically the "LLM Chatbot With Retrieval Augmented Generation (RAG)...

  • 1 kudos
6 More Replies
feed
by New Contributor III
  • 5612 Views
  • 7 replies
  • 3 kudos

TesseractNotFoundError

TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information. in databricks

  • 5612 Views
  • 7 replies
  • 3 kudos
Latest Reply
neha_ayodhya
New Contributor II
  • 3 kudos

%sh apt-get install -y tesseract-ocr this command is not working in my new Databricks free trail account, earlier it worked fine in my old Databricks instance. I get below error: E: Could not open lock file /var/lib/dpkg/lock-frontend - open (13: Per...

  • 3 kudos
6 More Replies
harraz
by New Contributor III
  • 1089 Views
  • 2 replies
  • 0 kudos

how to setup the path to a remote notebook in bitbucket to run as a jobI tried everything in the path and nothing is workingI keep getting this error:...

how to setup the path to a remote notebook in bitbucket to run as a jobI tried everything in the path and nothing is workingI keep getting this error:Run result unavailable: run failed with error message Notebook not found:Note that I already connec...

Screen Shot 2023-05-31 at 6.45.47 PM
  • 1089 Views
  • 2 replies
  • 0 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 0 kudos

Hi @mohamed harraz​ , Could you please confirm if files in repos has been enabled? https://docs.databricks.com/files/workspace.html#configure-support-for-files-in-repos.You can use the command  %sh pwd in a notebook inside a repo to check if Files in...

  • 0 kudos
1 More Replies
Dilorom
by New Contributor
  • 4998 Views
  • 5 replies
  • 3 kudos

What is a recommended directory for creating a database with a specified path?

I was going through Data Engineering with Databricks training, and in DE 3.3L - Databases, Tables & Views Lab section, it says "Defining database directories for groups of users can greatly reduce the chances of accidental data exfiltration." I agree...

  • 4998 Views
  • 5 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Dilorom A​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ...

  • 3 kudos
4 More Replies
pankajBhatt
by New Contributor II
  • 1270 Views
  • 2 replies
  • 1 kudos

Databricks not able to access latest files in Azure ADLS Gen1

I have mounted my path from Databricks to AzureADLS Gen1. using SPN as service accuntuntill yesterday everything was ok, but today I see, I can view all older deleted folders. I can not see them in ADLS . but my databricks dbutils.fs.ls() shows them....

  • 1270 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @pankaj bhatt​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so ...

  • 1 kudos
1 More Replies
tinendra
by New Contributor III
  • 2789 Views
  • 7 replies
  • 8 kudos

Can we run pandas dataframe inside databricks?

Hi, I want to run df=pd.read_csv('/dbfs/FileStore/airlines1.csv') while trying to run getting error likeFileNotFoundError: [Errno 2] No such file or directory: '/dbfs/FileStore/airlines1.csv'Could you please help me out how to run pandas dataframe in...

  • 2789 Views
  • 7 replies
  • 8 kudos
Latest Reply
Anonymous
Not applicable
  • 8 kudos

Hi @Tinendra Kumar​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

  • 8 kudos
6 More Replies
sunil_smile
by Contributor
  • 5523 Views
  • 9 replies
  • 11 kudos

Resolved! How i can add ADLS Gen2 - OAuth 2.0 as Cluster scope for my High concurrency Shared Cluster (without unity catalog)?

Hi All,Kindly help me , how i can add the ADLS gen2 OAuth 2.0 authentication to my high concurrency shared cluster. I want to scope this authentication to entire cluster not for particular notebook.Currently i have added them as spark configuration o...

image.png image
  • 5523 Views
  • 9 replies
  • 11 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 11 kudos

Hi @Sunilprasath Elangovan​ , We haven’t heard from you since the last response from @Hubert Dudek​​, and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please do share that with the community as it can b...

  • 11 kudos
8 More Replies
KKo
by Contributor III
  • 1833 Views
  • 3 replies
  • 3 kudos

delete and append in delta path

I am deleting data from curated path based on date column and appending staged data on it on each run, using below script. My fear is, just after the delete operation, if any network issue appeared and the job stopped before it appended the staged da...

  • 1833 Views
  • 3 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @Kris Koirala​ , We haven’t heard from you since the last response from @Hubert Dudek​ , and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please do share that with the community as it can be helpful ...

  • 3 kudos
2 More Replies
KrishZ
by Contributor
  • 3402 Views
  • 4 replies
  • 1 kudos

How to print the path of a .py file or a notebook?

I have stored a test.py in the dbfs at the below location "/dbfs/FileStore/shared_uploads/krishna@company.com/Project_Folder/test.py"I have a print statement in test.py which says the belowprint( os.getcwd() )and it prints the below'/databricks/drive...

  • 3402 Views
  • 4 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

Hey @Krishna Zanwar​  Please use the below code this will work and as you want the specific location you can create a custom code and format the path using a python formatter , it will give you desired result .

  • 1 kudos
3 More Replies
KVNARK
by Honored Contributor II
  • 2764 Views
  • 4 replies
  • 11 kudos

Resolved! Pyspark learning path

Can anyone suggest to take the best series of courses offered by Databricks to learn pyspark for ETL purpose either in Databricks partner learning portal or Databricks learning portal.

  • 2764 Views
  • 4 replies
  • 11 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 11 kudos

To learn Databricks ETL, I highy recommend videos made by Simon on that channel https://www.youtube.com/@AdvancingAnalytics

  • 11 kudos
3 More Replies
jd1
by New Contributor II
  • 584 Views
  • 1 replies
  • 3 kudos

Hello, When working in a python notebook and using tab-complete to navigate the file system, I find that pressing enter on a partially completed path ...

Hello,When working in a python notebook and using tab-complete to navigate the file system, I find that pressing enter on a partially completed path will add the full path to the cell in the notebook. This is annoying behaviour, since you end up with...

  • 584 Views
  • 1 replies
  • 3 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 3 kudos

Someone heard you In the experimental Monaco editor, I found this particular issue not appearing.

  • 3 kudos
Mado
by Valued Contributor II
  • 1091 Views
  • 2 replies
  • 3 kudos

When should I use ".start()" with writeStream?

Hi,I am practicing with Databricks. In sample notebooks,I have seen different use of writeStream with or without ".start()" method. Samples are below:Without .start() spark.readStream   .format("cloudFiles")   .option("cloudFiles.f...

  • 1091 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Mohammad Saber​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks

  • 3 kudos
1 More Replies
Kavin
by New Contributor II
  • 1364 Views
  • 2 replies
  • 2 kudos

Issue converting the datasets into JSON

Im a newbie to Databricks, I need to convert the data sets into JSON. i tried bth FOR JSON AUTO AND FOR JSON PATH, However im getting an issue - [PARSE_SYNTAX_ERROR] Syntax error at or near 'json'line My Query works fine without FOR JSON AUTO AND FOR...

  • 1364 Views
  • 2 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @Kavin Natarajan​, We haven’t heard from you since the last response from @Debayan Mukherjee​​, and I was checking back to see if their suggestions helped you. Or else, If you have any solution, please share it with the community, as it can be hel...

  • 2 kudos
1 More Replies
Ank
by New Contributor II
  • 836 Views
  • 2 replies
  • 2 kudos

Why am I getting a FileNotFoundError after providing the file path?

I used copy file path to get the file path of the notebook I am trying to run from another notebook.file_path = "/Users/ankur.lohiya@workday.com/PAS/Training/Ingest/TrainingQueries-Cloned.py/"ddi = DatabricksDataIngestion(file_path=file_path,        ...

  • 836 Views
  • 2 replies
  • 2 kudos
Latest Reply
Vidula
Honored Contributor
  • 2 kudos

Hello @Ankur Lohiya​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...

  • 2 kudos
1 More Replies
Labels