Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
After making some changes in my feature branch, I have committed and pushed (to Azure Devops) some work (note I have not yet raised a PR or merge to any other branch). Many of the files I committed are data files and so I would like to reverse the co...
Trying to use pdf2image on databricks, but its failing with "PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?"I've installed pdf2image & poppler-utils by running the following in a cell:%pip install pdf2image%pip ...
Seems like this thread has died, but for posterity, databricks provides the following code for installing poppler on a cluster. The code is sourced from the dbdemos accelerators, specifically the "LLM Chatbot With Retrieval Augmented Generation (RAG)...
Hi Everyone,I'm planning to use databricks python cli "install_libraries"can some one pls post examples on function install_libraries https://github.com/databricks/databricks-cli/blob/main/databricks_cli/libraries/api.py
Here you go using Python SDKfrom databricks.sdk import WorkspaceClientfrom databricks.sdk.service import computew = WorkspaceClient(host="yourhost", token="yourtoken")# Create an array of Library objects to be installedlibraries_to_install = [compute...
Yes, still illegal. And I also don’t understand why it is equated with drugs, but alcohol is not! Not a single murder has yet been committed under cannabis, not a single war has been unleashed. It's just that people who don't use don't understand how...
You are absolutely right! I have found it to be a big relief medically. I have nerve conditions which is not operable. The legal medical pills almost literally killed me, and if it wasn't for my husband's quick thinking, I wouldn't be here to share t...
Hi @THIAM HUAT TAN Thank you for reaching out, and we’re sorry to hear about this log-in issue! We have this Community Edition login troubleshooting post on Community. Please take a look, and follow the troubleshooting steps. If the steps do not res...
When I use dbutils.secrets.get in my code, spaces in the log are replaced by "[REDACTED]" literal. This is very annoying and makes the log reading difficult. Any idea how to avoid this?See my screenshot...
I ran into the same issue and found that the reason was that the notebook included some test keys with values of "A" and "B" for simple testing. I noticed that any string with a substring of "A" or "B" was "[REDACTED]".So, in my case, it was an eas...
Hello @asdf fdsa ,The NodeJS connector is built for NodeJS environment it will not integrate ReactJSFor cases where a web execution is needed we advise to use SQL Exec APIPlease check documentation here for the same:https://docs.databricks.com/sql/a...
Hi! I currently have this as an old generic template with amends over time to optimize Databricks Spark execution, can you help me to know if this still makes sense for v10-11-12 or if there are new recommendations? Maybe some of this is making my pr...
@Alejandro Martinez :Hi! Your template seems to be a good starting point for configuring a SparkSession in Databricks. However, there are some new recommendations that you can consider for Databricks runtime versions v10-11-12. Here are some suggest...
I have a custom application/executable that I upload to DBFS and transfer to my cluster's local storage for execution. I want to call multiple instances of this application in parallel, which I've only been able to successfully do with Python's subpr...
Autoscaling works for spark jobs only. It works by monitoring the job queue, which python code won't go into. If it's just python code, try single node.https://docs.databricks.com/clusters/configure.html#cluster-size-and-autoscaling
Hi all,I would like to improve the way I use JDBC credenditial information (ID/PW, host, port, etc)Where do you guys usually store and use the jdbc credentials?Thanks for your help in advance!
Hi @Kwangwon Yi Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...
Hi team, we tried to use the proxy options for BigQuery Spark connector as mentioned in this documentation. However, we keep getting "connect timed out" error. The proxy host is working on our end. This made us wonder if by chance Databricks does not...
Hi @Ayushi Pandey Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...
Hey there Community!! I have a client that will produce a CSV file daily that needs to be moved from Bronze -> Silver. Unfortunately, this source file will always be a full set of data....not incremental. I was thinking of using AutoLoader/cloudFil...
I "up voted'" all of @werners suggestions b/c they are all very valid ways of addressing my need (the true power/flexibility of the Databricks UDAP!!!). However, turns out I'm going to end up getting incremental data afterall :). So now the flow wi...
I've been trying to use the HiveMetastoreClient class in Scala to extract some metadata from Databricks internal Metastore, without success. I'm currently using the 7.3 LTS runtime.The error seems to be related to some kind of inconsistency between...
Thanks for the reference, @Atanu Sarkar .Seems a little odd to me that I'd need to change the internal Databricks Metastore table to add a column expected by the client default Scala client. I'm afraid this could cause issues with other users/jobs ...
Hello. I'm trying to connect to Databricks from my IDE (PyCharm) and then run delta table queries from there. However, the cluster I'm trying to access has to give me permission. In this case, I'd go to my cluster, run the cell which gives me permiss...
"I'm trying to connect to Databricks from my IDE (PyCharm) and then run delta table queries from there."If you are going to deploy later your code to databricks the only solutions which I see is to use databricks-connect or just make development envi...