cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

kro
by New Contributor II
  • 612 Views
  • 2 replies
  • 2 kudos

OCRmyPDF in Databricks

Hello,Do any of you have experience with using OCRmyPDF in Databricks? I have tried to install it in various was with different versions, but my notebook keep crashing with the error:The Python process exited with exit code 139 (SIGSEGV: Segmentation...

Get Started Discussions
ocr
ocrmypdf
pdf
segmentation fault
tesseract
  • 612 Views
  • 2 replies
  • 2 kudos
Latest Reply
sridharplv
Contributor
  • 2 kudos

Refer to this link too https://community.databricks.com/t5/data-engineering/pdf-parsing-in-notebook/td-p/14636

  • 2 kudos
1 More Replies
EllaClark
by New Contributor II
  • 417 Views
  • 2 replies
  • 0 kudos

Can I automate notebook tagging based on workspace folder structure?

Hi all,I’m currently organizing a growing number of notebooks in our Databricks workspace and trying to keep things manageable with proper tagging and metadata. One idea I had was to automatically apply tags to notebooks based on their folder structu...

  • 417 Views
  • 2 replies
  • 0 kudos
Latest Reply
Renu_
New Contributor III
  • 0 kudos

Hi @EllaClark, Yes, you can automate tagging of Databricks notebooks based on folder structure using the REST API and a script. Use the Workspace API to list notebook paths, extract folder names, and treat them as tags.If the API supports metadata up...

  • 0 kudos
1 More Replies
Kuchnhi
by New Contributor III
  • 783 Views
  • 10 replies
  • 6 kudos

Facing issues while upgrading DBR version from 9.1 LTS to 15.4 LTS

Dear all,I am upgrading DBR version from 9.1 LTS to 15.4 LTS in Azure Databricks. for that I have created a new cluster with 15.4 DBR attached init script for installing application dependencies. Cluster has started successfully but it takes 30 min. ...

  • 783 Views
  • 10 replies
  • 6 kudos
Latest Reply
SmithPoll
New Contributor II
  • 6 kudos

Hey, this error usually happens when the cluster isn't fully ready before your application starts running. Since your init script takes about 30 minutes, it’s likely that your job starts before all dependencies are properly installed. The ModuleNotFo...

  • 6 kudos
9 More Replies
Kabi
by New Contributor II
  • 211 Views
  • 1 replies
  • 1 kudos

Resolved! Simple notebook sync

Hi, is there a simple way to sync a local notebook with a Databricks notebook? For example, is it possible to just connect to the Databricks kernel or something similar?I know there are IDE extensions for this, but unfortunately, they use the local d...

  • 211 Views
  • 1 replies
  • 1 kudos
Latest Reply
Renu_
New Contributor III
  • 1 kudos

Hi @Kabi, as of my knowledge databricks doesn’t support directly connecting to Databricks kernel. However, here are practical ways to sync your local notebook with Databricks:You can use Git to version control your notebooks. Clone your repo into Dat...

  • 1 kudos
Mani2105
by New Contributor II
  • 148 Views
  • 1 replies
  • 1 kudos

Databricks Dashboard ,passing Prompt Values from one page to another

HI Guys,I have a dashboard with main page where I have a base query and added a  date time range widget and linked it to filter the base query , Now I have a Page 2 where i use a a different sumamrized query as a source , base query2 . I need this qu...

  • 148 Views
  • 1 replies
  • 1 kudos
Latest Reply
Renu_
New Contributor III
  • 1 kudos

Hi @Mani2105, I guess currently, Databricks dashboards don’t support sharing widget parameters like date range filters across pages. Each page is isolated, so filters must be recreated manually per page. Manual configuration remains the only way to m...

  • 1 kudos
Judith
by New Contributor III
  • 1045 Views
  • 1 replies
  • 1 kudos

Connect to Onelake using Service Principal, Unity Catalog and Databricks Access Connector

We are trying to connect Databricks to OneLake, to read data from a Fabric workspace into Databricks, using a notebook. We also use Unity Catalog. We are able to read data from the workspace with a Service Principal like this:from pyspark.sql.types i...

Judith_0-1739892045239.png Judith_1-1739891020619.png
  • 1045 Views
  • 1 replies
  • 1 kudos
Latest Reply
behema1074
New Contributor II
  • 1 kudos

Hi, I am facing the same problem. Have you already been able to solve the problem?

  • 1 kudos
htd350
by New Contributor II
  • 279 Views
  • 1 replies
  • 2 kudos

Resolved! Cluster by auto pyspark

I can find documentation to enable automatic liquid clustering with SQL code: CLUSTER BY AUTO. But how do I do this with Pyspark? I know I can do it with spark.sql("ALTER TABLE CLUSTER BY AUTO") but ideally I want to pass it as an .option().Thanks in...

  • 279 Views
  • 1 replies
  • 2 kudos
Latest Reply
BigRoux
Databricks Employee
  • 2 kudos

To enable automatic liquid clustering with PySpark and pass it as an `.option()` during table creation or modification, you currently cannot directly use a `.clusterBy("AUTO")` method in PySpark's `DataFrameWriter` API. However, there are workarounds...

  • 2 kudos
Rjdudley
by Honored Contributor
  • 224 Views
  • 2 replies
  • 1 kudos

Asinine bad word detection

Are you kidding me here--I couldn't post this reply because (see arrows because I can't say the words)?  I've run afoul of this several times before, bad word detection was a solved problem in the 1990s and there is even a term for errors like this--...

Rjdudley_1-1744315224899.png
  • 224 Views
  • 2 replies
  • 1 kudos
Latest Reply
Advika
Databricks Employee
  • 1 kudos

Hello @Rjdudley! Thank you for bringing this to our attention. We understand how frustrating it can be to have your message incorrectly flagged, especially when you're contributing meaningfully. While our filters are in place to maintain a safe space...

  • 1 kudos
1 More Replies
tw1
by New Contributor III
  • 238 Views
  • 5 replies
  • 1 kudos

AI/BI Dashboard - Hide Column in Table Visualization, but not in exported data

How can I hide specific colum from a table visualization, but not in the exported data.I have over 200 columns in my query result and the ui freeze while I want to show it in a table visualization. So I want to hide specific columns, but if I export ...

  • 238 Views
  • 5 replies
  • 1 kudos
Latest Reply
tw1
New Contributor III
  • 1 kudos

.

  • 1 kudos
4 More Replies
tejas8196
by New Contributor II
  • 1807 Views
  • 3 replies
  • 0 kudos

DAB not updating zone_id when redeployed

Hey folks,Facing an issue with zone_id not getting overridden when redeploying the DAB template to Databricks workspace.The Databricks job is already deployed and has "ap-south-1a" zone_id. I wanted to make it "auto" so, I have made the changes to th...

Screenshot 2024-06-05 at 12.40.57 AM.png
Get Started Discussions
data engineering
  • 1807 Views
  • 3 replies
  • 0 kudos
Latest Reply
KungFuMaster
New Contributor II
  • 0 kudos

Hello. I had similar issue when using DAB in CI/CD but able to fixed the issue. 

  • 0 kudos
2 More Replies
mancosta
by New Contributor
  • 174 Views
  • 0 replies
  • 0 kudos

Joblib with optuna and SB3

Hi everyone,I am training some reinforcement learning models and I am trying to automate the hyperparameter search using optuna. I saw in the documentation that you can use joblib with spark as a backend to train in paralel. I got that working with t...

  • 174 Views
  • 0 replies
  • 0 kudos
ChristianRRL
by Valued Contributor
  • 533 Views
  • 2 replies
  • 1 kudos

Resolved! DBX Community Pending Answers

Hi there, in the past I've posted questions in this community and I would consistently get responses back in a very reasonable time frame. Typically I think most of my posts have an initial response back within 1-2 days, or just a few days (I don't t...

  • 533 Views
  • 2 replies
  • 1 kudos
Latest Reply
ChristianRRL
Valued Contributor
  • 1 kudos

Thank you for clarifying. I know some questions may be a bit more technical, but I hope I get some feedback/suggestions, particularly to my UMF Best Practice question!

  • 1 kudos
1 More Replies
sys08001
by New Contributor II
  • 380 Views
  • 1 replies
  • 1 kudos

Resolved! Is there a way to iterate over a combination of parameters using a "for each" task?

Hi,I have a notebook with two input widgets set up ("current_month" and "current_year") that the notebook grabs values from and uses for processing. I want to be able to provide a list of input values in the "for each" task where each value is actual...

  • 380 Views
  • 1 replies
  • 1 kudos
Latest Reply
ashraf1395
Honored Contributor
  • 1 kudos

Hi there @sys08001 , yup it is possible you can pass the input values for the for_each task in json format Somewhat like this[ { "tableName": "product_2", "id": "1", "names": "John Doe", "created_at": "2025-02-22T10:00:00.000Z" },...

  • 1 kudos
rodneyc8063
by New Contributor II
  • 691 Views
  • 2 replies
  • 0 kudos

Azure Databricks - Databricks AI Assistant missing on Azure Student Subscription?

I am going through a course learning Azure Databricks and I had created a new Azure Databricks Workspace. I am the owner of the subscription and created everything so I assume I should have full admin rights.The following is my set up-Azure Student S...

1databricks.jpg 2databricks.jpg databricks.jpg
  • 691 Views
  • 2 replies
  • 0 kudos
Latest Reply
Takuya-Omi
Valued Contributor III
  • 0 kudos

@rodneyc8063 According to Azure’s documentation, it states:Tip:Admins: If you’re unable to enable Databricks Assistant, you might need to disable the "Enforce data processing within workspace Geography for Designated Services" setting. See “For an ac...

  • 0 kudos
1 More Replies
Prashanthkumar
by New Contributor III
  • 7856 Views
  • 12 replies
  • 2 kudos

Is it possible to view Databricks cluster metrics using REST API

I am looking for some help on getting databricks cluster metrics such as memory utilization, CPU utilization, memory swap utilization, free file system using REST API.I am trying it in postman using databricks token and with my Service Principal bear...

Prashanthkumar_0-1705104529507.png
  • 7856 Views
  • 12 replies
  • 2 kudos
Latest Reply
Prashanthkumar
New Contributor III
  • 2 kudos

Thank you Walter, will keep an eye.

  • 2 kudos
11 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels