cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

abin-bcgov
by New Contributor III
  • 2218 Views
  • 4 replies
  • 4 kudos

Resolved! using Azure Databricks vs using Databricks directly

Hi friends,A quick question regarding how data, workspace controls works while using "Azure Databricks". I am planning to use Azure Databricks that comes as part of my employer's Azure Subscriptions. I work for a Public sector organization, which is ...

  • 2218 Views
  • 4 replies
  • 4 kudos
Latest Reply
abin-bcgov
New Contributor III
  • 4 kudos

Thanks a ton, @SP_6721 

  • 4 kudos
3 More Replies
MLEngineer
by New Contributor
  • 613 Views
  • 1 replies
  • 0 kudos

Right course for ML engineer

Hi I would like to learn databricks so that I could look for job opportunities as a ML engineer. I have background with python programming, computer vision (OpenCV) .not having much of experience with azure , aws so on.which course here is good with ...

  • 613 Views
  • 1 replies
  • 0 kudos
Latest Reply
pedrotramos97
Databricks Employee
  • 0 kudos

Given your background in Python programming and computer vision but limited experience with cloud platforms, the best pathway to enter the job market as MLE using Databricks is to pursue the Databricks Certified Machine Learning Associate certificati...

  • 0 kudos
VaderK
by New Contributor
  • 3542 Views
  • 1 replies
  • 1 kudos

Resolved! Why does .collect() cause a shuffle while .show() does not?

I’m learning Spark using the book Spark: The Definitive Guide and came across some behavior I’m trying to understand.I am reading a csv_file which has 3 columns: DEST_COUNTRY_NAME, ORIGIN_COUNTRY_NAME, count. The dataset has a total of 256 rows.Here’...

show.png collect.png
Get Started Discussions
collect
pyspark
shuffle
  • 3542 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

Q1: collect() moves all data to the driver, hence a shufle. show() just shows x records from the df, from a partition (or more partitions if x > partition size).  No shuffling needed.For display purposes the results are of course gathered on the driv...

  • 1 kudos
aniket07
by New Contributor II
  • 1954 Views
  • 2 replies
  • 2 kudos

Lazy evaluation in serverless vs all purpose compute ?

As you can see right now I am connected to serverless compute and when I give wrong path, spark does lazy evaluation and gives error on display. However, when I switch from serverless to my all purpose cluster I get the error when I create the df its...

aniket07_0-1744691152378.png aniket07_1-1744691251247.png aniket07_2-1744691310065.png
  • 1954 Views
  • 2 replies
  • 2 kudos
Latest Reply
sridharplv
Valued Contributor II
  • 2 kudos

Based on the scenario, what https://community.databricks.com/t5/user/viewprofilepage/user-id/156441 saying is correct though the eager evaluation property is false in both cases and for All-Purpose clusters, Spark is checking the path immediately whe...

  • 2 kudos
1 More Replies
tommyhmt
by New Contributor II
  • 892 Views
  • 1 replies
  • 0 kudos

Unable to access external table created by DLT

I originally set the Storage location in my DLT as abfss://{container}@{storageaccount}.dfs.core.windows.net/...But when running the DLT I got the following error:So I decided to leave the above Storage location blank and define the path parameter in...

image.png image.png image.png image.png
  • 892 Views
  • 1 replies
  • 0 kudos
Latest Reply
brockb
Databricks Employee
  • 0 kudos

Hi @Tommy , Thanks for your question. I would encourage you to verify once using a Pro SQL Warehouse temporarily instead of a Serverless SQL Warehouse given the compute differences between the two - Pro compute resides in your data plane, Serverless ...

  • 0 kudos
kro
by New Contributor II
  • 2612 Views
  • 2 replies
  • 2 kudos

OCRmyPDF in Databricks

Hello,Do any of you have experience with using OCRmyPDF in Databricks? I have tried to install it in various was with different versions, but my notebook keep crashing with the error:The Python process exited with exit code 139 (SIGSEGV: Segmentation...

Get Started Discussions
ocr
ocrmypdf
pdf
segmentation fault
tesseract
  • 2612 Views
  • 2 replies
  • 2 kudos
Latest Reply
sridharplv
Valued Contributor II
  • 2 kudos

Refer to this link too https://community.databricks.com/t5/data-engineering/pdf-parsing-in-notebook/td-p/14636

  • 2 kudos
1 More Replies
EllaClark
by New Contributor II
  • 3476 Views
  • 2 replies
  • 0 kudos

Can I automate notebook tagging based on workspace folder structure?

Hi all,I’m currently organizing a growing number of notebooks in our Databricks workspace and trying to keep things manageable with proper tagging and metadata. One idea I had was to automatically apply tags to notebooks based on their folder structu...

  • 3476 Views
  • 2 replies
  • 0 kudos
Latest Reply
Renu_
Valued Contributor II
  • 0 kudos

Hi @EllaClark, Yes, you can automate tagging of Databricks notebooks based on folder structure using the REST API and a script. Use the Workspace API to list notebook paths, extract folder names, and treat them as tags.If the API supports metadata up...

  • 0 kudos
1 More Replies
Kabi
by New Contributor III
  • 990 Views
  • 1 replies
  • 1 kudos

Resolved! Simple notebook sync

Hi, is there a simple way to sync a local notebook with a Databricks notebook? For example, is it possible to just connect to the Databricks kernel or something similar?I know there are IDE extensions for this, but unfortunately, they use the local d...

  • 990 Views
  • 1 replies
  • 1 kudos
Latest Reply
Renu_
Valued Contributor II
  • 1 kudos

Hi @Kabi, as of my knowledge databricks doesn’t support directly connecting to Databricks kernel. However, here are practical ways to sync your local notebook with Databricks:You can use Git to version control your notebooks. Clone your repo into Dat...

  • 1 kudos
Mani2105
by New Contributor II
  • 607 Views
  • 1 replies
  • 1 kudos

Databricks Dashboard ,passing Prompt Values from one page to another

HI Guys,I have a dashboard with main page where I have a base query and added a  date time range widget and linked it to filter the base query , Now I have a Page 2 where i use a a different sumamrized query as a source , base query2 . I need this qu...

  • 607 Views
  • 1 replies
  • 1 kudos
Latest Reply
Renu_
Valued Contributor II
  • 1 kudos

Hi @Mani2105, I guess currently, Databricks dashboards don’t support sharing widget parameters like date range filters across pages. Each page is isolated, so filters must be recreated manually per page. Manual configuration remains the only way to m...

  • 1 kudos
Rjdudley
by Honored Contributor
  • 1201 Views
  • 2 replies
  • 1 kudos

Asinine bad word detection

Are you kidding me here--I couldn't post this reply because (see arrows because I can't say the words)?  I've run afoul of this several times before, bad word detection was a solved problem in the 1990s and there is even a term for errors like this--...

Rjdudley_1-1744315224899.png
  • 1201 Views
  • 2 replies
  • 1 kudos
Latest Reply
Advika
Databricks Employee
  • 1 kudos

Hello @Rjdudley! Thank you for bringing this to our attention. We understand how frustrating it can be to have your message incorrectly flagged, especially when you're contributing meaningfully. While our filters are in place to maintain a safe space...

  • 1 kudos
1 More Replies
tw1
by New Contributor III
  • 1171 Views
  • 5 replies
  • 1 kudos

AI/BI Dashboard - Hide Column in Table Visualization, but not in exported data

How can I hide specific colum from a table visualization, but not in the exported data.I have over 200 columns in my query result and the ui freeze while I want to show it in a table visualization. So I want to hide specific columns, but if I export ...

  • 1171 Views
  • 5 replies
  • 1 kudos
Latest Reply
tw1
New Contributor III
  • 1 kudos

.

  • 1 kudos
4 More Replies
tejas8196
by New Contributor II
  • 2815 Views
  • 3 replies
  • 0 kudos

DAB not updating zone_id when redeployed

Hey folks,Facing an issue with zone_id not getting overridden when redeploying the DAB template to Databricks workspace.The Databricks job is already deployed and has "ap-south-1a" zone_id. I wanted to make it "auto" so, I have made the changes to th...

Screenshot 2024-06-05 at 12.40.57 AM.png
Get Started Discussions
data engineering
  • 2815 Views
  • 3 replies
  • 0 kudos
Latest Reply
KungFuMaster
New Contributor II
  • 0 kudos

Hello. I had similar issue when using DAB in CI/CD but able to fixed the issue. 

  • 0 kudos
2 More Replies
ChristianRRL
by Valued Contributor III
  • 2177 Views
  • 2 replies
  • 1 kudos

Resolved! DBX Community Pending Answers

Hi there, in the past I've posted questions in this community and I would consistently get responses back in a very reasonable time frame. Typically I think most of my posts have an initial response back within 1-2 days, or just a few days (I don't t...

  • 2177 Views
  • 2 replies
  • 1 kudos
Latest Reply
ChristianRRL
Valued Contributor III
  • 1 kudos

Thank you for clarifying. I know some questions may be a bit more technical, but I hope I get some feedback/suggestions, particularly to my UMF Best Practice question!

  • 1 kudos
1 More Replies
sys08001
by New Contributor II
  • 1036 Views
  • 1 replies
  • 1 kudos

Resolved! Is there a way to iterate over a combination of parameters using a "for each" task?

Hi,I have a notebook with two input widgets set up ("current_month" and "current_year") that the notebook grabs values from and uses for processing. I want to be able to provide a list of input values in the "for each" task where each value is actual...

  • 1036 Views
  • 1 replies
  • 1 kudos
Latest Reply
ashraf1395
Honored Contributor
  • 1 kudos

Hi there @sys08001 , yup it is possible you can pass the input values for the for_each task in json format Somewhat like this[ { "tableName": "product_2", "id": "1", "names": "John Doe", "created_at": "2025-02-22T10:00:00.000Z" },...

  • 1 kudos
rodneyc8063
by New Contributor II
  • 2127 Views
  • 2 replies
  • 0 kudos

Azure Databricks - Databricks AI Assistant missing on Azure Student Subscription?

I am going through a course learning Azure Databricks and I had created a new Azure Databricks Workspace. I am the owner of the subscription and created everything so I assume I should have full admin rights.The following is my set up-Azure Student S...

1databricks.jpg 2databricks.jpg databricks.jpg
  • 2127 Views
  • 2 replies
  • 0 kudos
Latest Reply
Takuya-Omi
Valued Contributor III
  • 0 kudos

@rodneyc8063 According to Azure’s documentation, it states:Tip:Admins: If you’re unable to enable Databricks Assistant, you might need to disable the "Enforce data processing within workspace Geography for Designated Services" setting. See “For an ac...

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels