cancel
Showing results for 
Search instead for 
Did you mean: 
Databricks Platform Discussions
Dive into comprehensive discussions covering various aspects of the Databricks platform. Join the conversation to deepen your understanding and maximize your usage of the Databricks platform.
cancel
Showing results for 
Search instead for 
Did you mean: 

Browse the Community

Data Engineering

Join discussions on data engineering best practices, architectures, and optimization strategies with...

11813 Posts

Data Governance

Join discussions on data governance practices, compliance, and security within the Databricks Commun...

509 Posts

Generative AI

Explore discussions on generative artificial intelligence techniques and applications within the Dat...

332 Posts

Machine Learning

Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithm...

997 Posts

Warehousing & Analytics

Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Communi...

660 Posts

Activity in Databricks Platform Discussions

Suheb
by > Contributor
  • 108 Views
  • 4 replies
  • 2 kudos

How do I improve the performance of my Random Forest model on Databricks?

How can I make these people smarter or faster so the final answer is better?

  • 108 Views
  • 4 replies
  • 2 kudos
Latest Reply
jameswood32
Contributor
  • 2 kudos

Improving the performance of a Random Forest model on Databricks is usually about data quality, feature engineering, and hyperparameter tuning. Some tips:Feature Engineering:Create meaningful features and remove irrelevant ones.Encode categorical var...

  • 2 kudos
3 More Replies
jasmin_mbi
by > New Contributor
  • 155 Views
  • 2 replies
  • 1 kudos

Impossible to create classic warehouse

Hello,we have already spent surprisingly many DBUs, although we have only uploaded a few tiny tables (9 Tables with approx. 10 lines).We had the idea to change the warehouse from serverless starter warehouse to classic 2x small in order to save DBUs....

  • 155 Views
  • 2 replies
  • 1 kudos
Latest Reply
Advika
Community Manager
  • 1 kudos

Hello @jasmin_mbi! Did the suggestion shared above help resolve the issue with creating a classic SQL warehouse? If yes, please consider marking the response as the accepted solution.

  • 1 kudos
1 More Replies
Sunil_Patidar
by > New Contributor II
  • 187 Views
  • 2 replies
  • 1 kudos

Unable to read from or write to Snowflake Open Catalog via Databricks

I have Snowflake Iceberg tables whose metadata is stored in Snowflake Open Catalog. I am trying to read these tables from the Open Catalog and write back to the Open Catalog using Databricks.I have explored the available documentation but haven’t bee...

  • 187 Views
  • 2 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Greetings @Sunil_Patidar ,  Databricks and Snowflake can interoperate cleanly around Iceberg today — but how you do it matters. At a high level, interoperability works because both platforms meet at Apache Iceberg and the Iceberg REST Catalog API. Wh...

  • 1 kudos
1 More Replies
ciaran
by > New Contributor
  • 44 Views
  • 1 replies
  • 0 kudos

Is GCP Workload Identity Federation supported for BigQuery connections in Azure Databricks?

I’m trying to set up a BigQuery connection in Azure Databricks (Unity Catalog / Lakehouse Federation) using GCP Workload Identity Federation (WIF) instead of a GCP service account keyEnvironment:Azure Databricks workspaceBigQuery query federation via...

  • 44 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 0 kudos

I guess that it is only one accepted as doc say "Google service account key json"

  • 0 kudos
pavelhym
by > New Contributor
  • 54 Views
  • 1 replies
  • 1 kudos

Usage of MLFlow models inside Streamlit app in Databricks

I have an issue with loading registered MLflow model into streamlit app inside the DatabricksThis is the sample code used for model load:import mlflowfrom mlflow.tracking import MlflowClientmlflow.set_tracking_uri("databricks")mlflow.set_registry_uri...

  • 54 Views
  • 1 replies
  • 1 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 1 kudos

Authentication context isn’t automatically available in Apps. Notebooks automatically inject workspace host and token for mlflow when you use mlflow.set_tracking_uri("databricks") and mlflow.set_registry_uri("databricks-uc"). In Databricks Apps, you ...

  • 1 kudos
Suheb
by > Contributor
  • 30 Views
  • 1 replies
  • 1 kudos

How do I implement and train a custom PyTorch model on Databricks using distributed training?

How can I build my own PyTorch machine-learning model and train it faster on Databricks by using multiple machines/GPUs instead of just one?

  • 30 Views
  • 1 replies
  • 1 kudos
Latest Reply
KaushalVachhani
Databricks Employee
  • 1 kudos

@Suheb , You may look at the torch distributor. It provides multiple distributed training options, including single-node with multiple-GPU training and multi-node training. Below are the references for you. https://docs.databricks.com/aws/en/machine-...

  • 1 kudos
JothyGanesan
by > New Contributor III
  • 81 Views
  • 2 replies
  • 4 kudos

Resolved! Vacuum on DLT

We are currently using DLT tables in our target tables. The tables are getting loaded in continuous job pipelines.The liquid cluster is enabled in the tables. Will Vacuum work on these tables when it is getting loaded in continuous mode? How to run t...

  • 81 Views
  • 2 replies
  • 4 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 4 kudos

VACUUM works fine on DLT tables running in continuous mode. DLT does automatic maintenance (OPTIMIZE + VACUUM) roughly every 24 hours if the pipeline has a maintenance cluster configured. Q: The liquid cluster is enabled in the tables. Will Vacuum wo...

  • 4 kudos
1 More Replies
RodrigoE
by > New Contributor II
  • 58 Views
  • 2 replies
  • 0 kudos

Vector search index very slow

Hello,I have created a vector search index for a delta table with 1,400 rows. Using this vector index to find matching records on a table with 52M records with the query below ran for 20hrs and failed with: 'HTTP request failed with status: {"error_c...

Machine Learning
vector search index
  • 58 Views
  • 2 replies
  • 0 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 0 kudos

Hi @RodrigoE ,Your LATERAL subquery calls the Vector Search function once for every row of the 52M-row table, which results in tens of millions of remote calls to the Vector Search endpoint—this is not a nice pattern and will be extremely slow leadin...

  • 0 kudos
1 More Replies
DylanStout
by > Contributor
  • 221 Views
  • 4 replies
  • 1 kudos

Catalog tag filter error

When trying to filter in the catalog on "Tag", it throws an error that it failed to load values:The other filters do load:I have tried it with different computes and I have a view that has a tag (as shown in the screenshot).I have the following privi...

DylanStout_0-1764581602517.png DylanStout_1-1764581693590.png DylanStout_2-1764581879449.png
  • 221 Views
  • 4 replies
  • 1 kudos
Latest Reply
Advika
Community Manager
  • 1 kudos

Hello @DylanStout! Did the suggestions shared above help resolve your concern? If so, please consider marking the response as the accepted solution.If you found a different approach that worked, sharing it would be helpful for others in the community...

  • 1 kudos
3 More Replies
D_Science
by > New Contributor
  • 68 Views
  • 1 replies
  • 1 kudos

Local LLM's available in Databricks for email classification

Hello everyone,I am currently working on an email classification model in Azure Databricks. Since I work for an international company, the emails contain PII data. Because of this, I need to be very careful about compliance and data privacy, especial...

  • 68 Views
  • 1 replies
  • 1 kudos
Latest Reply
emma_s
Databricks Employee
  • 1 kudos

Hi, It is absolutely acceptable. Here are some details that you may want to consider. I'd also think about GPU availability in your cloud and region and whether there is GPU available for you to deploy these models to. You should be able to easily te...

  • 1 kudos
TFV
by > New Contributor
  • 105 Views
  • 1 replies
  • 1 kudos

Regression: Dashboard slicer paste now commits invalid filter values instead of searching

Hi Team,We appear to be experiencing a recent regression in the AI/BI dashboard filter slicer behaviour.Steps to reproduceOpen a dashboard containing a single-select or multi-select filter slicer.Click into the slicer’s text input.Paste text from the...

  • 105 Views
  • 1 replies
  • 1 kudos
Latest Reply
emma_s
Databricks Employee
  • 1 kudos

Hi Tim, I can't find any mention of this internally. But I suspect it will be related to this change  Multi-select filter paste: Viewers can now copy a column of values from a spreadsheet and paste them into a multi-select filter. My recommendation w...

  • 1 kudos
sher_1222
by > New Contributor
  • 114 Views
  • 3 replies
  • 0 kudos

Data Ingestions errors

I was going to ingestion Data from website to databricks but it is showing Public DBFS is not enableb message. is there any other way to automate data ingestion to databricks?

  • 114 Views
  • 3 replies
  • 0 kudos
Latest Reply
saurabh18cs
Honored Contributor II
  • 0 kudos

Hi @sher_1222 yes you can upload to cloud storage and then connect using unity catalog: Connect to cloud object storage using Unity Catalog - Azure Databricks | Microsoft Learnand then use What is Auto Loader? | Databricks on AWS to automatically ing...

  • 0 kudos
2 More Replies
dj4
by > New Contributor
  • 150 Views
  • 3 replies
  • 1 kudos

Azure Databricks UI consuming way too much memory & laggy

This especially happens when the notebook is large with many cells. Even if I clear all the outputs scrolling the notebook is way too laggy. When I start running the code the memory consumption is 3-4GB minimum even if I am not displaying any data/ta...

  • 150 Views
  • 3 replies
  • 1 kudos
Latest Reply
emma_s
Databricks Employee
  • 1 kudos

Hi, these are teh recommended troubleshooting steps we have: Troubleshooting & Immediate Workarounds Browser Recommendations: Use an incognito/private window to avoid interference from browser extensions/ad blockers.Monitor memory consumption; close...

  • 1 kudos
2 More Replies
bek04
by > New Contributor
  • 122 Views
  • 3 replies
  • 0 kudos

Serverless notebook DNS failure (gai error / name resolution)

I’m using a Databricks workspace on AWS (region: us-west-2). My Serverless notebook (CPU) cannot access any external URL — every outbound request fails at DNS resolution.Minimal test in a notebook:import urllib.requesturllib.request.urlopen("https://...

  • 122 Views
  • 3 replies
  • 0 kudos
Latest Reply
emma_s
Databricks Employee
  • 0 kudos

Hi, Here are some troubleshooting steps: 1. Network Connectivity Configuration (NCC) Confirm that the correct NCC (such as ncc_public_internet) is attached specifically to Serverless compute, not just to SQL Warehouses or other resources.After making...

  • 0 kudos
2 More Replies
confused_dev
by > New Contributor II
  • 43582 Views
  • 8 replies
  • 5 kudos

Python mocking dbutils in unittests

I am trying to write some unittests using pytest, but I am coming accross the problem of how to mock my dbutils method when dbutils isn't being defined in my notebook.Is there a way to do this so that I can unit test individual functions that are uti...

  • 43582 Views
  • 8 replies
  • 5 kudos
Latest Reply
kenmyers-8451
Contributor
  • 5 kudos

If this helps anyone here is how we do this:We rely on databricks_test for injecting dbutils into the notebooks that we're testing (which is a 3rd party package mind you and hasn't been updated in a while but still works). And in our notebooks we put...

  • 5 kudos
7 More Replies