cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

jeremy98
by Honored Contributor
  • 0 Views
  • 0 replies
  • 0 kudos

how to speed up inference?

Hi guys,I'm new to this concept, but we have several ML models that follow the same structure from the code. What I don’t fully understand is how to handle different types of models efficiently — right now, I need to loop through my items to get the ...

  • 0 Views
  • 0 replies
  • 0 kudos
spearitchmeta
by Contributor
  • 22 Views
  • 1 replies
  • 0 kudos

How does Databricks AutoML handle null imputation for categorical features by default?

Hi everyone I’m using Databricks AutoML (classification workflow) on Databricks Runtime 10.4 LTS ML+, and I’d like to clarify how missing (null) values are handled for categorical (string) columns by default.From the AutoML documentation, I see that:...

  • 22 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Hello @spearitchmeta , I looked internally to see if I could help with this and I found some information that will shed light on your question.   Here’s how missing (null) values in categorical (string) columns are handled in Databricks AutoML on Dat...

  • 0 kudos
VivekWV
by Visitor
  • 31 Views
  • 1 replies
  • 0 kudos

Safe Update Strategy for Online Feature Store Without Endpoint Disruption

Hi Team,We are implementing Databricks Online Feature Store using Lakebase architecture and have run into some constraints during development:Requirements:Deploy an offline table as a synced online table and create a feature spec that queries from th...

  • 31 Views
  • 1 replies
  • 0 kudos
Latest Reply
VivekWV
Visitor
  • 0 kudos

Additional Context:The feature spec created from the synced table is served through an endpoint, and we need to keep the same endpoint URL for consumers.After schema changes, we currently recreate the synced table and feature spec with the same names...

  • 0 kudos
AlbertWang
by Valued Contributor
  • 2599 Views
  • 1 replies
  • 1 kudos

Can I Replicate Azure Document Intelligence's Custom Table Extraction in Databricks?

I am using Azure Document Intelligence to get data from a table in a PDF file. The table's headers do not visually align with the values. Therefore, the standard and pre-built models cannot correctly read the data.I have built a custom-trained Azure ...

  • 2599 Views
  • 1 replies
  • 1 kudos
Latest Reply
dkushari
Databricks Employee
  • 1 kudos

Hi @AlbertWang, you can easily achieve this using AgenBricks - Information Extraction. Your PDFs will be converted to text using the ai_parse_document function and saved in a Databricks table. You can then create the agent using that text table to ge...

  • 1 kudos
MightyMasdo
by New Contributor III
  • 3108 Views
  • 3 replies
  • 7 kudos

Spark context not implemented Error when using Databricks connect

I am developing an application using databricks connect and when I try to use VectorAssembler I get the Error sc is not none Assertion Error. is there a workaround for this ?

  • 3108 Views
  • 3 replies
  • 7 kudos
Latest Reply
pibe1
New Contributor II
  • 7 kudos

Ran into exactly the same issue as @Łukasz1 After some googling, I found this SO post explaining the issue: later versions of databricks connect no longer support the SparkContext API. Our code is failing because the underlying library is trying to f...

  • 7 kudos
2 More Replies
tarunnagar
by New Contributor II
  • 189 Views
  • 1 replies
  • 1 kudos

Best Practices for Collaborative Notebook Development in Databricks

Hi everyone! I’m looking to learn more about effective strategies for collaborative development in Databricks notebooks. Since notebooks are often used by multiple data scientists, analysts, and engineers, managing collaboration efficiently is critic...

  • 189 Views
  • 1 replies
  • 1 kudos
Latest Reply
AbhaySingh
New Contributor II
  • 1 kudos

For version control, use this approach.Git Integration with Databricks ReposCore Features:Databricks Git Folders (Repos) provides native Git integration with visual UI and REST API access Supports all major providers: GitHub, GitLab, Azure DevOps, Bi...

  • 1 kudos
gg5
by New Contributor II
  • 2173 Views
  • 4 replies
  • 2 kudos

Resolved! Unable to Access Delta View from Azure Machine Learning via Delta Sharing – Is View Access Supported

Unable to Access Delta View from Azure Machine Learning via Delta Sharing – Is View Access Supported?I am able to access the tables but while accessing the view I am getting below error.Response from server: { 'details': [ { '@type': 'type.googleapis...

  • 2173 Views
  • 4 replies
  • 2 kudos
Latest Reply
ericwang52
New Contributor II
  • 2 kudos

View sharing is supported (launched GA) in Databricks. See https://docs.databricks.com/aws/en/delta-sharing/create-share#add-views-to-a-share. You likely need a workspace id override. Creating the recipient from a workspace with proper access and res...

  • 2 kudos
3 More Replies
juandados
by New Contributor
  • 215 Views
  • 1 replies
  • 0 kudos

GenAI experiment tracing does not render markdown images

When traces include base64 encoded images in Markdown, they do not render properly. This makes the analysis of traces including images difficult.Just for context, the same trace in other tracing tools like LangSmith renders as expected. An example of...

not expanded.png expanded.png
  • 215 Views
  • 1 replies
  • 0 kudos
Latest Reply
sarahbhord
Databricks Employee
  • 0 kudos

Thank you for the for the flag juandados! I will ping my product team to get a timeline for you.

  • 0 kudos
ostae911
by New Contributor
  • 749 Views
  • 1 replies
  • 1 kudos

AutoML Forecast fails when using feature_store_lookups with timestamp key

We are running AutoML Forecast on Databricks Runtime 15.4 ML LTS and 16.4 ML LTS, using a time series dataset with temporal covariates from the Feature Store (e.g. a corona_dummy feature). We use feature_store_lookups with lookup_key and timestamp_lo...

  • 749 Views
  • 1 replies
  • 1 kudos
Latest Reply
jamesl
Databricks Employee
  • 1 kudos

Hi @ostae911 , are you still facing this issue? It looks like your usage of the timestamp column is correct. It can be used as a primary key on the time series feature table. Is it possible that there are other duplicate columns between the training ...

  • 1 kudos
prashant_089
by New Contributor II
  • 1399 Views
  • 3 replies
  • 1 kudos

Resolved! Serving Endpoint Disappears After One Day

I'm encountering an issue where a serving endpoint I create disappears from the list of serving endpoints after a day. This has happened both when I created the endpoint from the Databricks UI and using the Databricks SDK.

  • 1399 Views
  • 3 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Hey @prashant_089 , what you are experiencing should not happen on its own except for some extremely outlying circumstanctes. IF YOU ARE USING Databricks Free Edition you shold ignore everything below. Here are some troubleshooting suggestions/tips: ...

  • 1 kudos
2 More Replies
AmineM
by New Contributor II
  • 2187 Views
  • 3 replies
  • 0 kudos

Resolved! Problem loading a pyfunc model in job run

Hi, I'm currently working on a automated job to predict forecasts using a notebook than work just fine when I run it manually, but keep failling when schedueled, here is my code: import mlflow # Load model as a PyFuncModel. loaded_model = mlflow.pyf...

  • 2187 Views
  • 3 replies
  • 0 kudos
Latest Reply
sarahbhord
Databricks Employee
  • 0 kudos

Hey AmineM! If your MLflow model loads fine in a Databricks notebook but fails in a scheduled job on serverless compute with an error like:   TypeError: code() argument 13 must be str, not int   the root cause is almost always a mismatch between the ...

  • 0 kudos
2 More Replies
excavator-matt
by New Contributor III
  • 1018 Views
  • 4 replies
  • 2 kudos

Resolved! What is the most efficient way of running sentence-transformers on a Spark DataFrame column?

We're trying to run the bundled sentence-transformers library from SBert in a notebook running Databricks ML 16.4 on an AWS g4dn.2xlarge [T4] instance.However, we're experiencing out of memory crashes and are wondering what the optimal to run sentenc...

Machine Learning
memory issues
sentence-transformers
vector embeddings
  • 1018 Views
  • 4 replies
  • 2 kudos
Latest Reply
jamesl
Databricks Employee
  • 2 kudos

If you didn't get this to work with Pandas API on Spark, you might also try importing and instantiating the SentenceTransformer model inside the pandas UDF for proper distributed execution. Each executor runs code independently, and when Spark execut...

  • 2 kudos
3 More Replies
salesbrj
by New Contributor
  • 251 Views
  • 1 replies
  • 0 kudos

Inference Tables Empty

Hello,I have been using Databricks Free Platform for a while. Everything seems to work well. However, I've been trying to generate the payload from the deployed endpoint and I got always an empty inference table.When I check the configuration, I got ...

salesbrj_0-1759590791882.png
  • 251 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @salesbrj ,Most probably this will be related to limitation in Free Edition. In limitations section I can see following entry:No custom models on GPU or batch inferencehttps://docs.databricks.com/aws/en/getting-started/free-edition-limitations

  • 0 kudos
spicysheep
by New Contributor II
  • 1381 Views
  • 3 replies
  • 1 kudos

Distributed SparkXGBRanker training: failed barrier ResultStage

I'm following a variation of the tutorial [here](https://assets.docs.databricks.com/_extras/notebooks/source/xgboost-pyspark-new.html) to train an `SparkXGBRanker` in distributed mode. However, the line:pipeline_model = pipeline.fit(data) Is throwing...

  • 1381 Views
  • 3 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

You have already mentioned you did turn off autoscaling, please try the num_workers too Step 1: Disable Dynamic Resource Allocation: Use spark.dynamicAllocation.enabled = false Step 2: Configure num_workers to Match Fixed Resources After disabling dy...

  • 1 kudos
2 More Replies
Ritchie
by New Contributor II
  • 6602 Views
  • 7 replies
  • 3 kudos

Unable to Use VectorAssembler in PySpark 3.5.0 Due to Whitelisting

Hi,I am currently using PySpark version 3.5.0 on my Databricks cluster. Despite setting the required configuration using the command: spark.conf.set("spark.databricks.ml.whitelist", "true"), I am still encountering an issue while trying to use the Ve...

  • 6602 Views
  • 7 replies
  • 3 kudos
Latest Reply
anderaraujo92
New Contributor II
  • 3 kudos

I also had this error trying to use ML on free edition. Is ML features working for free edition.

  • 3 kudos
6 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels