- 29 Views
- 1 replies
- 0 kudos
How to store & update a FAISS Index in Databricks
I’m currently using FAISS in a Databricks notebook to perform semantic search in text data. My current workflow looks like this:encode ~10k text entries using an embedding model.build a FAISS index in memory.run similarity searches using index.search...
- 29 Views
- 1 replies
- 0 kudos
- 0 kudos
Hello @ashfire , Here’s a practical path to scale your FAISS workflow on Databricks, along with patterns to persist indexes, incrementally add embeddings, and keep metadata aligned. Best practice to persist/load FAISS indexes on Databricks Use faiss...
- 0 kudos
- 3506 Views
- 2 replies
- 0 kudos
DAB - Add/remove task depending on workspace.
I use DAB for deploying Jobs, I want to add a specific Task in dev only but not in staging or prod. Is there any way to achieve this using DAB ?
- 3506 Views
- 2 replies
- 0 kudos
- 0 kudos
You can define specific resources by target in DAB as shown here. This is valid for jobs and/or tasks:For instance, in my case:I think, best option (but not available as far as I know) would be to be able to define "include" sections by target, inste...
- 0 kudos
- 1024 Views
- 2 replies
- 0 kudos
Distributed Training quits if any worker node fails
Hi,I'm training a Pytorch model in a distributed environment using the Pytorch's DistributedDataParallel (DDP) library. I have spin up 10 worker nodes.The issue which I'm facing is that during the training, if any worker node fails and exits, the ent...
- 1024 Views
- 2 replies
- 0 kudos
- 0 kudos
Distributed training with PyTorch’s DistributedDataParallel (DDP) is not inherently fault-tolerant—if any node fails, the whole job crashes, and, as you noted, checkpointing cannot auto-recover the process without hypervisor or application-level orch...
- 0 kudos
- 3877 Views
- 1 replies
- 0 kudos
FeatureEngineeringClient and Unity Catalog
When testing this code ( fe.score_batch( df=dataset.drop("Target").limit(10), model_uri=f"models:/{model_name}/{mv.version}", ) .select("prediction") .limit(10) .display() ) I get the error: “MlflowException: The...
- 3877 Views
- 1 replies
- 0 kudos
- 0 kudos
Your issues are tied to authentication and network/configuration differences between Unity Catalog and Workspace models in Databricks, specifically when using the FeatureEngineeringClient. Key Issues FeatureEngineeringClient + Unity Catalog: You get...
- 0 kudos
- 3696 Views
- 1 replies
- 0 kudos
Why is spark mllib is not encouraged on the platform?/Why is ML dependent on .toPandas() on dbricks?
I'm new to Spark,Databricks and am surprised about how the Databricks tutorials for ML are using pandas DF > Spark DF. Of the tutorials I've seen, most data processing is done in a distributed manner but then its just cast to a pandas dataframe. From...
- 3696 Views
- 1 replies
- 0 kudos
- 0 kudos
You are noticing a common pattern in Databricks ML tutorials: data is often processed with Spark for scalability, but training and modeling are frequently done on pandas DataFrames using single-node libraries like scikit-learn. This workflow can be c...
- 0 kudos
- 22 Views
- 1 replies
- 0 kudos
Resolved! No option for create compute in trial version
Hi,I dont see an option for "Create Compute". I have a trial version. I am trying to build machine learning model on Databricks for the first time.Please check the attached the screenshot.
- 22 Views
- 1 replies
- 0 kudos
- 0 kudos
Hello @nitinjain26! Free trials only offer serverless/SQL compute clusters (due to resource and cost controls).Please check out this post for more details: [FREE TRIAL] Missing All-Purpose Clusters Access - New Account
- 0 kudos
- 4311 Views
- 1 replies
- 0 kudos
Feature tables & Null Values
Hi!I was wondering if any of you has ever dealt with Feature tables and null values (more specifically, via feature engineering objects, rather than feature store, although I don't think it really matters).In brief, null values are allowed to be stor...
- 4311 Views
- 1 replies
- 0 kudos
- 0 kudos
When dealing with feature tables and null values—especially via Databricks Feature Engineering objects (but also more broadly in Spark or feature platforms)—there are some nuanced behaviors when schema inference is required. Here are clear answers to...
- 0 kudos
- 1291 Views
- 2 replies
- 2 kudos
Model Serving - Shadow Deployment - Azure
Hey,I'm composing an architecture within the usage of Model Serving Endpoints and one of the needs that we're aiming to resolve is Shadow Deployment.Currently, it seems that the traffic configurations available in model serving do not allow this type...
- 1291 Views
- 2 replies
- 2 kudos
- 2 kudos
@ryojikn and @irtizak , you’re right. Databricks Model Serving allows splitting traffic between model versions, but it doesn’t have a true shadow deployment where live production traffic is mirrored to a new model for monitoring without affecting use...
- 2 kudos
- 98 Views
- 4 replies
- 1 kudos
What Are the Key Challenges in Developing ETL Pipelines Using Databricks?
I’m looking to understand the practical challenges that professionals face when building ETL (Extract, Transform, Load) pipelines on Databricks. Specifically, I’m curious about issues related to scalability, performance, data quality, integration wit...
- 98 Views
- 4 replies
- 1 kudos
- 1 kudos
Developing ETL pipelines in Databricks comes with challenges like managing diverse data sources, optimizing Spark performance, and controlling cloud costs. Ensuring data quality, handling errors, and maintaining security and compliance add complexity...
- 1 kudos
- 79 Views
- 3 replies
- 3 kudos
course material access
Hi,Where do I find the notebooks used in the training? I am doing the Machine Learning Practitioner Learning PlanRegardsNitin
- 79 Views
- 3 replies
- 3 kudos
- 3 kudos
Then in the video the instructor should specify that. This ( https://partner-academy.databricks.com/learn/learning-plans/11/machine-learning-practitioner-learning-plan/courses/2343/data-preparation-for-machine-learning/lessons/17941/demo-load-and-ex...
- 3 kudos
- 154 Views
- 2 replies
- 2 kudos
Resolved! Model Registration and hosting
I have train & tested a model in databricks, now I want to register it and host it. But I am unable too do so. Please find attach snapshot of code & error
- 154 Views
- 2 replies
- 2 kudos
- 2 kudos
Hi @intelliconnectq The above code will fail with AttributeError: 'NoneType' object has no attribute 'info' on the line: model_uri = f"runs:/{mlflow.active_run().info.run_id}/xgboost-model" This happens because once the with mlflow.start_run(): bloc...
- 2 kudos
- 3947 Views
- 1 replies
- 0 kudos
Model serving with custom pip index URL
An mlflow model was logged with a custom pip requirements file which contains package versions (mlflow==2.11.3), as well as a custom --index-url. However model serving during the "Initializing model enviroment" step tries to pip install mlflow==2.2.2...
- 3947 Views
- 1 replies
- 0 kudos
- 0 kudos
Hi @ScyLukb , This is a common and frustrating problem that occurs when the Model Serving environment's built-in dependencies conflict with your model's specific requirements. The root cause is that the Model Serving environment tries to install its ...
- 0 kudos
- 3526 Views
- 1 replies
- 2 kudos
Bug: MLflow recipe
I'm not sure whether this is the right place, but we've encountered a bug in the datasets.py(https://github.com/mlflow/mlflow/blob/master/mlflow/recipes/steps/ingest/datasets.py.). Anyone using recipes beware of forementioned.def _convert_spark_df_to...
- 3526 Views
- 1 replies
- 2 kudos
- 2 kudos
Hi @Mario_D , Thanks for bringing this to our attention, I will pass this information along to the appropriate team!
- 2 kudos
- 3938 Views
- 1 replies
- 2 kudos
Rolling predictions with FeatureEngineeringClient
I am performing a time series analysis, using a XGBoostRegressor with rolling predictions. I am doing so using the FeatureEngineeringClient (in combination with Unity Catalog), where I create and load in my features during training and inference, as ...
- 3938 Views
- 1 replies
- 2 kudos
- 2 kudos
You’re running into a fundamental limitation: score_batch does point‑in‑time feature lookups and batch scoring, but it doesn’t support recursive multi‑step forecasting where predictions update features for subsequent timesteps. Feature Store looks up...
- 2 kudos
- 3666 Views
- 1 replies
- 0 kudos
TorchDistributor: installation of custom python package via wheel across all nodes in cluster
I am trying to set up a training pipeline of a distributed PyTorch model using TorchDistributor. I have defined a train_object (in my case it is a Callable) that runs my training code. However, this method requires custom code from modules that I hav...
- 3666 Views
- 1 replies
- 0 kudos
- 0 kudos
hi @tooooods , This is a classic challenge in distributed computing, and your observation is spot on. The ModuleNotFoundError on the workers, despite the UI and API showing the library as "Installed," is the key symptom. This happens because TorchDis...
- 0 kudos
Join Us as a Local Community Builder!
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now-
Access control
3 -
Access Data
2 -
AccessKeyVault
1 -
ADB
2 -
Airflow
1 -
Amazon
2 -
Apache
1 -
Apache spark
3 -
APILimit
1 -
Artifacts
1 -
Audit
1 -
Autoloader
6 -
Autologging
2 -
Automation
2 -
Automl
39 -
Aws databricks
1 -
AWSSagemaker
1 -
Azure
32 -
Azure active directory
1 -
Azure blob storage
2 -
Azure data lake
1 -
Azure Data Lake Storage
3 -
Azure data lake store
1 -
Azure databricks
32 -
Azure event hub
1 -
Azure key vault
1 -
Azure sql database
1 -
Azure Storage
2 -
Azure synapse
1 -
Azure Unity Catalog
1 -
Azure vm
1 -
AzureML
2 -
Bar
1 -
Beta
1 -
Better Way
1 -
BI Integrations
1 -
BI Tool
1 -
Billing and Cost Management
1 -
Blob
1 -
Blog
1 -
Blog Post
1 -
Broadcast variable
1 -
Business Intelligence
1 -
CatalogDDL
1 -
Centralized Model Registry
1 -
Certification
2 -
Certification Badge
1 -
Change
1 -
Change Logs
1 -
Check
2 -
Classification Model
1 -
Cloud Storage
1 -
Cluster
10 -
Cluster policy
1 -
Cluster Start
1 -
Cluster Termination
2 -
Clustering
1 -
ClusterMemory
1 -
CNN HOF
1 -
Column names
1 -
Community Edition
1 -
Community Edition Password
1 -
Community Members
1 -
Company Email
1 -
Condition
1 -
Config
1 -
Configure
3 -
Confluent Cloud
1 -
Container
2 -
ContainerServices
1 -
Control Plane
1 -
ControlPlane
1 -
Copy
1 -
Copy into
2 -
CosmosDB
1 -
Courses
2 -
Csv files
1 -
Dashboards
1 -
Data
8 -
Data Engineer Associate
1 -
Data Engineer Certification
1 -
Data Explorer
1 -
Data Ingestion
2 -
Data Ingestion & connectivity
11 -
Data Quality
1 -
Data Quality Checks
1 -
Data Science & Engineering
2 -
databricks
5 -
Databricks Academy
3 -
Databricks Account
1 -
Databricks AutoML
9 -
Databricks Cluster
3 -
Databricks Community
5 -
Databricks community edition
4 -
Databricks connect
1 -
Databricks dbfs
1 -
Databricks Feature Store
1 -
Databricks Job
1 -
Databricks Lakehouse
1 -
Databricks Mlflow
4 -
Databricks Model
2 -
Databricks notebook
10 -
Databricks ODBC
1 -
Databricks Platform
1 -
Databricks Pyspark
1 -
Databricks Python Notebook
1 -
Databricks Runtime
9 -
Databricks SQL
8 -
Databricks SQL Permission Problems
1 -
Databricks Terraform
1 -
Databricks Training
2 -
Databricks Unity Catalog
1 -
Databricks V2
1 -
Databricks version
1 -
Databricks Workflow
2 -
Databricks Workflows
1 -
Databricks workspace
2 -
Databricks-connect
1 -
DatabricksContainer
1 -
DatabricksML
6 -
Dataframe
3 -
DataSharing
1 -
Datatype
1 -
DataVersioning
1 -
Date Column
1 -
Dateadd
1 -
DB Notebook
1 -
DB Runtime
1 -
DBFS
5 -
DBFS Rest Api
1 -
Dbt
1 -
Dbu
1 -
DDL
1 -
DDP
1 -
Dear Community
1 -
DecisionTree
1 -
Deep learning
4 -
Default Location
1 -
Delete
1 -
Delt Lake
4 -
Delta lake table
1 -
Delta Live
1 -
Delta Live Tables
6 -
Delta log
1 -
Delta Sharing
3 -
Delta-lake
1 -
Deploy
1 -
DESC
1 -
Details
1 -
Dev
1 -
Devops
1 -
Df
1 -
Different Notebook
1 -
Different Parameters
1 -
DimensionTables
1 -
Directory
3 -
Disable
1 -
Distribution
1 -
DLT
6 -
DLT Pipeline
3 -
Dolly
5 -
Dolly Demo
2 -
Download
2 -
EC2
1 -
Emr
2 -
Ensemble Models
1 -
Environment Variable
1 -
Epoch
1 -
Error handling
1 -
Error log
2 -
Eventhub
1 -
Example
1 -
Experiments
4 -
External Sources
1 -
Extract
1 -
Fact Tables
1 -
Failure
2 -
Feature Lookup
2 -
Feature Store
61 -
Feature Store API
2 -
Feature Store Table
1 -
Feature Table
6 -
Feature Tables
4 -
Features
2 -
FeatureStore
2 -
File Path
2 -
File Size
1 -
Fine Tune Spark Jobs
1 -
Forecasting
2 -
Forgot Password
2 -
Garbage Collection
1 -
Garbage Collection Optimization
1 -
Github
2 -
Github actions
2 -
Github Repo
2 -
Gitlab
1 -
GKE
1 -
Global Init Script
1 -
Global init scripts
4 -
Governance
1 -
Hi
1 -
Horovod
1 -
Html
1 -
Hyperopt
4 -
Hyperparameter Tuning
2 -
Iam
1 -
Image
3 -
Image Data
1 -
Inference Setup Error
1 -
INFORMATION
1 -
Input
1 -
Insert
1 -
Instance Profile
1 -
Int
2 -
Interactive cluster
1 -
Internal error
1 -
Invalid Type Code
1 -
IP
1 -
Ipython
1 -
Ipywidgets
1 -
JDBC Connections
1 -
Jira
1 -
Job
4 -
Job Parameters
1 -
Job Runs
1 -
Join
1 -
Jsonfile
1 -
Kafka consumer
1 -
Key Management
1 -
Kinesis
1 -
Lakehouse
1 -
Large Datasets
1 -
Latest Version
1 -
Learning
1 -
Limit
3 -
LLM
3 -
LLMs
2 -
Local computer
1 -
Local Machine
1 -
Log Model
2 -
Logging
1 -
Login
1 -
Logs
1 -
Long Time
2 -
Low Latency APIs
2 -
LTS ML
3 -
Machine
3 -
Machine Learning
24 -
Machine Learning Associate
1 -
Managed Table
1 -
Max Retries
1 -
Maximum Number
1 -
Medallion Architecture
1 -
Memory
3 -
Metadata
1 -
Metrics
3 -
Microsoft azure
1 -
ML Lifecycle
4 -
ML Model
4 -
ML Practioner
3 -
ML Runtime
1 -
MlFlow
75 -
MLflow API
5 -
MLflow Artifacts
2 -
MLflow Experiment
6 -
MLflow Experiments
3 -
Mlflow Model
10 -
Mlflow registry
3 -
Mlflow Run
1 -
Mlflow Server
5 -
MLFlow Tracking Server
3 -
MLModels
2 -
Model Deployment
4 -
Model Lifecycle
6 -
Model Loading
2 -
Model Monitoring
1 -
Model registry
5 -
Model Serving
15 -
Model Serving Cluster
2 -
Model Serving REST API
6 -
Model Training
2 -
Model Tuning
1 -
Models
8 -
Module
3 -
Modulenotfounderror
1 -
MongoDB
1 -
Mount Point
1 -
Mounts
1 -
Multi
1 -
Multiline
1 -
Multiple users
1 -
Nested
1 -
New Feature
1 -
New Features
1 -
New Workspace
1 -
Nlp
3 -
Note
1 -
Notebook
6 -
Notification
2 -
Object
3 -
Onboarding
1 -
Online Feature Store Table
1 -
OOM Error
1 -
Open Source MLflow
4 -
Optimization
2 -
Optimize Command
1 -
OSS
3 -
Overwatch
1 -
Overwrite
2 -
Packages
2 -
Pandas udf
4 -
Pandas_udf
1 -
Parallel
1 -
Parallel processing
1 -
Parallel Runs
1 -
Parallelism
1 -
Parameter
2 -
PARAMETER VALUE
2 -
Partner Academy
1 -
Pending State
2 -
Performance Tuning
1 -
Photon Engine
1 -
Pickle
1 -
Pickle Files
2 -
Pip
2 -
Points
1 -
Possible
1 -
Postgres
1 -
Pricing
2 -
Primary Key
1 -
Primary Key Constraint
1 -
Progress bar
2 -
Proven Practices
2 -
Public
2 -
Pymc3 Models
2 -
PyPI
1 -
Pyspark
6 -
Python
21 -
Python API
1 -
Python Code
1 -
Python Function
3 -
Python Libraries
1 -
Python Packages
1 -
Python Project
1 -
Pytorch
3 -
Reading-excel
2 -
Redis
2 -
Region
1 -
Remote RPC Client
1 -
RESTAPI
1 -
Result
1 -
Runtime update
1 -
Sagemaker
1 -
Salesforce
1 -
SAP
1 -
Scalability
1 -
Scalable Machine
2 -
Schema evolution
1 -
Script
1 -
Search
1 -
Security
2 -
Security Exception
1 -
Self Service Notebooks
1 -
Server
1 -
Serverless
1 -
Serving
1 -
Shap
2 -
Size
1 -
Sklearn
1 -
Slow
1 -
Small Scale Experimentation
1 -
Source Table
1 -
Spark config
1 -
Spark connector
1 -
Spark Error
1 -
Spark MLlib
2 -
Spark Pandas Api
1 -
Spark ui
1 -
Spark Version
2 -
Spark-submit
1 -
SparkML Models
2 -
Sparknlp
3 -
Spot
1 -
SQL
19 -
SQL Editor
1 -
SQL Queries
1 -
SQL Visualizations
1 -
Stage failure
2 -
Storage
3 -
Stream
2 -
Stream Data
1 -
Structtype
1 -
Structured streaming
2 -
Study Material
1 -
Summit23
2 -
Support
1 -
Support Team
1 -
Synapse
1 -
Synapse ML
1 -
Table
4 -
Table access control
1 -
Tableau
1 -
Task
1 -
Temporary View
1 -
Tensor flow
1 -
Test
1 -
Timeseries
1 -
Timestamps
1 -
TODAY
1 -
Training
6 -
Transaction Log
1 -
Trying
1 -
Tuning
2 -
UAT
1 -
Ui
1 -
Unexpected Error
1 -
Unity Catalog
12 -
Use Case
2 -
Use cases
1 -
Uuid
1 -
Validate ML Model
2 -
Values
1 -
Variable
1 -
Vector
1 -
Versioncontrol
1 -
Visualization
2 -
Web App Azure Databricks
1 -
Weekly Release Notes
2 -
Whl
1 -
Worker Nodes
1 -
Workflow
2 -
Workflow Jobs
1 -
Workspace
2 -
Write
1 -
Writing
1 -
Z-ordering
1 -
Zorder
1
- « Previous
- Next »
| User | Count |
|---|---|
| 90 | |
| 39 | |
| 38 | |
| 25 | |
| 25 |