- 4776 Views
- 1 replies
- 0 kudos
Why is spark mllib is not encouraged on the platform?/Why is ML dependent on .toPandas() on dbricks?
I'm new to Spark,Databricks and am surprised about how the Databricks tutorials for ML are using pandas DF > Spark DF. Of the tutorials I've seen, most data processing is done in a distributed manner but then its just cast to a pandas dataframe. From...
- 4776 Views
- 1 replies
- 0 kudos
- 0 kudos
You are noticing a common pattern in Databricks ML tutorials: data is often processed with Spark for scalability, but training and modeling are frequently done on pandas DataFrames using single-node libraries like scikit-learn. This workflow can be c...
- 0 kudos
- 663 Views
- 1 replies
- 0 kudos
Resolved! No option for create compute in trial version
Hi,I dont see an option for "Create Compute". I have a trial version. I am trying to build machine learning model on Databricks for the first time.Please check the attached the screenshot.
- 663 Views
- 1 replies
- 0 kudos
- 0 kudos
Hello @nitinjain26! Free trials only offer serverless/SQL compute clusters (due to resource and cost controls).Please check out this post for more details: [FREE TRIAL] Missing All-Purpose Clusters Access - New Account
- 0 kudos
- 6429 Views
- 1 replies
- 0 kudos
Feature tables & Null Values
Hi!I was wondering if any of you has ever dealt with Feature tables and null values (more specifically, via feature engineering objects, rather than feature store, although I don't think it really matters).In brief, null values are allowed to be stor...
- 6429 Views
- 1 replies
- 0 kudos
- 0 kudos
When dealing with feature tables and null values—especially via Databricks Feature Engineering objects (but also more broadly in Spark or feature platforms)—there are some nuanced behaviors when schema inference is required. Here are clear answers to...
- 0 kudos
- 1989 Views
- 4 replies
- 1 kudos
What Are the Key Challenges in Developing ETL Pipelines Using Databricks?
I’m looking to understand the practical challenges that professionals face when building ETL (Extract, Transform, Load) pipelines on Databricks. Specifically, I’m curious about issues related to scalability, performance, data quality, integration wit...
- 1989 Views
- 4 replies
- 1 kudos
- 1 kudos
Developing ETL pipelines in Databricks comes with challenges like managing diverse data sources, optimizing Spark performance, and controlling cloud costs. Ensuring data quality, handling errors, and maintaining security and compliance add complexity...
- 1 kudos
- 726 Views
- 2 replies
- 2 kudos
Resolved! Model Registration and hosting
I have train & tested a model in databricks, now I want to register it and host it. But I am unable too do so. Please find attach snapshot of code & error
- 726 Views
- 2 replies
- 2 kudos
- 2 kudos
Hi @intelliconnectq The above code will fail with AttributeError: 'NoneType' object has no attribute 'info' on the line: model_uri = f"runs:/{mlflow.active_run().info.run_id}/xgboost-model" This happens because once the with mlflow.start_run(): bloc...
- 2 kudos
- 5034 Views
- 1 replies
- 0 kudos
Model serving with custom pip index URL
An mlflow model was logged with a custom pip requirements file which contains package versions (mlflow==2.11.3), as well as a custom --index-url. However model serving during the "Initializing model enviroment" step tries to pip install mlflow==2.2.2...
- 5034 Views
- 1 replies
- 0 kudos
- 0 kudos
Hi @ScyLukb , This is a common and frustrating problem that occurs when the Model Serving environment's built-in dependencies conflict with your model's specific requirements. The root cause is that the Model Serving environment tries to install its ...
- 0 kudos
- 4268 Views
- 1 replies
- 2 kudos
Bug: MLflow recipe
I'm not sure whether this is the right place, but we've encountered a bug in the datasets.py(https://github.com/mlflow/mlflow/blob/master/mlflow/recipes/steps/ingest/datasets.py.). Anyone using recipes beware of forementioned.def _convert_spark_df_to...
- 4268 Views
- 1 replies
- 2 kudos
- 2 kudos
Hi @Mario_D , Thanks for bringing this to our attention, I will pass this information along to the appropriate team!
- 2 kudos
- 5147 Views
- 1 replies
- 2 kudos
Rolling predictions with FeatureEngineeringClient
I am performing a time series analysis, using a XGBoostRegressor with rolling predictions. I am doing so using the FeatureEngineeringClient (in combination with Unity Catalog), where I create and load in my features during training and inference, as ...
- 5147 Views
- 1 replies
- 2 kudos
- 2 kudos
You’re running into a fundamental limitation: score_batch does point‑in‑time feature lookups and batch scoring, but it doesn’t support recursive multi‑step forecasting where predictions update features for subsequent timesteps. Feature Store looks up...
- 2 kudos
- 4690 Views
- 1 replies
- 0 kudos
TorchDistributor: installation of custom python package via wheel across all nodes in cluster
I am trying to set up a training pipeline of a distributed PyTorch model using TorchDistributor. I have defined a train_object (in my case it is a Callable) that runs my training code. However, this method requires custom code from modules that I hav...
- 4690 Views
- 1 replies
- 0 kudos
- 0 kudos
hi @tooooods , This is a classic challenge in distributed computing, and your observation is spot on. The ModuleNotFoundError on the workers, despite the UI and API showing the library as "Installed," is the key symptom. This happens because TorchDis...
- 0 kudos
- 8143 Views
- 5 replies
- 2 kudos
Problem serving a langchain model on Databricks
Hi, I've encountered a problem of serving a langchain model I just created successfully on Databricks.I was using the following code to set up a model in unity catalog:from mlflow.models import infer_signatureimport mlflowimport langchainmlflow.set_r...
- 8143 Views
- 5 replies
- 2 kudos
- 2 kudos
Greetings @hawa , Thanks for sharing the details—this looks like a combination of registration and configuration issues that commonly surface with the MLflow LangChain flavor on Databricks. What’s going wrong The registered model name should be a fu...
- 2 kudos
- 4480 Views
- 1 replies
- 0 kudos
How to load a synapse/maven package in Dbricks Model Serving Endpoint
Hi!A lot similar to this 2021's post: https://community.databricks.com/t5/data-engineering/how-to-include-a-third-party-maven-package-in-mlflow-model/td-p/17060I'm attempting to serve a synapseml model (maven dependencies) using Databricks Model Serv...
- 4480 Views
- 1 replies
- 0 kudos
- 0 kudos
You are encountering issues serving a SynapseML model (with Maven dependencies) via Databricks Model Serving Endpoints, and the deployment works fine on general-purpose clusters but fails for the serving endpoint. This is a well-known issue with Data...
- 0 kudos
- 4643 Views
- 1 replies
- 1 kudos
Proper mlflow run logging with SparkTrials and Hyperopt
Hello!I'm attempting to run a hyperparameter search using hyperopt and SparkTrials(), and log the resulting runs to an existing experiment (experiment A). I can see on this page that databricks suggests wrapping the `fmin()` call within a `mlflow.sta...
- 4643 Views
- 1 replies
- 1 kudos
- 1 kudos
Both the parent and child runs of a Hyperopt sweep in Databricks are, by default, influenced by the experiment associated with the notebook context rather than the explicit experiment passed to mlflow.start_run(). As you noticed, child runs remain in...
- 1 kudos
- 4562 Views
- 1 replies
- 0 kudos
Vector Index Creation for external embedding model takes a lot of time
I have embedding model endpoint created and served. It is huggingface model which databricks doesnt provide. I am using this model to create vector search index however this takes a lot of time to get created. I observed that when I use databricks of...
- 4562 Views
- 1 replies
- 0 kudos
- 0 kudos
The main reason your Hugging Face embedding model endpoint is taking much longer than Databricks’ own large_bge_en model to build a vector search index is likely due to differences in operational architecture and performance optimizations between ext...
- 0 kudos
- 5692 Views
- 1 replies
- 0 kudos
Pickle/joblib.dump a pre-processing function defined in a notebook
I've built a custom MLFlow model class which I know functions. As part of a given run the model class uses `joblib.dump` to store necessary parameters on the databricks DBFS before logging them as artifacts in the MLFlow run. This works fine when usi...
- 5692 Views
- 1 replies
- 0 kudos
- 0 kudos
The error you’re seeing—SPARK-5063 CONTEXT_ONLY_VALID_ON_DRIVER—arises when trying to serialize or use objects (such as functions) defined in Databricks notebooks from workers rather than the driver. This issue is especially common with Python functi...
- 0 kudos
- 643 Views
- 1 replies
- 1 kudos
Resolved! Options sporadic (and cost-efficient) Model Serving on Databricks?
Hi all,I'm new to Databricks so would appreciate some advice.I have a ML model deployed using Databricks Model Serving. My use case is very sporadic: I only need to make 5–15 prediction requests per day (industrial application), and there can be long...
- 643 Views
- 1 replies
- 1 kudos
- 1 kudos
Hi @cbossi , You are right! A 30-minute idle period precedes the endpoint's scaling down. You are billed for the compute resources used during this period, plus the actual serving time when requests are made. This is the current expected behaviour. Y...
- 1 kudos
-
Access control
3 -
Access Data
2 -
AccessKeyVault
1 -
ADB
2 -
Airflow
1 -
Amazon
2 -
Apache
1 -
Apache spark
3 -
APILimit
1 -
Artifacts
1 -
Audit
1 -
Autoloader
6 -
Autologging
2 -
Automation
2 -
Automl
44 -
Aws databricks
1 -
AWSSagemaker
1 -
Azure
32 -
Azure active directory
1 -
Azure blob storage
2 -
Azure data lake
1 -
Azure Data Lake Storage
3 -
Azure data lake store
1 -
Azure databricks
32 -
Azure event hub
1 -
Azure key vault
1 -
Azure sql database
1 -
Azure Storage
2 -
Azure synapse
1 -
Azure Unity Catalog
1 -
Azure vm
1 -
AzureML
2 -
Bar
1 -
Beta
1 -
Better Way
1 -
BI Integrations
1 -
BI Tool
1 -
Billing and Cost Management
1 -
Blob
1 -
Blog
1 -
Blog Post
1 -
Broadcast variable
1 -
Business Intelligence
1 -
CatalogDDL
1 -
Centralized Model Registry
1 -
Certification
2 -
Certification Badge
1 -
Change
1 -
Change Logs
1 -
Check
2 -
Classification Model
1 -
Cloud Storage
1 -
Cluster
10 -
Cluster policy
1 -
Cluster Start
1 -
Cluster Termination
2 -
Clustering
1 -
ClusterMemory
1 -
CNN HOF
1 -
Column names
1 -
Community Edition
1 -
Community Edition Password
1 -
Community Members
1 -
Company Email
1 -
Condition
1 -
Config
1 -
Configure
3 -
Confluent Cloud
1 -
Container
2 -
ContainerServices
1 -
Control Plane
1 -
ControlPlane
1 -
Copy
1 -
Copy into
2 -
CosmosDB
1 -
Courses
2 -
Csv files
1 -
Dashboards
1 -
Data
8 -
Data Engineer Associate
1 -
Data Engineer Certification
1 -
Data Explorer
1 -
Data Ingestion
2 -
Data Ingestion & connectivity
11 -
Data Quality
1 -
Data Quality Checks
1 -
Data Science & Engineering
2 -
databricks
5 -
Databricks Academy
3 -
Databricks Account
1 -
Databricks AutoML
9 -
Databricks Cluster
3 -
Databricks Community
5 -
Databricks community edition
4 -
Databricks connect
1 -
Databricks dbfs
1 -
Databricks Feature Store
1 -
Databricks Job
1 -
Databricks Lakehouse
1 -
Databricks Mlflow
4 -
Databricks Model
2 -
Databricks notebook
10 -
Databricks ODBC
1 -
Databricks Platform
1 -
Databricks Pyspark
1 -
Databricks Python Notebook
1 -
Databricks Runtime
9 -
Databricks SQL
8 -
Databricks SQL Permission Problems
1 -
Databricks Terraform
1 -
Databricks Training
2 -
Databricks Unity Catalog
1 -
Databricks V2
1 -
Databricks version
1 -
Databricks Workflow
2 -
Databricks Workflows
1 -
Databricks workspace
2 -
Databricks-connect
1 -
DatabricksContainer
1 -
DatabricksML
6 -
Dataframe
3 -
DataSharing
1 -
Datatype
1 -
DataVersioning
1 -
Date Column
1 -
Dateadd
1 -
DB Notebook
1 -
DB Runtime
1 -
DBFS
5 -
DBFS Rest Api
1 -
Dbt
1 -
Dbu
1 -
DDL
1 -
DDP
1 -
Dear Community
1 -
DecisionTree
1 -
Deep learning
4 -
Default Location
1 -
Delete
1 -
Delt Lake
4 -
Delta lake table
1 -
Delta Live
1 -
Delta Live Tables
6 -
Delta log
1 -
Delta Sharing
3 -
Delta-lake
1 -
Deploy
1 -
DESC
1 -
Details
1 -
Dev
1 -
Devops
1 -
Df
1 -
Different Notebook
1 -
Different Parameters
1 -
DimensionTables
1 -
Directory
3 -
Disable
1 -
Distribution
1 -
DLT
6 -
DLT Pipeline
3 -
Dolly
5 -
Dolly Demo
2 -
Download
2 -
EC2
1 -
Emr
2 -
Ensemble Models
1 -
Environment Variable
1 -
Epoch
1 -
Error handling
1 -
Error log
2 -
Eventhub
1 -
Example
1 -
Experiments
4 -
External Sources
1 -
Extract
1 -
Fact Tables
1 -
Failure
2 -
Feature Lookup
2 -
Feature Store
61 -
Feature Store API
2 -
Feature Store Table
1 -
Feature Table
6 -
Feature Tables
4 -
Features
2 -
FeatureStore
2 -
File Path
2 -
File Size
1 -
Fine Tune Spark Jobs
1 -
Forecasting
2 -
Forgot Password
2 -
Garbage Collection
1 -
Garbage Collection Optimization
1 -
Github
2 -
Github actions
2 -
Github Repo
2 -
Gitlab
1 -
GKE
1 -
Global Init Script
1 -
Global init scripts
4 -
Governance
1 -
Hi
1 -
Horovod
1 -
Html
1 -
Hyperopt
4 -
Hyperparameter Tuning
2 -
Iam
1 -
Image
3 -
Image Data
1 -
Inference Setup Error
1 -
INFORMATION
1 -
Input
1 -
Insert
1 -
Instance Profile
1 -
Int
2 -
Interactive cluster
1 -
Internal error
1 -
Invalid Type Code
1 -
IP
1 -
Ipython
1 -
Ipywidgets
1 -
JDBC Connections
1 -
Jira
1 -
Job
4 -
Job Parameters
1 -
Job Runs
1 -
Join
1 -
Jsonfile
1 -
Kafka consumer
1 -
Key Management
1 -
Kinesis
1 -
Lakehouse
1 -
Large Datasets
1 -
Latest Version
1 -
Learning
1 -
Limit
3 -
LLM
3 -
LLMs
3 -
Local computer
1 -
Local Machine
1 -
Log Model
2 -
Logging
1 -
Login
1 -
Logs
1 -
Long Time
2 -
Low Latency APIs
2 -
LTS ML
3 -
Machine
3 -
Machine Learning
24 -
Machine Learning Associate
1 -
Managed Table
1 -
Max Retries
1 -
Maximum Number
1 -
Medallion Architecture
1 -
Memory
3 -
Metadata
1 -
Metrics
3 -
Microsoft azure
1 -
ML Lifecycle
4 -
ML Model
4 -
ML Practioner
3 -
ML Runtime
1 -
MlFlow
75 -
MLflow API
5 -
MLflow Artifacts
2 -
MLflow Experiment
6 -
MLflow Experiments
3 -
Mlflow Model
10 -
Mlflow registry
3 -
Mlflow Run
1 -
Mlflow Server
5 -
MLFlow Tracking Server
3 -
MLModels
2 -
Model Deployment
4 -
Model Lifecycle
6 -
Model Loading
2 -
Model Monitoring
1 -
Model registry
5 -
Model Serving
28 -
Model Serving Cluster
2 -
Model Serving REST API
6 -
Model Training
2 -
Model Tuning
1 -
Models
8 -
Module
3 -
Modulenotfounderror
1 -
MongoDB
1 -
Mount Point
1 -
Mounts
1 -
Multi
1 -
Multiline
1 -
Multiple users
1 -
Nested
1 -
New Feature
1 -
New Features
1 -
New Workspace
1 -
Nlp
3 -
Note
1 -
Notebook
6 -
Notification
2 -
Object
3 -
Onboarding
1 -
Online Feature Store Table
1 -
OOM Error
1 -
Open Source MLflow
4 -
Optimization
2 -
Optimize Command
1 -
OSS
3 -
Overwatch
1 -
Overwrite
2 -
Packages
2 -
Pandas udf
4 -
Pandas_udf
1 -
Parallel
1 -
Parallel processing
1 -
Parallel Runs
1 -
Parallelism
1 -
Parameter
2 -
PARAMETER VALUE
2 -
Partner Academy
1 -
Pending State
2 -
Performance Tuning
1 -
Photon Engine
1 -
Pickle
1 -
Pickle Files
2 -
Pip
2 -
Points
1 -
Possible
1 -
Postgres
1 -
Pricing
2 -
Primary Key
1 -
Primary Key Constraint
1 -
Progress bar
2 -
Proven Practices
2 -
Public
2 -
Pymc3 Models
2 -
PyPI
1 -
Pyspark
6 -
Python
21 -
Python API
1 -
Python Code
1 -
Python Function
3 -
Python Libraries
1 -
Python Packages
1 -
Python Project
1 -
Pytorch
3 -
Reading-excel
2 -
Redis
2 -
Region
1 -
Remote RPC Client
1 -
RESTAPI
1 -
Result
1 -
Runtime update
1 -
Sagemaker
1 -
Salesforce
1 -
SAP
1 -
Scalability
1 -
Scalable Machine
2 -
Schema evolution
1 -
Script
1 -
Search
1 -
Security
2 -
Security Exception
1 -
Self Service Notebooks
1 -
Server
1 -
Serverless
1 -
Serving
1 -
Shap
2 -
Size
1 -
Sklearn
1 -
Slow
1 -
Small Scale Experimentation
1 -
Source Table
1 -
Spark config
1 -
Spark connector
1 -
Spark Error
1 -
Spark MLlib
2 -
Spark Pandas Api
1 -
Spark ui
1 -
Spark Version
2 -
Spark-submit
1 -
SparkML Models
2 -
Sparknlp
3 -
Spot
1 -
SQL
19 -
SQL Editor
1 -
SQL Queries
1 -
SQL Visualizations
1 -
Stage failure
2 -
Storage
3 -
Stream
2 -
Stream Data
1 -
Structtype
1 -
Structured streaming
2 -
Study Material
1 -
Summit23
2 -
Support
1 -
Support Team
1 -
Synapse
1 -
Synapse ML
1 -
Table
4 -
Table access control
1 -
Tableau
1 -
Task
1 -
Temporary View
1 -
Tensor flow
1 -
Test
1 -
Timeseries
1 -
Timestamps
1 -
TODAY
1 -
Training
6 -
Transaction Log
1 -
Trying
1 -
Tuning
2 -
UAT
1 -
Ui
1 -
Unexpected Error
1 -
Unity Catalog
12 -
Use Case
2 -
Use cases
1 -
Uuid
1 -
Validate ML Model
2 -
Values
1 -
Variable
1 -
Vector
1 -
Versioncontrol
1 -
Visualization
2 -
Web App Azure Databricks
1 -
Weekly Release Notes
2 -
Whl
1 -
Worker Nodes
1 -
Workflow
2 -
Workflow Jobs
1 -
Workspace
2 -
Write
1 -
Writing
1 -
Z-ordering
1 -
Zorder
1
- « Previous
- Next »
| User | Count |
|---|---|
| 90 | |
| 41 | |
| 38 | |
| 28 | |
| 25 |