- 4182 Views
- 1 replies
- 0 kudos
Hyperopt (15.4 LTS ML) ignores autologger settings
I use ML Flow Experiment to store models once they leave very early tests and development. I switched lately to 15.4 LTS ML and was hit by unhinged Hyperopt behavior:it was creating Experiment logs ignoring i) autologger is off on the workspace level...
- 4182 Views
- 1 replies
- 0 kudos
- 0 kudos
Hey @art1 , sorry this post got lost in the shuffle. Here are some things to consider regarding your question: Thanks for flagging this—what you’re seeing is expected given how Databricks integrates Hyperopt with MLflow, and there are clear ways t...
- 0 kudos
- 4646 Views
- 1 replies
- 0 kudos
Working with pyspark dataframe with machine learning libraries / statistical model libraries
Hi Team, I am working with huge volume of data (50GB) and i decompose the time series data using the statsmodel.Having said that the major challenge i am facing is the compatibility of the pyspark dataframe with the machine learning algorithms. altho...
- 4646 Views
- 1 replies
- 0 kudos
- 0 kudos
Greetings @javeed , You’re right to call out the friction between a PySpark DataFrame and many Python ML libraries like statsmodels; most Python ML stacks expect pandas, while Spark is distributed-first. Here’s how to bridge that gap efficiently fo...
- 0 kudos
- 4388 Views
- 1 replies
- 1 kudos
databricks-vectorsearch 0.53 unable to use similarity_search()
I have an issue with databricks-vectorsearch package. Version 0.51 suddenly stopped working this week because:It now expected me to provide azure_tenant_id in addition to service principal's client ID and secret.After supplying tenant ID, it showed s...
- 4388 Views
- 1 replies
- 1 kudos
- 1 kudos
Hi @snaveedgm , This is interesting - can you double-check that the service principal has CAN QUERY on the embedding endpoint used for ingestion and/or querying (databricks-bge-large-en in your case)? Even though your direct REST test works, double-c...
- 1 kudos
- 4399 Views
- 1 replies
- 1 kudos
Resolved! ML Solution for unstructured data containing Images and videos
Hi,I have a use case of developing an entire ML solution within Databricks starting from ingestion to inference and monitoring, but the problem is that we have unstructured data containing Images and Video for training the model using frameworks such...
- 4399 Views
- 1 replies
- 1 kudos
- 1 kudos
Hi @aswinkks , This is a very broad question, but generally, when dealing with video data, you convert the videos to images and have a system in place for training and another for inference. This Databricks blog posts explains how to set up a video ...
- 1 kudos
- 4365 Views
- 1 replies
- 1 kudos
Resolved! notebook stuck at "filtering data" or waiting to run
Hi, my data is in vector sparse representaion, and it was working fine (display and training ml models), I added few features that converted data from sparse to dense represenation and after that anything I want to perform on data stuck(display or ml...
- 4365 Views
- 1 replies
- 1 kudos
- 1 kudos
Greetings @harry_dfe , Thanks for the details — this almost certainly stems from your data flipping from a sparse vector representation to a dense one, which explodes per‑row memory and stalls actions like display, writes, and ML training. Why t...
- 1 kudos
- 4337 Views
- 1 replies
- 0 kudos
How to transpose spark dataframe using R API?
Hello,I recently discovered the sparklyr package and found it quite useful. After setting up the Spark connection, I can apply dplyr functions to manipulate large tables. However, it seems that any functions outside of dplyr cannot be used on Spark v...
- 4337 Views
- 1 replies
- 0 kudos
- 0 kudos
Greetings @Paddy_chu , You’re right that sparklyr gives you most dplyr verbs on Spark, but many tidyr verbs (including pivot_wider/pivot_longer) aren’t translated to Spark SQL and thus won’t run lazily on Spark tables. The practical options are to...
- 0 kudos
- 5002 Views
- 1 replies
- 2 kudos
Resolved! Experiences with CatBoost Spark Integration in Production on Databricks?
Hi Community,I am currently evaluating various gradient boosting options on Databricks using production-level data, including the CatBoost Spark integration (ai.catboost:catboost-spark).I would love to hear from others who have successfully used this...
- 5002 Views
- 1 replies
- 2 kudos
- 2 kudos
Hi @moh3th1 , I can't personally speak to using CatBoost, but I can discuss preferred libraries and recommendations per approach with various gradient-boosting libraries within Databricks. Preferred for robust distributed GBM on Databricks: XGBoost ...
- 2 kudos
- 4581 Views
- 1 replies
- 1 kudos
Resolved! MLflow Nested run with applyInPandas does not execute
I am trying to train an forecasting model along with Hyperparameters tuning with Hyperopt.I have multiple time series for "KEY" each of which I want to train a separate model. To do this I am using Spark's applyInPandas to tune and train model for ea...
- 4581 Views
- 1 replies
- 1 kudos
- 1 kudos
Hi @shubham_lekhwar , This is a common context-passing issue when using Spark with MLflow. The problem is that the nested=True flag in mlflow.start_run relies on an active run being present in the current process context. Your Parent_RUN is active on...
- 1 kudos
- 4958 Views
- 1 replies
- 1 kudos
Databricks app and R shiny
Hello,I've been testing the Databricks app and have the follow questions:1. My organization currently uses Catalog Explorer instead of Unity Catalog. I want to develop a Shiny app and was able to run code from the template under New > App. However, t...
- 4958 Views
- 1 replies
- 1 kudos
- 1 kudos
Thanks for the detailed context—here’s how to get Shiny-based apps working with your current setup and data. 1) Accessing data from “Catalog Explorer” in Databricks Apps A few key points about the Databricks Apps environment and data access: Apps su...
- 1 kudos
- 3687 Views
- 1 replies
- 1 kudos
Nested experiments and UC
Í have a general problem. I run a nested experiment in ML FLow, training and logging several models in a loop. Then I want to register the best in UC. No problem so far. But when I load the model I register and run prediction it dosen't work. If I o...
- 3687 Views
- 1 replies
- 1 kudos
- 1 kudos
Hey @Henrik_ , There are a few things that could be happening here, if you share the error message/stack trace you get when it doesn’t work, I can help figure out which of these could be biting you and tailor the fix. In the meantime, here's a quick ...
- 1 kudos
- 2450 Views
- 2 replies
- 4 kudos
Resolved! Best practices for structuring databricks workspaces for CI/CD and ML workflows
Hi everyone,I’m designing the CI/CD process for our environment environment focused on machine learning and data science projects, and I’d like to understand what the best practices are regarding workspace organization—especially when using Unity Cat...
- 2450 Views
- 2 replies
- 4 kudos
- 4 kudos
When designing a CI/CD process for Databricks environments — especially for machine learning and data science projects using Unity Catalog — enterprise-scale workspace organization should balance isolation, governance, and collaboration. The recommen...
- 4 kudos
- 1306 Views
- 3 replies
- 1 kudos
Safe Update Strategy for Online Feature Store Without Endpoint Disruption
Hi Team,We are implementing Databricks Online Feature Store using Lakebase architecture and have run into some constraints during development:Requirements:Deploy an offline table as a synced online table and create a feature spec that queries from th...
- 1306 Views
- 3 replies
- 1 kudos
- 1 kudos
Hi Mark, Thanks for your response. I followed the steps you suggested:Created the table and set primary key + time series key constraints.Enabled Change Data Feed.Created the feature table and deployed the online endpoint — this worked fine.Removed s...
- 1 kudos
- 1065 Views
- 2 replies
- 1 kudos
Offline Feature Store in Databricks Serving
Hi, I am planning to deploy a model (pyfunc) with Databricks Serving. During inference, my model needs to retrieve some data from delta tables. I could make these tables to an offline feature store as well.Latency is not so important. It doesnt matt...
- 1065 Views
- 2 replies
- 1 kudos
- 1 kudos
There is a ready feature engineering function for that: # on non ML runtime please install databricks-feature-engineering>=0.13.0a3" from databricks.feature_engineering import FeatureEngineeringClient fe = FeatureEngineeringClient() from databrick...
- 1 kudos
- 768 Views
- 2 replies
- 0 kudos
Resolved! how to speed up inference?
Hi guys,I'm new to this concept, but we have several ML models that follow the same structure from the code. What I don’t fully understand is how to handle different types of models efficiently — right now, I need to loop through my items to get the ...
- 768 Views
- 2 replies
- 0 kudos
- 0 kudos
Hi @jeremy98 I have not tried this - but could using Python's multiprocessing library to assign the inference for different models to different CPU cores be something you would want to give an attempt? Also here's a useful blog - https://docs.datab...
- 0 kudos
- 621 Views
- 1 replies
- 1 kudos
Resolved! How does Databricks AutoML handle null imputation for categorical features by default?
Hi everyone I’m using Databricks AutoML (classification workflow) on Databricks Runtime 10.4 LTS ML+, and I’d like to clarify how missing (null) values are handled for categorical (string) columns by default.From the AutoML documentation, I see that:...
- 621 Views
- 1 replies
- 1 kudos
- 1 kudos
Hello @spearitchmeta , I looked internally to see if I could help with this and I found some information that will shed light on your question. Here’s how missing (null) values in categorical (string) columns are handled in Databricks AutoML on Dat...
- 1 kudos
-
Access control
3 -
Access Data
2 -
AccessKeyVault
1 -
ADB
2 -
Airflow
1 -
Amazon
2 -
Apache
1 -
Apache spark
3 -
APILimit
1 -
Artifacts
1 -
Audit
1 -
Autoloader
6 -
Autologging
2 -
Automation
2 -
Automl
44 -
Aws databricks
1 -
AWSSagemaker
1 -
Azure
32 -
Azure active directory
1 -
Azure blob storage
2 -
Azure data lake
1 -
Azure Data Lake Storage
3 -
Azure data lake store
1 -
Azure databricks
32 -
Azure event hub
1 -
Azure key vault
1 -
Azure sql database
1 -
Azure Storage
2 -
Azure synapse
1 -
Azure Unity Catalog
1 -
Azure vm
1 -
AzureML
2 -
Bar
1 -
Beta
1 -
Better Way
1 -
BI Integrations
1 -
BI Tool
1 -
Billing and Cost Management
1 -
Blob
1 -
Blog
1 -
Blog Post
1 -
Broadcast variable
1 -
Business Intelligence
1 -
CatalogDDL
1 -
Centralized Model Registry
1 -
Certification
2 -
Certification Badge
1 -
Change
1 -
Change Logs
1 -
Check
2 -
Classification Model
1 -
Cloud Storage
1 -
Cluster
10 -
Cluster policy
1 -
Cluster Start
1 -
Cluster Termination
2 -
Clustering
1 -
ClusterMemory
1 -
CNN HOF
1 -
Column names
1 -
Community Edition
1 -
Community Edition Password
1 -
Community Members
1 -
Company Email
1 -
Condition
1 -
Config
1 -
Configure
3 -
Confluent Cloud
1 -
Container
2 -
ContainerServices
1 -
Control Plane
1 -
ControlPlane
1 -
Copy
1 -
Copy into
2 -
CosmosDB
1 -
Courses
2 -
Csv files
1 -
Dashboards
1 -
Data
8 -
Data Engineer Associate
1 -
Data Engineer Certification
1 -
Data Explorer
1 -
Data Ingestion
2 -
Data Ingestion & connectivity
11 -
Data Quality
1 -
Data Quality Checks
1 -
Data Science & Engineering
2 -
databricks
5 -
Databricks Academy
3 -
Databricks Account
1 -
Databricks AutoML
9 -
Databricks Cluster
3 -
Databricks Community
5 -
Databricks community edition
4 -
Databricks connect
1 -
Databricks dbfs
1 -
Databricks Feature Store
1 -
Databricks Job
1 -
Databricks Lakehouse
1 -
Databricks Mlflow
4 -
Databricks Model
2 -
Databricks notebook
10 -
Databricks ODBC
1 -
Databricks Platform
1 -
Databricks Pyspark
1 -
Databricks Python Notebook
1 -
Databricks Runtime
9 -
Databricks SQL
8 -
Databricks SQL Permission Problems
1 -
Databricks Terraform
1 -
Databricks Training
2 -
Databricks Unity Catalog
1 -
Databricks V2
1 -
Databricks version
1 -
Databricks Workflow
2 -
Databricks Workflows
1 -
Databricks workspace
2 -
Databricks-connect
1 -
DatabricksContainer
1 -
DatabricksML
6 -
Dataframe
3 -
DataSharing
1 -
Datatype
1 -
DataVersioning
1 -
Date Column
1 -
Dateadd
1 -
DB Notebook
1 -
DB Runtime
1 -
DBFS
5 -
DBFS Rest Api
1 -
Dbt
1 -
Dbu
1 -
DDL
1 -
DDP
1 -
Dear Community
1 -
DecisionTree
1 -
Deep learning
4 -
Default Location
1 -
Delete
1 -
Delt Lake
4 -
Delta lake table
1 -
Delta Live
1 -
Delta Live Tables
6 -
Delta log
1 -
Delta Sharing
3 -
Delta-lake
1 -
Deploy
1 -
DESC
1 -
Details
1 -
Dev
1 -
Devops
1 -
Df
1 -
Different Notebook
1 -
Different Parameters
1 -
DimensionTables
1 -
Directory
3 -
Disable
1 -
Distribution
1 -
DLT
6 -
DLT Pipeline
3 -
Dolly
5 -
Dolly Demo
2 -
Download
2 -
EC2
1 -
Emr
2 -
Ensemble Models
1 -
Environment Variable
1 -
Epoch
1 -
Error handling
1 -
Error log
2 -
Eventhub
1 -
Example
1 -
Experiments
4 -
External Sources
1 -
Extract
1 -
Fact Tables
1 -
Failure
2 -
Feature Lookup
2 -
Feature Store
61 -
Feature Store API
2 -
Feature Store Table
1 -
Feature Table
6 -
Feature Tables
4 -
Features
2 -
FeatureStore
2 -
File Path
2 -
File Size
1 -
Fine Tune Spark Jobs
1 -
Forecasting
2 -
Forgot Password
2 -
Garbage Collection
1 -
Garbage Collection Optimization
1 -
Github
2 -
Github actions
2 -
Github Repo
2 -
Gitlab
1 -
GKE
1 -
Global Init Script
1 -
Global init scripts
4 -
Governance
1 -
Hi
1 -
Horovod
1 -
Html
1 -
Hyperopt
4 -
Hyperparameter Tuning
2 -
Iam
1 -
Image
3 -
Image Data
1 -
Inference Setup Error
1 -
INFORMATION
1 -
Input
1 -
Insert
1 -
Instance Profile
1 -
Int
2 -
Interactive cluster
1 -
Internal error
1 -
Invalid Type Code
1 -
IP
1 -
Ipython
1 -
Ipywidgets
1 -
JDBC Connections
1 -
Jira
1 -
Job
4 -
Job Parameters
1 -
Job Runs
1 -
Join
1 -
Jsonfile
1 -
Kafka consumer
1 -
Key Management
1 -
Kinesis
1 -
Lakehouse
1 -
Large Datasets
1 -
Latest Version
1 -
Learning
1 -
Limit
3 -
LLM
3 -
LLMs
3 -
Local computer
1 -
Local Machine
1 -
Log Model
2 -
Logging
1 -
Login
1 -
Logs
1 -
Long Time
2 -
Low Latency APIs
2 -
LTS ML
3 -
Machine
3 -
Machine Learning
24 -
Machine Learning Associate
1 -
Managed Table
1 -
Max Retries
1 -
Maximum Number
1 -
Medallion Architecture
1 -
Memory
3 -
Metadata
1 -
Metrics
3 -
Microsoft azure
1 -
ML Lifecycle
4 -
ML Model
4 -
ML Practioner
3 -
ML Runtime
1 -
MlFlow
75 -
MLflow API
5 -
MLflow Artifacts
2 -
MLflow Experiment
6 -
MLflow Experiments
3 -
Mlflow Model
10 -
Mlflow registry
3 -
Mlflow Run
1 -
Mlflow Server
5 -
MLFlow Tracking Server
3 -
MLModels
2 -
Model Deployment
4 -
Model Lifecycle
6 -
Model Loading
2 -
Model Monitoring
1 -
Model registry
5 -
Model Serving
27 -
Model Serving Cluster
2 -
Model Serving REST API
6 -
Model Training
2 -
Model Tuning
1 -
Models
8 -
Module
3 -
Modulenotfounderror
1 -
MongoDB
1 -
Mount Point
1 -
Mounts
1 -
Multi
1 -
Multiline
1 -
Multiple users
1 -
Nested
1 -
New Feature
1 -
New Features
1 -
New Workspace
1 -
Nlp
3 -
Note
1 -
Notebook
6 -
Notification
2 -
Object
3 -
Onboarding
1 -
Online Feature Store Table
1 -
OOM Error
1 -
Open Source MLflow
4 -
Optimization
2 -
Optimize Command
1 -
OSS
3 -
Overwatch
1 -
Overwrite
2 -
Packages
2 -
Pandas udf
4 -
Pandas_udf
1 -
Parallel
1 -
Parallel processing
1 -
Parallel Runs
1 -
Parallelism
1 -
Parameter
2 -
PARAMETER VALUE
2 -
Partner Academy
1 -
Pending State
2 -
Performance Tuning
1 -
Photon Engine
1 -
Pickle
1 -
Pickle Files
2 -
Pip
2 -
Points
1 -
Possible
1 -
Postgres
1 -
Pricing
2 -
Primary Key
1 -
Primary Key Constraint
1 -
Progress bar
2 -
Proven Practices
2 -
Public
2 -
Pymc3 Models
2 -
PyPI
1 -
Pyspark
6 -
Python
21 -
Python API
1 -
Python Code
1 -
Python Function
3 -
Python Libraries
1 -
Python Packages
1 -
Python Project
1 -
Pytorch
3 -
Reading-excel
2 -
Redis
2 -
Region
1 -
Remote RPC Client
1 -
RESTAPI
1 -
Result
1 -
Runtime update
1 -
Sagemaker
1 -
Salesforce
1 -
SAP
1 -
Scalability
1 -
Scalable Machine
2 -
Schema evolution
1 -
Script
1 -
Search
1 -
Security
2 -
Security Exception
1 -
Self Service Notebooks
1 -
Server
1 -
Serverless
1 -
Serving
1 -
Shap
2 -
Size
1 -
Sklearn
1 -
Slow
1 -
Small Scale Experimentation
1 -
Source Table
1 -
Spark config
1 -
Spark connector
1 -
Spark Error
1 -
Spark MLlib
2 -
Spark Pandas Api
1 -
Spark ui
1 -
Spark Version
2 -
Spark-submit
1 -
SparkML Models
2 -
Sparknlp
3 -
Spot
1 -
SQL
19 -
SQL Editor
1 -
SQL Queries
1 -
SQL Visualizations
1 -
Stage failure
2 -
Storage
3 -
Stream
2 -
Stream Data
1 -
Structtype
1 -
Structured streaming
2 -
Study Material
1 -
Summit23
2 -
Support
1 -
Support Team
1 -
Synapse
1 -
Synapse ML
1 -
Table
4 -
Table access control
1 -
Tableau
1 -
Task
1 -
Temporary View
1 -
Tensor flow
1 -
Test
1 -
Timeseries
1 -
Timestamps
1 -
TODAY
1 -
Training
6 -
Transaction Log
1 -
Trying
1 -
Tuning
2 -
UAT
1 -
Ui
1 -
Unexpected Error
1 -
Unity Catalog
12 -
Use Case
2 -
Use cases
1 -
Uuid
1 -
Validate ML Model
2 -
Values
1 -
Variable
1 -
Vector
1 -
Versioncontrol
1 -
Visualization
2 -
Web App Azure Databricks
1 -
Weekly Release Notes
2 -
Whl
1 -
Worker Nodes
1 -
Workflow
2 -
Workflow Jobs
1 -
Workspace
2 -
Write
1 -
Writing
1 -
Z-ordering
1 -
Zorder
1
- « Previous
- Next »
| User | Count |
|---|---|
| 90 | |
| 40 | |
| 38 | |
| 28 | |
| 25 |