- 8076 Views
- 3 replies
- 1 kudos
Resolved! Tuning `CrossValidator` spark job performance
I am running a 3-fold cross validation of an ML pipeline that utilizes `GBTClassifier` as the final step. It takes 18 hours to run and I am looking for feedback into how to improve the performance as I expect this to go faster.For context here is the...
- 8076 Views
- 3 replies
- 1 kudos
- 1 kudos
Hello @Assaad Mrad​ , So this looks like trying to decide between putting the pipeline in the cross validator or the cross validator in the pipeline. Since you are doing the polynomial expansion as part of the pipeline you might want to consider putt...
- 1 kudos
- 5169 Views
- 4 replies
- 0 kudos
Training Job Failure (Driver Error)
We have a new model training job that was running fine for a few days and then started failing. I have attached images for more details.I am wondering if 'can't reach driver cluster' is a red herring. It says the driver is healthy right before execut...
- 5169 Views
- 4 replies
- 0 kudos
- 0 kudos
In our case, we needed to correct our dependent libraries. We had an incorrect path referenced.
- 0 kudos
- 2114 Views
- 2 replies
- 0 kudos
Resolved! Vector search index stops at 45406
I am trying to create a vector search index for a table, but it stops at 45406 rows. I can see that the writeback table has all the records but the indexing stops. Is there a hard limit on index?
- 2114 Views
- 2 replies
- 0 kudos
- 0 kudos
There are some limits that you can be hitting: Row Size for Delta Sync Index: The maximum row size is 100KB.Embedding Source Column Size for Delta Sync Index: The maximum size is 32764 bytes.Bulk Upsert Request Size Limit for Direct Vector Index: The...
- 0 kudos
- 4906 Views
- 2 replies
- 1 kudos
AutoML Runs Failing
After the Data Exploration notebook runs successfully, all AutoML trials fail without providing a source notebook. I have ensured that the training data labels have no null values or any labels with 16 or less occurrences associated with them. I cann...
- 4906 Views
- 2 replies
- 1 kudos
- 1 kudos
@AnNg Have there been any updates on this feature?
- 1 kudos
- 1664 Views
- 2 replies
- 0 kudos
Python running far slower than locally, even with large cluster and multiple workers
I have a notebook that is running extremely slowly even when I try to do pretty basic python functions. It is running far slower than locally no matter what I try, this is in spite of using a 32gb 4 core cluster with 4-8 workers. For context, my data...
- 1664 Views
- 2 replies
- 0 kudos
- 0 kudos
Please share more information, for example: Type of data sourceType of operations being executed (sharing code if possible)Timings of local runs and Databricks runs
- 0 kudos
- 4479 Views
- 3 replies
- 5 kudos
Deploy a ML model, trained and registered in Databricks to AKS
Hi,I can train, registered a ML Model in my Datbricks Workspace.Then, to deploy it on AKS, I need to register the model in Azure ML, and then, deploy to AKS.Is it possible to skip the Azure ML step?I would like to deploy directly into my AKS instance...
- 4479 Views
- 3 replies
- 5 kudos
- 5 kudos
Is it still the case, can't we serve the model in Databricks. I am new to this, so I am just wondering the capabilities.
- 5 kudos
- 2499 Views
- 2 replies
- 1 kudos
Endpoint creation without scale-to-zero
Hi, I've got a question about deploying an endpoint for Llama 3.1 8b. The following code should create the endpoint without scale-to-zero. The endpoint is being created, but with scale-to-zero, although scale_to_zero_enabled is set to False. Instead ...
- 2499 Views
- 2 replies
- 1 kudos
- 1 kudos
Thanks for the reply @Walter_C. This didn't quite work, since it used a CPU and didn't consider the max_provisioned_throughput, but I finally got it to work like this: from mlflow.deployments import get_deploy_client client = get_deploy_client("data...
- 1 kudos
- 10215 Views
- 5 replies
- 2 kudos
Issue with Multi-column In predicates are not supported in the DELETE condition.
I'm trying to delete rows from a table with the same date or id as records in another table. I'm using the below query and get the error 'Multi-column In predicates are not supported in the DELETE condition'. delete from cost_model.cm_dispatch_consol...
- 10215 Views
- 5 replies
- 2 kudos
- 2 kudos
I seem to get this error on some DeltaTables and not others:df.createOrReplaceTempView("channels_to_delete") spark.sql(""" delete from lake.something.earnings where TenantId = :tenantId and ChannelId = in ( select ChannelId ...
- 2 kudos
- 3811 Views
- 3 replies
- 1 kudos
Resolved! Extracting Topics From Text Data Using PySpark
Hi EveryoneI tried to follow the same steps in Topic from Text on similar data as example. However, when I tri to fit the model with data I get this error.IllegalArgumentException: requirement failed: Column features must be of type equal to one of t...
- 3811 Views
- 3 replies
- 1 kudos
- 1 kudos
Hi @amirA ,The LDA model expects the features column to be of type Vector from the pyspark.ml.linalg module, specifically either a SparseVector or DenseVector, whereas you have provided Row type.You need to convert your Row object to SparseVector.Che...
- 1 kudos
- 9680 Views
- 15 replies
- 2 kudos
Serving Endpoint Container Image Creation Fails
Hello, I trained a model using MLFlow, and saved the model as an artifact. I can load the model from a notebook and it works as expected (i.e. I can load the model using its URI).However, when I want to deploy it using Databricks endpoints, container...
- 9680 Views
- 15 replies
- 2 kudos
- 2 kudos
@ivan_calvo The problem still exists. Surely there has to be some other option than downgrading the ML cluster to DBR 14.3 LTS ML?
- 2 kudos
- 1307 Views
- 2 replies
- 0 kudos
I want to develop an automated lead allocation system to prospect sales representatives.
I want to develop an automated lead allocation system to prospect sales representatives. Please suggest a suitable solution also any links if available.
- 1307 Views
- 2 replies
- 0 kudos
- 0 kudos
Hi jamesl,My use case is related to match the prospect sales agent for the customer entering retail store, when a customer enters a store based on the inputs provided and checking on if the customer is existing or new customer, I want to create a rea...
- 0 kudos
- 4269 Views
- 6 replies
- 4 kudos
- 4269 Views
- 6 replies
- 4 kudos
- 4 kudos
There could be multiple reasone why you're getting this error @avishkarborkar . If the course you're following requires Unity Catalog, first you need to check if you have a premium workspace. Next you need to make sure that your workspace is enabled ...
- 4 kudos
- 3235 Views
- 1 replies
- 0 kudos
Unable to Check Experiment Existence with path starting with /Workspace/ Directory in Databricks Pla
https://github.com/mlflow/mlflow/issues/11077 In Databricks, when attempting to set an experiment with an experiment_name specified as an absolute path from /Workspace/Shared/mlflow_experiment/<experiment_name>, the mlflow.set_experiment() function ...
- 3235 Views
- 1 replies
- 0 kudos
- 0 kudos
Before setting the experiment, use mlflow.get_experiment_by_name() to check if the experiment already exists. If it does, you can set the experiment without attempting to create it again.
- 0 kudos
- 941 Views
- 1 replies
- 0 kudos
What is the best to way to not deploy/run a workflow in production?
I am building and MLOps architecture.I do not want to deploy the training workflow to prod. My first approach was to selectively not deploy the workflow to prod, but this does not seem to be possible as in this thread:https://community.databricks.com...
- 941 Views
- 1 replies
- 0 kudos
- 0 kudos
Target Override Feature: You can use the target override feature to specify different configurations for different environments. However, this does not provide a direct way to exclude specific job resources.Environment-Specific Folders: Another app...
- 0 kudos
- 1358 Views
- 1 replies
- 0 kudos
request for exam certification voucher
Hi , I've completed the course Machine Learning with Databricks ! Looking forward to learn more .
- 1358 Views
- 1 replies
- 0 kudos
-
Access control
3 -
Access Data
2 -
AccessKeyVault
1 -
ADB
2 -
Airflow
1 -
Amazon
2 -
Apache
1 -
Apache spark
3 -
APILimit
1 -
Artifacts
1 -
Audit
1 -
Autoloader
6 -
Autologging
2 -
Automation
2 -
Automl
44 -
Aws databricks
1 -
AWSSagemaker
1 -
Azure
32 -
Azure active directory
1 -
Azure blob storage
2 -
Azure data lake
1 -
Azure Data Lake Storage
3 -
Azure data lake store
1 -
Azure databricks
32 -
Azure event hub
1 -
Azure key vault
1 -
Azure sql database
1 -
Azure Storage
2 -
Azure synapse
1 -
Azure Unity Catalog
1 -
Azure vm
1 -
AzureML
2 -
Bar
1 -
Beta
1 -
Better Way
1 -
BI Integrations
1 -
BI Tool
1 -
Billing and Cost Management
1 -
Blob
1 -
Blog
1 -
Blog Post
1 -
Broadcast variable
1 -
Business Intelligence
1 -
CatalogDDL
1 -
Centralized Model Registry
1 -
Certification
2 -
Certification Badge
1 -
Change
1 -
Change Logs
1 -
Check
2 -
Classification Model
1 -
Cloud Storage
1 -
Cluster
10 -
Cluster policy
1 -
Cluster Start
1 -
Cluster Termination
2 -
Clustering
1 -
ClusterMemory
1 -
CNN HOF
1 -
Column names
1 -
Community Edition
1 -
Community Edition Password
1 -
Community Members
1 -
Company Email
1 -
Condition
1 -
Config
1 -
Configure
3 -
Confluent Cloud
1 -
Container
2 -
ContainerServices
1 -
Control Plane
1 -
ControlPlane
1 -
Copy
1 -
Copy into
2 -
CosmosDB
1 -
Courses
2 -
Csv files
1 -
Dashboards
1 -
Data
8 -
Data Engineer Associate
1 -
Data Engineer Certification
1 -
Data Explorer
1 -
Data Ingestion
2 -
Data Ingestion & connectivity
11 -
Data Quality
1 -
Data Quality Checks
1 -
Data Science & Engineering
2 -
databricks
5 -
Databricks Academy
3 -
Databricks Account
1 -
Databricks AutoML
9 -
Databricks Cluster
3 -
Databricks Community
5 -
Databricks community edition
4 -
Databricks connect
1 -
Databricks dbfs
1 -
Databricks Feature Store
1 -
Databricks Job
1 -
Databricks Lakehouse
1 -
Databricks Mlflow
4 -
Databricks Model
2 -
Databricks notebook
10 -
Databricks ODBC
1 -
Databricks Platform
1 -
Databricks Pyspark
1 -
Databricks Python Notebook
1 -
Databricks Runtime
9 -
Databricks SQL
8 -
Databricks SQL Permission Problems
1 -
Databricks Terraform
1 -
Databricks Training
2 -
Databricks Unity Catalog
1 -
Databricks V2
1 -
Databricks version
1 -
Databricks Workflow
2 -
Databricks Workflows
1 -
Databricks workspace
2 -
Databricks-connect
1 -
DatabricksContainer
1 -
DatabricksML
6 -
Dataframe
3 -
DataSharing
1 -
Datatype
1 -
DataVersioning
1 -
Date Column
1 -
Dateadd
1 -
DB Notebook
1 -
DB Runtime
1 -
DBFS
5 -
DBFS Rest Api
1 -
Dbt
1 -
Dbu
1 -
DDL
1 -
DDP
1 -
Dear Community
1 -
DecisionTree
1 -
Deep learning
4 -
Default Location
1 -
Delete
1 -
Delt Lake
4 -
Delta lake table
1 -
Delta Live
1 -
Delta Live Tables
6 -
Delta log
1 -
Delta Sharing
3 -
Delta-lake
1 -
Deploy
1 -
DESC
1 -
Details
1 -
Dev
1 -
Devops
1 -
Df
1 -
Different Notebook
1 -
Different Parameters
1 -
DimensionTables
1 -
Directory
3 -
Disable
1 -
Distribution
1 -
DLT
6 -
DLT Pipeline
3 -
Dolly
5 -
Dolly Demo
2 -
Download
2 -
EC2
1 -
Emr
2 -
Ensemble Models
1 -
Environment Variable
1 -
Epoch
1 -
Error handling
1 -
Error log
2 -
Eventhub
1 -
Example
1 -
Experiments
4 -
External Sources
1 -
Extract
1 -
Fact Tables
1 -
Failure
2 -
Feature Lookup
2 -
Feature Store
61 -
Feature Store API
2 -
Feature Store Table
1 -
Feature Table
6 -
Feature Tables
4 -
Features
2 -
FeatureStore
2 -
File Path
2 -
File Size
1 -
Fine Tune Spark Jobs
1 -
Forecasting
2 -
Forgot Password
2 -
Garbage Collection
1 -
Garbage Collection Optimization
1 -
Github
2 -
Github actions
2 -
Github Repo
2 -
Gitlab
1 -
GKE
1 -
Global Init Script
1 -
Global init scripts
4 -
Governance
1 -
Hi
1 -
Horovod
1 -
Html
1 -
Hyperopt
4 -
Hyperparameter Tuning
2 -
Iam
1 -
Image
3 -
Image Data
1 -
Inference Setup Error
1 -
INFORMATION
1 -
Input
1 -
Insert
1 -
Instance Profile
1 -
Int
2 -
Interactive cluster
1 -
Internal error
1 -
Invalid Type Code
1 -
IP
1 -
Ipython
1 -
Ipywidgets
1 -
JDBC Connections
1 -
Jira
1 -
Job
4 -
Job Parameters
1 -
Job Runs
1 -
Join
1 -
Jsonfile
1 -
Kafka consumer
1 -
Key Management
1 -
Kinesis
1 -
Lakehouse
1 -
Large Datasets
1 -
Latest Version
1 -
Learning
1 -
Limit
3 -
LLM
3 -
LLMs
3 -
Local computer
1 -
Local Machine
1 -
Log Model
2 -
Logging
1 -
Login
1 -
Logs
1 -
Long Time
2 -
Low Latency APIs
2 -
LTS ML
3 -
Machine
3 -
Machine Learning
24 -
Machine Learning Associate
1 -
Managed Table
1 -
Max Retries
1 -
Maximum Number
1 -
Medallion Architecture
1 -
Memory
3 -
Metadata
1 -
Metrics
3 -
Microsoft azure
1 -
ML Lifecycle
4 -
ML Model
4 -
ML Practioner
3 -
ML Runtime
1 -
MlFlow
75 -
MLflow API
5 -
MLflow Artifacts
2 -
MLflow Experiment
6 -
MLflow Experiments
3 -
Mlflow Model
10 -
Mlflow registry
3 -
Mlflow Run
1 -
Mlflow Server
5 -
MLFlow Tracking Server
3 -
MLModels
2 -
Model Deployment
4 -
Model Lifecycle
6 -
Model Loading
2 -
Model Monitoring
1 -
Model registry
5 -
Model Serving
28 -
Model Serving Cluster
2 -
Model Serving REST API
6 -
Model Training
2 -
Model Tuning
1 -
Models
8 -
Module
3 -
Modulenotfounderror
1 -
MongoDB
1 -
Mount Point
1 -
Mounts
1 -
Multi
1 -
Multiline
1 -
Multiple users
1 -
Nested
1 -
New Feature
1 -
New Features
1 -
New Workspace
1 -
Nlp
3 -
Note
1 -
Notebook
6 -
Notification
2 -
Object
3 -
Onboarding
1 -
Online Feature Store Table
1 -
OOM Error
1 -
Open Source MLflow
4 -
Optimization
2 -
Optimize Command
1 -
OSS
3 -
Overwatch
1 -
Overwrite
2 -
Packages
2 -
Pandas udf
4 -
Pandas_udf
1 -
Parallel
1 -
Parallel processing
1 -
Parallel Runs
1 -
Parallelism
1 -
Parameter
2 -
PARAMETER VALUE
2 -
Partner Academy
1 -
Pending State
2 -
Performance Tuning
1 -
Photon Engine
1 -
Pickle
1 -
Pickle Files
2 -
Pip
2 -
Points
1 -
Possible
1 -
Postgres
1 -
Pricing
2 -
Primary Key
1 -
Primary Key Constraint
1 -
Progress bar
2 -
Proven Practices
2 -
Public
2 -
Pymc3 Models
2 -
PyPI
1 -
Pyspark
6 -
Python
21 -
Python API
1 -
Python Code
1 -
Python Function
3 -
Python Libraries
1 -
Python Packages
1 -
Python Project
1 -
Pytorch
3 -
Reading-excel
2 -
Redis
2 -
Region
1 -
Remote RPC Client
1 -
RESTAPI
1 -
Result
1 -
Runtime update
1 -
Sagemaker
1 -
Salesforce
1 -
SAP
1 -
Scalability
1 -
Scalable Machine
2 -
Schema evolution
1 -
Script
1 -
Search
1 -
Security
2 -
Security Exception
1 -
Self Service Notebooks
1 -
Server
1 -
Serverless
1 -
Serving
1 -
Shap
2 -
Size
1 -
Sklearn
1 -
Slow
1 -
Small Scale Experimentation
1 -
Source Table
1 -
Spark config
1 -
Spark connector
1 -
Spark Error
1 -
Spark MLlib
2 -
Spark Pandas Api
1 -
Spark ui
1 -
Spark Version
2 -
Spark-submit
1 -
SparkML Models
2 -
Sparknlp
3 -
Spot
1 -
SQL
19 -
SQL Editor
1 -
SQL Queries
1 -
SQL Visualizations
1 -
Stage failure
2 -
Storage
3 -
Stream
2 -
Stream Data
1 -
Structtype
1 -
Structured streaming
2 -
Study Material
1 -
Summit23
2 -
Support
1 -
Support Team
1 -
Synapse
1 -
Synapse ML
1 -
Table
4 -
Table access control
1 -
Tableau
1 -
Task
1 -
Temporary View
1 -
Tensor flow
1 -
Test
1 -
Timeseries
1 -
Timestamps
1 -
TODAY
1 -
Training
6 -
Transaction Log
1 -
Trying
1 -
Tuning
2 -
UAT
1 -
Ui
1 -
Unexpected Error
1 -
Unity Catalog
12 -
Use Case
2 -
Use cases
1 -
Uuid
1 -
Validate ML Model
2 -
Values
1 -
Variable
1 -
Vector
1 -
Versioncontrol
1 -
Visualization
2 -
Web App Azure Databricks
1 -
Weekly Release Notes
2 -
Whl
1 -
Worker Nodes
1 -
Workflow
2 -
Workflow Jobs
1 -
Workspace
2 -
Write
1 -
Writing
1 -
Z-ordering
1 -
Zorder
1
- « Previous
- Next »
| User | Count |
|---|---|
| 90 | |
| 41 | |
| 38 | |
| 28 | |
| 25 |