- 32496 Views
- 25 replies
- 0 kudos
Problem when serving a langchain model on Databricks
I´m trying to model serving a LLM LangChain Model and every time it fails with this messsage:[6b6448zjll] [2024-02-06 14:09:55 +0000] [1146] [INFO] Booting worker with pid: 1146[6b6448zjll] An error occurred while loading the model. You haven't confi...
- 32496 Views
- 25 replies
- 0 kudos
- 0 kudos
Hi @DataWrangler and Team.I got to solve the initial problem from some tips you gave. I used your code as base and did some modifications adapted to what I have, I mean , No UC enabled and not able to use DatabricksEmbeddings, DatabricksVectorSearch ...
- 0 kudos
- 9420 Views
- 2 replies
- 2 kudos
Resolved! Running multiple linear regressions in parallel (speeding up for loop)
Hi, I am running several linear regressions on my dataframe, in which I run a regression for every unique value in the column "item" , apply the model to a new dataset (vector_new), and at the end union the results as the loop runs. The problem is th...
- 9420 Views
- 2 replies
- 2 kudos
- 2 kudos
@Marcela Bejarano​ :One approach to speed up the process is to avoid using a loop and instead use Spark's groupBy and map functions. Here is an example:from pyspark.ml import Pipeline from pyspark.ml.feature import VectorAssembler from pyspark.ml.reg...
- 2 kudos
- 1554 Views
- 1 replies
- 0 kudos
Copy Into command to copy into delta table with predefined schema and csv file has no headers
How do i use copy into command to load 200+ tables with 50+ columns into a delta lake table with predefined schema. I am looking for a more generic approach to be handled in pyspark code.I am aware that we can pass the column expression into the sele...
- 1554 Views
- 1 replies
- 0 kudos
- 0 kudos
Does your source data have same number of columns as your target Delta tables? In that case, you can do it this way:COPY INTO my_pipe_dataFROM 's3://my-bucket/pipeData'FILEFORMAT = CSVFORMAT_OPTIONS ('mergeSchema' = 'true','delimiter' = '|','header' ...
- 0 kudos
- 4665 Views
- 2 replies
- 1 kudos
Resolved! How do I use the Copy Into command to copy data into a Delta Table? Looking for examples where you want to have a pre-defined schema
I've reviewed the COPY INTO docs here - https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-copy-into.html#examples but there's only one simple example. Looking for some additional examples that show loading data from CSV - with ...
- 4665 Views
- 2 replies
- 1 kudos
- 1 kudos
Here's an example for predefined schemaUsing COPY INTO with a predefined table schema – Trick here is to CAST the CSV dataset into your desired schema in the select statement of COPY INTO. Example below%sql CREATE OR REPLACE TABLE copy_into_bronze_te...
- 1 kudos
- 5815 Views
- 1 replies
- 0 kudos
How to download a pytorch model created via notebook and saved in a folder ?
I have created a pytorch model using databricks notebooks and saved it in a folder in workspace. MLFlow is not used.When I try to download the files from the folder it exceeds the download limit. Is there a way to download the model locally into my s...
- 5815 Views
- 1 replies
- 0 kudos
- 3389 Views
- 1 replies
- 0 kudos
Resolved! Query ML Endpoint with R and Curl
I am trying to get a prediction by querying the ML Endpoint on Azure Databricks with R. I'm not sure what is the format of the expected data. Is there any other problem with this code? Thanks!!!
- 3389 Views
- 1 replies
- 0 kudos
- 0 kudos
Hi Kaniz, I was able to find the solution. You should post this in the examples when you click "Query Endpoint"You only have code for Browser, Curl, Python, SQL. You should add a tab for RHere is the solution:library(httr)url <- "https://adb-********...
- 0 kudos
- 2173 Views
- 1 replies
- 0 kudos
Security Controls to implement on Machine Learning Persona
Hello,Hope everyone are doing well. You may be aware that we are using Table ACL enabled cluster to ensure the adequate security controls on Databricks. You may be also aware that we can not use Table enabled ACL cluster on Machine Learning Persona. ...
- 2173 Views
- 1 replies
- 0 kudos
- 1710 Views
- 0 replies
- 0 kudos
DBR CLI v0.216.0 failed to pass bundle variable for notebook task
After installing the new version of the CLI (v0.216.0) the bundle variable for the notebook task is not parsed correctly, see code below:tasks: - task_key: notebook_task job_cluster_key: job_cluster notebook_task: ...
- 1710 Views
- 0 replies
- 0 kudos
- 2432 Views
- 0 replies
- 1 kudos
MLflow Experiments in Unity Catalog
Will MLflow Experiments be incorporated into Unity Catalog similar to models and feature tables? I feel like this is the final piece missing in a comprehensive Unity Catalog backed MLOps workflow. Currently it seems they can only be stored in a dbfs ...
- 2432 Views
- 0 replies
- 1 kudos
- 5021 Views
- 1 replies
- 0 kudos
pdb debugger on databricks
I am new to databricks. and trying to debug my python application with variable-explore by following the instruction from: https://www.databricks.com/blog/new-debugging-features-databricks-notebooks-variable-explorerI added the "import pdb" in the fi...
- 5021 Views
- 1 replies
- 0 kudos
- 0 kudos
I test with some simple applications, it works as you described. However, the application I am debugging uses the pyspark structured streaming, which runs continuously. After inserting pdb.set_trace(), the application paused at the breakpoint, but t...
- 0 kudos
- 7401 Views
- 1 replies
- 0 kudos
Optimizing for Recall in Azure AutoML UI
Hi all, I've been using Azure AutoML and noticed that I can choose 'recall' as my optimization metric in the notebook but not in the Azure AutoML UI. The Databricks documentation also doesn't list 'recall' as an optimization metric.Is there a reason ...
- 7401 Views
- 1 replies
- 0 kudos
- 0 kudos
On the databricks notebook itself, I can see that databricks.automl supports using recall as a primary metric Help on function classify in module databricks.automl: :param primary_metric: primary metric to select the best model. Each trial will...
- 0 kudos
- 6397 Views
- 6 replies
- 7 kudos
How to save model produce by distributed training?
I am trying to save model after distributed training via the following codeimport sys from spark_tensorflow_distributor import MirroredStrategyRunner import mlflow.keras mlflow.keras.autolog() mlflow.log_param("learning_rate", 0.001) import...
- 6397 Views
- 6 replies
- 7 kudos
- 7 kudos
I think I finally worked this out.Here is the extra code to save out the model only once and from the 1st node:context = pyspark.BarrierTaskContext.get() if context.partitionId() == 0: mlflow.keras.log_model(model, "mymodel")
- 7 kudos
- 2719 Views
- 0 replies
- 0 kudos
'error_code': 'INVALID_PARAMETER_VALUE', 'message': 'Too many sources. It cannot be more than 100'
I am getting the following error while saving a delta table in the feature storeWARNING databricks.feature_store._catalog_client_helper: Failed to record data sources in the catalog. Exception: {'error_code': 'INVALID_PARAMETER_VALUE', 'message': 'To...
- 2719 Views
- 0 replies
- 0 kudos
- 3462 Views
- 2 replies
- 1 kudos
AutoMl Dataset too large
Hello community,i have the following problem: I am using automl to solve a regression model, but in the preprocessing my dataset is sampled to ~30% of the original amount.I am using runtime 14.2 ML Driver: Standard_DS4_v2 28GB Memory 8 coresWorker: S...
- 3462 Views
- 2 replies
- 1 kudos
- 1 kudos
I am pretty sure that i know what the problem was. I had a timestamp column (with second precision) as a feature. If they get one hot encoded, the dataset can get pretty large.
- 1 kudos
- 2552 Views
- 2 replies
- 0 kudos
Error: batch scoring with mlflow.keras flavor model
I am logging a trained keras model using the following: fe.log_model( model=model, artifact_path="wine_quality_prediction", flavor= mlflow.keras, training_set=training_set, registered_model_name=model_name )And when I call the following:predictions_...
- 2552 Views
- 2 replies
- 0 kudos
Join Us as a Local Community Builder!
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now-
Access control
3 -
Access Data
2 -
AccessKeyVault
1 -
ADB
2 -
Airflow
1 -
Amazon
2 -
Apache
1 -
Apache spark
3 -
APILimit
1 -
Artifacts
1 -
Audit
1 -
Autoloader
6 -
Autologging
2 -
Automation
2 -
Automl
32 -
AWS
7 -
Aws databricks
1 -
AWSSagemaker
1 -
Azure
32 -
Azure active directory
1 -
Azure blob storage
2 -
Azure data lake
1 -
Azure Data Lake Storage
3 -
Azure data lake store
1 -
Azure databricks
32 -
Azure event hub
1 -
Azure key vault
1 -
Azure sql database
1 -
Azure Storage
2 -
Azure synapse
1 -
Azure Unity Catalog
1 -
Azure vm
1 -
AzureML
2 -
Bar
1 -
Beta
1 -
Better Way
1 -
BI Integrations
1 -
BI Tool
1 -
Billing and Cost Management
1 -
Blob
1 -
Blog
1 -
Blog Post
1 -
Broadcast variable
1 -
Business Intelligence
1 -
CatalogDDL
1 -
Centralized Model Registry
1 -
Certification
2 -
Certification Badge
1 -
Change
1 -
Change Logs
1 -
Chatgpt
2 -
Check
2 -
Classification Model
1 -
Cloud Storage
1 -
Cluster
10 -
Cluster policy
1 -
Cluster Start
1 -
Cluster Termination
2 -
Clustering
1 -
ClusterMemory
1 -
CNN HOF
1 -
Column names
1 -
Community Edition
1 -
Community Edition Password
1 -
Community Members
1 -
Company Email
1 -
Condition
1 -
Config
1 -
Configure
3 -
Confluent Cloud
1 -
Container
2 -
ContainerServices
1 -
Control Plane
1 -
ControlPlane
1 -
Copy
1 -
Copy into
2 -
CosmosDB
1 -
Courses
2 -
Csv files
1 -
Dashboards
1 -
Data
8 -
Data Engineer Associate
1 -
Data Engineer Certification
1 -
Data Explorer
1 -
Data Ingestion
2 -
Data Ingestion & connectivity
11 -
Data Quality
1 -
Data Quality Checks
1 -
Data Science & Engineering
2 -
databricks
5 -
Databricks Academy
3 -
Databricks Account
1 -
Databricks AutoML
9 -
Databricks Cluster
3 -
Databricks Community
5 -
Databricks community edition
4 -
Databricks connect
1 -
Databricks dbfs
1 -
Databricks Feature Store
1 -
Databricks Job
1 -
Databricks Lakehouse
1 -
Databricks Mlflow
4 -
Databricks Model
2 -
Databricks notebook
10 -
Databricks ODBC
1 -
Databricks Platform
1 -
Databricks Pyspark
1 -
Databricks Python Notebook
1 -
Databricks Runtime
9 -
Databricks SQL
8 -
Databricks SQL Permission Problems
1 -
Databricks Terraform
1 -
Databricks Training
2 -
Databricks Unity Catalog
1 -
Databricks V2
1 -
Databricks version
1 -
Databricks Workflow
2 -
Databricks Workflows
1 -
Databricks workspace
2 -
Databricks-connect
1 -
DatabricksContainer
1 -
DatabricksML
6 -
Dataframe
3 -
DataSharing
1 -
Datatype
1 -
DataVersioning
1 -
Date Column
1 -
Dateadd
1 -
DB Notebook
1 -
DB Runtime
1 -
DBFS
5 -
DBFS Rest Api
1 -
Dbt
1 -
Dbu
1 -
DDL
1 -
DDP
1 -
Dear Community
1 -
DecisionTree
1 -
Deep learning
4 -
Default Location
1 -
Delete
1 -
Delt Lake
4 -
Delta
24 -
Delta lake table
1 -
Delta Live
1 -
Delta Live Tables
6 -
Delta log
1 -
Delta Sharing
3 -
Delta-lake
1 -
Deploy
1 -
DESC
1 -
Details
1 -
Dev
1 -
Devops
1 -
Df
1 -
Different Notebook
1 -
Different Parameters
1 -
DimensionTables
1 -
Directory
3 -
Disable
1 -
Distribution
1 -
DLT
6 -
DLT Pipeline
3 -
Dolly
5 -
Dolly Demo
2 -
Download
2 -
EC2
1 -
Emr
2 -
Ensemble Models
1 -
Environment Variable
1 -
Epoch
1 -
Error handling
1 -
Error log
2 -
Eventhub
1 -
Example
1 -
Experiments
4 -
External Sources
1 -
Extract
1 -
Fact Tables
1 -
Failure
2 -
Feature Lookup
2 -
Feature Store
54 -
Feature Store API
2 -
Feature Store Table
1 -
Feature Table
6 -
Feature Tables
4 -
Features
2 -
FeatureStore
2 -
File Path
2 -
File Size
1 -
Fine Tune Spark Jobs
1 -
Forecasting
2 -
Forgot Password
2 -
Garbage Collection
1 -
Garbage Collection Optimization
1 -
Github
2 -
Github actions
2 -
Github Repo
2 -
Gitlab
1 -
GKE
1 -
Global Init Script
1 -
Global init scripts
4 -
Governance
1 -
Hi
1 -
Horovod
1 -
Html
1 -
Hyperopt
4 -
Hyperparameter Tuning
2 -
Iam
1 -
Image
3 -
Image Data
1 -
Inference Setup Error
1 -
INFORMATION
1 -
Input
1 -
Insert
1 -
Instance Profile
1 -
Int
2 -
Interactive cluster
1 -
Internal error
1 -
Invalid Type Code
1 -
IP
1 -
Ipython
1 -
Ipywidgets
1 -
JDBC Connections
1 -
Jira
1 -
Job
4 -
Job Parameters
1 -
Job Runs
1 -
Join
1 -
Jsonfile
1 -
Kafka consumer
1 -
Key Management
1 -
Kinesis
1 -
Lakehouse
1 -
Large Datasets
1 -
Latest Version
1 -
Learning
1 -
Limit
3 -
LLM
3 -
LLMs
1 -
Local computer
1 -
Local Machine
1 -
Log Model
2 -
Logging
1 -
Login
1 -
Logs
1 -
Long Time
2 -
Low Latency APIs
2 -
LTS ML
3 -
Machine
3 -
Machine Learning
24 -
Machine Learning Associate
1 -
Managed Table
1 -
Max Retries
1 -
Maximum Number
1 -
Medallion Architecture
1 -
Memory
3 -
Metadata
1 -
Metrics
3 -
Microsoft azure
1 -
ML Lifecycle
4 -
ML Model
4 -
ML Practioner
3 -
ML Runtime
1 -
MlFlow
75 -
MLflow API
5 -
MLflow Artifacts
2 -
MLflow Experiment
6 -
MLflow Experiments
3 -
Mlflow Model
10 -
Mlflow registry
3 -
Mlflow Run
1 -
Mlflow Server
5 -
MLFlow Tracking Server
3 -
MLModels
2 -
Model Deployment
4 -
Model Lifecycle
6 -
Model Loading
2 -
Model Monitoring
1 -
Model registry
5 -
Model Serving
7 -
Model Serving Cluster
2 -
Model Serving REST API
6 -
Model Training
2 -
Model Tuning
1 -
Models
8 -
Module
3 -
Modulenotfounderror
1 -
MongoDB
1 -
Mount Point
1 -
Mounts
1 -
Multi
1 -
Multiline
1 -
Multiple users
1 -
Nested
1 -
New Feature
1 -
New Features
1 -
New Workspace
1 -
Nlp
3 -
Note
1 -
Notebook
6 -
Notification
2 -
Object
3 -
Onboarding
1 -
Online Feature Store Table
1 -
OOM Error
1 -
Open Source MLflow
4 -
Optimization
2 -
Optimize Command
1 -
OSS
3 -
Overwatch
1 -
Overwrite
2 -
Packages
2 -
Pandas udf
4 -
Pandas_udf
1 -
Parallel
1 -
Parallel processing
1 -
Parallel Runs
1 -
Parallelism
1 -
Parameter
2 -
PARAMETER VALUE
2 -
Partner Academy
1 -
Pending State
2 -
Performance Tuning
1 -
Photon Engine
1 -
Pickle
1 -
Pickle Files
2 -
Pip
2 -
Points
1 -
Possible
1 -
Postgres
1 -
Pricing
2 -
Primary Key
1 -
Primary Key Constraint
1 -
Progress bar
2 -
Proven Practices
2 -
Public
2 -
Pymc3 Models
2 -
PyPI
1 -
Pyspark
6 -
Python
21 -
Python API
1 -
Python Code
1 -
Python Function
3 -
Python Libraries
1 -
Python Packages
1 -
Python Project
1 -
Pytorch
3 -
Reading-excel
2 -
Redis
2 -
Region
1 -
Remote RPC Client
1 -
RESTAPI
1 -
Result
1 -
Runtime update
1 -
Sagemaker
1 -
Salesforce
1 -
SAP
1 -
Scalability
1 -
Scalable Machine
2 -
Schema evolution
1 -
Script
1 -
Search
1 -
Security
2 -
Security Exception
1 -
Self Service Notebooks
1 -
Server
1 -
Serverless
1 -
Serving
1 -
Shap
2 -
Size
1 -
Sklearn
1 -
Slow
1 -
Small Scale Experimentation
1 -
Source Table
1 -
Spark
13 -
Spark config
1 -
Spark connector
1 -
Spark Error
1 -
Spark MLlib
2 -
Spark Pandas Api
1 -
Spark ui
1 -
Spark Version
2 -
Spark-submit
1 -
SparkML Models
2 -
Sparknlp
3 -
Spot
1 -
SQL
19 -
SQL Editor
1 -
SQL Queries
1 -
SQL Visualizations
1 -
Stage failure
2 -
Storage
3 -
Stream
2 -
Stream Data
1 -
Structtype
1 -
Structured streaming
2 -
Study Material
1 -
Summit23
2 -
Support
1 -
Support Team
1 -
Synapse
1 -
Synapse ML
1 -
Table
4 -
Table access control
1 -
Tableau
1 -
Task
1 -
Temporary View
1 -
Tensor flow
1 -
Test
1 -
Timeseries
1 -
Timestamps
1 -
TODAY
1 -
Training
6 -
Transaction Log
1 -
Trying
1 -
Tuning
2 -
UAT
1 -
Ui
1 -
Unexpected Error
1 -
Unity Catalog
12 -
Use Case
2 -
Use cases
1 -
Uuid
1 -
Validate ML Model
2 -
Values
1 -
Variable
1 -
Vector
1 -
Versioncontrol
1 -
Visualization
2 -
Web App Azure Databricks
1 -
Weekly Release Notes
2 -
Whl
1 -
Worker Nodes
1 -
Workflow
2 -
Workflow Jobs
1 -
Workspace
2 -
Write
1 -
Writing
1 -
Z-ordering
1 -
Zorder
1
- « Previous
- Next »
User | Count |
---|---|
89 | |
39 | |
38 | |
25 | |
25 |