cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Science & Machine Learning

Forum Posts

Spencer_Kent
by New Contributor III
  • 2407 Views
  • 2 replies
  • 1 kudos

Resolved! Lacking support for column-level select grants or attribute-based access control

In the Unity Catalog launch and its accompanying blog post, one of the primary selling points was a set of granular access control features that would at least partially eliminate the need to create a multitude of separate table views and the attenda...

  • 2407 Views
  • 2 replies
  • 1 kudos
Latest Reply
Spencer_Kent
New Contributor III
  • 1 kudos

Simply amazing that 2 years on from the initial announcement, this feature is not available. You released Unity Catalog missing one of it's most-hyped features.

  • 1 kudos
1 More Replies
karthik_p
by Esteemed Contributor
  • 3448 Views
  • 6 replies
  • 2 kudos

when we are trying to create folder/file or list file using dbutils we are getting forbidden error in aws

HI Team,we have created new premium workspace with custom managed vpc, workspace deployed successfully in AWS. we are trying to create folder in dbfs, we are getting below error. we have compared cross account custom managed role (Customer-managed VP...

  • 3448 Views
  • 6 replies
  • 2 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 2 kudos

@Debayan Mukherjee​ Issue resolved, looks cloud team have not updated required security groups that has been shared, after revisiting them we are able to find missing security groups and added them

  • 2 kudos
5 More Replies
ammarchalifah
by New Contributor
  • 3177 Views
  • 1 replies
  • 0 kudos

DeltaFileNotFoundException in a multi cluster conflict

I have several parallel data pipeline running in different Airflow DAGs. All of these pipeline execute two dbt selectors in a dedicated Databricks cluster: one of them is a common selector executed in all DAGs. This selector includes a test that is d...

  • 3177 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Ammar Ammar​ :The error message you're seeing suggests that the Delta Lake transaction log for the common model's test table has been truncated or deleted, either manually or due to the retention policies set in your cluster. This can happen if the ...

  • 0 kudos
DK
by New Contributor II
  • 1678 Views
  • 1 replies
  • 1 kudos

Unable to call logged ML model from a different notebook when using Spark ML

Hi, I am a R user and I am experimenting to build an ml model with R and with spark flavoured algorithms in Databricks. However, I am struggling to call a model that is logged as part of the experiment from a different notebook when I use spark flavo...

  • 1678 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Dip Kundu​ :It seems like the error you are facing is related to sparklyr, which is used to interact with Apache Spark from R, and not directly related to mlflow. The error message suggests that an object could not be found, but it's not clear which...

  • 1 kudos
Anonymous
by Not applicable
  • 1459 Views
  • 1 replies
  • 1 kudos

Hive Catalog DDL, describe extended returns "... n more fields" when detailing a many column array<struct<

I am using Hackolade data modelling tool to reverse engineer (using cluster connection) deployed databases and their table and view definitions.Some of our tables contain large multi-column structs, and these can only be partially described as a char...

  • 1459 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Yes, it is possible to configure the Hive Catalog in Databricks to return full descriptions of tables with large multi-column structs.One way to achieve this is to increase the value of the Hive configuration property "hive.metastore.client.record.ma...

  • 1 kudos
thomasm
by New Contributor II
  • 3314 Views
  • 3 replies
  • 1 kudos

Resolved! Online Feature Store MLflow serving problem

When I try to serve a model stored with FeatureStoreClient().log_model using the feature-store-online-example-cosmosdb tutorial Notebook, I get errors suggesting that the primary key schema is not configured properly. However, if I look in the Featur...

  • 3314 Views
  • 3 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

Hello @Thomas Michielsen​ , this error seems to occur when you may have created the table yourself. You must use publish_table() to create the table in the online store. Do not manually create a database or container inside Cosmos DB. publish_table()...

  • 1 kudos
2 More Replies
lurban
by New Contributor
  • 1245 Views
  • 1 replies
  • 0 kudos

CloudFilesIllegalStateException: Found mismatched event: key old_file_path doesn't have the prefix: new_file_path

My team currently uses Autoloader and Delta Live Tables to process incremental data from ADLS storage. We are needing to keep the same table and history, but switch the filepath to a different location in storage. When I test a filepath change, I rec...

  • 1245 Views
  • 1 replies
  • 0 kudos
Latest Reply
DD_Sharma
New Contributor III
  • 0 kudos

Autoloader doesn't support changing the source path for running job so if you change your source path your stream fails because the source path has changed. However, if you really want to change the path you can change it by using the new checkpoint ...

  • 0 kudos
ryojikn
by New Contributor III
  • 4331 Views
  • 2 replies
  • 0 kudos

How to use spark-submit python task with the usage of --archives parameter passing a .tar.gz conda env?

We've been trying to launch a spark-submit python task using the parameter "archives", similar to that one used in Yarn.​However, we've not been able to successfully make it work in databricks.​​We know that for our OnPrem installation we can use som...

  • 4331 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Ryoji Kuwae Neto​ :To use the --archives parameter with a conda environment in Databricks, you can follow these steps:1) Create a conda environment for your project and export it as a .tar.gz file:conda create --name myenv conda activate myenv conda...

  • 0 kudos
1 More Replies
Vish1
by New Contributor II
  • 9158 Views
  • 3 replies
  • 1 kudos

pyspark: Stage failure due to One hot encoding

I am facing the below error while fitting my model. I am trying to run a model with cross validation with a pipeline inside of it. Below is the code snippet for data transformation:qd = QuantileDiscretizer(relativeError=0.01, handleInvalid="error", n...

image
  • 9158 Views
  • 3 replies
  • 1 kudos
Latest Reply
shyam_9
Databricks Employee
  • 1 kudos

Hi @Vishnu P​, could you please share the full stack trace? Also, observe how the workers memory utilizing?

  • 1 kudos
2 More Replies
Cristianmarja
by New Contributor
  • 669 Views
  • 1 replies
  • 0 kudos

Hi everyone,Please note that I stuck with exercise 2.0 Train and Validate ML Model because when I run code appear a NameError with the following label...

Hi everyone,Please note that I stuck with exercise 2.0 Train and Validate ML Model because when I run code appear a NameError with the following label: name 'DoubleType' is not defined.I put the code bellow for your reference.I would like any help ab...

  • 669 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Cristian Martinez​ :The error you are seeing is occurring because the DoubleType class has not been imported. To fix this, add the following line to the top of your code to import DoubleType:from pyspark.sql.types import DoubleTypeThis should resolv...

  • 0 kudos
invalidargument
by New Contributor III
  • 924 Views
  • 1 replies
  • 0 kudos

Model storage requirements management

Hi.We have around 30 models in model storage that we use for batch scoring. These are created at different times by different person and on different cluster run times.Now we have run into problems that we can't de-serialize the models and use for in...

  • 924 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Jonas Lindberg​ :To address the issues you are facing with model serialization and versioning, I would recommend the following approach:Use MLflow to manage the lifecycle of your models, including versioning, deployment, and monitoring. MLflow is an...

  • 0 kudos
Cristianmarja
by New Contributor
  • 934 Views
  • 1 replies
  • 0 kudos

2.0 Train and Validate ML Model - Exercise / Double Type is not defined

Hi everyone,Please note that I stuck with exercise 2.0 Train and Validate ML Model because when I run code appear a NameError with the following label: name 'DoubleType' is not defined.I would like any help about this subject.

  • 934 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Cristian Martinez​ :In Databricks, you need to import the necessary classes from the pyspark.sql.types module in order to use them in your code. To fix the NameError you're encountering with the label "name 'DoubleType' is not defined" in Exercise 2...

  • 0 kudos
Orianh
by Valued Contributor II
  • 2449 Views
  • 1 replies
  • 2 kudos

MLflow log pytorch distributed training

Hey Guys,I have few question that i hope you can help me with.I start to train pytorch model in distributed training using petastorm + Horovod like databricks suggest in docs.Q 1:I can see that each worker is train the model, but when epochs are done...

  • 2449 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@orian hindi​ :Regarding your questions:Q1: The error message you are seeing is likely related to a segmentation fault, which can occur due to various reasons such as memory access violations or stack overflows. It could be caused by several factors,...

  • 2 kudos
Anonymous
by Not applicable
  • 1206 Views
  • 2 replies
  • 3 kudos

www.databricks.com

Hello Dolly: Democratizing the magic of ChatGPT with open modelsDatabricks has just released a groundbreaking new blog post exploring ChatGPT, an open-source language model with the potential to transform the way we interact with technology. From cha...

  • 1206 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Lets get candid! Let me know your initial thoughts about LLM Models, ChatGpt, Dolly.

  • 3 kudos
1 More Replies
Idan
by New Contributor II
  • 2416 Views
  • 2 replies
  • 1 kudos

Using code_path in mlflow.pyfunc models on Databricks

We are using Databricks over AWS infra, registering models on mlflow. We write our in-project imports as from src.(module location) import (objects).Following examples online, I expected that when I use mlflow.pyfunc.log_model(...code_path=['PROJECT_...

  • 2416 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Idan Reshef​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...

  • 1 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels