cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Science & Machine Learning

Forum Posts

aranyics
by New Contributor
  • 865 Views
  • 1 replies
  • 1 kudos

Is it possible to start Databricks AutoML experiment remotely? (Azure Databricks)

Currently I am using Azure Machine Learning Studio for my work, and would like to compare performance of Azure and Databricks automl algorithms. Is it possible to write a notebook in Azure to start the automl algorithm in Databricks? My data is found...

  • 865 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Csaba Aranyi​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 1 kudos
StephanieAlba
by Databricks Employee
  • 911 Views
  • 1 replies
  • 2 kudos

How do I move the template files into my own repo when cloning the MLflow recipes templates into Databricks?

Here https://mlflow.org/docs/latest/recipes.html#model-development-workflow, there are directions to add the repo. Is this best practice in Databricks? I tried exporting the repo code (inside of a Databricks notebook).. My DBC export was successful. ...

  • 911 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Stephanie Rivera​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 2 kudos
Saurabh707344
by New Contributor III
  • 798 Views
  • 1 replies
  • 2 kudos

AWS Databricks - Distributed ML Models in Sagameker and Databricks

While using Databricks on AWS, What will be impact if few ML models are build using Sagemaker pipelines, whereas other models build on databricks ML itself ?Any other impact apart from infra maintainance cost ?Are there any prefered tool that can eas...

  • 798 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Saurabh Singh​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 2 kudos
rgbuckley
by New Contributor III
  • 10636 Views
  • 5 replies
  • 6 kudos

Resolved! Fix Hanging Task in Databricks

I am applying a pandas UDF to a grouped dataframe in databricks. When I do this, a couple tasks hang forever, while the rest complete quickly.I start by repartitioning my dataset so that each group is in one partition:group_factors = ['a','b','c'] #m...

Spark UI for compute cluster stderr for hanging task stdout for hanging task
  • 10636 Views
  • 5 replies
  • 6 kudos
Latest Reply
rgbuckley
New Contributor III
  • 6 kudos

Thank you Suteja. I had watched the resources and had never reached capacity for any. The data was evenly distributed across partitions and groups as well. I did end up taking your advice in (1). I set a timer and killed the process if the group took...

  • 6 kudos
4 More Replies
smedegaard
by New Contributor III
  • 4077 Views
  • 3 replies
  • 5 kudos

Resolved! Difference between MLFlow recipes and projects?

MLFlow projects are described asAn MLflow Project is a format for packaging data science code in a reusable and reproducible way, based primarily on conventions. In addition, the Projects component includes an API and command-line tools for running p...

  • 4077 Views
  • 3 replies
  • 5 kudos
Latest Reply
smedegaard
New Contributor III
  • 5 kudos

Thanks for the answer @Priyadarshini G​ . Although a project has a pre-defined folder structure and standard files, it also "... includes an API and command-line tools for running projects, making it possible to chain together projects into workflows...

  • 5 kudos
2 More Replies
js54123875
by New Contributor III
  • 3755 Views
  • 4 replies
  • 3 kudos

Resolved! How to enforce schema with Autoloader?

I have a number of csv files that I am working to ingest using autoloader. There is an ID field that I want to require to be a STRING, but using SchemaHints is not working and is instead setting as an INT.The first few csv files have just integer va...

  • 3755 Views
  • 4 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Jennette Shepard​ We haven't heard from you since the last response from @Suteja Kanuri​  . Kindly share the information with us, and in return, we will provide you with the necessary solution.Thanks and Regards

  • 3 kudos
3 More Replies
reachbharathan
by New Contributor III
  • 2855 Views
  • 4 replies
  • 5 kudos

Resolved! Authenticating gitlab with databricks via username & password?

Currently we have azure databricks and gitlab in our project, for integrating with code repository we have only gitlab, integrating with personal access token is possible,But it flagged out as potential risk of personal access token exposure, wanted...

  • 2855 Views
  • 4 replies
  • 5 kudos
Latest Reply
reachbharathan
New Contributor III
  • 5 kudos

Thank you folks,currently only way to integrate with gitlab is only with Personal Access Token,There is not way to intergrate gitlab via password, as per our security recommendation, we need to have additional mechanism to integrate as exposure of Pe...

  • 5 kudos
3 More Replies
Jaeseon
by New Contributor II
  • 2833 Views
  • 3 replies
  • 3 kudos

Resolved! Distributed training on building object detection model on PyTorch and PySpark.

I'm currently immersed in a project where I'm leveraging PyTorch to develop an object detection model using satellite imagery. My immediate objective is to perform distributed training on this model using PySpark. While I have found several tutorials...

  • 2833 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Jaeseon Song​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers ...

  • 3 kudos
2 More Replies
fsimoes
by New Contributor II
  • 2488 Views
  • 2 replies
  • 1 kudos

Resolved! Docker image with libraries + MLFlow Experiments

Hi everybody,I have a scenario where we have multiple teams working with Python and R, and this teams uses a lot of different libraries. Because of this dozen of libraries, the cluster start took much time. Then I created a Docker image, where I can ...

  • 2488 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Fabio Simoes​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers ...

  • 1 kudos
1 More Replies
rusty
by New Contributor II
  • 5107 Views
  • 2 replies
  • 2 kudos

Resolved! "Photon ran out of memory" while when trying to get the unique Id from sql query

I am trying to get all unique id from sql query and I always run out of memoryselect concat_ws(';',view.MATNR,view.WERKS) from hive_metastore.dqaas.temp_view as view join hive_metastore.dqaas.t_dqaas_marc as marc on marc.MATNR = view.MATNR where view...

  • 5107 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Anil Kumar Chauhan​ We haven't heard from you since the last response from @Werner Stinckens​  . Kindly share the information with us, and in return, we will provide you with the necessary solution.Thanks and Regards

  • 2 kudos
1 More Replies
Databricks3
by Contributor
  • 3671 Views
  • 4 replies
  • 1 kudos

Resolved! Issue in Converting Pyspark Dataframe to dictionary

I have 3 questions listed below.Q1. I need to install third party library in Unity Catalog enabled shared cluster. But I am not able to install. It is not accepting dbfs path dbfs:/FileStore/jars/Q2. I have a requirement to load the data to salesforc...

  • 3671 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @SK ASIF ALI​ We haven't heard from you since the last response from @werners (Customer)​ . Kindly share the information with us, and in return, we will provide you with the necessary solution.Thanks and Regards

  • 1 kudos
3 More Replies
DataBRObin
by New Contributor III
  • 1958 Views
  • 2 replies
  • 0 kudos

Running Keras model training with HorovodRunner works until the training function is exited ("The MPI_Query_thread() function was called after MPI_FINALIZE was invoked.")

I am running training of a Keras/Tensorflow deep learning model on a cluster of (for now) 2 workers and 1 driver (T4 GPU, 28GB, 4 core) using the Databricks provided HorovodRunner. It all seems to go well and the performance scales quite nicely over ...

  • 1958 Views
  • 2 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

I personally suspect it's your callbacks. Can you remove all those state callbacks and see if that is it?

  • 0 kudos
1 More Replies
DipakBachhav
by New Contributor III
  • 13449 Views
  • 1 replies
  • 4 kudos

Resolved! [Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: HTTP Response code: 403

I am trying to connect to databricks using java code. Can someone help me please? Here is the code so far I have got::    import java.sql.Connection;  import java.sql.DriverManager;  import java.sql.SQLException;  import java.util.Properties;     ...

  • 13449 Views
  • 1 replies
  • 4 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 4 kudos

@Dipak Bachhav​ do you have any restriction in terms if IP to access databricks, in case of that you need to enable particular ip from security groups

  • 4 kudos
Thanapat_S
by Contributor
  • 3016 Views
  • 2 replies
  • 5 kudos

Resolved! Is it possible to use both `Dynamic partition overwrites` and `overwriteSchema` options when writing a DataFrame to a Delta table?"

In my ETL case, I want to be able to adjust the table schema as needed, meaning the number of columns may increase or decrease depending on the ETL script. Additionally, I would like to use dynamic partition overwrite to avoid potential errors when u...

image
  • 3016 Views
  • 2 replies
  • 5 kudos
Latest Reply
Vartika
Databricks Employee
  • 5 kudos

Hi @Thanapat Sontayasara​,Does @Werner Stinckens​'s response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? If not, would you be happy to give us more information?Thanks! 

  • 5 kudos
1 More Replies
AleksandraFrolo
by New Contributor III
  • 6578 Views
  • 5 replies
  • 6 kudos

Resolved! Merge 12 CSV files in Databricks.

Hello everybody,I am absolutely new in Databricks, so I need your help.Details:Task: merge 12 CSV files in Databricks with the best way.Location of files: I will describe it in details, because I can not good orientate yet. If i go to Data -> Browse ...

  • 6578 Views
  • 5 replies
  • 6 kudos
Latest Reply
Lakshay
Databricks Employee
  • 6 kudos

It seems that all your csv files are present under one folder and since you are able to union them, all these files must have same schema as well.Given the above conditions, you can simply read all the data by referring the folder name instead of ref...

  • 6 kudos
4 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels