cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Train machine learning models: How can I take my ML lifecycle from experimentation to production?

Anonymous
Not applicable

Note: the following guide is primarily for Python users. For other languages, please view the following links:

Table batch reads and writes

Create a table in SQL

Visualizing data with DBSQL

This step-by-step guide will get your data science projects underway by enabling you to:

• Use display() commands to quickly understand your data

• Process and save data efficiently

• Import any machine learning framework 

To start, use the persona switcher to open your Machine Learning homepage

Image 

Part 1: Use display() commands to quickly understand your data

View your data in an interactive output and quickly create visualizations using the display() command to view your DataFrame. 

1. Create a notebook. Give it a name, set the default language as Python, and select a Cluster 

Image 

2. Write a command to load your data into a DataFrame, or load the following sample DataFrame 

raw_data = spark.read.format("delta").load("/databricks-datasets/nyctaxi-with-zipcodes/subsampled")

Image 

3. Use the python display () command to view your Dataframe 

     display(raw_data)

Image 

4. Above displayed results, to the right of Table, click + and select "Visualization"

Image 

5. In the Visualization type drop-down, choose a chart type 

Recommendation: Use a scatter plot for this data

Image 

6. Select the data to appear in the visualization

Recommendation: X column = trip_distance; Y column = fare_amount

Image 

7. Click Save 

Image 

You are now ready to discover new insights from your data. 

Image 

Part 2: Process and save data efficiently

Save the results of your analysis by persisting the results to storage:

• SQL DDL commands: You can use standard SQL DDL commands supported in Apache Spark (for example, CREATE TABLE AS SELECT) to create Delta tables

• Table batch writes guide: 

# Create table in the metastore using DataFrame's schema and write data to it

df.write.format("delta").saveAsTable("default.people10m")

Part 3: Import any machine learning framework 

1. Import the necessary libraries. These libraries are preinstalled on Databricks Runtime for Machine Learning (AWS|Azure|GCP) clusters and are tuned for compatibility and performance.

import mlflow

import numpy as np

import pandas as pd

import sklearn.datasets

import sklearn.metrics

import sklearn.model_selection

import sklearn.ensemble

from hyperopt import fmin, tpe, hp, SparkTrials, Trials, STATUS_OK

from hyperopt.pyll import scope

Image 

Now you’ve trained your machine learning models, check out the links below for more.

Learn more:

• Databricks introduction to notebooks 

• Documentation on how to import, read and modify data

• Guide to creating visualizations 

• Data Science getting started guide

Apache Spark Programming with Databricks course

• Ask a Databricks expert live in Office Hours 

• Feel free to contact us

Drop your questions, feedback and tips below!

1 REPLY 1

Priyag1
Honored Contributor II

I got good knowledge by your post . It is very clear . Thank you . Keep sharing like this posts .It will be helpful

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group