cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

Train machine learning models: How can I take my ML lifecycle from experimentation to production?

Anonymous
Not applicable

Note: the following guide is primarily for Python users. For other languages, please view the following links:

Table batch reads and writes

Create a table in SQL

Visualizing data with DBSQL

This step-by-step guide will get your data science projects underway by enabling you to:

• Use display() commands to quickly understand your data

• Process and save data efficiently

• Import any machine learning framework 

To start, use the persona switcher to open your Machine Learning homepage

Image 

Part 1: Use display() commands to quickly understand your data

View your data in an interactive output and quickly create visualizations using the display() command to view your DataFrame. 

1. Create a notebook. Give it a name, set the default language as Python, and select a Cluster 

Image 

2. Write a command to load your data into a DataFrame, or load the following sample DataFrame 

raw_data = spark.read.format("delta").load("/databricks-datasets/nyctaxi-with-zipcodes/subsampled")

Image 

3. Use the python display () command to view your Dataframe 

     display(raw_data)

Image 

4. Above displayed results, to the right of Table, click + and select "Visualization"

Image 

5. In the Visualization type drop-down, choose a chart type 

Recommendation: Use a scatter plot for this data

Image 

6. Select the data to appear in the visualization

Recommendation: X column = trip_distance; Y column = fare_amount

Image 

7. Click Save 

Image 

You are now ready to discover new insights from your data. 

Image 

Part 2: Process and save data efficiently

Save the results of your analysis by persisting the results to storage:

• SQL DDL commands: You can use standard SQL DDL commands supported in Apache Spark (for example, CREATE TABLE AS SELECT) to create Delta tables

• Table batch writes guide: 

# Create table in the metastore using DataFrame's schema and write data to it

df.write.format("delta").saveAsTable("default.people10m")

Part 3: Import any machine learning framework 

1. Import the necessary libraries. These libraries are preinstalled on Databricks Runtime for Machine Learning (AWS|Azure|GCP) clusters and are tuned for compatibility and performance.

import mlflow

import numpy as np

import pandas as pd

import sklearn.datasets

import sklearn.metrics

import sklearn.model_selection

import sklearn.ensemble

from hyperopt import fmin, tpe, hp, SparkTrials, Trials, STATUS_OK

from hyperopt.pyll import scope

Image 

Now you’ve trained your machine learning models, check out the links below for more.

Learn more:

• Databricks introduction to notebooks 

• Documentation on how to import, read and modify data

• Guide to creating visualizations 

• Data Science getting started guide

Apache Spark Programming with Databricks course

• Ask a Databricks expert live in Office Hours 

• Feel free to contact us

Drop your questions, feedback and tips below!

1 REPLY 1

Priyag1
Honored Contributor II

I got good knowledge by your post . It is very clear . Thank you . Keep sharing like this posts .It will be helpful

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.