cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

2.0 Train and Validate ML Model - Exercise / Double Type is not defined

Cristianmarja
New Contributor

Hi everyone,

Please note that I stuck with exercise 2.0 Train and Validate ML Model because when I run code appear a NameError with the following label: name 'DoubleType' is not defined.

I would like any help about this subject.

1 REPLY 1

Anonymous
Not applicable

@Cristian Martinez​ :

In Databricks, you need to import the necessary classes from the pyspark.sql.types module in order to use them in your code. To fix the NameError you're encountering with the label "name 'DoubleType' is not defined" in Exercise 2.0, you can add the following line at the beginning of your notebook:

from pyspark.sql.types import DoubleType

This will import the DoubleType class and make it available for use in your code. You can then use it in your code like this:

from pyspark.sql.types import DoubleType
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.regression import LinearRegression
from pyspark.ml.evaluation import RegressionEvaluator
 
# Define the schema for the input data
schema = StructType([
  StructField("x1", DoubleType(), True),
  StructField("x2", DoubleType(), True),
  StructField("x3", DoubleType(), True),
  StructField("y", DoubleType(), True)
])
 
# Load the input data from a CSV file
data = spark.read.csv("dbfs:/path/to/your/data.csv", header=True, schema=schema)
 
# Create a VectorAssembler to combine the input columns into a single feature column
assembler = VectorAssembler(inputCols=["x1", "x2", "x3"], outputCol="features")
 
# Transform the input data using the VectorAssembler
data = assembler.transform(data)
 
# Split the input data into training and testing sets
train, test = data.randomSplit([0.7, 0.3])
 
# Train a linear regression model on the training data
lr = LinearRegression(featuresCol="features", labelCol="y")
model = lr.fit(train)
 
# Evaluate the model on the testing data
evaluator = RegressionEvaluator(labelCol="y", predictionCol="prediction", metricName="rmse")
rmse = evaluator.evaluate(model.transform(test))
 
print("RMSE on testing data: %g" % rmse)

Note that you should replace "dbfs:/path/to/your/data.csv" with the actual path to your input data file.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group