cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Having trouble with ARC (Automated Record Connector) Python Notebook

Isolated
New Contributor

I'm trying to use Databricks ARC (Automated Record Connector) and running into an object issue. I assume I'm missing something rather trivial that's not related to ARC.

 

#Databricks Python notebook

#CMD1 import AutoLinker
from arc.autolinker import AutoLinker
import arc

arc.enable_arc()

#CMD2 create dataframe from table data
import os
data_1 = spark.read.table("temp.list_all")

#CMD3 run autolinker
autolinker = AutoLinker()

attribute_columns = ["first_name", "last_name", "dob", "address_line_1", "zip_code"]
#runs fine up to this point
autolinker.auto_link(
  data=data_1, 
  attribute_columns=attribute_columns,
  unique_id="pid",  
  comparison_size_limit=100000,
  max_evals=100
)

 

Then I receive this error when running autolinker.auto_link() and not sure how to troubleshoot.

 

AttributeError: 'DataFrame' object has no attribute 'sparkSession'

 

My cluster Runtime Version is 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12). Within my cluster, I do not have any spark configurations set. I'm not sure if this needs to be changed, and if so, which properties to set. Currently researching.... 

 

2 REPLIES 2

-werners-
Esteemed Contributor III

Debayan
Databricks Employee
Databricks Employee