cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Having trouble with ARC (Automated Record Connector) Python Notebook

Isolated
New Contributor

I'm trying to use Databricks ARC (Automated Record Connector) and running into an object issue. I assume I'm missing something rather trivial that's not related to ARC.

 

#Databricks Python notebook

#CMD1 import AutoLinker
from arc.autolinker import AutoLinker
import arc

arc.enable_arc()

#CMD2 create dataframe from table data
import os
data_1 = spark.read.table("temp.list_all")

#CMD3 run autolinker
autolinker = AutoLinker()

attribute_columns = ["first_name", "last_name", "dob", "address_line_1", "zip_code"]
#runs fine up to this point
autolinker.auto_link(
  data=data_1, 
  attribute_columns=attribute_columns,
  unique_id="pid",  
  comparison_size_limit=100000,
  max_evals=100
)

 

Then I receive this error when running autolinker.auto_link() and not sure how to troubleshoot.

 

AttributeError: 'DataFrame' object has no attribute 'sparkSession'

 

My cluster Runtime Version is 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12). Within my cluster, I do not have any spark configurations set. I'm not sure if this needs to be changed, and if so, which properties to set. Currently researching.... 

 

2 REPLIES 2

-werners-
Esteemed Contributor III

Debayan
Databricks Employee
Databricks Employee

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group