Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
AttributeError: 'SparkSession' object has no attribute '_wrapped' when attempting CoNLL.readDataset()

New Contributor III

I'm getting the error...

AttributeError: 'SparkSession' object has no attribute '_wrapped'


AttributeError Traceback (most recent call last)

<command-2311820097584616> in <cell line: 2>()

1 from import CoNLL

----> 2 trainingData = CoNLL().readDataset(spark, 'dbfs:/FileStore/Users/')

3 trainingData.selectExpr(

4 "text",

5 "token.result as tokens",

/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/sparknlp/training/ in readDataset(self, spark, path, read_as, partitions, storage_level)

141 jdf = self._java_obj.readDataset(jSession, path, read_as, partitions,

142 spark.sparkContext._getJavaStorageLevel(storage_level))

--> 143 return DataFrame(jdf, spark._wrapped)


When executing the following code...

from import CoNLL

trainingData = CoNLL().readDataset(spark, 'dbfs:/FileStore/eng.train')



"token.result as tokens",

"pos.result as pos",

"label.result as label"

).show(3, False)

Can anyone help?


Esteemed Contributor III

this can happen in 10X version try to use 7.3 LTS and share your observation

and if it not working there try to create init script and load it to your databricks cluster so whenever your machine go up you can get advantage of that library because sometime due to network our library do not load in the cluster


Aviral Bhardwaj

