cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Issue in Converting Pyspark Dataframe to dictionary

Databricks3
Contributor

I have 3 questions listed below.

Q1. I need to install third party library in Unity Catalog enabled shared cluster. But I am not able to install. It is not accepting dbfs path dbfs:/FileStore/jars/

Q2. I have a requirement to load the data to salesforce from s3 files. I am using simple salesforce library to perform read/write on Salesforce from databricks. As per the documentation we need to provide dictionary data in the write function. When I am trying to convert the pyspark dataframe I am getting the below error.

from pyspark.sql.types import StructType,StructField, StringType, IntegerType
data2 = [("Test_Conv1","testmailconv1@yopmail.com","Olivia","A",'3000000000'),
    ("Test_Conv2","testmailconv2@yopmail.com","Jack","B",4000000000),
    ("Test_Conv3","testmailconv3@yopmail.com","Williams","C",5000000000),
    ("Test_Conv4","testmailconv4@yopmail.com","Jones","D",6000000000),
    ("Test_Conv5","testmailconv5@yopmail.com","Brown",None,9000000000)
  ]
schema = StructType([ \
    StructField("LastName",StringType(),True), \
    StructField("Email",StringType(),True), \
    StructField("FirstName",StringType(),True), \
    StructField("MiddleName", StringType(), True), \
    StructField("Phone", StringType(), True)
  ])
df = spark.createDataFrame(data=data2,schema=schema)
df_contact = df.rdd.map(lambda row: row.asDict()).collect()
sf.bulk.Contact.insert(df_contact,batch_size=20000,use_serial=True)

Error message :

py4j.security.Py4JSecurityException: Method public org.apache.spark.rdd.RDD org.apache.spark.api.java.JavaRDD.rdd() is not whitelisted on class class org.apache.spark.api.java.JavaRDD

Could you please help me to convert the dataframe to the dictionary.

Q3. Even if there is a way to convert the dataframe to dictionary, it could impact the performance for large data set. Is there any way to load the data in Salesforce in a more optimized way.

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

1. https://docs.databricks.com/dbfs/unity-catalog.html

To interact with files directly using DBFS, you must have

ANY FILE

permissions granted.

2.can you try one of these methods?

3.depending on the size of the data this will have an impact. But I think the bottleneck will be at the salesforce side.

View solution in original post

4 REPLIES 4

-werners-
Esteemed Contributor III

1. https://docs.databricks.com/dbfs/unity-catalog.html

To interact with files directly using DBFS, you must have

ANY FILE

permissions granted.

2.can you try one of these methods?

3.depending on the size of the data this will have an impact. But I think the bottleneck will be at the salesforce side.

This is not a permission issue. I have uploaded third-party libraries in databricks but databricks cluster is not accepting the jar paths.

-werners-
Esteemed Contributor III

third-party libs are not in dbfs, so it might still be that issue.

Anonymous
Not applicable

Hi @SK ASIF ALI​ 

We haven't heard from you since the last response from @werners (Customer)​ . Kindly share the information with us, and in return, we will provide you with the necessary solution.

Thanks and Regards

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group