I have 3 questions listed below.
Q1. I need to install third party library in Unity Catalog enabled shared cluster. But I am not able to install. It is not accepting dbfs path dbfs:/FileStore/jars/
Q2. I have a requirement to load the data to salesforce from s3 files. I am using simple salesforce library to perform read/write on Salesforce from databricks. As per the documentation we need to provide dictionary data in the write function. When I am trying to convert the pyspark dataframe I am getting the below error.
from pyspark.sql.types import StructType,StructField, StringType, IntegerType
data2 = [("Test_Conv1","testmailconv1@yopmail.com","Olivia","A",'3000000000'),
("Test_Conv2","testmailconv2@yopmail.com","Jack","B",4000000000),
("Test_Conv3","testmailconv3@yopmail.com","Williams","C",5000000000),
("Test_Conv4","testmailconv4@yopmail.com","Jones","D",6000000000),
("Test_Conv5","testmailconv5@yopmail.com","Brown",None,9000000000)
]
schema = StructType([ \
StructField("LastName",StringType(),True), \
StructField("Email",StringType(),True), \
StructField("FirstName",StringType(),True), \
StructField("MiddleName", StringType(), True), \
StructField("Phone", StringType(), True)
])
df = spark.createDataFrame(data=data2,schema=schema)
df_contact = df.rdd.map(lambda row: row.asDict()).collect()
sf.bulk.Contact.insert(df_contact,batch_size=20000,use_serial=True)
Error message :
py4j.security.Py4JSecurityException: Method public org.apache.spark.rdd.RDD org.apache.spark.api.java.JavaRDD.rdd() is not whitelisted on class class org.apache.spark.api.java.JavaRDD
Could you please help me to convert the dataframe to the dictionary.
Q3. Even if there is a way to convert the dataframe to dictionary, it could impact the performance for large data set. Is there any way to load the data in Salesforce in a more optimized way.