Databricks Community

Databricks3 · ‎06-08-2023

I have 3 questions listed below.

Q1. I need to install third party library in Unity Catalog enabled shared cluster. But I am not able to install. It is not accepting dbfs path dbfs:/FileStore/jars/

Q2. I have a requirement to load the data to salesforce from s3 files. I am using simple salesforce library to perform read/write on Salesforce from databricks. As per the documentation we need to provide dictionary data in the write function. When I am trying to convert the pyspark dataframe I am getting the below error.

from pyspark.sql.types import StructType,StructField, StringType, IntegerType
data2 = [("Test_Conv1","testmailconv1@yopmail.com","Olivia","A",'3000000000'),
    ("Test_Conv2","testmailconv2@yopmail.com","Jack","B",4000000000),
    ("Test_Conv3","testmailconv3@yopmail.com","Williams","C",5000000000),
    ("Test_Conv4","testmailconv4@yopmail.com","Jones","D",6000000000),
    ("Test_Conv5","testmailconv5@yopmail.com","Brown",None,9000000000)
  ]
schema = StructType([ \
    StructField("LastName",StringType(),True), \
    StructField("Email",StringType(),True), \
    StructField("FirstName",StringType(),True), \
    StructField("MiddleName", StringType(), True), \
    StructField("Phone", StringType(), True)
  ])
df = spark.createDataFrame(data=data2,schema=schema)
df_contact = df.rdd.map(lambda row: row.asDict()).collect()
sf.bulk.Contact.insert(df_contact,batch_size=20000,use_serial=True)

Error message :

py4j.security.Py4JSecurityException: Method public org.apache.spark.rdd.RDD org.apache.spark.api.java.JavaRDD.rdd() is not whitelisted on class class org.apache.spark.api.java.JavaRDD

Could you please help me to convert the dataframe to the dictionary.

Q3. Even if there is a way to convert the dataframe to dictionary, it could impact the performance for large data set. Is there any way to load the data in Salesforce in a more optimized way.

-werners- · ‎06-09-2023

1. https://docs.databricks.com/dbfs/unity-catalog.html

To interact with files directly using DBFS, you must have

ANY FILE

permissions granted.

2.can you try one of these methods?

3.depending on the size of the data this will have an impact. But I think the bottleneck will be at the salesforce side.

View solution in original post

-werners- · ‎06-09-2023

1. https://docs.databricks.com/dbfs/unity-catalog.html

To interact with files directly using DBFS, you must have

ANY FILE

permissions granted.

2.can you try one of these methods?

3.depending on the size of the data this will have an impact. But I think the bottleneck will be at the salesforce side.

Databricks3 · ‎06-12-2023

This is not a permission issue. I have uploaded third-party libraries in databricks but databricks cluster is not accepting the jar paths.

-werners- · ‎06-13-2023

third-party libs are not in dbfs, so it might still be that issue.

Anonymous · ‎06-13-2023

Hi @SK ASIF ALI

We haven't heard from you since the last response from @werners (Customer) . Kindly share the information with us, and in return, we will provide you with the necessary solution.

Thanks and Regards