Databricks Community

pinaki1 · ‎10-22-2024

1. How to connect s3 bucket to databricks since dbfs mount is not supported.?
2. In serverless compute Spark Context (sc), spark.sparkContext, and sqlContext are not supported?. Does it means it will not leverage power of distributed processing?
3. What is the impact on delta tables in serverless compute?

saurabh18cs · ‎10-23-2024

1. Access the S3 bucket directly using AWS credentials

spark = SparkSession.builder \

.appName("S3Access") \

.config("spark.hadoop.fs.s3a.access.key", "<your-access-key-id>") \

.config("spark.hadoop.fs.s3a.secret.key", "<your-secret-access-key>") \

.getOrCreate()

2. Databricks encourages the use of the SparkSession object, which provides a unified entry point for working with Spark.

from pyspark.sql import SparkSession

# Create a SparkSession

spark = SparkSession.builder.appName("ServerlessComputeExample").getOrCreate()

3. Serverless compute automatically scales resources based on workload demands. This can be beneficial for Delta tables as it ensures that sufficient resources are available for large-scale data processing tasks.

pinaki1 · ‎10-24-2024

Getting this error
[INSUFFICIENT_PERMISSIONS] Insufficient privileges: User does not have permission SELECT on any file. SQLSTATE: 42501
while trying to read from s3 bucket as per your approach