mits1
New Contributor III

Hi @lingareddy_Alva ,

Thank you for your response.

Just to inform you that

1. I am using Databrick's free edition to execute code using Serverless which doesnt allow me to get the partition numbers.

2. I intentionaly did not want to use/specify schema to know the schema inference behaviour.

3. As mention in your reply,

Option 2 — Use cloudFiles.inferColumnTypes

I have configured this property too but no good.

4.I did try option 1 but looks like it still creates 33 partitions.

My code : 

from pyspark.sql.types import StructType, StructField, StringType, IntegerType
schema = StructType([
    StructField("Name", StringType()),
    StructField("Gender", StringType()),
    StructField("Age", IntegerType())
])
 
df = spark.readStream.\
    format("cloudFiles")\
    .option("cloudFiles.format", "json")\
    .option("cloudFiles.schemaLocation", "/Volumes/workspace/default/sys/schema5")\
    .schema(schema)\
    .load('/Volumes/workspace/dev/input/')\
    .writeStream\
    .format("delta")\
    .option("checkpointLocation", "/Volumes/workspace/default/sys/checkpoint5")\
    .option("mergeSchema", "true")\
    .trigger(availableNow=True)\
    .toTable("workspace.default.json_null")
 
Table Output : Attached
 
I don't find google answers helpful too.