@Jonathan_ Good that you have given some extra information. Based on that I think that there might be memory issue, since it is a single node cluster, both driver and executor resides in the same machine. It would be better if you could tell me the R...
@bunny1174 It is a common issue that small files gets created during streaming. Since you are using delta file format, I would suggest two solutions,1. try using Liquid clustering. This does auto compact of small files into a bigger chuck mostly of 1...
@szymon_dybczak your solution was crisp.@SuMiT1 since you have mentioned your json is dynamic, get one of your json body into a variable. json_body = df.select("content").take(1).collect(0)then get the schema of the json,schema = schema_of_json(json_...
@Hritik_Moon Try to read the file as delta. path/delta_file_name/- parquet files- delta_log/since you are using spark, use this, spark.read.format("delta").load("path/delta_file_name").Delta internally stores the data as parquet and delta log contain...