@Tamoor Mirza :
You can use the to_json method of a DataFrame to convert each chunk to a JSON string, and then append those JSON strings to a list. Here is an example code snippet that splits a DataFrame into 1MB chunks and creates a list of JSON arrays, with each row in each chunk being an array element:
import json
# assume df is your DataFrame
chunk_size = 1_000_000 # 1MB chunk size
json_arrays = []
for start in range(0, len(df), chunk_size):
end = min(start + chunk_size, len(df))
chunk = df.iloc[start:end]
json_str = chunk.to_json(orient='records')
json_array = json.loads(json_str)
json_arrays.append(json_array)
# merge all JSON arrays into a single array
merged_json_array = sum(json_arrays, [])
# convert the merged JSON array to a JSON string
merged_json_str = json.dumps(merged_json_array)
In the above code, we first define the chunk size as 1MB. We then loop over the DataFrame, slicing it into chunks of the specified size using the iloc method. For each chunk, we use the to_json method to convert it to a JSON string with the orient parameter set to 'records' , which produces a list of JSON objects (one for each row). We then use json.loads to parse the JSON string into a list of dictionaries. We append each list of dictionaries (which corresponds to a chunk of the DataFrame) to the
json_arrays list.
After we have processed all the chunks, we merge all the JSON arrays into a single array using the built-in sum function. Finally, we convert the merged JSON array to a JSON string using json.dumps.