When ever I am trying to run and load multiple files in single dataframe for processing (overall file size is more than 15 gb in single dataframe at the end of the loop, my code is crashing everytime with the below error...
ConnectException error: This is often caused by an OOM error that causes the connection to the Python REPL to be closed. Check your query's memory usage.
Please help me to fix it. Below is my code
df2= pd.DataFrame()
for i in range(0, k):
df1= pd.DataFrame()
for j in pd.date_range(start_date, periods=5):
print(i, start_date)
path = r'/dbfs/mnt/xxxx/***/Ixxxx/***/'
path1 = os.path.join(path,'XXXX_'+ start_date +'.csv')
if os.path.isfile(path1):
df= pd.read_csv(path1, low_memory=False)
df= df.drop(['Var1', 'Var2', 'Var3'], axis=1)
df= df.drop_duplicates(keep='first')
df.reset_index(drop=True, inplace=True)
df.set_index('VmsNo', inplace=True)
df1= df1.append(df)
start_date = (pd.Timestamp(start_date)- pd.DateOffset(days=1)).strftime('%Y%m%d')
df2= df2.append(df1)