cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How can i save a parquet file using pandas with a data factory orchestrated notebook?

ricperelli
New Contributor II

Hi guys,

this is my first question, feel free to correct me if i'm doing something wrong.

Anyway, i'm facing a really strange problem, i have a notebook in which i'm performing some pandas analysis, after that i save the resulting dataframe in a parquet file, and everything is working fine.

But the music change when i run the notebook using azure data factory.

Basically, the saved file is empty, 0 byte contained in it.

This is the function that i'm using to save the file:

def save_file(obj, filename, kind="csv"):
    # Backing up
    outdir = '/dbfs/FileStore/'
    outname = outdir+filename+'.'+kind
    if kind=="csv":
        obj.to_csv(outname, index=False, encoding="utf-8")
    elif kind=="xlsx":
        obj.to_excel(outname, index=False)  # encoding="utf-8"
    elif kind=="parquet":
        obj.to_parquet(outname)
    elif kind=="pkl":
        pass  # Not Implemented
    print(filename, " saved")
    return

One more note that may be useful;

even the "normal execution", without azure data factory, saves the file empty, but if i display the pandas data frame before the saving, it work correctly.

This is weird.

Thank you 🙂

0 REPLIES 0
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.