run md5 using CLI
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-14-2024 09:00 AM
Hi,
I want to run a md5 checksum on the uploaded file to databricks. I can generate md5 on the local file but how do I generate one on uploaded file on databricks using CLI (Command line interface). Any help would be appreciated.
I tried running databricks fs md5 but it shows that md5 is not supported.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-15-2024 07:21 AM
Thanks Kaniz. I do get the MD5 hash of the file locally and then I upload it to Databricks Volume. I suppose it is Delta Lake Gen 2 storage type, but I am not able to generate MD5 using my code (running on local machine) of this uploaded file.
If we take a step back, the only reason I am doing MD5 checksum is to check the data integrity. If there is any other way, I can confirm that uploaded file from on-prem to Databricks volume is exactly same, then my problem would be solved. Any idea/suggestions?