07-19-2021 07:16 PM
Hello:
As you can see from below link, that it support 7 file formats. I am dealing with GeoSpatial Shape files and I want to know if Auto Loader can support Shape Files ???
Any help on this is greatly appreciated.Thanks.
avro: Avro filebinaryFile: Binary file csv: CSV file
json: JSON file orc: ORC file
parquet: Parquet file* text: Text file
09-27-2021 11:46 AM
Hi @Jay DAVE ,
Currently, shapefiles are not a supported file-type when using auto-loader. Would you be willing to share more about your use case? I am the Product Manager responsible for Geospatial in Databricks, and I need help from customers like you to better understand what you are doing with spatial data, how often you refresh the data, how big it is, etc. Any information you can share would be great!
Kent Marten
PM Databricks
09-27-2021 11:46 AM
Hi @Jay DAVE ,
Currently, shapefiles are not a supported file-type when using auto-loader. Would you be willing to share more about your use case? I am the Product Manager responsible for Geospatial in Databricks, and I need help from customers like you to better understand what you are doing with spatial data, how often you refresh the data, how big it is, etc. Any information you can share would be great!
Kent Marten
PM Databricks
09-29-2021 09:50 PM
Hello Kent:
Thanks for your reply. we receive .shape files from Satellites, LiDAR, Drones etc... consist of Geometry data along with other data points. this helps to get data insights based on geo location of assets where we do further EDA in Machine Learning.
It depends on requirements but data will be refresh weekly.
FYI ..... if you read this link, you will get better idea about all Oil & Gas companies are going to do with this .shape files.
Thanks
09-30-2021 09:19 AM
@Jay DAVE ,
How are you solving your analysis with geo-data today?
Are you using a GIS or spatial-ETL tool?
If you could upload shapefiles, what else would you want to do with that data? Run spatial operations against it -- like spatially joining your point datasets to your boundaries?
Can I email you and ask more questions 🙂
Kent Marten
PM Databricks
09-29-2021 01:56 AM
You could try to use the binary file type. But the disadvantage of this is that the content of the shape files will be put into a column, that might not be what you want.
If you absolutely want to use the autoloader, maybe some thinking outside the box can help.
What if you convert the shape files to geojson or topojson (this is not hard to do) and then use the json file format for autoloader? I have not tried this but it might just work, and in the mean time Kent can do his thing 🙂
09-29-2021 09:55 PM
Hello Werners:
Thanks for your reply, i agree to an extent but shape files are best way to handle geometry\geography data. converting\translating is a careful consideration keeping data integrity & corruption in a mind.
For now I am using Azure event grid & function to automate processing of shape files.
Thanks
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group