Save Spark DataFrame to shape file (.shp format)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-27-2023 05:07 AM
Hello,
I know how to create .shp file from Geopandas dataframe using code similar to this, also mentioned on SO:
gpd_df = geopandas.GeoDataFrame(pandas_df, geometry='geom')
gpd_df .to_file("username/nh.shp")
However I have .parquet files that I can load directly to Spark DataFrame and I want to create and save shape file this way. Unfortunately I'm not sure if that's possible. I can't see .shp format in supported formats. I checked also Sedona but found only Shapefilereader not allowing to save/write. What is the state-of-the-art to operate on shape files?
- Labels:
-
GeopandasDataframe
-
Sedona
-
Sf Username

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-08-2023 08:14 PM
@Bartosz Maciejewski :
Spark does not have native support for writing Shapefiles directly. However, you can use a third-party library such as GeoPandas or PyShp to write your Spark DataFrame to a Shapefile.
Here's an example of how to use GeoPandas to convert a Spark DataFrame to a GeoDataFrame and save it to a Shapefile.
import geopandas as gpd
from pyspark.sql import SparkSession
from shapely.geometry import Point
# create SparkSession
spark = SparkSession.builder.appName("SparkGeoPandas").getOrCreate()
# create sample Spark DataFrame
df = spark.createDataFrame([(1, Point(0, 0)), (2, Point(1, 1))], ["id", "geometry"])
# convert Spark DataFrame to GeoDataFrame using GeoPandas
*** = gpd.GeoDataFrame(df.toPandas(), geometry="geometry")
# save GeoDataFrame to Shapefile
***.to_file("path/to/shapefile.shp", driver="ESRI Shapefile")
You can also use another library 'PyShp' instead of GeoPandas.

