LZO codec not working for graviton instances
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-12-2024 05:56 AM
Hi databricks:
I have a job where I am saving my data in json format lzo compressed which requires the library lzo-codec
on shifting to graviton instances
I noticed that the same job started throwing exception
Caused by: java.lang.RuntimeException: native-lzo library not available at com.hadoop.compression.lzo.LzoCodec.getCompressorType(LzoCodec.java:155) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150) at com.hadoop.compression.lzo.LzopCodec.getCompressor(LzopCodec.java:171) at com.hadoop.compression.lzo.LzopCodec.createOutputStream(LzopCodec.java:72)
Current Config:
Worker- c6g.4xlarge
Driver- c6g.xlarge
Older config:
Worker- r4.8xlarge
Driver- r4.xlarge
Please suggest a solution
Please suggest a solution
1 REPLY 1
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-13-2024 01:39 AM
For more context, Please use the following code to replicate the error:
# Create a Python list containing JSON objects
json_data = [
{
"id": 1,
"name": "John",
"age": 25
},
{
"id": 2,
"name": "Jane",
"age": 30
},
{
"id": 3,
"name": "Mike",
"age": 35
}
]
# Create a DataFrame using the JSON data
df = spark.createDataFrame(json_data)
# Save the DataFrame in S3 with compression
df.write.format('json').save('s3://path', compression='com.hadoop.compression.lzo.LzopCodec')
Make sure to have lzo-codec installed in your cluster
Make sure to have lzo-codec installed in your cluster
Tried with both R class instances and Graviton C class instances, and it always failed with in case of Graviton instance

