Databricks Community

yatharth · ‎03-12-2024

Hi databricks:

I have a job where I am saving my data in json format lzo compressed which requires the library lzo-codec
on shifting to graviton instances

I noticed that the same job started throwing exception
Caused by: java.lang.RuntimeException: native-lzo library not available at com.hadoop.compression.lzo.LzoCodec.getCompressorType(LzoCodec.java:155) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150) at com.hadoop.compression.lzo.LzopCodec.getCompressor(LzopCodec.java:171) at com.hadoop.compression.lzo.LzopCodec.createOutputStream(LzopCodec.java:72)

Current Config:

Worker- c6g.4xlarge

Driver- c6g.xlarge

Older config:

Worker- r4.8xlarge

Driver- r4.xlarge

Please suggest a solution

yatharth · ‎03-13-2024

For more context, Please use the following code to replicate the error:

# Create a Python list containing JSON objects

json_data = [

{

"id": 1,

"name": "John",

"age": 25

},

{

"id": 2,

"name": "Jane",

"age": 30

},

{

"id": 3,

"name": "Mike",

"age": 35

}

]

# Create a DataFrame using the JSON data

df = spark.createDataFrame(json_data)

# Save the DataFrame in S3 with compression

df.write.format('json').save('s3://path', compression='com.hadoop.compression.lzo.LzopCodec')

Make sure to have lzo-codec installed in your cluster

Tried with both R class instances and Graviton C class instances, and it always failed with in case of Graviton instance

Databricks Community

LZO codec not working for graviton instances

Photos

Connect with Databricks Users in Your Area

Data + AI Summit 2025 — registration now open!

Jumpstart Your Data Journey with Databricks Get Started Days!

Databricks DevConnect: Global Community Meetups for Data Engineers

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Introducing SAP Databricks