10-14-2021 01:59 AM
Hello guys, I'm trying to read JSON files from the s3 bucket. but no matter what I try I get Query returned no result or if I don't specify the schema I get unable to infer a schema.
I tried to mount the s3 bucket, still not works.
here is some code that I tried:
df = spark.read.json('dbfs:/mnt/path_to_json', multiLine="true", schema= json_schema)
df = spark.read.option('multiline','true').format('json').load(path_to_json)
df = spark.read.json('s3a:// path_to _json, multiline=True)
display(df)
The json file look like this:
{
'key1' : 'value1',
'key2' : 'value2',
...
}
hope you guys can help me,
Thanks!
**EDIT**: inside the JSON i have string value that contains " \ " which throw corrupted error, is there any way to overcome this without change the value for the specific key?
10-14-2021 04:51 AM
I think i found the problem, inside the json i have a string value that contains '\'
and its throw corrupted error, any idea how to overcome on this without change all the json files?
10-14-2021 03:26 AM
Please try the below code and let me know if it helps you.
%scala
val mdf = spark.read.option("multiline", "true").json("s3://<path-to-jsonfile>/sample.json")
mdf.show(false)
10-14-2021 03:31 AM
Thanks for your answer, I get unable to infer a schema error.
error :
org.apache.spark.sql.AnalysisException: Unable to infer schema for JSON. It must be specified manually.
tired s3:// and s3a:// -- both didn't work.
10-14-2021 03:34 AM
Please verify json in some online json validator. Try double quotes in json - had issue with single quotes that one time.
Your code examples are correct.
10-14-2021 03:44 AM
the json is valid. when i tried to write a json file in fs and then read it evrey thing went fine.
dbutils.fs.put("/tmp/test.json", """
{"string":"string1",
"int":1,
"array":[1,2,3],
"dict": {"key": "value1"}}
""", True)
df = spark.read.json('/tmp/test.json')
but when tried to read from s3 bucket, or from mount its failed
10-14-2021 03:55 AM
other ideas:
spark.read.text()
10-14-2021 04:24 AM
I wrote the real json inside /tmp/test.json and tried to read it now.
when i didn't defined the schema i got an error:
Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the
referenced columns only include the internal corrupt record column
(named _corrupt_record by default). For example:
spark.read.schema(schema).json(file).filter($"_corrupt_record".isNotNull).count()
and spark.read.schema(schema).json(file).select("_corrupt_record").show().
Instead, you can cache or save the parsed results and then send the same query.
For example, val df = spark.read.schema(schema).json(file).cache() and then
df.filter($"_corrupt_record".isNotNull).count().;
but when i defined the schema i got a df with all columns null.
i have access to s3 bucket, since i already read text files from there and the json files have data inside it ( 800 KB)
Thanks a lot for your help
10-14-2021 03:42 AM
Please refer to the doc that helps you to read JSON.
If you are getting this error the problem should be with the JSON schema. Please validate it.
As a test, create a simple JSON file (you can get it on the internet), upload it to your S3 bucket, and try to read that. If it works then your JSON file schema has to be checked.
Further, the methods that you tried should also work if the JSON format is valid.
10-14-2021 04:51 AM
I think i found the problem, inside the json i have a string value that contains '\'
and its throw corrupted error, any idea how to overcome on this without change all the json files?
10-14-2021 04:59 AM
try to experiment with this options:
df = spark.read\
.option("mode", "PERMISSIVE")\
.option("columnNameOfCorruptRecord", "_corrupt_record")\
.json(...
10-14-2021 05:36 AM
still not working -- same corrupted error. I uploaded to s3 bucket same JSON just without the problematic value and every thing went well.
10-14-2021 06:20 AM
so last effort is just replace '\' in files like you do. You can do that programmatically before reading json.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group