cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Can't read large multiline json,

espenol
New Contributor III

Hey! So I'm struggling to read a multiline json. Some details:

  • It's gzipped from the API I get it from
  • just a single file in the folder currrently
  • stored in ADLS Gen2 storage.
  • 95 MB zipped, approximately 1.2 GB unzipped

I can read it just fine using the text read:

Can be read as text, multiline json 

But, if I try to read it normally as json without multiline option, I get corrupted string after some time reading(perhaps as expected)

Corrupted stringBut, if I instead use multiline option then I immediately get an error:

relative path in absolute uri 

Can anyone give me some pointers towards what is wrong? If I uncomment the last line I get the same error:

IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: enheter_full_2022-12-22T13:07:10.4988008Z_b744d2f4-5eb5-41a0-a546-e0514c7db325.json.gz

Does anyone know how to fix this? All the googling I've done suggests this is a problem with my path starting with "/", but my path starts with abfss:...

1 ACCEPTED SOLUTION

Accepted Solutions

daniel_sahal
Honored Contributor III

That's a pretty old issue with having ':' sign in a file name.

As of now there's no perfect workaround other than simply renaming the file or moving file names into a list (needs to be tested).

You can read more here:

https://stackoverflow.com/questions/48909921/struggling-with-colon-in-file-names

View solution in original post

4 REPLIES 4

-werners-
Esteemed Contributor III

it seems to be your url, and the last part to be more precise.

what happens if you pass the complete path?

daniel_sahal
Honored Contributor III

That's a pretty old issue with having ':' sign in a file name.

As of now there's no perfect workaround other than simply renaming the file or moving file names into a list (needs to be tested).

You can read more here:

https://stackoverflow.com/questions/48909921/struggling-with-colon-in-file-names

espenol
New Contributor III

Thanks a lot for the help! Removing colon fixed it. Now I need to fix the Data Factory instance that writes to my storage container. Hope it's easy, Data Factory is such a hassle.

Please mark any if the given responses as best. Thank you in advance.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.