cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Can't read large multiline json,

espenol
New Contributor III

Hey! So I'm struggling to read a multiline json. Some details:

  • It's gzipped from the API I get it from
  • just a single file in the folder currrently
  • stored in ADLS Gen2 storage.
  • 95 MB zipped, approximately 1.2 GB unzipped

I can read it just fine using the text read:

Can be read as text, multiline json 

But, if I try to read it normally as json without multiline option, I get corrupted string after some time reading(perhaps as expected)

Corrupted stringBut, if I instead use multiline option then I immediately get an error:

relative path in absolute uri 

Can anyone give me some pointers towards what is wrong? If I uncomment the last line I get the same error:

IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: enheter_full_2022-12-22T13:07:10.4988008Z_b744d2f4-5eb5-41a0-a546-e0514c7db325.json.gz

Does anyone know how to fix this? All the googling I've done suggests this is a problem with my path starting with "/", but my path starts with abfss:...

1 ACCEPTED SOLUTION

Accepted Solutions

daniel_sahal
Esteemed Contributor

That's a pretty old issue with having ':' sign in a file name.

As of now there's no perfect workaround other than simply renaming the file or moving file names into a list (needs to be tested).

You can read more here:

https://stackoverflow.com/questions/48909921/struggling-with-colon-in-file-names

View solution in original post

4 REPLIES 4

-werners-
Esteemed Contributor III

it seems to be your url, and the last part to be more precise.

what happens if you pass the complete path?

daniel_sahal
Esteemed Contributor

That's a pretty old issue with having ':' sign in a file name.

As of now there's no perfect workaround other than simply renaming the file or moving file names into a list (needs to be tested).

You can read more here:

https://stackoverflow.com/questions/48909921/struggling-with-colon-in-file-names

espenol
New Contributor III

Thanks a lot for the help! Removing colon fixed it. Now I need to fix the Data Factory instance that writes to my storage container. Hope it's easy, Data Factory is such a hassle.

Please mark any if the given responses as best. Thank you in advance.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group