โ07-16-2023 07:47 AM
Hi!
I'm trying to read a file using Scala from gcs that has square brackets in the file path.
I keep getting the following error:URISyntaxException: Illegal character in path at index 209
I tried putting an extra front slash in front of them but it still didn't work.
Would really appreciate you're help ere!
โ07-18-2023 04:17 AM
Hi @Sparktaculer, When reading a file from GCS using Scala, if the file path contains square brackets, you can try encoding the square brackets using URL encoding. For example, replace "[" with "%5B" and "]" with "%5D". Then use the encoded file path in your code.
Here's an example:
import org.apache.hadoop.fs.{FileSystem, Path}
import java.net.URI
val path = "gs://my-bucket/path/with/%5Bsquare%5D/brackets.csv"
val fs = FileSystem.get(new URI(path), sc.hadoopConfiguration)
val file = fs.open(new Path(path))
val lines = scala.io.Source.fromInputStream(file).getLines()
In this example, the file path contains the square brackets encoded as "%5B" and "%5D". The FileSystem.get
method is used to get a handle to the file system, and the fs.open
method is used to open the file. Finally, the scala.io.Source.fromInputStream
method is used to read the contents of the file.Sources:
- https://docs.databricks.com/data/data-sources/read-gcs.html
- https://en.wikipedia.org/wiki/Percent-encoding
โ07-19-2023 01:47 AM - edited โ07-19-2023 01:53 AM
Hi @Kaniz_Fatma ! Thank you for your help.
However, when I tray using you're code I still get an error : "URISyntaxException: Illegal character in path at index "
I'm trying to read a txt file. This is the file path:
โ07-18-2023 04:34 AM
In Spark, you can disable the option globPaths. This will skip the pattern matching that happens during file reads.
spark.read.option("__globPaths__", False).format("").load("path[]")
โ07-19-2023 03:27 AM
Hi @Sparktaculer,
We haven't heard from you since the last response from @Tharun-Kumar and @Kaniz_Fatma , and I was checking back to see if her suggestions helped you.
Or else, If you have any solution, please share it with the community, as it can be helpful to others.
Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.
โ07-19-2023 03:50 AM
Hi @Kaniz ! Thank you for your help.
However, when I try using you're code I still get an error: "URISyntaxException: Illegal character in path at index "
I'm trying to read a txt file. This is the file path:
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group