Databricks Community

Sparktaculer · ‎07-16-2023

Hi!

I'm trying to read a file using Scala from gcs that has square brackets in the file path.

I keep getting the following error:URISyntaxException: Illegal character in path at index 209

I tried putting an extra front slash in front of them but it still didn't work.

Would really appreciate you're help ere!

Kaniz_Fatma · ‎07-18-2023

Hi @Sparktaculer, When reading a file from GCS using Scala, if the file path contains square brackets, you can try encoding the square brackets using URL encoding. For example, replace "[" with "%5B" and "]" with "%5D". Then use the encoded file path in your code.

Here's an example:

import org.apache.hadoop.fs.{FileSystem, Path}
import java.net.URI

val path = "gs://my-bucket/path/with/%5Bsquare%5D/brackets.csv"
val fs = FileSystem.get(new URI(path), sc.hadoopConfiguration)
val file = fs.open(new Path(path))
val lines = scala.io.Source.fromInputStream(file).getLines()

In this example, the file path contains the square brackets encoded as "%5B" and "%5D". The FileSystem.get method is used to get a handle to the file system, and the fs.open method is used to open the file. Finally, the scala.io.Source.fromInputStream method is used to read the contents of the file.Sources:
- https://docs.databricks.com/data/data-sources/read-gcs.html
- https://en.wikipedia.org/wiki/Percent-encoding

Sparktaculer · ‎07-19-2023

Hi @Kaniz_Fatma ! Thank you for your help.

However, when I tray using you're code I still get an error : "URISyntaxException: Illegal character in path at index "

I'm trying to read a txt file. This is the file path:

"gs://my-bucket/my Data/sparkTests/GM-1220, reading a txt/Version1/3 Model Creation/3 models_to_check/[no_country] (2)/test.txt"

This is how I'm trying to read the file:

def loadFromGCS(gcsUrl: String😞 (String, Boolean, RecordClassifier) = {

val content = spark.sparkContext.textFile(gcsUrl).collect().mkString("\n")

print(content)}

Tharun-Kumar · ‎07-18-2023

Hi @Sparktaculer

In Spark, you can disable the option globPaths. This will skip the pattern matching that happens during file reads.

spark.read.option("__globPaths__", False).format("").load("path[]")

Anonymous · ‎07-19-2023

Hi @Sparktaculer,

We haven't heard from you since the last response from @Tharun-Kumar and @Kaniz_Fatma , and I was checking back to see if her suggestions helped you.

Or else, If you have any solution, please share it with the community, as it can be helpful to others.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Sparktaculer · ‎07-19-2023

Hi @Kaniz ! Thank you for your help.

However, when I try using you're code I still get an error: "URISyntaxException: Illegal character in path at index "

I'm trying to read a txt file. This is the file path:

"gs://my-bucket/my Data/sparkTests/GM-1220, reading a txt/Version1/3 Model Creation/3 models_to_check/[no_country] (2)/test.txt"

This is how I'm trying to read the file:

def loadFromGCS(gcsUrl: String
(String, Boolean, RecordClassifier) = {

val content = spark.sparkContext.textFile(gcsUrl).collect().mkString("\n")

print(content)}

Databricks Community

How to read gcs paths with square barkets?

Data + AI World Tour 2024

Databricks Community Social - July 31 - 8AM PT

Get Started With Generative AI on Databricks

Submit your feedback and win a $25 gift card!

🔔 ALERT: Act Now to Protect Your Community Account; Secure Your Details Before It's Too Late!