cancel
Showing results for 
Search instead for 
Did you mean: 
Community Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to read gcs paths with square barkets?

Sparktaculer
New Contributor II

Hi!

 

I'm trying to read a file using Scala from gcs that has square brackets in the file path.

I keep getting the following error:URISyntaxException: Illegal character in path at index 209

I tried putting an extra front slash in front of them but it still didn't work.

Would really appreciate you're help ere!

5 REPLIES 5

Kaniz
Community Manager
Community Manager

Hi @SparktaculerWhen reading a file from GCS using Scala, if the file path contains square brackets, you can try encoding the square brackets using URL encoding. For example, replace "[" with "%5B" and "]" with "%5D". Then use the encoded file path in your code.

Here's an example:

import org.apache.hadoop.fs.{FileSystem, Path}
import java.net.URI

val path = "gs://my-bucket/path/with/%5Bsquare%5D/brackets.csv"
val fs = FileSystem.get(new URI(path), sc.hadoopConfiguration)
val file = fs.open(new Path(path))
val lines = scala.io.Source.fromInputStream(file).getLines()

In this example, the file path contains the square brackets encoded as "%5B" and "%5D". The FileSystem.get method is used to get a handle to the file system, and the fs.open method is used to open the file. Finally, the scala.io.Source.fromInputStream method is used to read the contents of the file.Sources:
https://docs.databricks.com/data/data-sources/read-gcs.html
https://en.wikipedia.org/wiki/Percent-encoding

Sparktaculer
New Contributor II

Hi @Kaniz ! Thank you for your help.

However, when I tray using you're code I still get an error : "URISyntaxException: Illegal character in path at index "

I'm trying to read a txt file. This is the file path: 

"gs://my-bucket/my Data/sparkTests/GM-1220, reading a txt/Version1/3 Model Creation/3 models_to_check/[no_country] (2)/test.txt"
 
This is how I'm trying to read the file:
def loadFromGCS(gcsUrl: String😞 (String, Boolean, RecordClassifier) = {
  val content = spark.sparkContext.textFile(gcsUrl).collect().mkString("\n")
  print(content)}

 

Tharun-Kumar
Honored Contributor II
Honored Contributor II

Hi @Sparktaculer 

In Spark, you can disable the option globPaths. This will skip the pattern matching that happens during file reads.

spark.read.option("__globPaths__", False).format("").load("path[]")

 

Anonymous
Not applicable

Hi @Sparktaculer

We haven't heard from you since the last response from @Tharun-Kumar and @Kaniz , and I was checking back to see if her suggestions helped you.

Or else, If you have any solution, please share it with the community, as it can be helpful to others. 

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Sparktaculer
New Contributor II

Hi @Kaniz ! Thank you for your help.

However, when I try using you're code I still get an error: "URISyntaxException: Illegal character in path at index "

I'm trying to read a txt file. This is the file path: 

"gs://my-bucket/my Data/sparkTests/GM-1220, reading a txt/Version1/3 Model Creation/3 models_to_check/[no_country] (2)/test.txt"
This is how I'm trying to read the file:
def loadFromGCS(gcsUrlString
 (StringBooleanRecordClassifier) = {
  val content = spark.sparkContext.textFile(gcsUrl).collect().mkString("\n")
  print(content)}
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!