How to read file in pyspark with “]|[” delimiter
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-18-2017 01:14 PM
The data looks like this:
pageId]|[page]|[Position]|[sysId]|[carId 0005]|[bmw]|[south]|[AD6]|[OP4
There are atleast 50 columns and millions of rows.
I did try to use below code to read:
dff = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").option("delimiter", "]|[").load(trainingdata+"part-00000")
it gives me following error:
IllegalArgumentException: u'Delimiter cannot be more than one character: ]|['