cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

THIAM_HUATTAN
by Valued Contributor
  • 25942 Views
  • 8 replies
  • 2 kudos

Skip number of rows when reading CSV files

staticDataFrame = spark.read.format("csv")\ .option("header", "true").option("inferSchema", "true").load("/FileStore/tables/Consumption_2019/*.csv") when above, I need an option to skip say first 4 lines on each CSV file, How do I do that?

  • 25942 Views
  • 8 replies
  • 2 kudos
Latest Reply
Michael_Appiah
New Contributor III
  • 2 kudos

The option... .option("skipRows", <number of rows to skip>) ...works for me as well. However, I am surprised that the official Spark doc does not list it as a CSV Data Source Option: https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data...

  • 2 kudos
7 More Replies
shamly
by New Contributor III
  • 1701 Views
  • 3 replies
  • 2 kudos

How to remove extra ENTER line in csv UTF-16 while reading

Dear Friends,I have a csv and it looks like this‡‡Id‡‡,‡‡Version‡‡,‡‡Questionnaire‡‡,‡‡Date‡‡‡‡123456‡‡,‡‡Version2‡‡,‡‡All questions have been answered accurately and the guidance in the questionnaire was understood and followed‡‡,‡‡2010-12-16 00:01:...

  • 1701 Views
  • 3 replies
  • 2 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 2 kudos

This is working fine, from pyspark.sql.functions import regexp_replace   path="dbfs:/FileStore/df/test.csv" dff = spark.read.option("header", "true").option("inferSchema", "true").option('multiline', 'true').option('encoding', 'UTF-8').option("delimi...

  • 2 kudos
2 More Replies
Raagavi
by New Contributor
  • 1418 Views
  • 2 replies
  • 1 kudos

Is there a way to read the CSV files automatically from on-premises network locations and write back to the same from Databricks?

Is there a way to read the CSV files automatically from on-premises network locations and write back to the same from Databricks? 

  • 1418 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @Raagavi Rajagopal​ ​, We haven’t heard from you since the last response from @Debayan Mukherjee​, and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to ot...

  • 1 kudos
1 More Replies
hisham1
by New Contributor
  • 1010 Views
  • 2 replies
  • 2 kudos

Resolved! unable to read Csv files from Databricks Database Tables

I amTrying to read a csv file stored in database tables of databricks, but getting error . It is runnin gfine for dbfs but same format not working for Database Tables.

error_db
  • 1010 Views
  • 2 replies
  • 2 kudos
Latest Reply
Kaniz
Community Manager
  • 2 kudos

Hi @Sayed Ali​ , We haven’t heard from you on the last response from me and I was checking back to see if my suggestions helped you. Or else, If you have any solution, please do share that with the community as it can be helpful to others.Also, Pleas...

  • 2 kudos
1 More Replies
Shay
by New Contributor III
  • 3595 Views
  • 8 replies
  • 6 kudos

Resolved! How do you Upload TXT and CSV files into Shared Workspace in Databricks?

I try to upload the needed files under the right directory of the project to work.The files are zipped first as that is an accepted format. I have a Python project which requires the TXT and CSV format files as they are called and used via .py files ...

  • 3595 Views
  • 8 replies
  • 6 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 6 kudos

@Shay Alam​, can you share the code with which you read the files? Apparently python interprets the file format as a language, so it seems like some options are not filled in correctly.

  • 6 kudos
7 More Replies
lprevost
by New Contributor II
  • 1535 Views
  • 1 replies
  • 1 kudos

Resolved! Schema inferrence CSV picks up \r carriage returns

I'm using: frame = spark.read.csv(path=bucket+folder, inferSchema = True, header = True, multiLine=True ) to read in a series of CSV ...

  • 1535 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Files saved in Windows operation system contain carriage return and line feed in every line.Please add following option it can help: .option("ignoreTrailingWhiteSpace", true)

  • 1 kudos
Labels