cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

THIAM_HUATTAN
by Valued Contributor
  • 39798 Views
  • 8 replies
  • 2 kudos

Skip number of rows when reading CSV files

staticDataFrame = spark.read.format("csv")\ .option("header", "true").option("inferSchema", "true").load("/FileStore/tables/Consumption_2019/*.csv") when above, I need an option to skip say first 4 lines on each CSV file, How do I do that?

  • 39798 Views
  • 8 replies
  • 2 kudos
Latest Reply
Michael_Appiah
Contributor
  • 2 kudos

The option... .option("skipRows", <number of rows to skip>) ...works for me as well. However, I am surprised that the official Spark doc does not list it as a CSV Data Source Option: https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data...

  • 2 kudos
7 More Replies
shamly
by New Contributor III
  • 3188 Views
  • 3 replies
  • 2 kudos

How to remove extra ENTER line in csv UTF-16 while reading

Dear Friends,I have a csv and it looks like this‡‡Id‡‡,‡‡Version‡‡,‡‡Questionnaire‡‡,‡‡Date‡‡‡‡123456‡‡,‡‡Version2‡‡,‡‡All questions have been answered accurately and the guidance in the questionnaire was understood and followed‡‡,‡‡2010-12-16 00:01:...

  • 3188 Views
  • 3 replies
  • 2 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 2 kudos

This is working fine, from pyspark.sql.functions import regexp_replace   path="dbfs:/FileStore/df/test.csv" dff = spark.read.option("header", "true").option("inferSchema", "true").option('multiline', 'true').option('encoding', 'UTF-8').option("delimi...

  • 2 kudos
2 More Replies
Raagavi
by New Contributor
  • 2201 Views
  • 1 replies
  • 1 kudos

Is there a way to read the CSV files automatically from on-premises network locations and write back to the same from Databricks?

Is there a way to read the CSV files automatically from on-premises network locations and write back to the same from Databricks? 

  • 2201 Views
  • 1 replies
  • 1 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 1 kudos

Hi @Raagavi Rajagopal​ , you can access files on mounted object storage (just an example) or files, please refer: https://docs.databricks.com/files/index.html#access-files-on-mounted-object-storageAnd in the DBFS , CSV files can be read and write fr...

  • 1 kudos
Shay
by New Contributor III
  • 6168 Views
  • 8 replies
  • 6 kudos

Resolved! How do you Upload TXT and CSV files into Shared Workspace in Databricks?

I try to upload the needed files under the right directory of the project to work.The files are zipped first as that is an accepted format. I have a Python project which requires the TXT and CSV format files as they are called and used via .py files ...

  • 6168 Views
  • 8 replies
  • 6 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 6 kudos

@Shay Alam​, can you share the code with which you read the files? Apparently python interprets the file format as a language, so it seems like some options are not filled in correctly.

  • 6 kudos
7 More Replies
lprevost
by Contributor
  • 2286 Views
  • 1 replies
  • 1 kudos

Resolved! Schema inferrence CSV picks up \r carriage returns

I'm using: frame = spark.read.csv(path=bucket+folder, inferSchema = True, header = True, multiLine=True ) to read in a series of CSV ...

  • 2286 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Files saved in Windows operation system contain carriage return and line feed in every line.Please add following option it can help: .option("ignoreTrailingWhiteSpace", true)

  • 1 kudos
Labels