cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

THIAM_HUATTAN
by Valued Contributor
  • 44577 Views
  • 8 replies
  • 2 kudos

Skip number of rows when reading CSV files

staticDataFrame = spark.read.format("csv")\ .option("header", "true").option("inferSchema", "true").load("/FileStore/tables/Consumption_2019/*.csv") when above, I need an option to skip say first 4 lines on each CSV file, How do I do that?

  • 44577 Views
  • 8 replies
  • 2 kudos
Latest Reply
Michael_Appiah
Contributor
  • 2 kudos

The option... .option("skipRows", <number of rows to skip>) ...works for me as well. However, I am surprised that the official Spark doc does not list it as a CSV Data Source Option: https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data...

  • 2 kudos
7 More Replies
shamly
by New Contributor III
  • 3885 Views
  • 3 replies
  • 2 kudos

How to remove extra ENTER line in csv UTF-16 while reading

Dear Friends,I have a csv and it looks like this‡‡Id‡‡,‡‡Version‡‡,‡‡Questionnaire‡‡,‡‡Date‡‡‡‡123456‡‡,‡‡Version2‡‡,‡‡All questions have been answered accurately and the guidance in the questionnaire was understood and followed‡‡,‡‡2010-12-16 00:01:...

  • 3885 Views
  • 3 replies
  • 2 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 2 kudos

This is working fine, from pyspark.sql.functions import regexp_replace   path="dbfs:/FileStore/df/test.csv" dff = spark.read.option("header", "true").option("inferSchema", "true").option('multiline', 'true').option('encoding', 'UTF-8').option("delimi...

  • 2 kudos
2 More Replies
Raagavi
by New Contributor
  • 2503 Views
  • 1 replies
  • 1 kudos

Is there a way to read the CSV files automatically from on-premises network locations and write back to the same from Databricks?

Is there a way to read the CSV files automatically from on-premises network locations and write back to the same from Databricks? 

  • 2503 Views
  • 1 replies
  • 1 kudos
Latest Reply
Debayan
Databricks Employee
  • 1 kudos

Hi @Raagavi Rajagopal​ , you can access files on mounted object storage (just an example) or files, please refer: https://docs.databricks.com/files/index.html#access-files-on-mounted-object-storageAnd in the DBFS , CSV files can be read and write fr...

  • 1 kudos
Shay
by New Contributor III
  • 7046 Views
  • 8 replies
  • 6 kudos

Resolved! How do you Upload TXT and CSV files into Shared Workspace in Databricks?

I try to upload the needed files under the right directory of the project to work.The files are zipped first as that is an accepted format. I have a Python project which requires the TXT and CSV format files as they are called and used via .py files ...

  • 7046 Views
  • 8 replies
  • 6 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 6 kudos

@Shay Alam​, can you share the code with which you read the files? Apparently python interprets the file format as a language, so it seems like some options are not filled in correctly.

  • 6 kudos
7 More Replies
lprevost
by Contributor
  • 2576 Views
  • 1 replies
  • 1 kudos

Resolved! Schema inferrence CSV picks up \r carriage returns

I'm using: frame = spark.read.csv(path=bucket+folder, inferSchema = True, header = True, multiLine=True ) to read in a series of CSV ...

  • 2576 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Files saved in Windows operation system contain carriage return and line feed in every line.Please add following option it can help: .option("ignoreTrailingWhiteSpace", true)

  • 1 kudos
Labels