cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks pyspark - Find columns in xls file.

weldermartins
Honored Contributor

Hello everyone, every day I extract data into xls files but the column position changes every day. Is there any way to find these columns within the file?

Here's a snippet of my code.

df = spark.read.format("com.crealytics.spark.excel")\
  .option("header", "true")\
  .schema(schema)\
  .option("dataAddress", "'releases'!A27:D78") \
  .load("dbfs:/FileStore/tables/invoice_september.xls")
df.display()

1 ACCEPTED SOLUTION

Accepted Solutions

You can also do df.printSchema() to check. Or even dbutils.fs.head(<file_path>) to check the header's position. Docs https://docs.databricks.com/dev-tools/databricks-utils.html

View solution in original post

4 REPLIES 4

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi, Thanks for reaching out to community.databricks.com.

Please refer and let us know if this helps, you can find column names: https://sparkbyexamples.com/pyspark/pyspark-find-datatype-column-names-of-dataframe/

You can also do df.printSchema() to check. Or even dbutils.fs.head(<file_path>) to check the header's position. Docs https://docs.databricks.com/dev-tools/databricks-utils.html

Vidula
Honored Contributor

Hi @welder martins​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

weldermartins
Honored Contributor

Hello, come shape. Thanks!