topic showing only a limited number of lines from the CSV file in Data Engineering

showing only a limited number of lines from the CSV file

Yyyyy — Wed, 24 Jul 2024 10:48:50 GMT

Expected no of lines is - 16400

Showing only 20 No of records

Script

spark.conf.set(

"~~REDACTED~~",

"~~REDACTED~~"

)

# File location

file_location = "~~REDACTED~~"

# Read in the data to dataframe df

df = spark.read.format("CSV").option("inferSchema", "true").option("header", "true").option("delimiter", ",").load(file_location).show()

Re: showing only a limited number of lines from the CSV file

romy — Wed, 24 Jul 2024 09:45:03 GMT

Hi, the show() method prints only the top 20 rows by default: DataFrame.show(n: int = 20, truncate: Union[bool, int] = True, vertical: bool = False) (cf https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.show.html)

You can either use show() with a bigger n parameter, or use the Databricks display() command to print the dataframe in a tabular format:

df = spark.read.format("CSV").option("inferSchema", "true").option("header", "true").option("delimiter", ",").load(file_location) display(df)

https://www.databricks.com/spark/getting-started-with-apache-spark/dataframes#view-the-dataframe

Re: showing only a limited number of lines from the CSV file

Yyyyy — Wed, 24 Jul 2024 11:14:04 GMT

hi, pls look help me

spark.conf.set(

"REDACTED",

"REDACTED"

)

# File location

file_location = "REDACTED"

# Read in the data to dataframe df

df = spark.read.format("CSV").option("inferSchema", "true").option("header", "true").option("delimiter", ",").load(file_location)

# Display the dataframe

display(df)

error - Failed to upload command result to DBFS. Error message: Status code: 301 Moved Permanently, Error message: <?xml version="1.0" encoding="UTF-8"?> <Error><Code>PermanentRedirect</Code><Message>The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.</Message><Endpoint>command-results.s3.amazonaws.com</Endpoint><Bucket>command-results</Bucket><RequestId>AMNJ84M2CZ0G4MFK</RequestId><HostId>rXnbI5MLQZdZmhOfF/SbvNDErLlAqj92hFAxcTi4cwGqo2Qe2E1VIDkMoyAOUpIkBLePYy4+up4=</HostId></Error>

once, i am tring to use display() i am getting above error

Re: showing only a limited number of lines from the CSV file

szymon_dybczak — Wed, 24 Jul 2024 10:56:36 GMT

Hi @Yyyyy ,

You should edit your question and redacted key your'e setting in spark session.