Databricks Community

Yyyyy · ‎07-24-2024

Expected no of lines is - 16400

Showing only 20 No of records

Script

spark.conf.set(

"~~REDACTED~~",

"~~REDACTED~~"

)

# File location

file_location = "~~REDACTED~~"

# Read in the data to dataframe df

df = spark.read.format("CSV").option("inferSchema", "true").option("header", "true").option("delimiter", ",").load(file_location).show()

romy · ‎07-24-2024

Hi, the show() method prints only the top 20 rows by default: DataFrame.show(n: int = 20, truncate: Union[bool, int] = True, vertical: bool = False) (cf https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.show...)

You can either use show() with a bigger n parameter, or use the Databricks display() command to print the dataframe in a tabular format:

df = spark.read.format("CSV").option("inferSchema", "true").option("header", "true").option("delimiter", ",").load(file_location)

display(df)

https://www.databricks.com/spark/getting-started-with-apache-spark/dataframes#view-the-dataframe

Yyyyy · ‎07-24-2024

hi, pls look help me

spark.conf.set(

"REDACTED",

"REDACTED"

)

# File location

file_location = "REDACTED"

# Read in the data to dataframe df

df = spark.read.format("CSV").option("inferSchema", "true").option("header", "true").option("delimiter", ",").load(file_location)

# Display the dataframe

display(df)

error - Failed to upload command result to DBFS. Error message: Status code: 301 Moved Permanently, Error message: <?xml version="1.0" encoding="UTF-8"?> <Error><Code>PermanentRedirect</Code><Message>The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.</Message><Endpoint>command-results.s3.amazonaws.com</Endpoint><Bucket>command-results</Bucket><RequestId>AMNJ84M2CZ0G4MFK</RequestId><HostId>rXnbI5MLQZdZmhOfF/SbvNDErLlAqj92hFAxcTi4cwGqo2Qe2E1VIDkMoyAOUpIkBLePYy4+up4=</HostId></Error>

once, i am tring to use display() i am getting above error