<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic how to infer csv schema default all columns like string using spark- csv? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-infer-csv-schema-default-all-columns-like-string-using/m-p/29560#M21283</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I am using spark- csv utility, but I need when it infer schema all columns be transform in string columns by default.&lt;/P&gt;
&lt;P&gt;Thanks in advance.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 19 Jul 2016 15:17:07 GMT</pubDate>
    <dc:creator>Jasam</dc:creator>
    <dc:date>2016-07-19T15:17:07Z</dc:date>
    <item>
      <title>how to infer csv schema default all columns like string using spark- csv?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-infer-csv-schema-default-all-columns-like-string-using/m-p/29560#M21283</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I am using spark- csv utility, but I need when it infer schema all columns be transform in string columns by default.&lt;/P&gt;
&lt;P&gt;Thanks in advance.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 19 Jul 2016 15:17:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-infer-csv-schema-default-all-columns-like-string-using/m-p/29560#M21283</guid>
      <dc:creator>Jasam</dc:creator>
      <dc:date>2016-07-19T15:17:07Z</dc:date>
    </item>
    <item>
      <title>Re: how to infer csv schema default all columns like string using spark- csv?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-infer-csv-schema-default-all-columns-like-string-using/m-p/29561#M21284</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;You can manually specify schema, e.g. from (https://github.com/databricks/spark-csv):&lt;/P&gt;import org.apache.spark.sql.SQLContext import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType};
&lt;P&gt;&lt;/P&gt; 
&lt;P&gt;val sqlContext = new SQLContext(sc) val customSchema = StructType(Array( StructField("year", IntegerType, true), StructField("make", StringType, true), StructField("model", StringType, true), StructField("comment", StringType, true), StructField("blank", StringType, true)))&lt;/P&gt; 
&lt;P&gt;val df = sqlContext.read .format("com.databricks.spark.csv") .option("header", "true") // Use first line of all files as header .schema(customSchema) .load("cars.csv")&lt;/P&gt; 
&lt;P&gt;val selectedData = df.select("year", "model") selectedData.write .format("com.databricks.spark.csv") .option("header", "true") .save("newcars.csv") &lt;/P&gt;</description>
      <pubDate>Fri, 22 Jul 2016 16:30:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-infer-csv-schema-default-all-columns-like-string-using/m-p/29561#M21284</guid>
      <dc:creator>User16789201666</dc:creator>
      <dc:date>2016-07-22T16:30:33Z</dc:date>
    </item>
    <item>
      <title>Re: how to infer csv schema default all columns like string using spark- csv?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-infer-csv-schema-default-all-columns-like-string-using/m-p/29562#M21285</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I was solving the same issue, that I wanted all the columns as text and deal with correct cast later which I have solved by recasting all the column to string after I've inferred the schema. I'm not sure if it's efficient, but it works. &lt;/P&gt;#pyspark path = '...' df = spark.read \ .option("inferschema", "true") \ .csv(df)
&lt;P&gt;&lt;/P&gt; 
&lt;P&gt;for column in df.columns: df= df.withColumn(column,df[column].cast('string'))&lt;/P&gt; 
&lt;B&gt;then you have to read again with changed schema&lt;/B&gt; 
&lt;P&gt;f = spark.read.option("schema", df.schema).csv(df) &lt;/P&gt;
&lt;P&gt;This however doesn't deal with nested columns, though csv doesn't create any nested structs, I hope.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 15 Nov 2018 13:57:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-infer-csv-schema-default-all-columns-like-string-using/m-p/29562#M21285</guid>
      <dc:creator>vadeka</dc:creator>
      <dc:date>2018-11-15T13:57:47Z</dc:date>
    </item>
    <item>
      <title>Re: how to infer csv schema default all columns like string using spark- csv?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-infer-csv-schema-default-all-columns-like-string-using/m-p/29563#M21286</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;@peyman what if I don't want to manually specify the schema?&lt;/P&gt;
&lt;P&gt;For example, I have a vendor that can't build a valid .csv file. I just need to import it somewhere so I can explore the data and find the errors.&lt;/P&gt;
&lt;P&gt;Just like the original author's question? How do I tell Spark to read all columns as string?&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Apr 2021 21:09:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-infer-csv-schema-default-all-columns-like-string-using/m-p/29563#M21286</guid>
      <dc:creator>jhoop2002</dc:creator>
      <dc:date>2021-04-19T21:09:25Z</dc:date>
    </item>
  </channel>
</rss>

