<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Error when query a table created by DLT pipeline; &amp;quot;Couldn't find value of a column&amp;quot; in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/error-when-query-a-table-created-by-dlt-pipeline-quot-couldn-t/m-p/15860#M10133</link>
    <description>&lt;P&gt;Hi, &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I create a table using DLT pipeline (triggered once). In the ETL process, I add a new column to the table with Null values by:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;output = output.withColumn('Indicator_Latest_Value_Date', F.lit(None))&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Pipeline works and I don't get any error. But, when I get query from the resulting table by:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;stage_df = spark.sql('select * from marketing.stage_macroeconomics_manual_indicators')
display(stage_df)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I get the following error (I couldn't copy the whole message, too long):&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;java.lang.IllegalStateException: Couldn't find Indicator_Latest_Value_Date#9439 in [Country#9420,Indicator_Name#9421,Indicator_Source#9422,Indicator_Source_URL#9423,Indicator_Unit#9424,Indicator_Category_Group#9425,Indicator_Adjustment#9426,Indicator_Frequency#9427,Calendar_Year#9428,Month_Number#9429,Indicator_Date#9430,Indicator_Value#9431,Excel_Input_Column_Names#9432,Excel_Input_File#9433,Excel_Input_Sheet#9434,Excel_Ingest_Datetime#9435,ETL_InputFile#9436,ETL_LoadDate#9437,ETL_SourceFile_Date#9438,Source_System#9440,Indicator_Title#9441]&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
&amp;lt;command-1891366018233962&amp;gt; in &amp;lt;cell line: 2&amp;gt;()
      1 stage_df = spark.sql('select * from marketing.stage_macroeconomics_manual_indicators')
----&amp;gt; 2 display(stage_df)
&amp;nbsp;
/databricks/python_shell/dbruntime/display.py in display(self, input, *args, **kwargs)
     81                     raise Exception('Triggers can only be set for streaming queries.')
     82 
---&amp;gt; 83                 self.add_custom_display_data("table", input._jdf)
     84 
     85         elif isinstance(input, list):
&amp;nbsp;
/databricks/python_shell/dbruntime/display.py in add_custom_display_data(self, data_type, data)
     34     def add_custom_display_data(self, data_type, data):
     35         custom_display_key = str(uuid.uuid4())
---&amp;gt; 36         return_code = self.entry_point.addCustomDisplayData(custom_display_key, data_type, data)
     37         ip_display({
     38             "application/vnd.databricks.v1+display": custom_display_key,
&amp;nbsp;
/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1319 
   1320         answer = self.gateway_client.send_command(command)
-&amp;gt; 1321         return_value = get_return_value(
   1322             answer, self.gateway_client, self.target_id, self.name)
   1323 
&amp;nbsp;
/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
    194     def deco(*a: Any, **kw: Any) -&amp;gt; Any:
    195         try:
--&amp;gt; 196             return f(*a, **kw)
    197         except Py4JJavaError as e:
    198             converted = convert_exception(e.java_exception)
&amp;nbsp;
/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    324             value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325             if answer[1] == REFERENCE_TYPE:
--&amp;gt; 326                 raise Py4JJavaError(
    327                     "An error occurred while calling {0}{1}{2}.\n".
    328                     format(target_id, ".", name), value)
&amp;nbsp;
Py4JJavaError: An error occurred while calling t.addCustomDisplayData.
: java.lang.IllegalStateException: Couldn't find Indicator_Latest_Value_Date#9439 in [Country#9420,Indicator_Name#9421,Indicator_Source#9422,Indicator_Source_URL#9423,Indicator_Unit#9424,Indicator_Category_Group#9425,Indicator_Adjustment#9426,Indicator_Frequency#9427,Calendar_Year#9428,Month_Number#9429,Indicator_Date#9430,Indicator_Value#9431,Excel_Input_Column_Names#9432,Excel_Input_File#9433,Excel_Input_Sheet#9434,Excel_Ingest_Datetime#9435,ETL_InputFile#9436,ETL_LoadDate#9437,ETL_SourceFile_Date#9438,Source_System#9440,Indicator_Title#9441]
	at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)
	at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:99)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:456)
	at org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:73)
	at org.apache.spark.sql.catalyst.expressions.BindReferences$.$anonfun$bindReferences$1(BoundAttribute.scala:94)
	at scala.collection.immutable.List.map(List.scala:297)
	at org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReferences(BoundAttribute.scala:94)
	at org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:70)
	at org.apache.spark.sql.execution.CodegenSupport.consume(WholeStageCodegenExec.scala:197)
	at org.apache.spark.sql.execution.CodegenSupport.consume$(WholeStageCodegenExec.scala:152)
	at org.apache.spark.sql.execution.ColumnarToRowExec.consume(Columnar.scala:71)
	at org.apache.spark.sql.execution.ColumnarToRowExec.doProduce(Columnar.scala:202)
	at org.apache.spark.sql.execution.CodegenSupport.$anonfun$produce$1(WholeStageCodegenExec.scala:98)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:269)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:265)
	at org.apache.spark.sql.execution.CodegenSupport.produce(WholeStageCodegenExec.scala:93)
	at org.apache.spark.sql.execution.CodegenSupport.produce$(WholeStageCodegenExec.scala:93)
	at org.apache.spark.sql.execution.ColumnarToRowExec.produce(Columnar.scala:71)
	at org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:55)
	at org.apache.spark.sql.execution.CodegenSupport.$anonfun$produce$1(WholeStageCodegenExec.scala:98)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:269)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:265)
	at org.apache.spark.sql.execution.CodegenSupport.produce(WholeStageCodegenExec.scala:93)
	at org.apache.spark.sql.execution.CodegenSupport.produce$(WholeStageCodegenExec.scala:93)
	at org.apache.spark.sql.execution.ProjectExec.produce(basicPhysicalOperators.scala:45)
	at org.apache.spark.sql.execution.WholeStageCodegenExec.doCodeGen(WholeStageCodegenExec.scala:661)
	at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:724)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:225)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:269)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:265)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:221)
	at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:97)
	at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:108)
	at &lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Note that when I set the column value to a fixed string value or space like:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;    output = output.withColumn('Indicator_Latest_Value_Date', F.lit(''))  &lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I don't get any error by getting query. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt; What is the possible reason for this error and how I can solve it? Thanks for your help. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 19 Dec 2022 22:09:59 GMT</pubDate>
    <dc:creator>Mado</dc:creator>
    <dc:date>2022-12-19T22:09:59Z</dc:date>
    <item>
      <title>Error when query a table created by DLT pipeline; "Couldn't find value of a column"</title>
      <link>https://community.databricks.com/t5/data-engineering/error-when-query-a-table-created-by-dlt-pipeline-quot-couldn-t/m-p/15860#M10133</link>
      <description>&lt;P&gt;Hi, &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I create a table using DLT pipeline (triggered once). In the ETL process, I add a new column to the table with Null values by:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;output = output.withColumn('Indicator_Latest_Value_Date', F.lit(None))&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Pipeline works and I don't get any error. But, when I get query from the resulting table by:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;stage_df = spark.sql('select * from marketing.stage_macroeconomics_manual_indicators')
display(stage_df)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I get the following error (I couldn't copy the whole message, too long):&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;java.lang.IllegalStateException: Couldn't find Indicator_Latest_Value_Date#9439 in [Country#9420,Indicator_Name#9421,Indicator_Source#9422,Indicator_Source_URL#9423,Indicator_Unit#9424,Indicator_Category_Group#9425,Indicator_Adjustment#9426,Indicator_Frequency#9427,Calendar_Year#9428,Month_Number#9429,Indicator_Date#9430,Indicator_Value#9431,Excel_Input_Column_Names#9432,Excel_Input_File#9433,Excel_Input_Sheet#9434,Excel_Ingest_Datetime#9435,ETL_InputFile#9436,ETL_LoadDate#9437,ETL_SourceFile_Date#9438,Source_System#9440,Indicator_Title#9441]&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
&amp;lt;command-1891366018233962&amp;gt; in &amp;lt;cell line: 2&amp;gt;()
      1 stage_df = spark.sql('select * from marketing.stage_macroeconomics_manual_indicators')
----&amp;gt; 2 display(stage_df)
&amp;nbsp;
/databricks/python_shell/dbruntime/display.py in display(self, input, *args, **kwargs)
     81                     raise Exception('Triggers can only be set for streaming queries.')
     82 
---&amp;gt; 83                 self.add_custom_display_data("table", input._jdf)
     84 
     85         elif isinstance(input, list):
&amp;nbsp;
/databricks/python_shell/dbruntime/display.py in add_custom_display_data(self, data_type, data)
     34     def add_custom_display_data(self, data_type, data):
     35         custom_display_key = str(uuid.uuid4())
---&amp;gt; 36         return_code = self.entry_point.addCustomDisplayData(custom_display_key, data_type, data)
     37         ip_display({
     38             "application/vnd.databricks.v1+display": custom_display_key,
&amp;nbsp;
/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1319 
   1320         answer = self.gateway_client.send_command(command)
-&amp;gt; 1321         return_value = get_return_value(
   1322             answer, self.gateway_client, self.target_id, self.name)
   1323 
&amp;nbsp;
/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
    194     def deco(*a: Any, **kw: Any) -&amp;gt; Any:
    195         try:
--&amp;gt; 196             return f(*a, **kw)
    197         except Py4JJavaError as e:
    198             converted = convert_exception(e.java_exception)
&amp;nbsp;
/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    324             value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325             if answer[1] == REFERENCE_TYPE:
--&amp;gt; 326                 raise Py4JJavaError(
    327                     "An error occurred while calling {0}{1}{2}.\n".
    328                     format(target_id, ".", name), value)
&amp;nbsp;
Py4JJavaError: An error occurred while calling t.addCustomDisplayData.
: java.lang.IllegalStateException: Couldn't find Indicator_Latest_Value_Date#9439 in [Country#9420,Indicator_Name#9421,Indicator_Source#9422,Indicator_Source_URL#9423,Indicator_Unit#9424,Indicator_Category_Group#9425,Indicator_Adjustment#9426,Indicator_Frequency#9427,Calendar_Year#9428,Month_Number#9429,Indicator_Date#9430,Indicator_Value#9431,Excel_Input_Column_Names#9432,Excel_Input_File#9433,Excel_Input_Sheet#9434,Excel_Ingest_Datetime#9435,ETL_InputFile#9436,ETL_LoadDate#9437,ETL_SourceFile_Date#9438,Source_System#9440,Indicator_Title#9441]
	at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)
	at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:99)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:456)
	at org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:73)
	at org.apache.spark.sql.catalyst.expressions.BindReferences$.$anonfun$bindReferences$1(BoundAttribute.scala:94)
	at scala.collection.immutable.List.map(List.scala:297)
	at org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReferences(BoundAttribute.scala:94)
	at org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:70)
	at org.apache.spark.sql.execution.CodegenSupport.consume(WholeStageCodegenExec.scala:197)
	at org.apache.spark.sql.execution.CodegenSupport.consume$(WholeStageCodegenExec.scala:152)
	at org.apache.spark.sql.execution.ColumnarToRowExec.consume(Columnar.scala:71)
	at org.apache.spark.sql.execution.ColumnarToRowExec.doProduce(Columnar.scala:202)
	at org.apache.spark.sql.execution.CodegenSupport.$anonfun$produce$1(WholeStageCodegenExec.scala:98)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:269)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:265)
	at org.apache.spark.sql.execution.CodegenSupport.produce(WholeStageCodegenExec.scala:93)
	at org.apache.spark.sql.execution.CodegenSupport.produce$(WholeStageCodegenExec.scala:93)
	at org.apache.spark.sql.execution.ColumnarToRowExec.produce(Columnar.scala:71)
	at org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:55)
	at org.apache.spark.sql.execution.CodegenSupport.$anonfun$produce$1(WholeStageCodegenExec.scala:98)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:269)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:265)
	at org.apache.spark.sql.execution.CodegenSupport.produce(WholeStageCodegenExec.scala:93)
	at org.apache.spark.sql.execution.CodegenSupport.produce$(WholeStageCodegenExec.scala:93)
	at org.apache.spark.sql.execution.ProjectExec.produce(basicPhysicalOperators.scala:45)
	at org.apache.spark.sql.execution.WholeStageCodegenExec.doCodeGen(WholeStageCodegenExec.scala:661)
	at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:724)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:225)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:269)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:265)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:221)
	at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:97)
	at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:108)
	at &lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Note that when I set the column value to a fixed string value or space like:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;    output = output.withColumn('Indicator_Latest_Value_Date', F.lit(''))  &lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I don't get any error by getting query. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt; What is the possible reason for this error and how I can solve it? Thanks for your help. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Dec 2022 22:09:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-when-query-a-table-created-by-dlt-pipeline-quot-couldn-t/m-p/15860#M10133</guid>
      <dc:creator>Mado</dc:creator>
      <dc:date>2022-12-19T22:09:59Z</dc:date>
    </item>
    <item>
      <title>Re: Error when query a table created by DLT pipeline; "Couldn't find value of a column"</title>
      <link>https://community.databricks.com/t5/data-engineering/error-when-query-a-table-created-by-dlt-pipeline-quot-couldn-t/m-p/15861#M10134</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Try converting the None of the output line this :&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;output = output.withColumn('Indicator_Latest_Value_Date', F.lit(None).cast("String"))&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 13 Jun 2023 12:17:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-when-query-a-table-created-by-dlt-pipeline-quot-couldn-t/m-p/15861#M10134</guid>
      <dc:creator>josruiz22</dc:creator>
      <dc:date>2023-06-13T12:17:10Z</dc:date>
    </item>
  </channel>
</rss>

