<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Why does @dlt.table from a table give different results than from a view? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/why-does-dlt-table-from-a-table-give-different-results-than-from/m-p/33246#M24287</link>
    <description>&lt;P&gt;I have some data in silver that I read in as a view using the __apply_changes function on. I create a table based on this, and I then want to create my gold-table, after doing a .groupBy() and .pivot(). The transformations I do in the gold-table aren't giving my any error messages, they simply aren't being done. If I create my gold-table based on a view instead of a table, the transformations are done. See bottom for fabricated example.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Here's my code:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;import dlt
from pyspark.sql.functions import col, concat_ws, explode, split, explode_outer, posexplode, concat, lit, expr, first, create_map
&amp;nbsp;
@dlt.view
def silver_messages():
    """Silver messages table"""
    return spark.readStream.table("eqs_cloud.__apply_changes_storage_silver_messages")
&amp;nbsp;
@dlt.view
def groups_hierarchy_vw():
    """Get all groups' hierarchy"""
    
    assigned = dlt.read_stream("silver_messages").select("assignedToUnit.*")
    from_unit = dlt.read_stream("silver_messages").select("fromUnit.*")
    to_unit = dlt.read_stream("silver_messages").select("toUnit.*")
    
    groups_hierarchy_vw = assigned.unionByName(from_unit).unionByName(to_unit)
&amp;nbsp;
    return groups_hierarchy_vw
&amp;nbsp;
@dlt.table
def groups_hierarchy_tbl():
    return dlt.read_stream("groups_hierarchy_vw")
&amp;nbsp;
@dlt.table
def groups_hierarchy_tbl_pivoted():
    return(dlt.read("groups_hierarchy_tbl")
        .select(
            "id",
            "name",
            split("path","/").alias("groups_in_path"),
            posexplode(split("path","/")).alias("pos","value"))
        .select(
            "id",
            "name",
            concat(lit("group"),"pos").alias("group_name"),
            expr("groups_in_path[pos]").alias("val"))
        .groupBy("id", "name")
        .pivot("group_name")
        .agg(first("val"))
    )&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is there a way I can use the groups_hierarchy_vw view directly in the gold-table at the bottom? If I enter it with dlt.read_stream("groups_hierarchy_vw") I get an error "pyspark.sql.utils.AnalysisException: Queries with streaming sources must be executed with writeStream.start();"&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Here's a fabricated example you can try out. &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;import pandas as pd
from pyspark.sql.functions import col, concat_ws, explode, split, explode_outer, posexplode, concat, lit, expr, first, create_map
&amp;nbsp;
pdf = pd.DataFrame({"id": ["1001", "1002", "1003"],
                    "name": ["Dep1", "Dep2", "Dep3"],
                    "path": ["1001", "1001/1002", "1001/1002/1003"]
})
                
df = spark.createDataFrame(pdf)
df.write.mode('overwrite').saveAsTable('fabricated_testtable')
&amp;nbsp;
@dlt.view
def fabricated_hierarchy_vw():
    return spark.read.table('fabricated_testtable')
&amp;nbsp;
@dlt.table
def fabricated_tbl_pivoted():
    return (dlt.read('fabricated_hierarchy_vw')
        .select(
            "id",
            "name",
            split("path","/").alias("groups_in_path"),
            posexplode(split("path","/")).alias("pos","value"))
        .select(
            "id",
            "name",
            concat(lit("group"),"pos").alias("group_name"),
            expr("groups_in_path[pos]").alias("val"))
        .groupBy("id", "name")
        .pivot("group_name")
        .agg(first("val"))        
    )&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;This gives the expected outcome:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1548i814C3CBF9A1143BB/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But if you change line 12 to @dlt.table, the transformations are not performed and the outcome is this:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1574i69693D7E47BC6B53/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 29 Aug 2022 15:49:25 GMT</pubDate>
    <dc:creator>PrebenOlsen</dc:creator>
    <dc:date>2022-08-29T15:49:25Z</dc:date>
    <item>
      <title>Why does @dlt.table from a table give different results than from a view?</title>
      <link>https://community.databricks.com/t5/data-engineering/why-does-dlt-table-from-a-table-give-different-results-than-from/m-p/33246#M24287</link>
      <description>&lt;P&gt;I have some data in silver that I read in as a view using the __apply_changes function on. I create a table based on this, and I then want to create my gold-table, after doing a .groupBy() and .pivot(). The transformations I do in the gold-table aren't giving my any error messages, they simply aren't being done. If I create my gold-table based on a view instead of a table, the transformations are done. See bottom for fabricated example.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Here's my code:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;import dlt
from pyspark.sql.functions import col, concat_ws, explode, split, explode_outer, posexplode, concat, lit, expr, first, create_map
&amp;nbsp;
@dlt.view
def silver_messages():
    """Silver messages table"""
    return spark.readStream.table("eqs_cloud.__apply_changes_storage_silver_messages")
&amp;nbsp;
@dlt.view
def groups_hierarchy_vw():
    """Get all groups' hierarchy"""
    
    assigned = dlt.read_stream("silver_messages").select("assignedToUnit.*")
    from_unit = dlt.read_stream("silver_messages").select("fromUnit.*")
    to_unit = dlt.read_stream("silver_messages").select("toUnit.*")
    
    groups_hierarchy_vw = assigned.unionByName(from_unit).unionByName(to_unit)
&amp;nbsp;
    return groups_hierarchy_vw
&amp;nbsp;
@dlt.table
def groups_hierarchy_tbl():
    return dlt.read_stream("groups_hierarchy_vw")
&amp;nbsp;
@dlt.table
def groups_hierarchy_tbl_pivoted():
    return(dlt.read("groups_hierarchy_tbl")
        .select(
            "id",
            "name",
            split("path","/").alias("groups_in_path"),
            posexplode(split("path","/")).alias("pos","value"))
        .select(
            "id",
            "name",
            concat(lit("group"),"pos").alias("group_name"),
            expr("groups_in_path[pos]").alias("val"))
        .groupBy("id", "name")
        .pivot("group_name")
        .agg(first("val"))
    )&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is there a way I can use the groups_hierarchy_vw view directly in the gold-table at the bottom? If I enter it with dlt.read_stream("groups_hierarchy_vw") I get an error "pyspark.sql.utils.AnalysisException: Queries with streaming sources must be executed with writeStream.start();"&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Here's a fabricated example you can try out. &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;import pandas as pd
from pyspark.sql.functions import col, concat_ws, explode, split, explode_outer, posexplode, concat, lit, expr, first, create_map
&amp;nbsp;
pdf = pd.DataFrame({"id": ["1001", "1002", "1003"],
                    "name": ["Dep1", "Dep2", "Dep3"],
                    "path": ["1001", "1001/1002", "1001/1002/1003"]
})
                
df = spark.createDataFrame(pdf)
df.write.mode('overwrite').saveAsTable('fabricated_testtable')
&amp;nbsp;
@dlt.view
def fabricated_hierarchy_vw():
    return spark.read.table('fabricated_testtable')
&amp;nbsp;
@dlt.table
def fabricated_tbl_pivoted():
    return (dlt.read('fabricated_hierarchy_vw')
        .select(
            "id",
            "name",
            split("path","/").alias("groups_in_path"),
            posexplode(split("path","/")).alias("pos","value"))
        .select(
            "id",
            "name",
            concat(lit("group"),"pos").alias("group_name"),
            expr("groups_in_path[pos]").alias("val"))
        .groupBy("id", "name")
        .pivot("group_name")
        .agg(first("val"))        
    )&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;This gives the expected outcome:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1548i814C3CBF9A1143BB/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But if you change line 12 to @dlt.table, the transformations are not performed and the outcome is this:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1574i69693D7E47BC6B53/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 29 Aug 2022 15:49:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/why-does-dlt-table-from-a-table-give-different-results-than-from/m-p/33246#M24287</guid>
      <dc:creator>PrebenOlsen</dc:creator>
      <dc:date>2022-08-29T15:49:25Z</dc:date>
    </item>
    <item>
      <title>Re: Why does @dlt.table from a table give different results than from a view?</title>
      <link>https://community.databricks.com/t5/data-engineering/why-does-dlt-table-from-a-table-give-different-results-than-from/m-p/33247#M24288</link>
      <description>&lt;P&gt;I have found a temporary solution to solve this. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The .pivot("columnName") should automatically grab all the values it can find, but for some reason it does not. I need to specify the values, using&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;.pivot("group_name", "group0", "group1", "group2"...) &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;or &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;.pivot("group_name", ["group{}".format(i) for i in range(8)])&lt;/P&gt;</description>
      <pubDate>Tue, 30 Aug 2022 10:44:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/why-does-dlt-table-from-a-table-give-different-results-than-from/m-p/33247#M24288</guid>
      <dc:creator>PrebenOlsen</dc:creator>
      <dc:date>2022-08-30T10:44:51Z</dc:date>
    </item>
  </channel>
</rss>

