<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Convert the tensorflow datatset to numpy tuples in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/convert-the-tensorflow-datatset-to-numpy-tuples/m-p/97838#M3781</link>
    <description>&lt;P&gt;Hello everyone ,&lt;BR /&gt;&lt;BR /&gt;Here are the sequence of steps i have followed:&lt;BR /&gt;1. I have used petastorm to convert the spark dataframe to tf.dataset&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;FONT color="#993300"&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; numpy &lt;/SPAN&gt;&lt;SPAN&gt;as&lt;/SPAN&gt;&lt;SPAN&gt; np&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT color="#993300"&gt;&lt;SPAN&gt;# Read the Petastorm dataset and convert it to TensorFlow Dataset&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT color="#993300"&gt;&lt;SPAN&gt;with&lt;/SPAN&gt;&lt;SPAN&gt; converter.&lt;/SPAN&gt;&lt;SPAN&gt;make_tf_dataset&lt;/SPAN&gt;&lt;SPAN&gt;() &lt;/SPAN&gt;&lt;SPAN&gt;as&lt;/SPAN&gt;&lt;SPAN&gt; tf_dataset:&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT color="#993300"&gt;&lt;SPAN&gt;# Batch and shuffle the TensorFlow Dataset&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT color="#993300"&gt;&lt;SPAN&gt;tf_dataset &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; tf_dataset.&lt;/SPAN&gt;&lt;SPAN&gt;shuffle&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;buffer_size&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;1024&lt;/SPAN&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;SPAN&gt;batch&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;32&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;FONT color="#993300"&gt;&lt;SPAN&gt;# Convert TensorFlow Dataset to NumPy array&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT color="#993300"&gt;&lt;SPAN&gt;numpy_data &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; []&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT color="#993300"&gt;&lt;SPAN&gt;for&lt;/SPAN&gt;&lt;SPAN&gt; batch &lt;/SPAN&gt;&lt;SPAN&gt;in&lt;/SPAN&gt;&lt;SPAN&gt; tf_dataset:&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT color="#993300"&gt;&lt;SPAN&gt;numpy_data.&lt;/SPAN&gt;&lt;SPAN&gt;append&lt;/SPAN&gt;&lt;SPAN&gt;(batch.&lt;/SPAN&gt;&lt;SPAN&gt;numpy&lt;/SPAN&gt;&lt;SPAN&gt;())&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;FONT color="#993300"&gt;&lt;SPAN&gt;# Concatenate all batches into a single NumPy array&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT color="#993300"&gt;&lt;SPAN&gt;numpy_data &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; np.&lt;/SPAN&gt;&lt;SPAN&gt;concatenate&lt;/SPAN&gt;&lt;SPAN&gt;(numpy_data, &lt;/SPAN&gt;&lt;SPAN&gt;axis&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;0&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT color="#993300"&gt;&lt;SPAN&gt;print&lt;/SPAN&gt;&lt;SPAN&gt;(numpy_data)&lt;BR /&gt;&lt;BR /&gt;&lt;FONT color="#000000"&gt;But with this code i get an error saying that&amp;nbsp;&lt;BR /&gt;'inferred_schema_view' object has no attribute numpy&lt;BR /&gt;My goal is to convert the tf.data to numpy tuples.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Wed, 06 Nov 2024 06:00:39 GMT</pubDate>
    <dc:creator>javeed</dc:creator>
    <dc:date>2024-11-06T06:00:39Z</dc:date>
    <item>
      <title>Convert the tensorflow datatset to numpy tuples</title>
      <link>https://community.databricks.com/t5/machine-learning/convert-the-tensorflow-datatset-to-numpy-tuples/m-p/97838#M3781</link>
      <description>&lt;P&gt;Hello everyone ,&lt;BR /&gt;&lt;BR /&gt;Here are the sequence of steps i have followed:&lt;BR /&gt;1. I have used petastorm to convert the spark dataframe to tf.dataset&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;FONT color="#993300"&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; numpy &lt;/SPAN&gt;&lt;SPAN&gt;as&lt;/SPAN&gt;&lt;SPAN&gt; np&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT color="#993300"&gt;&lt;SPAN&gt;# Read the Petastorm dataset and convert it to TensorFlow Dataset&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT color="#993300"&gt;&lt;SPAN&gt;with&lt;/SPAN&gt;&lt;SPAN&gt; converter.&lt;/SPAN&gt;&lt;SPAN&gt;make_tf_dataset&lt;/SPAN&gt;&lt;SPAN&gt;() &lt;/SPAN&gt;&lt;SPAN&gt;as&lt;/SPAN&gt;&lt;SPAN&gt; tf_dataset:&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT color="#993300"&gt;&lt;SPAN&gt;# Batch and shuffle the TensorFlow Dataset&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT color="#993300"&gt;&lt;SPAN&gt;tf_dataset &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; tf_dataset.&lt;/SPAN&gt;&lt;SPAN&gt;shuffle&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;buffer_size&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;1024&lt;/SPAN&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;SPAN&gt;batch&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;32&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;FONT color="#993300"&gt;&lt;SPAN&gt;# Convert TensorFlow Dataset to NumPy array&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT color="#993300"&gt;&lt;SPAN&gt;numpy_data &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; []&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT color="#993300"&gt;&lt;SPAN&gt;for&lt;/SPAN&gt;&lt;SPAN&gt; batch &lt;/SPAN&gt;&lt;SPAN&gt;in&lt;/SPAN&gt;&lt;SPAN&gt; tf_dataset:&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT color="#993300"&gt;&lt;SPAN&gt;numpy_data.&lt;/SPAN&gt;&lt;SPAN&gt;append&lt;/SPAN&gt;&lt;SPAN&gt;(batch.&lt;/SPAN&gt;&lt;SPAN&gt;numpy&lt;/SPAN&gt;&lt;SPAN&gt;())&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;FONT color="#993300"&gt;&lt;SPAN&gt;# Concatenate all batches into a single NumPy array&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT color="#993300"&gt;&lt;SPAN&gt;numpy_data &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; np.&lt;/SPAN&gt;&lt;SPAN&gt;concatenate&lt;/SPAN&gt;&lt;SPAN&gt;(numpy_data, &lt;/SPAN&gt;&lt;SPAN&gt;axis&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;0&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT color="#993300"&gt;&lt;SPAN&gt;print&lt;/SPAN&gt;&lt;SPAN&gt;(numpy_data)&lt;BR /&gt;&lt;BR /&gt;&lt;FONT color="#000000"&gt;But with this code i get an error saying that&amp;nbsp;&lt;BR /&gt;'inferred_schema_view' object has no attribute numpy&lt;BR /&gt;My goal is to convert the tf.data to numpy tuples.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 06 Nov 2024 06:00:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/convert-the-tensorflow-datatset-to-numpy-tuples/m-p/97838#M3781</guid>
      <dc:creator>javeed</dc:creator>
      <dc:date>2024-11-06T06:00:39Z</dc:date>
    </item>
    <item>
      <title>Re: Convert the tensorflow datatset to numpy tuples</title>
      <link>https://community.databricks.com/t5/machine-learning/convert-the-tensorflow-datatset-to-numpy-tuples/m-p/112825#M3995</link>
      <description>&lt;P&gt;The error occurs because &lt;STRONG&gt;&lt;FONT face="andale mono,times"&gt;make_tf_dataset()&lt;/FONT&gt;&lt;/STRONG&gt; returns an &lt;FONT face="andale mono,times"&gt;inferred_schema_view&lt;/FONT&gt; object, which is a Petastorm wrapper representing the dataset schema. This object does not have a &lt;STRONG&gt;&lt;FONT face="andale mono,times"&gt;.numpy()&lt;/FONT&gt;&lt;/STRONG&gt; attribute, so calling &lt;STRONG&gt;&lt;FONT face="andale mono,times"&gt;batch.numpy()&lt;/FONT&gt;&lt;/STRONG&gt; will throw the AttributeError.&amp;nbsp;&amp;nbsp;&lt;A href="https://petastorm.readthedocs.io/en/latest/api.html#petastorm.spark.spark_dataset_converter.SparkDatasetConverter.make_tf_dataset" target="_self"&gt;Reference link&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Instead of calling &lt;FONT face="andale mono,times"&gt;.&lt;STRONG&gt;numpy()&lt;/STRONG&gt;&lt;/FONT&gt; directly on batch, you can try to iterate over its elements and convert each individual tensor using &lt;FONT face="andale mono,times"&gt;&lt;STRONG&gt;.numpy().&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 17 Mar 2025 16:46:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/convert-the-tensorflow-datatset-to-numpy-tuples/m-p/112825#M3995</guid>
      <dc:creator>Ismael-K</dc:creator>
      <dc:date>2025-03-17T16:46:16Z</dc:date>
    </item>
  </channel>
</rss>

