Convert the tensorflow datatset to numpy tuples

javeed — Wed, 06 Nov 2024 06:00:39 GMT

Hello everyone ,

Here are the sequence of steps i have followed:
1. I have used petastorm to convert the spark dataframe to tf.dataset

import numpy as np

# Read the Petastorm dataset and convert it to TensorFlow Dataset

with converter.make_tf_dataset() as tf_dataset:

# Batch and shuffle the TensorFlow Dataset

tf_dataset = tf_dataset.shuffle(buffer_size=1024).batch(32)

# Convert TensorFlow Dataset to NumPy array

numpy_data = []

for batch in tf_dataset:

numpy_data.append(batch.numpy())

# Concatenate all batches into a single NumPy array

numpy_data = np.concatenate(numpy_data, axis=0)

print(numpy_data)

But with this code i get an error saying that
'inferred_schema_view' object has no attribute numpy
My goal is to convert the tf.data to numpy tuples.

Re: Convert the tensorflow datatset to numpy tuples

Ismael-K — Mon, 17 Mar 2025 16:46:16 GMT

The error occurs because make_tf_dataset() returns an inferred_schema_view object, which is a Petastorm wrapper representing the dataset schema. This object does not have a .numpy() attribute, so calling batch.numpy() will throw the AttributeError. Reference link

Instead of calling .numpy() directly on batch, you can try to iterate over its elements and convert each individual tensor using .numpy().

topic Convert the tensorflow datatset to numpy tuples in Machine Learning

Convert the tensorflow datatset to numpy tuples

Re: Convert the tensorflow datatset to numpy tuples