cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Convert the tensorflow datatset to numpy tuples

javeed
New Contributor

Hello everyone ,

Here are the sequence of steps i have followed:
1. I have used petastorm to convert the spark dataframe to tf.dataset

import numpy as np
# Read the Petastorm dataset and convert it to TensorFlow Dataset
with converter.make_tf_dataset() as tf_dataset:
# Batch and shuffle the TensorFlow Dataset
tf_dataset = tf_dataset.shuffle(buffer_size=1024).batch(32)

# Convert TensorFlow Dataset to NumPy array
numpy_data = []
for batch in tf_dataset:
numpy_data.append(batch.numpy())

# Concatenate all batches into a single NumPy array
numpy_data = np.concatenate(numpy_data, axis=0)
print(numpy_data)

But with this code i get an error saying that 
'inferred_schema_view' object has no attribute numpy
My goal is to convert the tf.data to numpy tuples.




1 REPLY 1

Ismael-K
Databricks Employee
Databricks Employee

The error occurs because make_tf_dataset() returns an inferred_schema_view object, which is a Petastorm wrapper representing the dataset schema. This object does not have a .numpy() attribute, so calling batch.numpy() will throw the AttributeError.  Reference link


Instead of calling .numpy() directly on batch, you can try to iterate over its elements and convert each individual tensor using .numpy().

 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now