Hello everyone ,
Here are the sequence of steps i have followed:
1. I have used petastorm to convert the spark dataframe to tf.dataset
import numpy as np
# Read the Petastorm dataset and convert it to TensorFlow Dataset
with converter.make_tf_dataset() as tf_dataset:
# Batch and shuffle the TensorFlow Dataset
tf_dataset = tf_dataset.shuffle(buffer_size=1024).batch(32)
# Convert TensorFlow Dataset to NumPy array
numpy_data = []
for batch in tf_dataset:
numpy_data.append(batch.numpy())
# Concatenate all batches into a single NumPy array
numpy_data = np.concatenate(numpy_data, axis=0)
print(numpy_data)
But with this code i get an error saying that
'inferred_schema_view' object has no attribute numpy
My goal is to convert the tf.data to numpy tuples.