Databricks

NathanLaw · ‎05-19-2022

We are converting Pyspark dataframe to Tensorflow using PetaStorm and have encountered a “data adapter” error. What do you recommend for diagnosing and fixing this error?

https://docs.microsoft.com/en-us/azure/databricks/applications/machine-learning/load-data/petastorm

https://docs.microsoft.com/en-us/azure/databricks/_static/notebooks/deep-learning/petastorm-spark-co...

Thanks for help

Kaniz · ‎06-02-2022

Hi @Nathan Law , Did you already check these requirements?

Requirements

Databricks Runtime 7.3 LTS ML or above. On Databricks Runtime 6.x ML, you need to install petastorm==0.9.0 and pyarrow==0.15.0 on the cluster.
Node type: one driver and two workers. Databricks recommends using GPU instances.

This notebook demonstrates the following workflow on Databricks:

Load data using Spark.
Convert the Spark DataFrame to a TensorFlow Dataset using petastorm spark_dataset_converter
Feed the data into a single-node TensorFlow model for training.
Feed the data into a distributed hyperparameter tuning function.
Feed the data into a distributed TensorFlow model for training.

The example in this notebook is based on the transfer learning tutorial from TensorFlow. It applies the pretrained MobileNetV2 model to the flo/were data set.

Anonymous · ‎06-06-2022

Hi @Nathan Law following up did you get a chance to check @Kaniz Fatma 's previous comments ?

Kaniz · ‎06-06-2022

Hi @Nathan Law, We haven’t heard from you on the last response from me, and I was checking back to see if you have a resolution yet. If you have any solution, please do share that with the community as it can be helpful to others. Otherwise, we will respond with more details and try to help.

NathanLaw · ‎06-06-2022

Hi,

From the Petastorm example:

# Make sure the number of partitions is at least the number of workers which is required for distributed training.

I am testing an recommendation to not use Autoscaling. I'll report back with findings.

Nathan

Kaniz · ‎06-06-2022

@Nathan Law , Please don't forget to click on the "Select as Best" option whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer.

Anonymous · ‎07-19-2022

Hey there @Nathan Law

Hope all is well!

Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? It would be really helpful for the other members too.

We'd love to hear from you.

Cheers!