cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Model Training Data Adapter Error.

NathanLaw
New Contributor III

We are converting Pyspark dataframe to Tensorflow using PetaStorm and have encountered a “data adapter” error. What do you recommend for diagnosing and fixing this error?

https://docs.microsoft.com/en-us/azure/databricks/applications/machine-learning/load-data/petastorm

https://docs.microsoft.com/en-us/azure/databricks/_static/notebooks/deep-learning/petastorm-spark-co...

DataAdpaterErrorClusterDataAdpaterError 

Thanks for help

8 REPLIES 8

Kaniz_Fatma
Community Manager
Community Manager

Hi @Nathan Law​ , Did you already check these requirements?

Requirements

  1. Databricks Runtime 7.3 LTS ML or above. On Databricks Runtime 6.x ML, you need to install petastorm==0.9.0 and pyarrow==0.15.0 on the cluster.
  2. Node type: one driver and two workers. Databricks recommends using GPU instances.

This notebook demonstrates the following workflow on Databricks:

  1. Load data using Spark.
  2. Convert the Spark DataFrame to a TensorFlow Dataset using petastorm spark_dataset_converter
  3. Feed the data into a single-node TensorFlow model for training.
  4. Feed the data into a distributed hyperparameter tuning function.
  5. Feed the data into a distributed TensorFlow model for training.

The example in this notebook is based on the transfer learning tutorial from TensorFlow. It applies the pretrained MobileNetV2 model to the flo/were data set.

Anonymous
Not applicable

Hi @Nathan Law​ following up did you get a chance to check @Kaniz Fatma​ 's previous comments ?

Kaniz_Fatma
Community Manager
Community Manager

Hi @Nathan Law​, We haven’t heard from you on the last response from me, and I was checking back to see if you have a resolution yet. If you have any solution, please do share that with the community as it can be helpful to others. Otherwise, we will respond with more details and try to help.

NathanLaw
New Contributor III

Hi,

From the Petastorm example:

# Make sure the number of partitions is at least the number of workers which is required for distributed training.

I am testing an recommendation to not use Autoscaling. I'll report back with findings.

  • Nathan

Kaniz_Fatma
Community Manager
Community Manager

@Nathan Law​ , Please don't forget to click on the "Select as Best" option whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer.

Anonymous
Not applicable

Hey there @Nathan Law​ 

Hope all is well!

Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? It would be really helpful for the other members too.

We'd love to hear from you.

Cheers!

NathanLaw
New Contributor III

Making progress but still working through issues. I'll post findings when completed.

Anonymous
Not applicable

Hey @Nathan Law​ 

Thank you so much for getting back to us. We will await your response.

We really appreciate your time.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group