cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to use SparkNLP library and JohnSnowLabs maven coordinates in cluster which is not connected to internet

ssy
New Contributor II

Hi,

I am trying SparkNLP library for the first time. The cluster I'm using is corporate and cannot be connected to internet. I can only download packages that are provided to us or by using a jar file.

I've three questions:

  1. What jar files do I need to install SparkNLP library for NLP work. I will be needing BERT transformers and encoders as well as other packages required for NER work using SparkNLP library.
  2. How can I add the proper johnsnowlabs maven coordinates and jar file to my cluster when it's not connected to internet
  3. How can I reference these installed libraries in my notebook that is running on the cluster with the packages installed

Thanks!

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @Samy Syed​,,

Spark NLP library and all the pretrained models/pipelines can be used entirely offline without Internet access. Suppose you are behind a proxy or a firewall with no access to the Maven repository (to download packages) or/and no access to S3 (to download models and pipelines automatically). In that case, you can simply follow the instructions to have Spark NLP without any limitations offline:

  • Instead of using the Maven package, you need to load the F/at JAR
  • Instead of using PretrainedPipeline for pretrained pipelines or the .pretrained() function to download pretrained models, you must manually download your pipeline/model from Models Hub, extract it, and load it.

Example of SparkSession with F/at JAR to have Spark NLP offline:

spark = SparkSession.builder \
    .appName("Spark NLP")\
    .master("local[*]")\
    .config("spark.driver.memory","16G")\
    .config("spark.driver.maxResultSize", "0") \
    .config("spark.kryoserializer.buffer.max", "2000M")\
    .config("spark.jars", "/tmp/spark-nlp-assembly-4.3.0.jar")\
    .getOrCreate()
  • You can download provided F/at JARs from each release note; please pay attention to pick the one that suits your environment depending on the device (CPU/GPU) and Apache Spark™ version (3. x)
  • If you are local, you can load the F/at JAR from your local FileSystem; however, if you are in a cluster setup, you need to put the F/at JAR on a distributed FileSystem such as HDFS, DBFS, S3, etc. (i.e., hdfs:///tmp/spark-nlp-assembly-4.3.0.jar)

Example of using pretrained Models and Pipelines offline:

# instead of using pretrained() for online:
# french_pos = PerceptronModel.pretrained("pos_ud_gsd", lang="fr")
# you download this model, extract it, and use .load
french_pos = PerceptronModel.load("/tmp/pos_ud_gsd_fr_2.0.2_2.4_1556531457346/")\
      .setInputCols("document", "token")\
      .setOutputCol("pos")
 
# example for pipelines
# instead of using PretrainedPipeline
# pipeline = PretrainedPipeline('explain_document_dl', lang='en')
# you download this pipeline, extract it, and use PipelineModel
PipelineModel.load("/tmp/explain_document_dl_en_2.0.2_2.4_1556530585689/")
  • Since you are manually downloading and loading models/pipelines, Spark NLP is not downloading the most recent and compatible models/pipelines for you. Choosing the proper model/pipeline is on you.
  • If you are local, you can load the model/pipeline from your local FileSystem; however, if you are in a cluster setup, you need to put the model/pipeline on a distributed FileSystem such as HDFS, DBFS, S3, etc. (i.e., hdfs:///tmp/explain_document_dl_en_2.0.2_2.4_1556530585689/)

SOURCE

Anonymous
Not applicable

Hi @Samy Syed​ 

Hope everything is going great.

Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you. 

Cheers!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.