Databricks Community

llmnerd · ‎11-11-2024

Hi there,

I am trying to parellize a text extraction via the Databrick foundational model.

Any pointers to suggestions or examples are welcome

The code and error below.

model = "databricks-meta-llama-3-1-70b-instruct"
temperature=0.0
max_tokens=1024

schema_llm = StructType([
    StructField("contains_vulnerability", BooleanType(), True),
])

chat_model = ChatDatabricks(
            endpoint=model,
            temperature=temperature,
            max_tokens=max_tokens
        )

chain_llm: LLMChain = (chat_prompt | chat_model.with_structured_output(VulnerabilityReport))

@udf(returnType=schema_llm) 
def CheckContent(text:str): 
    out = chain_llm.invoke({"content":text})
    return (out["contains_vulnerability"])
    
expand_df = sample_df.withColumn("content_check", CheckContent("file_content"))
display(expand_df)<div><span>And I am getting a pickle error:<div> <li-code lang="markup">Traceback (most recent call last):
  File "/databricks/spark/python/pyspark/serializers.py", line 559, in dumps
    return cloudpickle.dumps(obj, pickle_protocol)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/databricks/spark/python/pyspark/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "/databricks/spark/python/pyspark/cloudpickle/cloudpickle_fast.py", line 632, in dump
    return Pickler.dump(self, obj)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/databricks/spark/python/pyspark/core/context.py", line 525, in __getnewargs__
    raise PySparkRuntimeError(
pyspark.errors.exceptions.base.PySparkRuntimeError: [CONTEXT_ONLY_VALID_ON_DRIVER] It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.

Databricks Community

UDF LLM DataBrick pickle error

Connect with Databricks Users in Your Area

Databricks Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Announcing the new Meta Llama 3.3 model on Databricks

Milestone: DatabricksTV Reaches 100 Videos!

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences

Databricks Community Champion - December 2024 - Sujesh Menon