cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ae20cg
by New Contributor III
  • 17275 Views
  • 17 replies
  • 12 kudos

How to instantiate Databricks spark context in a python script?

I want to run a block of code in a script and not in a notebook on databricks, however I cannot properly instantiate the spark context without some error.I have tried ` SparkContext.getOrCreate()`, but this does not work.Is there a simple way to do t...

  • 17275 Views
  • 17 replies
  • 12 kudos
Latest Reply
ayush007
New Contributor II
  • 12 kudos

Is there some solution for this.We got struck where a cluster having unity catalog is not able to get spark context.This is not allowing to use distributed nature of spark in databricks.

  • 12 kudos
16 More Replies
del1000
by New Contributor III
  • 826 Views
  • 0 replies
  • 0 kudos

Problem with sparkContext.parallelize and volatile functions?

I have a code:from time import sleep from random import random from operator import add   def f(a: int) -> float: sleep(0.1) return random() rdd1 = sc.parallelize(range(20), 2) rdd2 = sc.parallelize(range(20), 2) rdd3 = sc.parallelize(rang...

  • 826 Views
  • 0 replies
  • 0 kudos
Fed
by New Contributor III
  • 7959 Views
  • 1 replies
  • 0 kudos

Setting checkpoint directory for checkpointInterval argument of estimators in pyspark.ml

Tree-based estimators in pyspark.ml have an argument called checkpointIntervalcheckpointInterval = Param(parent='undefined', name='checkpointInterval', doc='set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will ...

  • 7959 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Federico Trifoglio​ :If sc.getCheckpointDir() returns None, it means that no checkpoint directory is set in the SparkContext. In this case, the checkpointInterval argument will indeed be ignored. To set a checkpoint directory, you can use the SparkC...

  • 0 kudos
KateK
by New Contributor II
  • 2348 Views
  • 2 replies
  • 1 kudos

How do you correctly access the spark context in DLT pipelines?

I have some code that uses RDDs, and the sc.parallelize() and rdd.toDF() methods to get a dataframe back out. The code works in a regular notebook (and if I run the notebook as a job) but fails if I do the same thing in a DLT pipeline. The error mess...

  • 2348 Views
  • 2 replies
  • 1 kudos
Latest Reply
KateK
New Contributor II
  • 1 kudos

Thanks for your help Alex, I ended up re-writing my code with spark UDFs -- maybe there is a better solution with only the Dataframe API but I couldn't find it. To summarize my problem: I was trying to un-nest a large json blob (the fake data in my f...

  • 1 kudos
1 More Replies
cfregly
by Contributor
  • 8131 Views
  • 5 replies
  • 0 kudos
  • 8131 Views
  • 5 replies
  • 0 kudos
Latest Reply
MatthewValenti
New Contributor II
  • 0 kudos

This is an old post, however, is this still accurate for the latest version of Databricks in 2019? If so, how to approach the following?1. Connect to many MongoDBs.2. Connect to MongoDB when connection string information is dynamic (i.e. stored in s...

  • 0 kudos
4 More Replies
lau_thiamkok
by New Contributor II
  • 14828 Views
  • 5 replies
  • 0 kudos

Spark + Python - Java gateway process exited before sending the driver its port number?

Why do I get this error on my browser screen, <type 'exceptions.Exception'>: Java gateway process exited before sending the driver its port number args = ('Java gateway process exited before sending the driver its port number',) message = 'Java gat...

  • 14828 Views
  • 5 replies
  • 0 kudos
Latest Reply
EricaLi
New Contributor II
  • 0 kudos

I'm facing the same problem, does anybody know how to connect Spark in Ipython notebook? The issue I created, https://github.com/jupyter/notebook/issues/743

  • 0 kudos
4 More Replies
Labels