cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Nastasia
by New Contributor II
  • 3733 Views
  • 3 replies
  • 1 kudos

Why is Spark creating multiple jobs for one action?

I noticed that when launching this bunch of code with only one action, I have three jobs that are launched.from pyspark.sql import DataFrame from pyspark.sql.types import StructType, StructField, StringType from pyspark.sql.functions import avgdata:...

https___i.stack.imgur.com_xfYDe.png LTHBM DdfHN
  • 3733 Views
  • 3 replies
  • 1 kudos
Latest Reply
RKNutalapati
Valued Contributor
  • 1 kudos

The above code will create two jobs.JOB-1. dataframe: DataFrame = spark.createDataFrame(data=data,schema=schema)The createDataFrame function is responsible for inferring the schema from the provided data or using the specified schema.Depending on the...

  • 1 kudos
2 More Replies
Charmin
by New Contributor
  • 887 Views
  • 1 replies
  • 0 kudos

Why 'runCommand' action does NOT show up in databricksNotebook audit log table?

I understand databricks can send diagnostic/audit logs to log analytics in azure. There is a standard 'DatabricksNotebook' table that provides audit log for notebook actions. In this table there is an action called 'runCommand' but this does not show...

  • 887 Views
  • 1 replies
  • 0 kudos
Latest Reply
rsamant07
New Contributor III
  • 0 kudos

Hi @Charmin patel​  , you need to enable verbose audit logging in workspace setting for runCommand to appear in the audit logs

  • 0 kudos
Mradul07
by New Contributor II
  • 701 Views
  • 0 replies
  • 1 kudos

Spark behavior while dealing with Actions & Transformations ?

Hi, My question is - what happens to the initial RDD after the action is performed on it. Does it disappear or stays in the memory or does it needs to be explicitly cached() if we want to use it again.For eg : If I execute this in a sequence :df_outp...

  • 701 Views
  • 0 replies
  • 1 kudos
aladda
by Honored Contributor II
  • 3053 Views
  • 1 replies
  • 0 kudos
  • 3053 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

Spark's execution engine is designed to be Lazy. In effect, you're first up build up your analytics/data processing request through a series of Transformations which are then executed by an ActionTransformations are kind of operations which will tran...

  • 0 kudos
Labels