cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Data_Engineer3
by Contributor III
  • 2270 Views
  • 2 replies
  • 1 kudos

Unable to access Scala and python variables in-between shells in same notebook.

I am facing issue in while accessing python data frame in Scala shell and vice versa. I am getting error variable not defined.

  • 2270 Views
  • 2 replies
  • 1 kudos
Latest Reply
tomasz
Databricks Employee
  • 1 kudos

The context is not shared between Scala and Python so you won't be able to access the same variables directly. However you can use createOrReplaceTempView to create a temporary view of your dataframe and read it in the other language with read_df = s...

  • 1 kudos
1 More Replies
Sunny
by New Contributor III
  • 7807 Views
  • 7 replies
  • 4 kudos

Resolved! Retrieve job id and run id from scala

I need to retrieve job id and run id of the job from a jar file in Scala.When I try to compile below code in IntelliJ, below error is shown.import com.databricks.dbutils_v1.DBUtilsHolder.dbutils   object MainSNL {   @throws(classOf[Exception]) de...

  • 7807 Views
  • 7 replies
  • 4 kudos
Latest Reply
Mohit_m
Valued Contributor II
  • 4 kudos

Maybe its worth going through the Task Parameter variables section of the below dochttps://docs.databricks.com/data-engineering/jobs/jobs.html#task-parameter-variables

  • 4 kudos
6 More Replies
abd
by Contributor
  • 17332 Views
  • 12 replies
  • 11 kudos

Resolved! Is there any difference between performance of Python and SQL ?

I read somewhere that Python code is converted to SQL at the end. So is it true or there is any difference in performance while working with Scala, Python or SQL ?

  • 17332 Views
  • 12 replies
  • 11 kudos
Latest Reply
Rheiman
Contributor II
  • 11 kudos

To add on the consideration of UDFs, try to consider using HOFs (Higher Order Functions) whenever possible first as there is a signifcant performance benefit as seen here.

  • 11 kudos
11 More Replies
Vibhor
by Contributor
  • 6428 Views
  • 5 replies
  • 2 kudos

Get current date as string in databricks using scala

I want to get current date in scala as a string for example today current date is 3rd jan, want to store it as a new variable dynamically as below, how to get it.val currdate : String = “20220103”when I am using val currdate = Calendar.getInstance.ge...

  • 6428 Views
  • 5 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hey @Vibhor Sethi​ Hope you are well!Thank you for posting your question and letting us know that you were able to resolve the issue. Would you be happy to mark it as the best solution? It would be really helpful for the other members too.Cheers!

  • 2 kudos
4 More Replies
LukaszJ
by Contributor III
  • 1846 Views
  • 2 replies
  • 1 kudos

Table access control cluster with R language

Hello,I want to have a high concurrency cluster with table access control and I want to use R language on it.I know that the documentation says that R and Scala is not available with table access control.But maybe you have some tricks or best practic...

  • 1846 Views
  • 2 replies
  • 1 kudos
Latest Reply
Aashita
Databricks Employee
  • 1 kudos

@Łukasz Jaremek​, Currently it is only available in Python and SQL.

  • 1 kudos
1 More Replies
p42af
by New Contributor
  • 5555 Views
  • 4 replies
  • 1 kudos

Resolved! rdd.foreachPartition() does nothing?

I expected the code below to print "hello" for each partition, and "world" for each record. But when I ran it the code ran but had no print outs of any kind. No errors either. What is happening here?%scala   val rdd = spark.sparkContext.parallelize(S...

  • 5555 Views
  • 4 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Is it lazy evaluated so you need to trigger action I guess

  • 1 kudos
3 More Replies
TS
by New Contributor III
  • 4160 Views
  • 3 replies
  • 3 kudos

Resolved! Turn spark.sql query into scala function

Hello,I'm learning Scala / Spark and try to understand what's wrong with my function:I have a spark.sql query, stored in a variable:val uViewName = spark.sql(""" SELECT v.Data_View_Name FROM apoHierarchy AS h INNER JOIN apoView AS v ON h.View_N...

  • 4160 Views
  • 3 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

try add .first()(0) it will return only value from first row/column as currently you are returning Dataset: var uViewName = spark.sql(s""" SELECT v.Data_View_Name FROM apoHierarchy AS h INNER JOIN apoView AS v ON h.View_Name = v.Context_View_N...

  • 3 kudos
2 More Replies
Databach
by New Contributor
  • 4152 Views
  • 0 replies
  • 0 kudos

How to resolve "java.lang.ClassNotFoundException: com.databricks.spark.util.RegexBasedAWSSecretKeyRedactor" when running Scala Spark project using databricks-connect ?

Currently I am learning how to use databricks-connect to develop Scala code using IDE (VS Code) locally. The set-up of the databricks-connect as described here https://docs.microsoft.com/en-us/azure/databricks/dev-tools/databricks-connect was succues...

image build.sbt
  • 4152 Views
  • 0 replies
  • 0 kudos
emanuele_maffeo
by New Contributor III
  • 3768 Views
  • 5 replies
  • 8 kudos

Resolved! Trigger.AvailableNow on scala - compile issue

Hi everybody,Trigger.AvailableNow is released within the databricks 10.1 runtime and we would like to use this new feature with autoloader.We write all our data pipeline in scala and our projects import spark as a provided dependency. If we try to sw...

  • 3768 Views
  • 5 replies
  • 8 kudos
Latest Reply
Anonymous
Not applicable
  • 8 kudos

You can switch to python. Depending on what you're doing and if you're using UDFs, there shouldn't be any difference at all in terms of performance.

  • 8 kudos
4 More Replies
lecardozo
by New Contributor II
  • 5549 Views
  • 5 replies
  • 1 kudos

Resolved! Problems with HiveMetastoreClient and internal Databricks Metastore.

I've been trying to use ​the HiveMetastoreClient class in Scala to extract some metadata from Databricks internal Metastore, without success. I'm currently using the 7.3 LTS runtime.​The error seems to be related to some kind of inconsistency between...

  • 5549 Views
  • 5 replies
  • 1 kudos
Latest Reply
lecardozo
New Contributor II
  • 1 kudos

Thanks for the reference, @Atanu Sarkar​ .​Seems a little odd to me that I'd need to change the internal Databricks Metastore table to add a column expected by the client default Scala client. I'm afraid this could cause issues with other users/jobs ...

  • 1 kudos
4 More Replies
sonali1996
by New Contributor
  • 2976 Views
  • 0 replies
  • 0 kudos

Multithreading in SCALA DATABRICKS

Hi Team, I was trying to call/run multiple notebooks in one notebook concurrent. But the caller notebooks are getting executing one by one whereas I need to run all the caller notebooks concurrently. I have also tried using Threading in Scala Databri...

  • 2976 Views
  • 0 replies
  • 0 kudos
MadelynM
by Databricks Employee
  • 862 Views
  • 0 replies
  • 1 kudos

vimeo.com

Auto Loader provides Python and Scala methods to ingest new data from a folder location into a Delta Lake table by using directory listing or file notifications. Here's a quick video (7:00) on how to use Auto Loader for Databricks on AWS with Databri...

  • 862 Views
  • 0 replies
  • 1 kudos
as999
by New Contributor III
  • 1741 Views
  • 3 replies
  • 1 kudos

python dataframe or hiveSql update based on predecessor value?

I have a million in rows that I need to update which looks for the highest count of the predecessor from the same source data and replaces the same value on a different row.  For example.Original DF.sno Object Name  shape  rating1  Fruit apple round ...

  • 1741 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

basically you have to create a dataframe (or use a window function, that will also work) which gives you the group combination with the most occurances. So a window/groupby on object, name, shape with a count().Then you have to determine which shape...

  • 1 kudos
2 More Replies
sarvesh
by Contributor III
  • 7482 Views
  • 9 replies
  • 8 kudos

Resolved! Getting Null values at the place of data which was removed manually from excel file( solved )

I was reading an excel file with one column,country india India india India indiadataframe i got from this data : df.show()+-------+ |country| +-------+ | india | | India | | india | | India | | india | +-------+In the next step i removed last value ...

  • 7482 Views
  • 9 replies
  • 8 kudos
Latest Reply
Anonymous
Not applicable
  • 8 kudos

@sarvesh singh​ - Thank you for letting us know. Would you be happy to mark the best answer so others can find the solution easily?

  • 8 kudos
8 More Replies
Labels