cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Turn spark.sql query into scala function

TS
New Contributor III

Hello,

I'm learning Scala / Spark and try to understand what's wrong with my function:

I have a spark.sql query, stored in a variable:

val uViewName = spark.sql("""
SELECT
  v.Data_View_Name
FROM
  apoHierarchy AS h
  INNER JOIN apoView AS v ON h.View_Name = v.Context_View_Name
WHERE
  1 = 1
  AND h.Step_Number = 1
""")
uViewName.show()

This gives me the right result:

+-------------------------------+
|      Data_View_Name|
+-------------------------------+
|APO_S_VN_E1_APR_U...|
+-------------------------------+

I then want to pack it into a function like this:

def getUniverse(x: Int): String = {
  var sn = x
  var stepNumber = x.toString()
  var uViewName = spark.sql(s"""
SELECT
  v.Data_View_Name
FROM
  apoHierarchy AS h
  INNER JOIN apoView AS v ON h.View_Name = v.Context_View_Name
WHERE
  1 = 1
  AND h.Step_Number = $stepNumber
""")
  return uViewName
}

So, what I try to achieve is:

  • the StepNumber should be given as parameter
  • the query takes this parameter and queries the db accordingly

But, why do I get this error message?

command-4245198:14: error: type mismatch;
 found   : org.apache.spark.sql.DataFrame
    (which expands to)  org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
 required: String
  return uViewName
         ^

And when I change the return statement to:

return uViewName.toString()

it seems to work:

getUniverse: (x: Int)String

And, finally when running the def, shouldn't I get a string "APO_S_VN_E1_APR_U.."?

val x = getUniverse(1)
println(x)
[Data_View_Name: string]
x: String = [Data_View_Name: string]

Thanks for any advice / hint!

1 ACCEPTED SOLUTION

Accepted Solutions

Hubert-Dudek
Esteemed Contributor III

try add .first()(0) it will return only value from first row/column as currently you are returning Dataset:

  var uViewName = spark.sql(s"""
SELECT
  v.Data_View_Name
FROM
  apoHierarchy AS h
  INNER JOIN apoView AS v ON h.View_Name = v.Context_View_Name
WHERE
  1 = 1
  AND h.Step_Number = $stepNumber
""").first()(0)

View solution in original post

3 REPLIES 3

Hubert-Dudek
Esteemed Contributor III

try add .first()(0) it will return only value from first row/column as currently you are returning Dataset:

  var uViewName = spark.sql(s"""
SELECT
  v.Data_View_Name
FROM
  apoHierarchy AS h
  INNER JOIN apoView AS v ON h.View_Name = v.Context_View_Name
WHERE
  1 = 1
  AND h.Step_Number = $stepNumber
""").first()(0)

TS
New Contributor III

Yep, solved. Thank you!

(1) Spark Jobs
APR_S_VN_E1_APR_UNIVERSE
x: String = APR_S_VN_E1_APR_UNIVERSE

Hubert-Dudek
Esteemed Contributor III

Great to hear. If it is possible, please mark my answer as the Best one.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.