cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Turn spark.sql query into scala function

TS
New Contributor III

Hello,

I'm learning Scala / Spark and try to understand what's wrong with my function:

I have a spark.sql query, stored in a variable:

val uViewName = spark.sql("""
SELECT
  v.Data_View_Name
FROM
  apoHierarchy AS h
  INNER JOIN apoView AS v ON h.View_Name = v.Context_View_Name
WHERE
  1 = 1
  AND h.Step_Number = 1
""")
uViewName.show()

This gives me the right result:

+-------------------------------+
|      Data_View_Name|
+-------------------------------+
|APO_S_VN_E1_APR_U...|
+-------------------------------+

I then want to pack it into a function like this:

def getUniverse(x: Int): String = {
  var sn = x
  var stepNumber = x.toString()
  var uViewName = spark.sql(s"""
SELECT
  v.Data_View_Name
FROM
  apoHierarchy AS h
  INNER JOIN apoView AS v ON h.View_Name = v.Context_View_Name
WHERE
  1 = 1
  AND h.Step_Number = $stepNumber
""")
  return uViewName
}

So, what I try to achieve is:

  • the StepNumber should be given as parameter
  • the query takes this parameter and queries the db accordingly

But, why do I get this error message?

command-4245198:14: error: type mismatch;
 found   : org.apache.spark.sql.DataFrame
    (which expands to)  org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
 required: String
  return uViewName
         ^

And when I change the return statement to:

return uViewName.toString()

it seems to work:

getUniverse: (x: Int)String

And, finally when running the def, shouldn't I get a string "APO_S_VN_E1_APR_U.."?

val x = getUniverse(1)
println(x)
[Data_View_Name: string]
x: String = [Data_View_Name: string]

Thanks for any advice / hint!

1 ACCEPTED SOLUTION

Accepted Solutions

Hubert-Dudek
Esteemed Contributor III

try add .first()(0) it will return only value from first row/column as currently you are returning Dataset:

  var uViewName = spark.sql(s"""
SELECT
  v.Data_View_Name
FROM
  apoHierarchy AS h
  INNER JOIN apoView AS v ON h.View_Name = v.Context_View_Name
WHERE
  1 = 1
  AND h.Step_Number = $stepNumber
""").first()(0)

View solution in original post

3 REPLIES 3

Hubert-Dudek
Esteemed Contributor III

try add .first()(0) it will return only value from first row/column as currently you are returning Dataset:

  var uViewName = spark.sql(s"""
SELECT
  v.Data_View_Name
FROM
  apoHierarchy AS h
  INNER JOIN apoView AS v ON h.View_Name = v.Context_View_Name
WHERE
  1 = 1
  AND h.Step_Number = $stepNumber
""").first()(0)

TS
New Contributor III

Yep, solved. Thank you!

(1) Spark Jobs
APR_S_VN_E1_APR_UNIVERSE
x: String = APR_S_VN_E1_APR_UNIVERSE

Hubert-Dudek
Esteemed Contributor III

Great to hear. If it is possible, please mark my answer as the Best one.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!