cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Turn spark.sql query into scala function

TS
New Contributor III

Hello,

I'm learning Scala / Spark and try to understand what's wrong with my function:

I have a spark.sql query, stored in a variable:

val uViewName = spark.sql("""
SELECT
  v.Data_View_Name
FROM
  apoHierarchy AS h
  INNER JOIN apoView AS v ON h.View_Name = v.Context_View_Name
WHERE
  1 = 1
  AND h.Step_Number = 1
""")
uViewName.show()

This gives me the right result:

+-------------------------------+
|      Data_View_Name|
+-------------------------------+
|APO_S_VN_E1_APR_U...|
+-------------------------------+

I then want to pack it into a function like this:

def getUniverse(x: Int): String = {
  var sn = x
  var stepNumber = x.toString()
  var uViewName = spark.sql(s"""
SELECT
  v.Data_View_Name
FROM
  apoHierarchy AS h
  INNER JOIN apoView AS v ON h.View_Name = v.Context_View_Name
WHERE
  1 = 1
  AND h.Step_Number = $stepNumber
""")
  return uViewName
}

So, what I try to achieve is:

  • the StepNumber should be given as parameter
  • the query takes this parameter and queries the db accordingly

But, why do I get this error message?

command-4245198:14: error: type mismatch;
 found   : org.apache.spark.sql.DataFrame
    (which expands to)  org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
 required: String
  return uViewName
         ^

And when I change the return statement to:

return uViewName.toString()

it seems to work:

getUniverse: (x: Int)String

And, finally when running the def, shouldn't I get a string "APO_S_VN_E1_APR_U.."?

val x = getUniverse(1)
println(x)
[Data_View_Name: string]
x: String = [Data_View_Name: string]

Thanks for any advice / hint!

1 ACCEPTED SOLUTION

Accepted Solutions

Hubert-Dudek
Esteemed Contributor III

try add .first()(0) it will return only value from first row/column as currently you are returning Dataset:

  var uViewName = spark.sql(s"""
SELECT
  v.Data_View_Name
FROM
  apoHierarchy AS h
  INNER JOIN apoView AS v ON h.View_Name = v.Context_View_Name
WHERE
  1 = 1
  AND h.Step_Number = $stepNumber
""").first()(0)

View solution in original post

3 REPLIES 3

Hubert-Dudek
Esteemed Contributor III

try add .first()(0) it will return only value from first row/column as currently you are returning Dataset:

  var uViewName = spark.sql(s"""
SELECT
  v.Data_View_Name
FROM
  apoHierarchy AS h
  INNER JOIN apoView AS v ON h.View_Name = v.Context_View_Name
WHERE
  1 = 1
  AND h.Step_Number = $stepNumber
""").first()(0)

TS
New Contributor III

Yep, solved. Thank you!

(1) Spark Jobs
APR_S_VN_E1_APR_UNIVERSE
x: String = APR_S_VN_E1_APR_UNIVERSE

Hubert-Dudek
Esteemed Contributor III

Great to hear. If it is possible, please mark my answer as the Best one.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group