cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Importing python function with spark.read.jdbc in to Repos

mortenhaga
Contributor

Hi all!

Before we used Databricks Repos we used the run magic to run various utility python functions from one notebook inside other notebooks, fex like reading from a jdbc connections. We now plan to switch to repos to utilize the fantastic CI/CD possibilities that gives us. But we meet some challenges with this.

First, as I have understood, you cant use the run magic to run notebooks. Only arbritary files are allowed to be used. Then we re-wrote from notebook to .py files and sucessfully imported the functions but a spark related error occured.

This is the overview of my setup:

The python file with the function resides in the folder "utils" and the notebook I want to make use of it is inside the folder "landing".

image.pngThe function looks like this:

class qybele_db_connection:
  
  #this class creates a secure connection to the seas database
  jdbcHostname = "x"
  jdbcPort = 3306
  jdbcDatabase = "x"
  jdbcUrl = "jdbc:mysql://{0}:{1}/{2}".format(jdbcHostname, jdbcPort, jdbcDatabase)
  #always use the databricks secrets manager to store password
  jdbcpassword = 'x'
  connectionProps = { "user": 'x', "password": jdbcpassword }
  
  @staticmethod
  def read(spark_session,query=str):
    
 
    """
    this static method uses the above variables in a spark.read.jdbc function to create the connection.
    Note that the function takes in a string, which is a query and passes it to the "table" method in the spark function.
    """
    
    try:
      
      
      print('Executing query')
      df=spark.read.jdbc(url=qybele_db_connection.jdbcUrl, table=query, properties=qybele_db_connection.connectionProps)
      
    except Exception as e:
      raise Exception("Issue with reading from seas database")
 
    return df

This is how I import it to the notebook:

image 

  • Do I really have to start with the whole sparksession thing when including "spark.read.jdbc" in python files for this kind of workflow?
  • How can I stick to the run magic like before; ie running the notebook, instead of importing the python function?
  • How does this effect the results when you use this notebook from repos in jobs/workflows?
  • How does databricks secrets act when using repos like this?
1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

If you have a notebook in a repo, that notebook can include a %run. I attached a screenshot where you can see it in my notebook. You can tell it's a repo bc of the published button on the top.

View solution in original post

2 REPLIES 2

Anonymous
Not applicable

If you have a notebook in a repo, that notebook can include a %run. I attached a screenshot where you can see it in my notebook. You can tell it's a repo bc of the published button on the top.

mortenhaga
Contributor

Thats...odd. I was sure I had tried that, but now it works somehow. I guess it has to be that now I did it with double quotation marks. Thanks anyway! Works like a charm.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group