Databricks Community

zmwaris1 · ‎10-07-2024

I am using Apache Kylin for Data Analytics and Databricks for data modelling and filtering. I have my final data in gold tables and I would like to integrate this data with Apache Kylin using JDBC where the gold table will be the Data Source. I would really like to know if there is any way to achieve this. If there is, please share any resources that I can use?

Sidhant07 · ‎12-05-2024

Yes, it is possible to integrate your Databricks gold tables with Apache Kylin using JDBC. This integration allows you to use Apache Kylin's OLAP capabilities on the data stored in your Databricks environment. Here's how you can achieve this:

## Connecting Apache Kylin to Databricks

Apache Kylin supports connecting to various data sources using JDBC drivers, including Databricks. To set up this connection, you'll need to follow these steps:

1. **Configure JDBC Connection in Kylin**

In your Apache Kylin configuration, you'll need to set up the JDBC connection to Databricks. This can be done by adding the following properties to your Kylin configuration file[1]:

```
kylin.source.default=8
kylin.source.jdbc.connection-url=jdbc:databricks://<server-hostname>:<port>/<schema>
kylin.source.jdbc.driver=com.databricks.client.jdbc.Driver
kylin.source.jdbc.dialect=default
kylin.source.jdbc.user=<username>
kylin.source.jdbc.pass=<password>
```

Replace the placeholders with your Databricks connection details.

2. **Databricks JDBC Driver**

Ensure you have the Databricks JDBC driver jar file in your Kylin classpath. You can download this from the Databricks website[3].

3. **Configure Authentication**

Databricks supports various authentication methods. The most common is using a username and password, but for enhanced security, you might want to consider using OAuth or Personal Access Tokens[3][5].

## Setting Up the Data Source in Kylin

Once the connection is established, you can set up your Databricks gold table as a data source in Kylin:

1. In the Kylin web interface, go to the "Data Source" section.
2. Add a new table and select "JDBC" as the source type.
3. Provide the necessary details, including the schema and table name of your gold table in Databricks.

## Building Cubes

After setting up the data source, you can proceed to build cubes in Kylin based on your Databricks gold table:

1. Define dimensions and measures from your gold table.
2. Set up the appropriate aggregations and hierarchies.
3. Build the cube to precompute the data for fast OLAP queries.

## Querying Data

Once your cube is built, you can query the data using Kylin's SQL interface, which will now include data from your Databricks gold table.

## Additional Resources

1. Apache Kylin documentation on JDBC data sources: https://kylin.apache.org/docs/
2. Databricks JDBC driver documentation: https://docs.databricks.com/integrations/jdbc-odbc-bi.html

By following these steps, you should be able to successfully integrate your Databricks gold tables with Apache Kylin using JDBC, allowing you to leverage Kylin's OLAP capabilities on your Databricks data.