cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Structured Streaming from TimescaleDB?

Erik_L
Contributor II

I realize that the best practice would be to integrate our service with Kafka as a streaming source for Databricks, but given that the service already stores data into TimescaleDB, how can I stream data from TimescaleDB into DBX? Debezium doesn't work for TimescaleDB due to hypertables. Or is there some trivial way to integrate Kafka into our service for consuming data? I also need to have the historical data in TimescaleDB available for aggregate statistics.

2 REPLIES 2

-werners-
Esteemed Contributor III

Kaniz
Community Manager
Community Manager

Hi @Erik_LCurrently, there is no direct way to stream data from TimescaleDB into Databricks.

However, there are a couple of ways you can approach this:

1. **Kafka Integration**: You can integrate Kafka into your service for consuming data. Kafka is a popular choice for real-time data streaming services due to its ease of use and serverless setup. Databricks supports Apache Kafka as a source or a sink when running workloads. You can use Kafka as a bridge between your service and Databricks, where your service pushes data into Kafka, and Databricks consumes the data from Kafka.

2. **Historical Data**: For the historical data in TimescaleDB, you might need to export it from TimescaleDB and then import it into Databricks for aggregate statistics. Databricks supports various data sources for ingestion.

I hope this helps! Feel free to ask if you have further questions or need more clarification.