cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Delta Jira data import to Databricks

greengil
Contributor

We need to import large amount of Jira data into Databricks, and should import only the delta changes.  What's the best approach to do so?  Using the Fivetran Jira connector or develop our own Python scripts/pipeline code?  Thanks.

17 REPLIES 17

There are other errors, like this one:

api.atlassian.com
Error class: UnknownHostException
Table: issues_without_deletes
Message: api.atlassian.com

abhi_dabhi
Databricks Partner

Hi @greengil  good question, I went through this something similar recently, so sharing what I found.

My instinct was also to build it in Python, but once I dug in, the "just write a script" path hides a lot of pain:

  • Deletions are invisible. Jira's REST API doesn't return deleted issues. Without webhooks, you'll have ghost records in Delta forever.
  • Field history isn't free. The API gives you current state, not change history. Reporting usually needs history, which means building and maintaining it yourself.
  • Archived issues aren't returned in JQL queries, only by ID.
  • Rate limits, pagination, schema drift for custom fields, all real work.

Fivetran's Jira connector handles all of this natively, JQL-based incremental sync, webhook-based deletion capture, auto-populated ISSUE_FIELD_HISTORY tables, schema drift detection, MERGE into Delta, and it's available through Databricks Partner Connect for quick setup. There's also a free dbt package (fivetran/dbt_jira) with pre-built analytics models.

My take: I would suggest go with Fivetran unless you have a specific reason not to - high volume cost concerns, need for archived issues, or data residency restrictions. Custom Python makes sense for narrow use cases, but it's weeks of build plus ongoing maintenance.

References I did research and came up with solution, please take a look, I think you will find it really helpful:

Happy to dig in further if you're leaning one way.

I'll keep this in mind.  Thanks!