topic Re: Delta Jira data import to Databricks in Administration & Architecture

Delta Jira data import to Databricks

greengil — Mon, 13 Apr 2026 22:00:23 GMT

We need to import large amount of Jira data into Databricks, and should import only the delta changes. What's the best approach to do so? Using the Fivetran Jira connector or develop our own Python scripts/pipeline code? Thanks.

Re: Delta Jira data import to Databricks

Ashwin_DSA — Tue, 14 Apr 2026 05:59:58 GMT

Hi @greengil,

Have you considered Lakeflow Connect? Databricks now has a native Jira connector in Lakeflow Connect that can achieve what you are looking for. It's in beta, but something you may want to consider.

It ingests Jira into Delta with incremental (delta) loads out of the box, supports SCD1/SCD2, handles deletes via audit logs, and runs fully managed on serverless with Unity Catalog governance. This is lower-effort and better integrated than both Fivetran and custom Python, and directly targets your large volume + only changes requirement.

If you can’t use the Databricks Jira connector, prefer Fivetran Jira --> Databricks over custom code for a managed, low-maintenance ELT path. Only build custom Python pipelines if you have very specific requirements that neither managed option can meet.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Re: Delta Jira data import to Databricks

greengil — Fri, 17 Apr 2026 18:59:59 GMT

Hi @Ashwin_DSA - Thank you for the information. Appreciate it. Regarding the built-in Lakeflow Connect, I see that it will inject all the Jira tables into Databricks. Is there a way to inject only a subset of data? For example, instead of all issues, I want only a subset. Thanks.

Re: Delta Jira data import to Databricks

Ashwin_DSA — Fri, 17 Apr 2026 21:56:21 GMT

Hi @greengil,

Yes, you can restrict what Lakeflow Connect for Jira ingests, both by table and by rows (partially).

In the UI, on the Source step, you can select only the tables you care about (for example, just issues, or issues + projects) instead of all source tables. In DABs/API, only list the tables you want under objects.

The Jira connector supports filtering by Jira project/space via jira_options.include_jira_spaces (list of project keys). In the UI, this is exposed as an option to filter the data by Jira spaces or projects (you enter project keys, not names or IDs).

If you are looking for anything more granular than project/space (e.g. specific issue types, statuses, labels), then that's ot supported as of now. The connector ingests all matching issues for those projects/spaces, and you then filter downstream in silver/gold tables. More general row-level filtering for Jira is on the backlog but not yet available.

Refer to these pages jira pipeline and limitation.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Re: Delta Jira data import to Databricks

abhi_dabhi — Sat, 18 Apr 2026 04:49:58 GMT

Hi @greengil good question, I went through this something similar recently, so sharing what I found.

My instinct was also to build it in Python, but once I dug in, the "just write a script" path hides a lot of pain:

Deletions are invisible. Jira's REST API doesn't return deleted issues. Without webhooks, you'll have ghost records in Delta forever.
Field history isn't free. The API gives you current state, not change history. Reporting usually needs history, which means building and maintaining it yourself.
Archived issues aren't returned in JQL queries, only by ID.
Rate limits, pagination, schema drift for custom fields, all real work.

Fivetran's Jira connector handles all of this natively, JQL-based incremental sync, webhook-based deletion capture, auto-populated ISSUE_FIELD_HISTORY tables, schema drift detection, MERGE into Delta, and it's available through Databricks Partner Connect for quick setup. There's also a free dbt package (fivetran/dbt_jira) with pre-built analytics models.

My take: I would suggest go with Fivetran unless you have a specific reason not to - high volume cost concerns, need for archived issues, or data residency restrictions. Custom Python makes sense for narrow use cases, but it's weeks of build plus ongoing maintenance.

References I did research and came up with solution, please take a look, I think you will find it really helpful:

Fivetran Jira: Jira connector by Fivetran | Fivetran documentation
Fivetran Databricks destination: Databricks database connector by Fivetran | Fivetran documentation
dbt Jira package: fivetran/dbt_jira: Data models for Fivetran's Jira connector built using dbt.

Happy to dig in further if you're leaning one way.

Re: Delta Jira data import to Databricks

greengil — Thu, 23 Apr 2026 01:00:02 GMT

I'll keep this in mind. Thanks!

Re: Delta Jira data import to Databricks

greengil — Thu, 23 Apr 2026 01:04:36 GMT

Hi @Ashwin_DSA - I tried out Lakeflow Connect by following the instructions here: https://docs.databricks.com/aws/en/ingestion/lakeflow-connect/jira

When running the pipeline, I got the below error on each table data injection. I did make sure the Oauth and connector were set up correctly. Do you know why? Thanks in advance!

com.databricks.pipelines.execution.conduit.common.DataConnectorException: [SAAS_CONNECTOR_SOURCE_API_ERROR] An error occurred in the JIRA API call. Source API type: sourceApi.jira.listBoards. Error code: API_ERROR.
Try refreshing the destination table. If the issue persists, please file a ticket.

Re: Delta Jira data import to Databricks

greengil — Thu, 23 Apr 2026 05:47:48 GMT

There are other errors, like this one:

api.atlassian.com

Error class: UnknownHostException

Table: issues_without_deletes

Message: api.atlassian.com

Re: Delta Jira data import to Databricks

Ashwin_DSA — Thu, 23 Apr 2026 12:04:01 GMT

Hi @greengil,

The error messages you posted are two separate issues, but they probably appear at the same time.

The SAAS_CONNECTOR_SOURCE_API_ERROR on sourceApi.jira.listBoards usually means Jira itself is rejecting the listBoards call, most often because the OAuth app or connection user doesn’t have enough board/project permissions. For the Lakeflow Jira connector, you need not just the base scopes like read:jira-work and read:jira-user, but also the board-related scopes such as read:board-scope.admin:jira-software and read:project:jira, and the connection user should have at least Administer Projects / ADMINISTER GLOBAL privileges in Jira for the tables you’re ingesting.

If you’re filtering by project, double-check that include_jira_spaces contains exact project keys (for example ENG, PROD), not project names or numeric IDs, and that the projects are active and accessible to that user.

The other error.... UnknownHostException: api.atlassian.com on issues_without_deletes is a lower-level networking problem. The serverless cluster can’t resolve or reach api.atlassian.com. That typically happens when serverless egress control or another outbound network policy is enabled but the Atlassian domains aren’t allowlisted, similar to how other SaaS connectors fail with UnknownHostException when their API hosts are blocked.

If your workspace uses serverless egress/network policies, you’ll need to allow outbound HTTPS to api.atlassian.com (and your <tenant>.atlassian.net Jira host), or relax the policy so those domains are reachable, then rerun the pipeline. Once DNS/network to api.atlassian.com is fixed and the OAuth scopes and Jira permissions are aligned with the Jira connector reference, rerun the pipeline...

If the same DataConnectorException persists, it’s worth opening a Databricks support ticket with the pipeline ID and failing run ID so the team can look at backend logs, especially since the Jira connector is still in Beta.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Re: Delta Jira data import to Databricks

greengil — Thu, 23 Apr 2026 17:05:27 GMT

Hi @Ashwin_DSA -

I am trying this on the Databricks free edition. How do I set up outbound allowlist? Once everything working, I will do this setup in our paid version of Databricks. Or perhaps this free edition won't allow any public access?

Regarding the permissions, I have granted all the permissions as documented in the articles. I also use Oauth to connect and I logged in with my account which has global admin privileges, which should have all the needed permissions. Basically, the ingestion fails on every table. Maybe it's due to that api.atlassian.com issue discussed above?

By the way, where do I specify to inject only the specific Jira projects? During the configuration, I can select which table to inject, such as issues, issuetype, etc, but I don't see the option to inject only certain projects.

Thanks!

Re: Delta Jira data import to Databricks

Ashwin_DSA — Thu, 23 Apr 2026 21:00:10 GMT

Hi @greengil,

The UnknownHostException for api.atlassian.com is a pure networking/DNS problem, not a permissions issue. The Jira connector talks to both your <tenant>.atlassian.net and api.atlassian.com (for things like audit logs/deletes etc). If the serverless cluster can’t resolve or reach api.atlassian.com, all the tables that rely on that path will fail, regardless of how perfect your OAuth scopes and Jira permissions are.

On Databricks, that kind of UnknownHostException usually means serverless egress control or some other network policy is in place, and the SaaS hostname isn’t on the allowlist. The outbound allowlist is configured via serverless network policies at the account level, and that’s not something you can tweak from a free/Community-style workspace UI. In practice, that means in the free edition you’re using now, you probably can’t change the outbound allowlist yourself. In a proper paid workspace, an account admin can define a serverless network policy that allowlists api.atlassian.com and your <tenant>.atlassian.net host, which should clear the UnknownHostException for the Jira connector.

Given you’ve already granted the documented scopes and authenticated with a global admin Jira account, your permissions setup is likely fine. The fact that ingestion fails for every table lines up with the api.atlassian.com DNS issue being the real blocker.

On your last question about only specific Jira projects, the current UI wizard only lets you pick which tables to ingest (issues, issue_types, etc.). Project-level filtering is exposed through the pipeline definition rather than the click-through UI. In the YAML / notebook examples, you can add:

connector_options:
  jira_options:
    include_jira_spaces:
      - KEY1
      - KEY2

to each table object, where KEY1, KEY2 are the project keys, not names or IDs (for example ENG, PROD). That’s how you tell the connector to ingest only specific projects.

So in your shoes I’d treat the free workspace as a place to learn the flow, but expect the real end-to-end test (with api.atlassian.com reachable and an outbound allowlist) to happen in your paid workspace, with an admin setting up the appropriate serverless network policy.

Hope this clarifies.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Re: Delta Jira data import to Databricks

greengil — Sat, 25 Apr 2026 00:41:00 GMT

@Ashwin_DSA Thank you! Will try that on the paid version. Another question, I assume this connector will take care of deleted items, such as Jira issues, Components, custom field values? Same for custom field value name change? Thanks.

Re: Delta Jira data import to Databricks

Ashwin_DSA — Sat, 25 Apr 2026 13:49:14 GMT

Hi @greengil,

Mostly yes... but with some caveats.

For Jira issues, the connector does support deletes. It relies on the Jira audit logs API, and when that’s enabled, and the connection user has global admin, deleted issues are tracked. With SCD2 enabled, they receive a delete timestamp. With SCD1, they’re removed from the destination table on the next run.

For comments and worklogs, deletes are not handled incrementally. You only pick those up via a full refresh of the corresponding tables.

For dimension style entities like components, projects, users, custom field definitions, etc., those tables are modelled as full refresh on each run, so if a component or custom field is deleted, or you rename it, the next pipeline run will just reflect the current state from Jira (old entries disappear / names change to the new ones). Check this page.

For custom field values per issue, the issue_field_values table is incremental (SCD1/SCD2), so changes to the value are picked up on update. With SCD2 you can also see the history of those value changes over time. Check this page.

Hope this helps.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Re: Delta Jira data import to Databricks

greengil — Mon, 27 Apr 2026 19:53:11 GMT

Hi @Ashwin_DSA - Regarding the data filter, can I apply a Jira JQL to pull in data based on certain conditions? Also if I specify the condition here for the Issues table, for other tables such as Components, custom fields, will the data still being read in all?

Re: Delta Jira data import to Databricks

Ashwin_DSA — Mon, 27 Apr 2026 20:40:11 GMT

Hi @greengil,

As of now, the Jira connector doesn’t support passing an arbitrary Jira JQL. The only source-side filtering it exposes is by project/space via connector_options.jira_options.include_jira_spaces (using project keys), and then you do any finer-grained filtering (status, dates, etc.) downstream in Databricks.

That project filter is applied per table. If you include_jira_spaces only on the issues table, issues will be pruned to those projects, but other tables (components, custom fields, users, etc.) will still be read according to their own settings, which, by default, means "all available data" for that table.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Re: Delta Jira data import to Databricks

greengil — Mon, 27 Apr 2026 20:51:38 GMT

Hi @Ashwin_DSA - Thanks for the additional details. Could you please send me the documentation link for filtering other tables? Thanks a lot!

Re: Delta Jira data import to Databricks

Ashwin_DSA — Mon, 27 Apr 2026 21:07:11 GMT

Hi @greengil,

I had shared this in the previous posts, but here you go.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Re: Delta Jira data import to Databricks

greengil — Mon, 27 Apr 2026 22:28:22 GMT

Thanks, @Ashwin_DSA. On that page, I could not find how to filter other data. For example, for issue_fields table, how do I filter down to only the fields I want? Basically, if I import only Jira project A, I want to see only the fields used in project A in the issue_fields table. Is it possible to do during data injection? Thanks.

Re: Delta Jira data import to Databricks

greengil — Thu, 14 May 2026 23:26:52 GMT

Hi @Ashwin_DSA - So we got things set up so far so good. But when running the ETL pipeline, there's one table (isssue_with_deletes) fails due to the following error (partially shown). We have followed the instructions on this page: https://docs.databricks.com/aws/en/ingestion/lakeflow-connect/jira-source-setup.

org.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED] Query [id = xxxxxxxxxxxxxxxx, runId = xxxxxxxxxxxx] terminated with exception: Job aborted due to stage failure: Task 0 in stage 146.0 failed 4 times, most recent failure: Lost task 0.3 in stage 146.0 (TID 149) (10.0.15.32 executor 0): com.databricks.pipelines.execution.conduit.common.DataConnectorException: [JIRA_ADMIN_PERMISSION_MISSING] Error encountered while calling Jira APIs. Source API type: sourceApi.jira.fetchAuditLogs. Ensure the connecting user has Jira admin permissions for your Jira instance.

Looking at this page: https://docs.databricks.com/aws/en/ingestion/lakeflow-connect/jira-reference#supported-jira-source-tables, it mentions the user needs global admin permissions in Jira.

When setting up the connection in Catalog, our Databricks admin sets it up but he has no Jira admin privileges. When clicking on 'Sign into Jira" button in the process, we enter the actual Jira admin credentials to log into Jira. My understanding is, this is a one-time connection establishment between Jira and Databricks. The actual required permissions are set in the scope in the Oauth app. Reading the above page, it seems like the Databricks admin need Jira admin access for things to work during all data injection? Thanks.