cancel
Showing results for 
Search instead for 
Did you mean: 
Generative AI
Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.
cancel
Showing results for 
Search instead for 
Did you mean: 

GenAI Cookbook - how to add source documents to output and open pdf file on a page

karavyu1
New Contributor II

Hello,

I am implementing RAG solution as per Databricks cookbook. Review App is working, references are provided as text chunks.

I need to build functionality to open pdf file on a specific page as a reference. Is there a way to change ReviewApp to open pdf file instead of opening text?
Or do I need to build chainlit app with PdfViewer elements to provide this functionality?
How can I call serving point to return source documents?
I am using this code atm and sources are not returned:

 

 

import os
import requests
import numpy as np
import pandas as pd
import json

def create_tf_serving_json(data):
    return {'inputs': {name: data[name] for name in data.keys()} if isinstance(data, dict) else data}

def score_model(dataset):
    url = 'https://XXX/invocations'
    headers = {'Authorization': f'Bearer {DATABRICKS_TOKEN}', 'Content-Type': 'application/json'}
    ds_dict = {'dataframe_split': dataset.to_dict(orient='split')} if isinstance(dataset, pd.DataFrame) else create_tf_serving_json(dataset)
    data_json = json.dumps(ds_dict, allow_nan=True)
    print(data_json)
    response = requests.request(method='POST', headers=headers, url=url, data=data_json})
    if response.status_code != 200:
        raise Exception(f'Request failed with status {response.status_code}, {response.text}')
    return response.json()

response = score_model({
  "messages": [
    {
      "role": "user",
      "content": "What is LLM?"
    }
  ]
})
for item in response.items():
    print(item)

 

 

 
1 REPLY 1

mark_ott
Databricks Employee
Databricks Employee

You want your RAG solution (based on Databricks Cookbook) to display PDF files at specific pages as references in your Review App, rather than plain text chunks. You also wish to retrieve source documents from your serving endpoint, but your current code only returns model results, not sources.

Here’s a step-by-step review and actionable improvements to meet your requirements:

1. Opening PDFs at Specific Pages

The standard ReviewApp in Databricks' RAG cookbook is text-focused, and does not support direct PDF page referencing out of the box. To display PDFs and open them at a specified page, you must:

  • Use a PDF viewer component (like Chainlit’s PdfViewer), or another frontend framework.

  • On click of reference, provide both the file path (or URL) and the page number.

  • Store a mapping (e.g., pdf_path, page_no) during retrieval so references in the UI link directly to the PDF at the page you want.

Databricks ReviewApp Modification

  • Not natively supported, would require custom coding in the frontend to integrate a PDF viewer and manage page navigation based on reference metadata.

  • It is easier and more modular to use a framework like Chainlit or Streamlit (with PyPDF2, pdfplumber, or frontend PDF components) to achieve this directly.

2. Do You Need to Build Chainlit App?

Yes, building with Chainlit (or similar) is recommended for:

  • Embedding a PDF viewer.

  • Navigating to a specific page programmatically.

  • Making the app interactive and reference-friendly.

With Chainlit, you can use PdfViewer to show a PDF and control the page based on your reference logic.

3. Calling Serving Point to Return Source Documents

Your model serving endpoint (/invocations) needs to return not just the answer, but also the source document metadata (file path, page, chunk) for referencing.

  • If the model endpoint doesn’t return sources, check:

    • Is your retrieval chain constructed to include sources? In RAG, output should include something like sources or references.

    • Is your serving endpoint returning source_documents in the JSON?

Typical structure for returning sources:

json
{ "output": "LLM answer...", "source_documents": [ { "file": "mydoc.pdf", "page": 11, "chunk": "The LLM is..." } ] }

Your current code only prints the top-level dictionary items; you may need to check for ['source_documents'] in the response.

How to Get Source Documents

  • Back-end changes:
    In your Databricks RAG chain/code, ensure your chain/app is set up to return references:

    • For LangChain: use return_source_documents=True when calling the retriever/chain.

    • For custom solutions: append metadata (file, page) to the returned list.

  • Serving endpoint:
    Must be configured to return sources as part of its response.

  • Frontend handling:
    Parse the source_documents in the returned JSON and use this info to display or link the correct PDF/page.

4. Practical Code Improvements

Model Request

Ensure your backend is returning the sources:

python
def score_model(dataset): ... # Call endpoint as before response = requests.request(...) if response.status_code != 200: ... # Ideally: response.json() contains 'output' and 'source_documents' return response.json()

Example Response Handling

python
result = score_model(...) print("Answer:", result.get("output")) for src in result.get("source_documents", []): print(f"File: {src['file']} Page: {src['page']}") # In frontend: Pass src['file'] and src['page'] to PdfViewer

5. Recommendation Table

Option PDF Page Support Effort Flexibility Source Metadata Required
ReviewApp No (custom hack) Medium Low Yes
Chainlit App Yes (PdfViewer) Low/Med High Yes
 
 

References

  • [How to display PDFs and open at a page with Chainlit PdfViewer]

  • [Databricks RAG Cookbook documentation]


Summary:
Change your app to use Chainlit (or extend ReviewApp using a PDF viewer library) for easy PDF page referencing. Ensure your backend returns source metadata, and update frontend code to send file/page info to the viewer.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now