Genie PDF export corrupts non-ASCII characters (Polish diacritics ł, ż, ś, ź, ę, ą)

MWojcicki — Mon, 11 May 2026 17:14:46 GMT

When exporting a Genie conversation response to PDF, all Polish diacritical characters are systematically replaced with wrong ASCII characters, making the document unreadable for Polish-speaking users.

Character substitution pattern

Expected Rendered as Example in PDF

ł	B	gBównych instead of głównych
ż	\|	ró\|nica instead of różnica
ś	[	bezpo\[rednie instead of bezpośrednie
Ź	y	yródBa instead of Źródła
ę	(dropped)	midzy instead of między
ą	(dropped)	rosn instead of rosną

This affects all Polish characters across the entire document — titles, paragraphs, and table cells.

PDF metadata & font analysis

I analyzed the generated PDF with pypdf. Key findings:

Producer: PDFium
Creator: PDFium

Fonts used:
/F1: BaseFont=/Helvetica, Subtype=/Type1, Encoding=/WinAnsiEncoding, ToUnicode=False, Embedded=False
/F2: BaseFont=/PMWIBM+SourceHanSansJP-Bold, Subtype=/Type0, Encoding=/Identity-H, ToUnicode=True, Embedded=False
/F5: BaseFont=/KUZLZL+SourceHanSansJP-Normal, Subtype=/Type0, Encoding=/Identity-H, ToUnicode=True, Embedded=False

Root cause analysis

SourceHanSansJP is a Japanese CJK font (JP = Japanese). It uses Identity-H encoding with a ToUnicode CMap.
The font is not embedded in the PDF — only referenced.
The ToUnicode CMap appears to incorrectly map Polish diacritical glyphs (Latin Extended-A/B range: U+0141 ł, U+017B ż, U+015A ś, etc.) to wrong code points, producing the garbled output.
The /Helvetica Type1 font with WinAnsiEncoding could handle some Latin Extended characters, but the text is routed through the CJK font instead.

Expected behavior

Polish diacritical characters (ą, ć, ę, ł, ń, ó, ś, ź, ż) should render correctly in exported PDFs. These are standard Latin Extended-A characters (Unicode range U+0100–U+017F), supported by virtually all modern fonts.

Steps to reproduce

Open a Genie space
Ask a question in Polish (or get a response containing Polish text)
Export the conversation/response to PDF
Open the PDF — all diacritical characters are corrupted

Environment

Databricks on Azure
Genie (AI/BI) PDF export
PDF generated by PDFium engine
Language: Polish (likely affects other Central/Eastern European languages using Latin Extended: Czech, Hungarian, Romanian, etc.)