- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-08-2025 12:00 AM
Hello Team,
I am facing an issue with Lakebridge transpiler.
The Analyzer step runs successfully and produces the expected analysis files. However, when I run the Transpiler, it fails with the following error:
ERROR [src/databricks/labs/Lakebridge.transpile] UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 71: character maps to <undefined> Error: unexpected end of JSON input Lakebridge Transpile failed with exit code 1
Command I executed:
databricks labs lakebridge transpile --input-source "C:\Users\user_name\Downloads\segment_pioneer" --source-dialect synapse --output-folder "C:\Users\user_name\Downloads\segment_pioneer\output\Converted_Code"
What confuses me is that:
The Analyzer works fine and completes successfully.
The Transpiler fails immediately with encoding-related error.
If there was a code issue in SQL, I would expect the Analyzer to also fail. So it seems related to how files/paths are being read by the transpiler (maybe encoding issue in Windows).
Could you please help clarify:
- Why Analyzer runs but Transpiler fails on the same input?
- Is there a known workaround for the UnicodeDecodeError on Windows (e.g., forcing UTF-8)?
- Should I try running this with a different CLI encoding setting?
Thanks in advance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-08-2025 12:17 AM
Hi @shashankB ,
Maybe for analyzer they're using encoding-tolerant methods? The code is open-sourced so I guess you can check it in free time.
Could you open your input file in VSCode and check encoding? Also do you have some weird characters in your input file? Maybe some comments?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-08-2025 04:29 AM
Root Cause
- The trailing “unexpected end of JSON input” suggests the decoder aborted midway, producing invalid JSON.
- This mismatch between file content (likely UTF-8 or containing special characters) and default Windows decoding causes the issue.
Suggested Solutions
1. Force UTF-8 decoding in the Transpiler
If you have control over the CLI or transpiler's Python code, ensure file opening is done with: open(filename, 'r', encoding='utf-8')
2. Set Python's environment to use UTF-8 by default
You can try running the transpiler in UTF-8 mode using:
py -Xutf8 -m databricks.labs.lakebridge transpile ...
3. Convert files to UTF-8 before transpiling
If possible, ensure your source files are encoded in UTF-8. :
import codecs
with codecs.open(src, 'r', encoding='cp1252', errors='ignore') as f_in, \
codecs.open(dst, 'w', encoding='utf-8') as f_out:
f_out.write(f_in.read())
Pls let me know if any of the above works