cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Lakebridge Transpiler Fails with UnicodeDecodeError while Analyzer Works Successfully

shashankB
New Contributor III

 

Hello Team,

I am facing an issue with Lakebridge transpiler.
The Analyzer step runs successfully and produces the expected analysis files. However, when I run the Transpiler, it fails with the following error:

 

 
ERROR [src/databricks/labs/Lakebridge.transpile] UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 71: character maps to <undefined> Error: unexpected end of JSON input Lakebridge Transpile failed with exit code 1
 

 
Command I executed:

databricks labs lakebridge transpile --input-source "C:\Users\user_name\Downloads\segment_pioneer" --source-dialect synapse --output-folder "C:\Users\user_name\Downloads\segment_pioneer\output\Converted_Code"
 
What confuses me is that:

The Analyzer works fine and completes successfully.

The Transpiler fails immediately with encoding-related error.

If there was a code issue in SQL, I would expect the Analyzer to also fail. So it seems related to how files/paths are being read by the transpiler (maybe encoding issue in Windows).

Could you please help clarify:

  1. Why Analyzer runs but Transpiler fails on the same input?
  2. Is there a known workaround for the UnicodeDecodeError on Windows (e.g., forcing UTF-8)?
  3. Should I try running this with a different CLI encoding setting?

Thanks in advance.

 

1 ACCEPTED SOLUTION

Accepted Solutions

ManojkMohan
Honored Contributor

Root Cause

  • The trailing “unexpected end of JSON input” suggests the decoder aborted midway, producing invalid JSON.
  • This mismatch between file content (likely UTF-8 or containing special characters) and default Windows decoding causes the issue.

Suggested Solutions
1. Force UTF-8 decoding in the Transpiler

If you have control over the CLI or transpiler's Python code, ensure file opening is done with:                        open(filename, 'r', encoding='utf-8')

2. Set Python's environment to use UTF-8 by default
You can try running the transpiler in UTF-8 mode using:

py -Xutf8 -m databricks.labs.lakebridge transpile ...

3. Convert files to UTF-8 before transpiling

If possible, ensure your source files are encoded in UTF-8. :

import codecs
with codecs.open(src, 'r', encoding='cp1252', errors='ignore') as f_in, \
codecs.open(dst, 'w', encoding='utf-8') as f_out:
f_out.write(f_in.read())

Pls let me know if any of the above works

View solution in original post

2 REPLIES 2

szymon_dybczak
Esteemed Contributor III

Hi @shashankB ,

 Maybe for analyzer they're using encoding-tolerant methods? The code is open-sourced so I guess you can check it in free time.

Could you open your input file in VSCode and check encoding? Also do you have some weird characters in your input file? Maybe some comments?

 

ManojkMohan
Honored Contributor

Root Cause

  • The trailing “unexpected end of JSON input” suggests the decoder aborted midway, producing invalid JSON.
  • This mismatch between file content (likely UTF-8 or containing special characters) and default Windows decoding causes the issue.

Suggested Solutions
1. Force UTF-8 decoding in the Transpiler

If you have control over the CLI or transpiler's Python code, ensure file opening is done with:                        open(filename, 'r', encoding='utf-8')

2. Set Python's environment to use UTF-8 by default
You can try running the transpiler in UTF-8 mode using:

py -Xutf8 -m databricks.labs.lakebridge transpile ...

3. Convert files to UTF-8 before transpiling

If possible, ensure your source files are encoded in UTF-8. :

import codecs
with codecs.open(src, 'r', encoding='cp1252', errors='ignore') as f_in, \
codecs.open(dst, 'w', encoding='utf-8') as f_out:
f_out.write(f_in.read())

Pls let me know if any of the above works

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now