MalformedInputException when using extended ascii characters in dbutils.notebook.exit()
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ10-03-2024 11:49 AM
I have a specific use case where I call another notebook using the
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ10-03-2024 09:16 PM - edited โ10-03-2024 09:19 PM
@jfpatenaude starbuckssecretmenu wrote:I have a specific use case where I call another notebook using the
dbutils.notebook.run() function. The other notebook do some processing and return a string in the dbutils.notebook.exit() function to the caller notebook. The returned string has some french special characters in it like ร , รฉ, รจ and because of that, the calling notebook executes for about 5 minutes longer than the called notebook and eventually ends with an exception: com.databricks.WorkflowException: java.nio.charset.MalformedInputException: Input length = 1If I remove the special characters, everything works fine. Same thing If I use the .encode('ascii', 'remove') function on my string, but I need to have the the correct string returned with my accents. Is there a way to preserve my string intact? I'm on the Databricks Runtime Version 13.3 LTS.You can reproduce the behavior using these two simple notebooks.Caller Notebook:Called Notebook:output = dbutils.notebook.run('called_notebook', 600)dbutils.notebook.exit("La mise ร jour des tables des donnรฉes raffinรฉes est terminรฉe")
- Called Notebook
# Called Notebook
output_string = "La mise ร jour des tables des donnรฉes raffinรฉes est terminรฉe"
# Encode to UTF-8
output_bytes = output_string.encode('utf-8')
## Convert to string format before returnin
dbutils.notebook.exit(output_bytes.decode('utf-8'))
- Caller Notebook
# Caller Notebook
output = dbutils.notebook.run('called_notebook', 600)
# You may want to ensure it's in the right format after calling
output_string = output.encode('utf-8').decode('utf-8')
print(output_string)

