emmafrost1
New Contributor II

The reason why the special characters are converted to '?????' is because the Hive Metastore stores data in a binary format. When you create a view, the Hive Metastore converts the data in the view to binary format. This conversion process strips out any non-ASCII characters.

To preserve the special characters in the view representation, you can use the following workaround:

  1. Create a new column in the view that contains the encoded version of the string attribute.
  2. Use the new column in the where clause of the view.

The following code shows how to do this:

CREATE VIEW my_view AS
SELECT
  *,
  encode(string_attribute, 'utf-8') AS encoded_string_attribute
FROM my_table;
 
SELECT
  *
FROM my_view
WHERE
  encoded_string_attribute != 'シミュレータに接続されていません';

This code will create a new view called

my_view

that contains the original data from

my_table

plus a new column called

encoded_string_attribute

. The

encoded_string_attribute

column contains the encoded version of the

string_attribute

column. The where clause of the view will use the

encoded_string_attribute

column to filter out rows where the value of the

string_attribute

column is equal to 'シミュレータに接続されていません'.

This workaround will preserve the special characters in the view representation. However, it will also make the view slightly larger, because the encoded version of the

string_attribute

column will be slightly larger than the original

string_attribute

column.

If you want to avoid the performance penalty of storing the encoded version of the

string_attribute

column, you can use the following alternative workaround:

  1. Create a new column in the view that contains the escaped version of the string attribute.
  2. Use the new column in the where clause of the view.

The following code shows how to do this:

CREATE VIEW my_view AS
SELECT
  *,
  escape(string_attribute) AS escaped_string_attribute
FROM my_table;
 
SELECT
  *
FROM my_view
WHERE
  escaped_string_attribute != 'シミュレータに接続されていません';

This code will create a new view called

my_view

that contains the original data from

my_table

plus a new column called

escaped_string_attribute

. The

escaped_string_attribute

column contains the escaped version of the

string_attribute

column. The where clause of the view will use the

escaped_string_attribute

column to filter out rows where the value of the

string_attribute

column is equal to 'シミュレータに接続されていません'.

This workaround will preserve the special characters in the view representation without making the view any larger. However, it will make the where clause of the view slightly slower, because the Hive Metastore has to do some extra work to escape the special characters.

Which workaround you choose will depend on your specific needs. If you need to preserve the special characters in the view representation and you need the view to be as fast as possible, then you should use the first workaround. If you need to preserve the special characters in the view representation but you don't need the view to be as fast as possible, then you should use the second workaround.