Databricks Community

Mamdouh_Dabjan · ‎08-18-2022

Basically, I have a large csv file that does not fit in a single worksheet. I can just use it in power query. I am trying to import this file into my databricks notebook. I imported it and created a table using that file. But, When I saw the table, it was full of random symbols and not the data I imported. Is there a way to convert these symbols into my data??

the symbols are something like this

6��@#W&��9�`�ϻ��U1�ѵL�T��E)�N�9;��l01H�O��>�4Q+(�2�wiɆ��%?-2��7��A�ze�C��H��r+�;�>��(�2��~Y��D��[�2g��eϢ��ԯy�ir#��~�

weldermartins · ‎08-19-2022

hello, manually opening one of the parts of the csv file is the view different?

Mamdouh_Dabjan · ‎08-19-2022

I don't really understand what you mean. If you mean to open a csv on a worksheet it does not fit because the data is over 1 million rows

weldermartins · ‎08-19-2022

if you open cvs it will generate a message that it is not possible to read all lines, but you can preview the file. Post more information than you already have.

Mamdouh_Dabjan · ‎08-19-2022

It does say what you are saying. But what Can I do to import this csv into databricks?. as I said before, I uploaded the data and created a table using it but it is displaying these random symbols instead of the data I imported.

weldermartins · ‎08-19-2022

you can use pyspark to remove the unicodes.

This example removes null unicode. You will have to search or match yours or you can find some solution on google.

# Change null to empty in DataFrame
null = u'\u0000'
dfCnae = df\
.withColumn('id', regexp_replace(df['id'], null, ''))\
.withColumn('description', regexp_replace(df['description'], null, ''))