08-18-2022 12:09 PM
Basically, I have a large csv file that does not fit in a single worksheet. I can just use it in power query. I am trying to import this file into my databricks notebook. I imported it and created a table using that file. But, When I saw the table, it was full of random symbols and not the data I imported. Is there a way to convert these symbols into my data??
the symbols are something like this
6��@#W&���9�`�ϻ��U1�ѵL�T���E)�N�9;��l01H�O���>�4Q+(�2�wiɆ�������%?-2��7��A�ze�C��H��r+�;�>���(�2����~Y���D����[�2g�����eϢ��ԯy�ir#��~�
08-19-2022 05:19 AM
hello, manually opening one of the parts of the csv file is the view different?
08-19-2022 06:01 AM
I don't really understand what you mean. If you mean to open a csv on a worksheet it does not fit because the data is over 1 million rows
08-19-2022 06:17 AM
if you open cvs it will generate a message that it is not possible to read all lines, but you can preview the file. Post more information than you already have.
08-19-2022 06:54 AM
It does say what you are saying. But what Can I do to import this csv into databricks?. as I said before, I uploaded the data and created a table using it but it is displaying these random symbols instead of the data I imported.
08-19-2022 09:50 AM
you can use pyspark to remove the unicodes.
This example removes null unicode. You will have to search or match yours or you can find some solution on google.
# Change null to empty in DataFrame
null = u'\u0000'
dfCnae = df\
.withColumn('id', regexp_replace(df['id'], null, ''))\
.withColumn('description', regexp_replace(df['description'], null, ''))
08-19-2022 10:44 AM
Thanks
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group