Importing a large csv file into databricks free
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-18-2022 12:09 PM
Basically, I have a large csv file that does not fit in a single worksheet. I can just use it in power query. I am trying to import this file into my databricks notebook. I imported it and created a table using that file. But, When I saw the table, it was full of random symbols and not the data I imported. Is there a way to convert these symbols into my data??
the symbols are something like this
6��@#W&���9�`�ϻ��U1�ѵL�T���E)�N�9;��l01H�O���>�4Q+(�2�wiɆ�������%?-2��7��A�ze�C��H��r+�;�>���(�2����~Y���D����[�2g�����eϢ��ԯy�ir#��~�
- Labels:
-
Databricks notebook
-
Importing
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-19-2022 05:19 AM
hello, manually opening one of the parts of the csv file is the view different?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-19-2022 06:01 AM
I don't really understand what you mean. If you mean to open a csv on a worksheet it does not fit because the data is over 1 million rows
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-19-2022 06:17 AM
if you open cvs it will generate a message that it is not possible to read all lines, but you can preview the file. Post more information than you already have.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-19-2022 06:54 AM
It does say what you are saying. But what Can I do to import this csv into databricks?. as I said before, I uploaded the data and created a table using it but it is displaying these random symbols instead of the data I imported.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-19-2022 09:50 AM
you can use pyspark to remove the unicodes.
This example removes null unicode. You will have to search or match yours or you can find some solution on google.
# Change null to empty in DataFrame
null = u'\u0000'
dfCnae = df\
.withColumn('id', regexp_replace(df['id'], null, ''))\
.withColumn('description', regexp_replace(df['description'], null, ''))
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-19-2022 10:44 AM
Thanks

