cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Importing a large csv file into databricks free

Mamdouh_Dabjan
New Contributor III

Basically, I have a large csv file that does not fit in a single worksheet. I can just use it in power query. I am trying to import this file into my databricks notebook. I imported it and created a table using that file. But, When I saw the table, it was full of random symbols and not the data I imported. Is there a way to convert these symbols into my data??

the symbols are something like this

6��@#W&���9�`�ϻ��U1�ѵL�T���E)�N�9;��l01H�O���>�4Q+(�2�wiɆ�������%?-2��7��A�ze�C��H��r+�;�>���(�2����~Y���D����[�2g�����eϢ��ԯy�ir#��~�

6 REPLIES 6

weldermartins
Honored Contributor

hello, manually opening one of the parts of the csv file is the view different?

I don't really understand what you mean. If you mean to open a csv on a worksheet it does not fit because the data is over 1 million rows

if you open cvs it will generate a message that it is not possible to read all lines, but you can preview the file. Post more information than you already have.

It does say what you are saying. But what Can I do to import this csv into databricks?. as I said before, I uploaded the data and created a table using it but it is displaying these random symbols instead of the data I imported.

you can use pyspark to remove the unicodes. 

This example removes null unicode. You will have to search or match yours or you can find some solution on google.

# Change null to empty in DataFrame
null = u'\u0000'
dfCnae = df\
.withColumn('id', regexp_replace(df['id'], null, ''))\
.withColumn('description', regexp_replace(df['description'], null, ''))

Thanks

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group