shamly
New Contributor III

Hi Werners,

I have a csv which has double dagger delimitter and UTF-16 encoding. It has extra lines and spaces Some rows ends with CRLF and some ends with LF. So, I have created a shell script to handle this. Now, I wanted to integrate this shell script with my bigger python commands.

%sh tr '\n' ' ' <'/dbfs/mnt/datalake/data/file.csv' > '/dbfs/mnt/datalake/data/file_new.csv'

dff = spark.read.option("header", "true") \

.option("inferSchema", "true") \

.option('encoding', 'UTF-16') \

.option("delimiter", "‡‡,‡‡") \

.option("multiLine", True) \

.csv("/mnt/datalake/data/file_new.csv")

dffs_headers = dff.dtypes

for i in dffs_headers:

  columnLabel = i[0]

  newColumnLabel = columnLabel.replace('‡‡','').replace('‡‡','')

  dff=dff.withColumn(newColumnLabel,regexp_replace(columnLabel,'^\\‡‡|\\‡‡$|\\ ‡‡',''))

  if columnLabel != newColumnLabel:

    dff = dff.drop(columnLabel)

    #[display(dff)

display(dff)]​ 

Now, I want to parameterise every path thats why I wrote the widgets, and get widgets etc