Hi Werners,
I have a csv which has double dagger delimitter and UTF-16 encoding. It has extra lines and spaces Some rows ends with CRLF and some ends with LF. So, I have created a shell script to handle this. Now, I wanted to integrate this shell script with my bigger python commands.
%sh tr '\n' ' ' <'/dbfs/mnt/datalake/data/file.csv' > '/dbfs/mnt/datalake/data/file_new.csv'
dff = spark.read.option("header", "true") \
.option("inferSchema", "true") \
.option('encoding', 'UTF-16') \
.option("delimiter", "‡‡,‡‡") \
.option("multiLine", True) \
.csv("/mnt/datalake/data/file_new.csv")
dffs_headers = dff.dtypes
for i in dffs_headers:
columnLabel = i[0]
newColumnLabel = columnLabel.replace('‡‡','').replace('‡‡','')
dff=dff.withColumn(newColumnLabel,regexp_replace(columnLabel,'^\\‡‡|\\‡‡$|\\ ‡‡',''))
if columnLabel != newColumnLabel:
dff = dff.drop(columnLabel)
#[display(dff)
display(dff)]
Now, I want to parameterise every path thats why I wrote the widgets, and get widgets etc