Databricks Community

vishwanath_1 · 02-26-2024

I am using below command to push DataFrame to Mongo Collection.There are few null values in String and Double datatype columns , we see these are getting missed when pushed to mongo even after using the option("ignoreNullValues", false) inputproddata...

vishwanath_1 · 01-22-2024

i have below steps to perform 1.Read a csv file (considerably huge file .. ~100gb)2.add index using zipwithindex function 3.repartition dataframe 4.Passing on to another function .Can you suggest the best optimized caching strategy to execute these c...

vishwanath_1 · 01-11-2024

reading 130gb file without multi line true it is 6 minutes my file has data in multi liner .How to speed up the reading time here .. i am using below commandInputDF=spark.read.option("delimiter","^").option("header",false).option("encoding","UTF-8"...

vishwanath_1 · 03-14-2024

First approach works.. Thanks

vishwanath_1 · 02-28-2024

import pandas as pd ExcelData = pd.read_excel("/dbfs"+FilePath, sheetName) # make sure you add /dbfs to FilePath

vishwanath_1 · 01-23-2024

i am facing the same issue currently even after setting keep_default_na = False still #N/A is being converted as nulldoes anyone know the solution here?

vishwanath_1 · 01-21-2024

By using set spark.databricks.sql.csv.edgeParserSplittable=true;There is now taking 30 mins lesser time than usual 4 hours.Any other setting which can be used to make it faster?

Databricks Community

User Stats

User Activity

Loading spark dataframe to Mongo collection isn't allowing nulls

Need Suggestion for better caching strategy

i am reading a 130gb csv file with multi line true it is taking 4 hours just to read

Re: Loading spark dataframe to Mongo collection isn't allowing nulls

Re: How to import excel files xls/xlsx file into Databricks python notebook?

Re: Reading excel file using pandas on spark api not rendering #N/A values correctly

Re: i am reading a 130gb csv file with multi line true it is taking 4 hours just to read