cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

java.lang.ArithmeticException: long overflow Exception while writing to table | pyspark

Divyanshu
New Contributor

Hey ,

I am trying to fetch data from mongo and write to databricks table.

I have read data from mongo using pymongo library, then flattened nested struct objects along with renaming columns(since there were few duplicates) and then writing to databricks table. Upon writing databricks throws following error:

"org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 12.0 failed 4 times, most recent failure: Lost task 2.3 in stage 12.0 (TID 53) (10.2.64.17 executor 0): org.apache.spark.SparkRuntimeException: Error while encoding: java.lang.ArithmeticException: long overflow"

I am getting same error for 197/492 columns which includes string, double, boolean, integer.

Any leads to what should I look into? Normally all answers I could find over internet are related to storing long data in int but our case seems to be very different.

Below is code and schema sample:

 

 

#Function to flatten nested json struct
def flatten_json_df(_df: DataFrame) -> DataFrame:
    # List to hold the dynamically generated column names
    flattened_col_list = []
    
    # Inner method to iterate over Data Frame to generate the column list
    def get_flattened_cols(df: DataFrame, struct_col: str = None) -> None:
        for col in df.columns:
            if df.schema[col].dataType.typeName() != 'struct':
                if struct_col is None:
                    flattened_col_list.append(f"{col} as {col.replace('.','_')}")
                else:
                    t = struct_col + "." + col
                    flattened_col_list.append(f"{t} as {t.replace('.','_')}")
            else:
                chained_col = struct_col +"."+ col if struct_col is not None else col
                get_flattened_cols(df.select(col+".*"), chained_col)
    
    get_flattened_cols(_df)
    i=0
    for col in flattened_col_list:
        #Removing/quoting special charachters
        col=re.sub(r'([0-9]+-[0-9]+) as',r'`\1` as',col)
        col=re.sub(r'([0-9]+)-([0-9]+)$',r'\1_\2',col)
        col=re.sub(r'\.(.*[0-9]+, [0-9]+) as',r'.`\1` as',col)
        flattened_col_list[i]=re.sub(r' ([0-9]+), ([0-9]+)$',r'_\1__\2',col)
        i=i+1
        
    return _df.selectExpr(flattened_col_list)

#Function to write data
def new_write_delta(type,df1,tgt_db_name,tgt_table_name,full_databricks_table_name,pk,operation=""):
    id_column = pk
    tgt_db_name = tgt_db_name
    targetTable = tgt_table_name
    delta_path = ""
    df1=df1.dropDuplicates()
    pk_arr = [pk_val for pk_val in pk.split(",") if pk_val != ""]
    if len(pk_arr) != 0:
        df1=df1.dropDuplicates(pk.split(","))
    if type == 'test':
        path = 'abfss://ft-data-container@ftdatastorage.dfs.core.windows.net/test'
        delta_path = path+'/'+tgt_db_name+'/'+targetTable
    print(f"{full_databricks_table_name} Start writing in overwrite.....")
    print("Writting in the location of :")
    print(f"{destination_table_location} in the container")
    (
        df1
            .write
            .mode('overwrite')
            .format('delta')
            .option('path',destination_table_location)
            .saveAsTable(f"{full_databricks_table_name}")
    )


type="test"
full_databricks_table_name = tgt_db_name + '.' + 'mongo_test_' + table
query = """[{ $match: { "lastUpdated" : { $gte : {$date : "%s"} } } }]"""%(start_date)
spark.conf.set('spark.sql.caseSensitive', True)

df = spark.read.format("mongo")\
    .option("database", database)\
    .option("spark.mongodb.input.uri", connectionString)\
    .option("collection",table)\
    .option("pipeline", query)\
    .load()

dfnew1=df.withColumn("wayPoint_value",explode("wayPoint")).withColumn("origin_value",df["origin"]).withColumn("destination_value",df["destination"]) #Exploding array column and rename
dfnew=dfnew1.drop("wayPoint").drop("origin").drop("destination") # Removing original columns which were duplicated in above line
dfnew=flatten_json_df(dfnew) #Flattening schema
dfnew=dfnew.drop("destination_value_TAT").drop("origin_value_TAT").drop("wayPoint_value_TAT") # Dropping duplicate columns
pk = '_id_oid'
df=dfnew

tgt_db_name = tgt_db_name
tgt_table_name = "mongo_test_" + table

#Calling write function defined above
new_write_delta(type,df,tgt_db_name,tgt_table_name,full_databricks_table_name,pk)

 

 

Schema:

 

 

|-- _id: struct (nullable = true)
 |    |-- oid: string (nullable = true)
 |-- averageSpeedDistribution: struct (nullable = true)
 |    |-- 0-20: double (nullable = true)
 |    |-- 20-40: double (nullable = true)
 |    |-- 40-60: double (nullable = true)
 |    |-- moreThan60: double (nullable = true)
 |-- created: timestamp (nullable = true)
 |-- dailyDistanceTravelled: struct (nullable = true)
 |    |-- August 01, 2023: double (nullable = true)
 |    |-- August 02, 2023: double (nullable = true)
 |    |-- August 03, 2023: double (nullable = true)
 |    |-- August 04, 2023: double (nullable = true)
 |    |-- August 05, 2023: double (nullable = true)
 |    |-- August 06, 2023: double (nullable = true)
 |    |-- August 07, 2023: double (nullable = true)
 |    |-- August 08, 2023: double (nullable = true)
 |    |-- August 09, 2023: double (nullable = true)
 |    |-- August 10, 2023: double (nullable = true)
 |    |-- August 11, 2023: double (nullable = true)
 |    |-- August 12, 2023: double (nullable = true)
 |    |-- August 13, 2023: double (nullable = true)
 |    |-- August 14, 2023: double (nullable = true)
 |    |-- August 15, 2023: double (nullable = true)
 |    |-- August 16, 2023: double (nullable = true)
 |    |-- August 17, 2023: double (nullable = true)
 |    |-- August 18, 2023: double (nullable = true)
 |    |-- August 19, 2023: double (nullable = true)
 |    |-- August 20, 2023: double (nullable = true)
 |    |-- August 21, 2023: double (nullable = true)
 |    |-- August 22, 2023: double (nullable = true)
 |    |-- August 23, 2023: double (nullable = true)
 |    |-- August 24, 2023: double (nullable = true)
 |    |-- August 25, 2023: double (nullable = true)
 |    |-- August 26, 2023: double (nullable = true)
 |    |-- August 27, 2023: double (nullable = true)
 |    |-- August 28, 2023: double (nullable = true)
 |    |-- August 29, 2023: double (nullable = true)
 |    |-- August 30, 2023: double (nullable = true)
 |    |-- August 31, 2023: double (nullable = true)
 |    |-- July 01, 2023: double (nullable = true)
 |    |-- July 02, 2023: double (nullable = true)
 |    |-- July 03, 2023: double (nullable = true)
 |    |-- July 04, 2023: double (nullable = true)
 |    |-- July 05, 2023: double (nullable = true)
 |    |-- July 06, 2023: double (nullable = true)
 |    |-- July 07, 2023: double (nullable = true)
 |    |-- July 08, 2023: double (nullable = true)
 |    |-- July 09, 2023: double (nullable = true)
 |    |-- July 10, 2023: double (nullable = true)
 |    |-- July 11, 2023: double (nullable = true)
 |    |-- July 12, 2023: double (nullable = true)
 |    |-- July 13, 2023: double (nullable = true)
 |    |-- July 14, 2023: double (nullable = true)
 |    |-- July 15, 2023: double (nullable = true)
 |    |-- July 16, 2023: double (nullable = true)
 |    |-- July 17, 2023: double (nullable = true)
 |    |-- July 18, 2023: double (nullable = true)
 |    |-- July 19, 2023: double (nullable = true)
 |    |-- July 20, 2023: double (nullable = true)
 |    |-- July 21, 2023: double (nullable = true)
 |    |-- July 22, 2023: double (nullable = true)
 |    |-- July 23, 2023: double (nullable = true)
 |    |-- July 24, 2023: double (nullable = true)
 |    |-- July 25, 2023: double (nullable = true)
 |    |-- July 26, 2023: double (nullable = true)
 |    |-- July 27, 2023: double (nullable = true)
 |    |-- July 28, 2023: double (nullable = true)
 |    |-- July 29, 2023: double (nullable = true)
 |    |-- July 30, 2023: double (nullable = true)
 |    |-- July 31, 2023: double (nullable = true)
 |    |-- June 07, 2023: double (nullable = true)
 |    |-- June 08, 2023: double (nullable = true)
 |    |-- June 09, 2023: double (nullable = true)
 |    |-- June 10, 2023: double (nullable = true)
 |    |-- June 11, 2023: double (nullable = true)
 |    |-- June 12, 2023: double (nullable = true)
 |    |-- June 13, 2023: double (nullable = true)
 |    |-- June 14, 2023: double (nullable = true)
 |    |-- June 15, 2023: double (nullable = true)
 |    |-- June 16, 2023: double (nullable = true)
 |    |-- June 17, 2023: double (nullable = true)
 |    |-- June 18, 2023: double (nullable = true)
 |    |-- June 19, 2023: double (nullable = true)
 |    |-- June 20, 2023: double (nullable = true)
 |    |-- June 21, 2023: double (nullable = true)
 |    |-- June 22, 2023: double (nullable = true)
 |    |-- June 23, 2023: double (nullable = true)
 |    |-- June 24, 2023: double (nullable = true)
 |    |-- June 25, 2023: double (nullable = true)
 |    |-- June 26, 2023: double (nullable = true)
 |    |-- June 27, 2023: double (nullable = true)
 |    |-- June 28, 2023: double (nullable = true)
 |    |-- June 29, 2023: double (nullable = true)
 |    |-- June 30, 2023: double (nullable = true)
 |    |-- May 02, 2023: double (nullable = true)
 |    |-- May 03, 2023: double (nullable = true)
 |    |-- May 04, 2023: double (nullable = true)
 |    |-- May 05, 2023: double (nullable = true)
 |    |-- May 06, 2023: double (nullable = true)
 |    |-- May 07, 2023: double (nullable = true)
 |    |-- May 08, 2023: double (nullable = true)
 |    |-- May 09, 2023: double (nullable = true)
 |    |-- May 10, 2023: double (nullable = true)
 |    |-- May 11, 2023: double (nullable = true)
 |    |-- May 12, 2023: double (nullable = true)
 |    |-- May 13, 2023: double (nullable = true)
 |    |-- May 14, 2023: double (nullable = true)
 |    |-- May 15, 2023: double (nullable = true)
 |    |-- May 16, 2023: double (nullable = true)
 |    |-- September 01, 2023: double (nullable = true)
 |    |-- September 02, 2023: double (nullable = true)
 |    |-- September 03, 2023: double (nullable = true)
 |    |-- September 04, 2023: double (nullable = true)
 |    |-- September 05, 2023: double (nullable = true)
 |    |-- September 06, 2023: double (nullable = true)
 |    |-- September 07, 2023: double (nullable = true)
 |    |-- September 08, 2023: double (nullable = true)
 |    |-- September 09, 2023: double (nullable = true)
 |    |-- September 10, 2023: double (nullable = true)
 |    |-- September 11, 2023: double (nullable = true)
 |    |-- September 12, 2023: double (nullable = true)
 |    |-- September 13, 2023: double (nullable = true)
 |    |-- September 14, 2023: double (nullable = true)
 |    |-- September 15, 2023: double (nullable = true)
 |    |-- September 16, 2023: double (nullable = true)
 |    |-- September 17, 2023: double (nullable = true)
 |    |-- September 18, 2023: double (nullable = true)
 |    |-- September 19, 2023: double (nullable = true)
 |    |-- September 20, 2023: double (nullable = true)
 |    |-- September 21, 2023: double (nullable = true)
 |    |-- September 22, 2023: double (nullable = true)
 |    |-- September 23, 2023: double (nullable = true)
 |    |-- September 24, 2023: double (nullable = true)
 |    |-- September 25, 2023: double (nullable = true)
 |    |-- September 26, 2023: double (nullable = true)
 |    |-- September 27, 2023: double (nullable = true)
 |    |-- September 28, 2023: double (nullable = true)
 |-- destination: struct (nullable = true)
 |    |-- TAT: double (nullable = true)
 |    |-- currentState: string (nullable = true)
 |    |-- d1: double (nullable = true)
 |    |-- d2: double (nullable = true)
 |    |-- d3: double (nullable = true)
 |    |-- d4: double (nullable = true)
 |    |-- distance: double (nullable = true)
 |    |-- entryTime: string (nullable = true)
 |    |-- exitTime: string (nullable = true)
 |    |-- hasMultipleEntry: boolean (nullable = true)
 |    |-- hasMultipleExit: boolean (nullable = true)
 |    |-- latitude: double (nullable = true)
 |    |-- longitude: double (nullable = true)
 |    |-- multipleEntryTimestamp: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- ENTRY: timestamp (nullable = true)
 |    |    |    |-- EXIT: timestamp (nullable = true)
 |    |-- multipleGeofence: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- code: string (nullable = true)
 |    |    |    |-- entryTime: timestamp (nullable = true)
 |    |    |    |-- exitTime: timestamp (nullable = true)
 |    |    |    |-- radius: double (nullable = true)
 |    |    |    |-- shape: string (nullable = true)
 |    |    |    |-- type: string (nullable = true)
 |    |-- noOfEntries: integer (nullable = true)
 |    |-- noOfExits: integer (nullable = true)
 |    |-- p1: struct (nullable = true)
 |    |    |-- attributes: struct (nullable = true)
 |    |    |    |-- External_Location_Source: string (nullable = true)
 |    |    |    |-- appPackage: string (nullable = true)
 |    |    |    |-- appVersion: string (nullable = true)
 |    |    |    |-- battery: integer (nullable = true)
 |    |    |    |-- external_api_id: integer (nullable = true)
 |    |    |    |-- is_charging: integer (nullable = true)
 |    |    |    |-- is_mocked_location: boolean (nullable = true)
 |    |    |    |-- locationSource: string (nullable = true)
 |    |    |    |-- location_source: string (nullable = true)
 |    |    |    |-- orientation: integer (nullable = true)
 |    |    |    |-- os: string (nullable = true)
 |    |    |    |-- osVersion: string (nullable = true)
 |    |    |    |-- provider_id: string (nullable = true)
 |    |    |-- deviceId: string (nullable = true)
 |    |    |-- lat: double (nullable = true)
 |    |    |-- lng: double (nullable = true)
 |    |    |-- ls: string (nullable = true)
 |    |    |-- ts: string (nullable = true)
 |    |-- p2: struct (nullable = true)
 |    |    |-- attributes: struct (nullable = true)
 |    |    |    |-- External_Location_Source: string (nullable = true)
 |    |    |    |-- appPackage: string (nullable = true)
 |    |    |    |-- appVersion: string (nullable = true)
 |    |    |    |-- battery: integer (nullable = true)
 |    |    |    |-- external_api_id: integer (nullable = true)
 |    |    |    |-- is_charging: integer (nullable = true)
 |    |    |    |-- is_mocked_location: boolean (nullable = true)
 |    |    |    |-- locationSource: string (nullable = true)
 |    |    |    |-- location_source: string (nullable = true)
 |    |    |    |-- orientation: integer (nullable = true)
 |    |    |    |-- os: string (nullable = true)
 |    |    |    |-- osVersion: string (nullable = true)
 |    |    |    |-- provider_id: string (nullable = true)
 |    |    |-- deviceId: string (nullable = true)
 |    |    |-- lat: double (nullable = true)
 |    |    |-- lng: double (nullable = true)
 |    |    |-- ls: string (nullable = true)
 |    |    |-- ts: string (nullable = true)
 |    |-- p3: struct (nullable = true)
 |    |    |-- attributes: struct (nullable = true)
 |    |    |    |-- External_Location_Source: string (nullable = true)
 |    |    |    |-- appPackage: string (nullable = true)
 |    |    |    |-- appVersion: string (nullable = true)
 |    |    |    |-- battery: integer (nullable = true)
 |    |    |    |-- external_api_id: integer (nullable = true)
 |    |    |    |-- is_charging: integer (nullable = true)
 |    |    |    |-- is_mocked_location: boolean (nullable = true)
 |    |    |    |-- locationSource: string (nullable = true)
 |    |    |    |-- location_source: string (nullable = true)
 |    |    |    |-- orientation: integer (nullable = true)
 |    |    |    |-- os: string (nullable = true)
 |    |    |    |-- osVersion: string (nullable = true)
 |    |    |    |-- provider_id: string (nullable = true)
 |    |    |-- deviceId: string (nullable = true)
 |    |    |-- lat: double (nullable = true)
 |    |    |-- lng: double (nullable = true)
 |    |    |-- ls: string (nullable = true)
 |    |    |-- ts: string (nullable = true)
 |    |-- p4: struct (nullable = true)
 |    |    |-- attributes: struct (nullable = true)
 |    |    |    |-- External_Location_Source: string (nullable = true)
 |    |    |    |-- external_api_id: integer (nullable = true)
 |    |    |    |-- locationSource: string (nullable = true)
 |    |    |    |-- location_source: string (nullable = true)
 |    |    |    |-- orientation: integer (nullable = true)
 |    |    |    |-- provider_id: string (nullable = true)
 |    |    |-- deviceId: string (nullable = true)
 |    |    |-- lat: double (nullable = true)
 |    |    |-- lng: double (nullable = true)
 |    |    |-- ls: string (nullable = true)
 |    |    |-- ts: string (nullable = true)
 |    |-- placeInfo: struct (nullable = true)
 |    |    |-- lat: double (nullable = true)
 |    |    |-- lng: double (nullable = true)
 |    |    |-- name: string (nullable = true)
 |    |    |-- geofenceType: long (nullable = true)
 |    |    |-- boundaryData: double (nullable = true)
 |    |    |-- placeType: long (nullable = true)
 |    |    |-- thresholds: struct (nullable = true)
 |    |    |    |-- boundaryTime: double (nullable = true)
 |    |    |    |-- boundaryDistance: double (nullable = true)
 |    |    |    |-- tatMargin: double (nullable = true)
 |    |-- tat: double (nullable = true)
 |    |-- trackingStatus: boolean (nullable = true)
 |-- distanceTravelled: double (nullable = true)
 |-- firstPingTime: string (nullable = true)
 |-- lastKnownAddress: string (nullable = true)
 |-- lastMileEntered: boolean (nullable = true)
 |-- lastMileThreshold: double (nullable = true)
 |-- lastPing: struct (nullable = true)
 |    |-- attributes: struct (nullable = true)
 |    |    |-- External_Location_Source: string (nullable = true)
 |    |    |-- appPackage: string (nullable = true)
 |    |    |-- appVersion: string (nullable = true)
 |    |    |-- battery: integer (nullable = true)
 |    |    |-- external_api_id: integer (nullable = true)
 |    |    |-- is_charging: integer (nullable = true)
 |    |    |-- is_mocked_location: boolean (nullable = true)
 |    |    |-- locationSource: string (nullable = true)
 |    |    |-- location_source: string (nullable = true)
 |    |    |-- orientation: integer (nullable = true)
 |    |    |-- os: string (nullable = true)
 |    |    |-- osVersion: string (nullable = true)
 |    |    |-- provider_id: string (nullable = true)
 |    |-- averageSpeed: double (nullable = true)
 |    |-- currentState: string (nullable = true)
 |    |-- deviceId: string (nullable = true)
 |    |-- distanceFromDestination: double (nullable = true)
 |    |-- distanceFromLastPing: double (nullable = true)
 |    |-- distanceFromOrigin: double (nullable = true)
 |    |-- lat: double (nullable = true)
 |    |-- lng: double (nullable = true)
 |    |-- ls: string (nullable = true)
 |    |-- timeFromLastPing: double (nullable = true)
 |    |-- ts: string (nullable = true)
 |-- lastPingTime: string (nullable = true)
 |-- lastUpdated: timestamp (nullable = true)
 |-- manualPositionCount: long (nullable = true)
 |-- origin: struct (nullable = true)
 |    |-- TAT: double (nullable = true)
 |    |-- currentState: string (nullable = true)
 |    |-- d1: double (nullable = true)
 |    |-- d2: double (nullable = true)
 |    |-- d3: double (nullable = true)
 |    |-- d4: double (nullable = true)
 |    |-- distance: double (nullable = true)
 |    |-- entryTime: string (nullable = true)
 |    |-- exitTime: string (nullable = true)
 |    |-- hasMultipleEntry: boolean (nullable = true)
 |    |-- hasMultipleExit: boolean (nullable = true)
 |    |-- latitude: double (nullable = true)
 |    |-- longitude: double (nullable = true)
 |    |-- multipleEntryTimestamp: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- ENTRY: timestamp (nullable = true)
 |    |    |    |-- EXIT: timestamp (nullable = true)
 |    |-- multipleGeofence: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- code: string (nullable = true)
 |    |    |    |-- entryTime: timestamp (nullable = true)
 |    |    |    |-- exitTime: timestamp (nullable = true)
 |    |    |    |-- radius: double (nullable = true)
 |    |    |    |-- shape: string (nullable = true)
 |    |    |    |-- type: string (nullable = true)
 |    |-- noOfEntries: integer (nullable = true)
 |    |-- noOfExits: integer (nullable = true)
 |    |-- p1: struct (nullable = true)
 |    |    |-- attributes: struct (nullable = true)
 |    |    |    |-- External_Location_Source: string (nullable = true)
 |    |    |    |-- external_api_id: integer (nullable = true)
 |    |    |    |-- locationSource: string (nullable = true)
 |    |    |    |-- location_source: string (nullable = true)
 |    |    |    |-- orientation: integer (nullable = true)
 |    |    |    |-- provider_id: string (nullable = true)
 |    |    |-- deviceId: string (nullable = true)
 |    |    |-- lat: double (nullable = true)
 |    |    |-- lng: double (nullable = true)
 |    |    |-- ls: string (nullable = true)
 |    |    |-- ts: string (nullable = true)
 |    |-- p2: struct (nullable = true)
 |    |    |-- attributes: struct (nullable = true)
 |    |    |    |-- External_Location_Source: string (nullable = true)
 |    |    |    |-- appPackage: string (nullable = true)
 |    |    |    |-- appVersion: string (nullable = true)
 |    |    |    |-- battery: integer (nullable = true)
 |    |    |    |-- external_api_id: integer (nullable = true)
 |    |    |    |-- is_charging: integer (nullable = true)
 |    |    |    |-- is_mocked_location: boolean (nullable = true)
 |    |    |    |-- locationSource: string (nullable = true)
 |    |    |    |-- location_source: string (nullable = true)
 |    |    |    |-- orientation: integer (nullable = true)
 |    |    |    |-- os: string (nullable = true)
 |    |    |    |-- osVersion: string (nullable = true)
 |    |    |    |-- provider_id: string (nullable = true)
 |    |    |-- deviceId: string (nullable = true)
 |    |    |-- lat: double (nullable = true)
 |    |    |-- lng: double (nullable = true)
 |    |    |-- ls: string (nullable = true)
 |    |    |-- ts: string (nullable = true)
 |    |-- p3: struct (nullable = true)
 |    |    |-- attributes: struct (nullable = true)
 |    |    |    |-- External_Location_Source: string (nullable = true)
 |    |    |    |-- appPackage: string (nullable = true)
 |    |    |    |-- appVersion: string (nullable = true)
 |    |    |    |-- battery: integer (nullable = true)
 |    |    |    |-- external_api_id: integer (nullable = true)
 |    |    |    |-- is_charging: integer (nullable = true)
 |    |    |    |-- is_mocked_location: boolean (nullable = true)
 |    |    |    |-- locationSource: string (nullable = true)
 |    |    |    |-- location_source: string (nullable = true)
 |    |    |    |-- orientation: integer (nullable = true)
 |    |    |    |-- os: string (nullable = true)
 |    |    |    |-- osVersion: string (nullable = true)
 |    |    |    |-- provider_id: string (nullable = true)
 |    |    |-- deviceId: string (nullable = true)
 |    |    |-- lat: double (nullable = true)
 |    |    |-- lng: double (nullable = true)
 |    |    |-- ls: string (nullable = true)
 |    |    |-- ts: string (nullable = true)
 |    |-- p4: struct (nullable = true)
 |    |    |-- attributes: struct (nullable = true)
 |    |    |    |-- External_Location_Source: string (nullable = true)
 |    |    |    |-- appPackage: string (nullable = true)
 |    |    |    |-- appVersion: string (nullable = true)
 |    |    |    |-- battery: integer (nullable = true)
 |    |    |    |-- external_api_id: integer (nullable = true)
 |    |    |    |-- is_charging: integer (nullable = true)
 |    |    |    |-- is_mocked_location: boolean (nullable = true)
 |    |    |    |-- locationSource: string (nullable = true)
 |    |    |    |-- location_source: string (nullable = true)
 |    |    |    |-- orientation: integer (nullable = true)
 |    |    |    |-- os: string (nullable = true)
 |    |    |    |-- osVersion: string (nullable = true)
 |    |    |    |-- provider_id: string (nullable = true)
 |    |    |-- deviceId: string (nullable = true)
 |    |    |-- lat: double (nullable = true)
 |    |    |-- lng: double (nullable = true)
 |    |    |-- ls: string (nullable = true)
 |    |    |-- ts: string (nullable = true)
 |    |-- placeInfo: struct (nullable = true)
 |    |    |-- lat: double (nullable = true)
 |    |    |-- lng: double (nullable = true)
 |    |    |-- name: string (nullable = true)
 |    |    |-- geofenceType: long (nullable = true)
 |    |    |-- boundaryData: double (nullable = true)
 |    |    |-- placeType: long (nullable = true)
 |    |    |-- thresholds: struct (nullable = true)
 |    |    |    |-- boundaryTime: double (nullable = true)
 |    |    |    |-- boundaryDistance: double (nullable = true)
 |    |    |    |-- tatMargin: double (nullable = true)
 |    |-- tat: double (nullable = true)
 |    |-- trackingStatus: boolean (nullable = true)
 |-- originDestinationTransitDistance: double (nullable = true)
 |-- originDestinationTransitTime: double (nullable = true)
 |-- positionCount: long (nullable = true)
 |-- runningTime: double (nullable = true)
 |-- speedTimeDistribution: struct (nullable = true)
 |    |-- 0-20: double (nullable = true)
 |    |-- 20-40: double (nullable = true)
 |    |-- 40-60: double (nullable = true)
 |    |-- moreThan60: double (nullable = true)
 |-- stoppageTime: double (nullable = true)
 |-- totalDistance: double (nullable = true)
 |-- totalIntervals: integer (nullable = true)
 |-- totalTrackedIntervals: integer (nullable = true)
 |-- trackingHealth: double (nullable = true)
 |-- trackingStatus: string (nullable = true)
 |-- transitDuration: double (nullable = true)
 |-- transitPing: struct (nullable = true)
 |    |-- firstTransitPing: struct (nullable = true)
 |    |    |-- attributes: struct (nullable = true)
 |    |    |    |-- External_Location_Source: string (nullable = true)
 |    |    |    |-- appPackage: string (nullable = true)
 |    |    |    |-- appVersion: string (nullable = true)
 |    |    |    |-- battery: integer (nullable = true)
 |    |    |    |-- external_api_id: integer (nullable = true)
 |    |    |    |-- is_charging: integer (nullable = true)
 |    |    |    |-- is_mocked_location: boolean (nullable = true)
 |    |    |    |-- locationSource: string (nullable = true)
 |    |    |    |-- location_source: string (nullable = true)
 |    |    |    |-- orientation: integer (nullable = true)
 |    |    |    |-- os: string (nullable = true)
 |    |    |    |-- osVersion: string (nullable = true)
 |    |    |    |-- provider_id: string (nullable = true)
 |    |    |-- deviceId: string (nullable = true)
 |    |    |-- lat: double (nullable = true)
 |    |    |-- lng: double (nullable = true)
 |    |    |-- ls: string (nullable = true)
 |    |    |-- ts: string (nullable = true)
 |    |-- lastTransitPing: struct (nullable = true)
 |    |    |-- attributes: struct (nullable = true)
 |    |    |    |-- External_Location_Source: string (nullable = true)
 |    |    |    |-- appPackage: string (nullable = true)
 |    |    |    |-- appVersion: string (nullable = true)
 |    |    |    |-- battery: integer (nullable = true)
 |    |    |    |-- external_api_id: integer (nullable = true)
 |    |    |    |-- is_charging: integer (nullable = true)
 |    |    |    |-- is_mocked_location: boolean (nullable = true)
 |    |    |    |-- locationSource: string (nullable = true)
 |    |    |    |-- location_source: string (nullable = true)
 |    |    |    |-- orientation: integer (nullable = true)
 |    |    |    |-- os: string (nullable = true)
 |    |    |    |-- osVersion: string (nullable = true)
 |    |    |    |-- provider_id: string (nullable = true)
 |    |    |-- deviceId: string (nullable = true)
 |    |    |-- lat: double (nullable = true)
 |    |    |-- lng: double (nullable = true)
 |    |    |-- ls: string (nullable = true)
 |    |    |-- ts: string (nullable = true)
 |-- transitTime: double (nullable = true)
 |-- tripId: long (nullable = true)
 |-- tripStatus: integer (nullable = true)
 |-- tripTrackedAtDestination: boolean (nullable = true)
 |-- tripTrackedAtOrigin: boolean (nullable = true)
 |-- tripTrackedDuringTransit: boolean (nullable = true)
 |-- wayPoint: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- TAT: double (nullable = true)
 |    |    |-- currentState: string (nullable = true)
 |    |    |-- d1: double (nullable = true)
 |    |    |-- d2: double (nullable = true)
 |    |    |-- d3: double (nullable = true)
 |    |    |-- d4: double (nullable = true)
 |    |    |-- distance: double (nullable = true)
 |    |    |-- entryTime: string (nullable = true)
 |    |    |-- exitTime: string (nullable = true)
 |    |    |-- hasMultipleEntry: boolean (nullable = true)
 |    |    |-- hasMultipleExit: boolean (nullable = true)
 |    |    |-- latitude: double (nullable = true)
 |    |    |-- longitude: double (nullable = true)
 |    |    |-- multipleEntryTimestamp: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- ENTRY: timestamp (nullable = true)
 |    |    |    |    |-- EXIT: timestamp (nullable = true)
 |    |    |-- noOfEntries: integer (nullable = true)
 |    |    |-- noOfExits: integer (nullable = true)
 |    |    |-- p1: struct (nullable = true)
 |    |    |    |-- attributes: struct (nullable = true)
 |    |    |    |    |-- appPackage: string (nullable = true)
 |    |    |    |    |-- appVersion: string (nullable = true)
 |    |    |    |    |-- battery: integer (nullable = true)
 |    |    |    |    |-- is_charging: integer (nullable = true)
 |    |    |    |    |-- is_mocked_location: boolean (nullable = true)
 |    |    |    |    |-- locationSource: string (nullable = true)
 |    |    |    |    |-- location_source: string (nullable = true)
 |    |    |    |    |-- os: string (nullable = true)
 |    |    |    |    |-- osVersion: string (nullable = true)
 |    |    |    |    |-- provider_id: string (nullable = true)
 |    |    |    |-- deviceId: string (nullable = true)
 |    |    |    |-- lat: double (nullable = true)
 |    |    |    |-- lng: double (nullable = true)
 |    |    |    |-- ls: string (nullable = true)
 |    |    |    |-- ts: string (nullable = true)
 |    |    |-- p2: struct (nullable = true)
 |    |    |    |-- attributes: struct (nullable = true)
 |    |    |    |    |-- appPackage: string (nullable = true)
 |    |    |    |    |-- appVersion: string (nullable = true)
 |    |    |    |    |-- battery: integer (nullable = true)
 |    |    |    |    |-- is_charging: integer (nullable = true)
 |    |    |    |    |-- is_mocked_location: boolean (nullable = true)
 |    |    |    |    |-- locationSource: string (nullable = true)
 |    |    |    |    |-- location_source: string (nullable = true)
 |    |    |    |    |-- os: string (nullable = true)
 |    |    |    |    |-- osVersion: string (nullable = true)
 |    |    |    |    |-- provider_id: string (nullable = true)
 |    |    |    |-- deviceId: string (nullable = true)
 |    |    |    |-- lat: double (nullable = true)
 |    |    |    |-- lng: double (nullable = true)
 |    |    |    |-- ls: string (nullable = true)
 |    |    |    |-- ts: string (nullable = true)
 |    |    |-- p3: struct (nullable = true)
 |    |    |    |-- attributes: struct (nullable = true)
 |    |    |    |    |-- appPackage: string (nullable = true)
 |    |    |    |    |-- appVersion: string (nullable = true)
 |    |    |    |    |-- battery: integer (nullable = true)
 |    |    |    |    |-- is_charging: integer (nullable = true)
 |    |    |    |    |-- is_mocked_location: boolean (nullable = true)
 |    |    |    |    |-- locationSource: string (nullable = true)
 |    |    |    |    |-- location_source: string (nullable = true)
 |    |    |    |    |-- os: string (nullable = true)
 |    |    |    |    |-- osVersion: string (nullable = true)
 |    |    |    |    |-- provider_id: string (nullable = true)
 |    |    |    |-- deviceId: string (nullable = true)
 |    |    |    |-- lat: double (nullable = true)
 |    |    |    |-- lng: double (nullable = true)
 |    |    |    |-- ls: string (nullable = true)
 |    |    |    |-- ts: string (nullable = true)
 |    |    |-- p4: struct (nullable = true)
 |    |    |    |-- attributes: struct (nullable = true)
 |    |    |    |    |-- appPackage: string (nullable = true)
 |    |    |    |    |-- appVersion: string (nullable = true)
 |    |    |    |    |-- battery: integer (nullable = true)
 |    |    |    |    |-- is_charging: integer (nullable = true)
 |    |    |    |    |-- is_mocked_location: boolean (nullable = true)
 |    |    |    |    |-- locationSource: string (nullable = true)
 |    |    |    |    |-- location_source: string (nullable = true)
 |    |    |    |    |-- os: string (nullable = true)
 |    |    |    |    |-- osVersion: string (nullable = true)
 |    |    |    |    |-- provider_id: string (nullable = true)
 |    |    |    |-- deviceId: string (nullable = true)
 |    |    |    |-- lat: double (nullable = true)
 |    |    |    |-- lng: double (nullable = true)
 |    |    |    |-- ls: string (nullable = true)
 |    |    |    |-- ts: string (nullable = true)
 |    |    |-- placeInfo: struct (nullable = true)
 |    |    |    |-- lat: double (nullable = true)
 |    |    |    |-- lng: double (nullable = true)
 |    |    |    |-- name: string (nullable = true)
 |    |    |    |-- geofenceType: long (nullable = true)
 |    |    |    |-- boundaryData: double (nullable = true)
 |    |    |    |-- placeType: long (nullable = true)
 |    |    |    |-- thresholds: struct (nullable = true)
 |    |    |    |    |-- boundaryTime: double (nullable = true)
 |    |    |    |    |-- boundaryDistance: double (nullable = true)
 |    |    |    |    |-- tatMargin: double (nullable = true)
 |    |    |-- tat: double (nullable = true)
 |    |    |-- trackingStatus: boolean (nullable = true)

 

 

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @Divyanshu , 

The error message "org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 12.0 failed 4 times, most recent failure: Lost task 2.3 in stage 12.0 (TID 53) (192.168.23.122 executor 0): org.apache.spark.SparkRuntimeException: Error while encoding: java.lang.ArithmeticException: long overflow" indicates a possible issue with data types of columns in Databricks table.


- Consider the following:


 - **Check column data types**: Ensure your column data types are compatible with Spark's supported data types. Issues may arise if a string type in MongoDB is written as a varchar in Databricks, as varchar is only usable in table schema and not in functions or operators.


 - **Check for data overflow**: The error message implies an arithmetic overflow, which happens when a value is too large for its data type. For example, casting a long type to an integer type can cause an overflow error if the long value exceeds the integer's capacity.


 - **Check data size**: The job may have been aborted due to a stage failure caused by large data size. If the total size of serialized task results exceeds spark.driver.maxResultSize, this error may occur. Consider increasing the value of spark.driver.maxResultSize according to the fetched data size.


- Always validate data types and sizes before writing them into a Databricks table.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.