cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Improve Query Performance

Krubug
New Contributor

Hello

I have a query in one of my notebooks that took around 3.5 hours on D12_V2 cluster and workers between 5 to 25 .

is there a way to write the query in diffrenet way in order to improve performance and cost :

 

select /*+ BROADCAST(b) */ MD5(CONCAT(NVL(client_id, ''), '|', NVL(event_name, ''), '|', NVL(CAST(event_date as STRING),''), '|', NVL(CAST(client_id_type as STRING),''), '|', NVL(CAST(policy_number as STRING), ''))) as event_gk,client_id,client_id_type,event_date,event_name,event_group,event_domain,event_sub_domain,source_system,policy_number,agent_id,event_details,event_status,event_performed_by,event_addressed_to,event_application_date,fund_number,loan_number,cancel_reason,amount_bruto,amount_neto,premia_bruto,anaf,kod_pnia,premia_neto,premia_discount,premia_paid,premia_late,premia_preserved,vif_policy,policy_owner_id,policy_owner_id_type,policy_payer_id,policy_payer_id_type,application_code,application_name,action_code,action_name,page_title,page_info,page_details,page_url,page_variables,page_arrived_from_url,page_action,user_agent,user_ip,agent_user_name,log_id,session_id,instance_guid,flow_guid,addressing_system_code,addressing_system_name,call_center_name,call_center_type,case_number,case_topic,case_sub_topic,case_source,case_status,case_next_step,case_connect_type,media,case_number_of_actions,case_is_recurrent,case_is_justified,case_preservation_details,treatment_number,treatment_type,treatment_description,treatment_status,treatment_transfer,next_treatment_number,next_treatment_days_to,client_type,client_level,client_asiron,call_center_team_id,call_center_emp_id,call_center_emp_id_next_treatment,campaign_code,campaign_name,campaign_objective,channel_name,call_center_rtdm_action,call_center_rtdm_remark,verint_categories,cti_interaction_id,cti_session_id,cti_call_date,cti_interaction_count,cti_direction,cti_ivr_time_sec,cti_wait_time_sec,cti_talk_time_sec,cti_held_time_sec,cti_conference_time_sec,cti_external_time_sec,rtdm_request_comment,rtdm_response_comment,rtdm_result,preservation_tool,treatment_channel,rtdm_process_id,is_last_record_on_process,rtdm_call_center_name,rtdm_call_center_details,document_id,document_type_id,document_type_name,document_create_date,document_receive_date,document_source,request_number,process_name,track_name,department_name,message_id,message_type_id,message_type_name,message_content,claim_number,claim_type,claim_coverage_group,claim_classificaton,claim_specialty,claim_diagnosis,claim_hearing_decision,claim_legal_status,claim_decision_line_number,claim_payment_line_number,claim_line_type,claim_line_coverage_group,claim_line_coverage_name,claim_line_opertion,claim_line_covered_event,claim_line_supplier_type,claim_line_supplier_id,claim_line_decision_type,claim_line_decision_comment,claim_line_required_amount,claim_line_decision_amount,claim_line_waiver_amount,claim_line_expense_amount,claim_line_paid_amount,claim_line_required_percentage,claim_line_decided_percentage,claim_line_canceled,claim_line_storno,claim_sub_decision_line_number,hdl_create_date,hdl_update_date,source_script

from ( select a.* FROM raw_df a LEFT JOIN prc.clients_to_delete b ON a.client_id = b.client_id AND a.client_id_type = b.client_id_type WHERE b.client_id IS NULL )final

 

thank in advanced

Amir

0 REPLIES 0