Using AutoML to predict completion dates of a project management dataset

User100024
New Contributor II

Hello! I am fairly new to Databricks. I'm trying to do a proof of concept with AutoML in Databricks at my organization, and the dataset I am using is a project management dataset. Here's a sample:

 

project_idmarketgeneral_contractorproject_typepermit_datepermit_statusconstruction_dateconstruction_statuscompletion_datecompletion_status
project_1NYacme increhab2/1/2024complete3/1/2024projected4/1/2024projected
project_2LAxyz incbuild to suit1/1/2020complete2/2/2023complete3/4/2023complete

So based on this dataset, I want to be able to see how I can reduce completion_date period. For example, if I use acme inc in LA, will that reduce my completion date and if so, by how much? or for example if I reduce my permit_date by 2 days, how big of an impact will it have on completion_date? Of course I only have to rely on historical data so all the status fields must be set to "complete".

How do I go about doing this? Also, is there a way to output the result in a way for stakeholder to analyze, using a visual tool like tableau or powerbi?

Thanks!

 

Hello Kaniz, Thank you so much for your reply!! I am trying to follow your steps, but however I cannot seem to select Completion Date as my target. It is only showing SYS_ID's (which are numeric in nature):
DBricks1.jpgWhat should I add to "Time Column for Training/Validation/Testing Split"? Is that where completion_date goes?

One more question if you don't mind. On the right side, it lists all the columns that I have available with an "Impute with" function:

DBricks2.jpg

 Is there where I select what columns I need in my dataset? I was not sure what Impute With means here.

I appreciate all your help. Thank you so much 🙂 🙂