Answer the following questions:
- Which of the variables listed above (second paragraph) would you use to define success of sales employees in order to develop your model? (10 points)
- The choice of independent variables for your model has to be based on their power to predict the dependent variable AND their availability in job candidates’ resumes that you intend to screen. Which of the variables listed above (second paragraph) would you test as predictors of success in your model? (10 points)
- Which parameter of the model, i.e., sensitivity, specificity, precision, or accuracy, describes the ability of your model to identify candidates who are actually qualified? Which parameter of the model describes the ability of your model to identify candidates who are actually unqualified? Which parameter describes the ability of your model to correctly predict unknown candidates as being qualified? (5 points)
- What are the false positive rate and the false negative rate? Describe what false positive and false negative mean. (10 points)
- If your goal is to develop a model to screen resumes and identify candidates to be invited for an interview, which type of error is worse – false positive or false negative? Explain the rationale for your answer. (10 points)
- If you want to improve the performance of your model to identify candidates to be invited for an interview, which parameter (sensitivity, specificity, precision, accuracy) would you use to guide the selection of candidates? Explain your rationale based on what you are trying to achieve with your predictions. (10 points)
- If your goal is to develop a model to identify candidates who will receive a job offer, which type of error is worse – false positive or false negative? Explain the rationale for your answer. (10 points)
- If you want to improve the performance of your model to identify candidates to receive a job offer, which parameter would you use? Explain your rationale based on the what you are trying to achieve with your predictions. (10 points)
- Fast forward one year. The company deployed the model that you developed with the 3 years of data on current employees, and people were hired based on your predictions. The Head of HR has now come back to you with a concern that not all of the new hires were "good". Twelve of the 100 people hired were not qualified, and did not work out. What parameter in the confusion matrix would you use to understand if your model worked better than you expected, as well as you expected or worse than you expected? How well did the model work? Do you agree that accuracy not the best parameter to use? If so, why not? Explain the rationale for your answers. (15 points)
- Describe any limitations and/or concerns associated with your approach for this new business opportunity. (10 points)
Assessment
Point values for each question are stated above. Each question will be evaluated based on the following criteria:
- Your ability to derive insights from the data provided.
- The clarity of your logic, and the accuracy of your answers.