What is a drawback to performing data cleansing (imputation, transformations, etc.) on raw data prior to partitioning the data for honest assessment as opposed to performing the data cleansing after partitioning the data?
A company has branch offices in eight regions. Customers within each region are classified as either "High Value" or "Medium Value" and are coded using the variable name VALUE. In the last year, the total amount of purchases per customer is used as the response variable.
Suppose there is a significant interaction between REGION and VALUE. What can you conclude?
Given the following output from the LOGISTIC procedure:
Which variables, among those that are statistically significant at an alpha of 0.05, have the greatest and least relative importance on the fitted model?
Refer to the lift chart:
At a depth of 0.1, Lift = 3.14. What does this mean?
The total modeling data has been split into training, validation, and test data.
What is the best data to use for model assessment?
This question will ask you to provide a missing option. Given the following SAS program:
What option must be added to the program to obtain a data set containing Pearson statistics?
Which of the following describes a concordant pair of observations in the LOGISTIC procedure?
A researcher is planning a logistic regression to model the probability of disease occurrence. The researcher determines the rate of disease occurrence in the population is 1%.
For which of the following would this study be a candidate?
Drag the adjustment formulas for oversamping from the left and place them into the correct location in the confusion matrix shown on the right.
A researcher has several variables that could be possible predictors for the final model. There is interest in checking all 2-way interactions for possible entry to the model. The researcher has decided to use forward selection within PROC LOGISTIC. Fill in the missing code option that will ensure that all 2-way interactions will be considered for entry.