How do I use test and train datasets in SPSS?

Answer

In ALL cases below, replace   _______  with the name of your binary dependent variable. No other changes in code should be necessary.

 

1. Create Testing and Training Groups

a. Create a random sample indicator.

This will set a seed for the random number generator and create a new variable with random numbers between 0 and 1.

SET SEED=123456. 
COMPUTE random_sample = RV.UNIFORM(0,1).
EXECUTE.

 

b. Select 30% of the cases.

This will create a variable which contains the original value of you dependent variable, then replace that value with a system missing value (.) if the random number generated in the previous step was less than 0.3. This should be about 30% of the cases. If you want to use a different percentage of cases, change that value. The group of observations with the missing value become your "test" set, which is indicated by the test variable.

COMPUTE train_dv = __________ .
IF (random_sample <= 0.3) train_dv = $SYSMIS.
COMPUTE test = SYSMIS(train_dv).
EXECUTE.

 

2. Prepare the logistic regression model

You will be re-doing your logistic regression with train_dv as your dependent variable instead. Copy the original code and make the following changes:

  1. Change the dependent variable to train_dv in the first line.
  2. Save Model Predictions by adding the line starting with /SAVE. If you already /SAVE, add PRED and PGROUP as shown.
LOGISTIC REGRESSION VARIABLES train_dv 
/SAVE=PRED PGROUP COOK ZRESID 
....
Using the menu:
  1. In Analyze > Regession > Binary Logistic set up the regression you are evaluating with the following additions:
  2. Put train_dv in the Dependent variable box.
  3. Click Save and check Probabilities and Group membership in the Predicted Values group.

The /SAVE option will create some new variables, including:

  • PGR_1 : Predicted Group
  • PRE_1 : Predicted Probability

 

Testing Model Fit

Select the Testing Data

This will make SPSS use only those cases with a 1 in the test variable.

FILTER BY test.
EXECUTE.
Using the menu:
  1. Go to Data > Select Cases.
  2. Select Use filter variable.
  3. Put the test variable into the box.
  4. Keep Filter out unselected cases selected.

 

Classification Accuracy
  • Compare these predictions with the actual outcomes in the test set.
  • Calculate the percentage of correctly classified cases.
CROSSTABS
  /TABLES=__________ BY PGR_1
  /FORMAT=AVALUE TABLES
  /CELLS=COUNT ROW  
  /COUNT ROUND CELL.

Using the menu:

  1. Go to Analyze > Descriptive Statistics > Crosstabs.
  2. Put the actual outcomes (original DV) in the Row and the predicted outcome/group (PGR_1) in the Column.

AUC (Area Under the Curve)

  • See the AUC calculation in the output.
ROC PRE_1 BY __________(1)
  /PLOT=CURVE(REFERENCE)
  /PRINT=SE
  /CRITERIA=CUTOFF(INCLUDE) TESTPOS(LARGE) DISTRIBUTION(FREE) CI(95).
Using the menu:
  1. Go to Analyze > Classify > ROC Curve.
  2. Put the predicted probabilities (PRE_1) into the Test Variable box.
  3. Put the actual outcomes (original DV) into the State Variable box.
  4. Type 1 in the Value of State Variable box.
  5. In the Display group, check With diagonal eference line and Standad eor and confidence interval

 

Hosmer–Lemeshow Test
  • See the test result in the output
LOGISTIC REGRESSION VARIABLES _______
  /METHOD=ENTER PRE_1
  /PRINT=GOODFIT.
Using the menu:
  1. Go to Analyze > Regression > Binary Logistic.
  2. Select the actual outcomes (original DV) as the dependent variable.
  3. Select the predicted probabilities (PRE_1) as the independent variable.
  4. Click Options and select Hosmer–Lemeshow goodness-of-fit.

 

Return to Normal

Turn off the filter:

USE ALL.

If desired, remove the extra variables:

DELETE VARIABLES test train_dv random_sample PRE_1 PGR_1. 

 

  • Last Updated Apr 29, 2025
  • Views 7
  • Answered By Debby Kermer

FAQ Actions

Was this helpful? 0 0