Loading and Saving Data Files

SPSS® Reference Manual: A guide for market researchers

Prepared by Paul Hartzer

Contents

Introduction

A good place to start learning SPSS syntax is with loading ASCII datafiles. This consists of five basic steps, as well as other details:

The SPSS Reference Guide lists commands in alphabetical order; in this section, I organize commands in the order that you'll likely to need them. For best results, open an existing SPSS syntax file (in, for instance, \\[server]\data\SPSS\[report]\Code\[wave]\), then walk through it using this guide.

Note that all SPSS commands start on a new line and end with a period. Also, because INCLUDE FILES only allow the first character of a command to start in the first column, it's good practice to indent all the lines of a command except the first one.

Also note that if you want to break a string, such as a file name, onto multiple lines, you have to use single quotes around each piece and end all but the last one with a plus sign. For instance:

DATA LIST FILE='C:\DATA\SPSS\'+
 'RAWDATA\Q310\vehdat.dat'
 /

Otherwise, SPSS doesn't particularly care about spacing, or about how many lines a command takes up: You can put it all on one line, or put each word on a new line.

SPSS also doesn't care if you use all capital letters, all lower case letters, or a mix (except in strings). It's typical, though, to write SPSS in all capital letters.

Historically, SPSS variable names have been limited to eight characters. This is no longer true, but it might be a good idea to bear in mind whether the SPSS file is likely to be shared before deciding to use longer names.

Getting data

NEW FILE command

NEW FILE.

While it's not always necessary, most SPSS syntax files for loading data start with NEW FILE. This clears out the current working file.

DATA command

DATA LIST FILE='[filename]' /
 [varname] [columns]
 [varname] [columns]
.

This command is used to load data from files where each row corresponds to a respondent. If you don't specify a format, SPSS will assume that it's a number and that the leftmost digit represents the "1"s column.

For example:

DATA LIST FILE='C:\DATA\SPSS\'+
 'RAWDATA\Q310\vehdat.dat' /
 RESPID   1-8
 INTWGHT  10-15 (F,4)
 TIMEINT  16
 MONTH    17-18 (A).

This loads data from the raw data file vehdat.dat into four variables.

FILE command

FILE TYPE GROUPED FILE='[filename]' 
RECORD=[columns] CASE=[variable] [columns] WILD=NOWARN.
RECORD TYPE [n]. 
DATA LIST / 
[variable] [columns].
RECORD TYPE [n].
DATA LIST /
[variable] [columns].
END FILE TYPE.

This syntax is used for loading data where each respondent's records are spread across multiple rows. This type of data is becoming less and less common. In the "old days," data files were limited to 80 columns wide, and so respondent data had to be put on multiple lines. These days, SPSS can read data files much, much wider than that (so wide that it's rarely an issue), so data files are generally delivered in flat format.

Note that this is actually not a single command. Instead, it starts with the FILE TYPE command, which specifies the name of the file, where to find the row number or "record type," and which variable to use as the respondent ID.

Following the FILE TYPE command, there are a series of pairs which describe each row of the data you'd like to load in. RECORD TYPE [n] specifies the number in the RECORD columns you specified in the FILE TYPE command, and the DATA LIST command works the same as in the previous section, except that you don't specify a file name.

Finally, END FILE TYPE tells SPSS you're done describing the file.

For example:

FILE TYPE GROUPED FILE='C:\Data\SPSS\'+
 'OldStudy\1999\DATA\Study-1999.DAT' 
 RECORD=7-8 CASE=PANELID 1-6 WILD=NOWARN.

RECORD TYPE 1. 
DATA LIST / 
 PANELID 1-6
 C1MODEL 9-12
 C1MDLYR 13-14.

RECORD TYPE 2.
DATA LIST / 
 T1MODEL 9-12
 T1MDLYR 13-14.

END FILE TYPE.

This loads the file shown. Rows with 1 or 01 in columns 7-8 will be loaded according to the first data list command; rows with 2 or 02 in columns 7-8 will be loaded according to the second data list command. All rows with the same number in 1-6 will be grouped into a single record in SPSS.

Labeling variables and values

VARIABLE LABELS command

VARIABLE LABELS
 [varname] '[description]'
 [varname] '[description]'
.

Use this to give longer, more descriptive labels to variable names. For example:

VARIABLE LABELS
 PANELID 'RESPONDENT ID'
 C1MODEL 'MODEL - 1ST CAR'.

You can also use double quotation marks around the label. Note that, whichever mark you use, the label can't contain that mark. For instance, this would cause an error:

VARIABLE LABELS
 PANELID 'RESPONDENT'S ID'.

SPSS wouldn't know what to do with S ID' because it would think the label for PANELID was just "RESPONDENT".

VALUE LABELS command

VALUE LABELS
 [varname] [varname]
 [value1] '[description]'
 [value2] '[description]'
/[varname]
 [value1] '[description]'
 [value2] '[description]'
.

Use this command to label each value for the variables. If the values of several variables have the same format, you can list all the variables. Separate each set of values with /. For example:

VALUE LABELS
C1PRCODE C2PRCODE C3PRCODE C4PRCODE
 1 'BUY NEW'
 2 'BUY USED'
 3 'LEASE NEW'
 4 'LEASE USED'
/C1BSTYLE TO C4BSTYLE
 1 '3DR HATCH'
 2 '5DR HATCH'
 3 '2DR TRUNK'
.

If C1PRCODE = 1, then its value label will be "BUY NEW"; this is also true for C2PRCODE, and so on.

If the variables you want to label are consecutive, you can use TO rather than listing them all: In the example, all the variables from C1BSTYLE to C4BSTYLE will have the same value labels. This function can be efficient, but it can also be dangerous, if you're likely to change the order of variables or to insert variables later in between existing ones.

Cleaning the Data

MISSING VALUES command

MISSING VALUES [varname] [varname] ([values]).

Use this command to tell SPSS to disregard certain values. For instance, it might be that 99 for INCOME is a code for "no answer," and you don't want that included in frequency counts. In this example, use this command:

MISSING VALUES INCOME (99).

To get rid of all missing values for a variable, leave the parenthesis empty.

FORMAT command

FORMAT [varname] [varname] ([format]).

Use this command to change how SPSS displays a numeric variable. Note that this does not change the value of the variable, only what's displayed. For instance, if the values are all two-digit numbers, use this:

FORMAT INCOME AGE (F2.0).

When displaying INCOME and AGE, SPSS will show a total of 2 characters, none to the right of the decimal point ("99" will be shown as "99"). Note that the value to the left is the total number of digits to be shown. For instance, to display 99 as 99.0, use:

FORMAT INCOME AGE (F3.1).

RECODE command

RECODE [varname] ([values] = [value])([values] = [value])
 (ELSE = [value]) INTO [varname].

Use this command to modify scales or to group responses into an aggregate variable. If INTO [varname] is left off, then the new values replace the old ones in the existing variable. You can recode multiple variables at once; if you include INTO at this point, make sure you have the same number of variables on each side.

Example:

RECODE ATT01 TO ATT08 (3=1)(1=3)(ELSE=COPY).

This example reverses the three-point scale in all the variables from ATT01 to ATT08. 2 stays 2, as do any other values.

RECODE AGE (1 THRU 21=1)(22 THRU 45=2)(45 THRU  99=3)
 INTO AGEGROUP.

This example groups ages into three groups as AGEGROUP.

COMPUTE command

COMPUTE [varname] = [equation].

This command allows you to calculate variables, usually based on the values of other variables. This is sometimes more efficient than RECODE. For example:

COMPUTE YEAR4 = 1900 + YEAR2.

This is also used to create a new variable with a set value. For instance:

COMPUTE WAVE = 'Q107'.

IF command

IF ([condition]) [varname] = [equation].

Use this command if you have a single condition you want to test, and a single variable to change. For instance, let's say you want to add 100 to the brand code if the vehicle code is over 8000. This would be the code:

IF (VEHCODE > 8000) BRAND = BRAND + 100.

DO IF command set

DO IF ([condition1]).
[commands1].
ELSE IF ([condition2]).
[commands2].
ELSE.
[commands3].
END IF.

Use this command set if you have multiple conditions to test, or if you want to do more than just change a variable. For example:

DO IF (YEAR2 < 9).
COMPUTE YEAR4 = 2000 + YEAR2.
ELSE.
COMPUTE YEAR4 = 1900 + YEAR2.
END IF.

Note that each of these lines is a command. DO requires END; you can have as many ELSE IF statements in the middle, and the ELSE is optional. Also note that IF does not use COMPUTE (it's assumed), while DO IF requires COMPUTE (you could run most commands, not just COMPUTE, in a DO IF command set).

Saving the Data (and related commands)

EXECUTE command

EXECUTE.

Some commands (VALUE LABELS) take effect immediately. Others, like DATA LIST, don't. EXECUTE causes all pending commands to be executed. This can be run at any time (the more often, the better), but it's a good idea to remember to include it before the SAVE command.

WEIGHT command

WEIGHT BY [varname].

This command turns the weight on. While it's useful at any time, it's especially important to set the weight on or off before you save, because that state will be saved. Most users expect the weight to be set to the appropriate variable, and often won't check before running frequencies. Therefore, this command is included here as a reminder to turn you weights on! For example:

WEIGHT BY INTWGHT.

To turn the weight off, use:

WEIGHT OFF.

SAVE command

SAVE OUTFILE='[filename]'
 /DROP=[varname] [varname]
 /KEEP=[varname] [varname]
.

Use this command to save the file in SPSS's data format. /DROP deletes the listed variables; /KEEP only saves the listed variables: Both are optional, but using both at the same time is unusual. It's common most not to use either:

SAVE OUTFILE='C:\DATA\SPSS\REPORT\'+
 'DATA\Q310\FinalData.sav'.

Other Commands and Keywords

INCLUDE FILE

INCLUDE FILE='[filename]'.

This command inserts the text of the specified filename at the location of the command, as if you'd copy-pasted the contents of the file into your SPSS syntax. This is useful if you have code that's repeated for different files, such as long lists of value labels (like vehicle codes!). Unfortunately, SPSS insists that each line of a command other than the first one has to be indented. Also, if an INCLUDE file experiences a problem, SPSS will stop processing and roll the data back to the last EXECUTE statement. This is another argument in favor of using EXECUTE statements often.

In general, until you're more comfortable with other aspects of SPSS syntax, INCLUDE files are best avoided for your own code, but I include the information here because many existing SPSS syntax files have INCLUDE statements.

SYSMIS

SYSMIS stands for SPSS's special value for null, no data. It occurs sometimes in existing SPSS syntax files.

FILTER, FREQUENCIES, and CROSSTAB commands

These commands are related to auditing the data; they're not related to the data load and save, but often appear in the syntax files.

WRITE command

The WRITE command creates an ASCII file from an SPSS file.

Return to Contents - Back to Interfacing with Excel - Continue to Excel String Manipulation