The Little SAS Book Chapters 1 & 2 (seminar by Anna Johansson, 25 March 2003)

The Little SAS Book, 2nd ed.,
by Lora Delwiche & Susan Slaughter,
SAS Publishing

Chpt 1: Getting started using the SAS System

1.1 The SAS language

SAS programs consist of data steps, proc steps and comments.

Example of a SAS program:

* 2003-03-10 / Anna Johansson, MEP;
* uses: annaj.diet_raw; 
* (originally diet.sd7 from BiostatIII_course
Statistical models in epidemiology, 
Clayton&Hills, 1993, p 274);
* creates:; 
* Examples for SAS seminar 2003-03-25;
libname annaj 'h:\sas\sas seminars';
* read in raw data, do not want to use raw data, 
  if I mess up the data;
  set annaj.diet_raw;
* checking outcome: chd, Coronary Heart Disease;
proc freq;
  tables chd;


1.2 SAS data sets

SAS data sets consist of observations (rows) and variables (columns).

Variables are: NUM or CHAR

x=42 NUM
x='42' CHAR
x='042' CHAR, eg occupational code, SES code
x=042 NUM >> x=42

x='dead' CHAR
x=dead ERROR, dead will be interpreted as variable DEAD

Missing values are represented

x=. NUM
x=' ' CHAR

A data set is made up of two parts or portions, the DATA PORTION which is the data itself, and the DESCRIPTOR PORTION which is meta data or descriptive information about the data, such as a variable list, number of observations, date of creation. You can view the descriptor portion by using PROC CONTENTS. See also chpt 2.8.

proc contents data=annaj.diet_raw;

1.3 The two parts of a SAS program

SAS programs are made up of data steps and proc (procedure) steps.

Data steps read and modify data, and create a new data set.

data ...;

Proc steps use a data set, can produce output/result.

proc ...;

Data step are used for actions on rows (eg. create a new variable from another variable).
Proc step are used for actions on columns (eg. calculate a mean of a variable)

Good rule1: use as few data steps as possible (in most cases only one step is needed!)

         set annaj.diet_raw;
 *here I create all my variables for the analyses;
 bmi = weight/height**2;

Good rule2: keep the main data set code in a separate program, do analyses in other programs, and name them properly and understandably!, use dates, use comments (a good program is a green program!)

1.8 Reading the SAS log

When a program is executed a log is generated in the log window. ALWAYS read log! It contains useful information.

There are three types of log messages, coloured blue, green and red.

NOTE: blue, general (good) information, useful, number of obs.

WARNING: green, not an error but SAS informs you that you may have a problem, although it does not stop processing, still creates a data set

ERROR: red, an error in the code, SAS cannot process the data step, it stops! If you are running the data step to replace an older version of a data set, it has NOT been replaced!

1.10 Using SAS System options

You can change the SAS environment by using system options.

Change font for output: Choose from the menu File > Print setup > Font
Center|Nocenter output: Choose from the menu Tools > Options > System > Log & Procedure Output Control > Procedure Output > Center=1

An easy way to work with SAS is to use the function keys (F1-F12), instead of using the mouse and clicking. You can define the keys any way you like, below is a suggestion.

To change keys settings: type "keys" in the command line
F3 clear log; clear output; wpgm
F4 recall
F5 wpgm
F6 log
F7 output
F8 submit
F12 clear

Chpt 2: Getting your Data into the SAS System

Not a big problem for MEP users, we usually already have SAS data sets
(.sd7, .sas7bdat, .sd2). Then you only use the SET statement.

If you do not have a SAS data set, ask a SAS programmer, or use Import Wizard.

Other data formats, we can use DBMS/Copy to convert data files, on computer in biostat library, do not spend hours trying to convert a file.

2.9 Temporary vs. permanent data sets

Temporary data sets disappear when you exit SAS.
Permanent data sets are stored on disk, so you can use them again. You need to specify the path to the data set in the SAS code.

/*temporary data set*/
data diet_temp;
  set diet_raw;
/* permanent data set*/
data 'h:\sas\seminar\diet_perm'; 
  set diet_raw;
proc contents data='h:\sas\seminar\diet_perm';          

2.10 Using LIBNAME statements with permanent data sets

To avoid the extra work to write paths in code for a permanent data set, there is a shortcut called LIBNAME or more correctly LIBREF.

The libname is a little label that you define as the path, and then you write the label in the code instead of the path.

/* permanent data set */
data 'h:\sas\sas seminars\diet'; 
  set 'h:\sas\sas seminars\diet_raw';
* create libname, i.e. path;
libname annaj 'h:\sas\sas seminars';
data; /* permanent data set */
  set annaj.diet_raw;

LIBREFS/LIBNAMES can be used in both data steps and proc steps

proc print data=annaj.diet_raw;

But, even a temporary data set must be stored physically on the disk.

WORK library : 'c:\documents and settings\annaj\
SAS temporary files\_TD840\diet_temp'

Libname for the temporary library is WORK.

  set annaj.diet_raw;

You do not need to specify the WORK library.

data diet;
  set annaj.diet_raw;

The WORK library is emptied automatically when you end the SAS session, thus no temporary data sets are stored.

Different versions of SAS use what is known as engines. The engine is specific for each version and can cause problems when you want to use data sets created in different versions.

Relationship between file extensions and versions:
.sd2 (v6)
.sd7 (v8)
.sas7bdat (v8)

The libnames are engine-specific, i.e. a libname can only be used for one type of file extensions. You specify the engine in the libname statement. If no engine is specified SAS chooses the one that is most common among the data set files in the directory.

libname annaj6 v612 'h:\sas\sas seminars\';
libname annaj v8 'h:\sas\sas seminars\';
* v6 >> v6;
  set annaj6.diet_raw;
* v8 >> v8 ;
 set annaj.diet_raw;
* if I want to change versions of data set;
* v6 >> v8;
data annaj.diet_raw;
  set annaj6.diet_raw;