SAS Video 2.1

Name: SAS Video 2.1
Uploaded: 2019-07-09T15:37:06.883Z
Duration: 41 min 39 s
Description: Review, Arithmetic Operations, Functions, Date Variables

July 09, 2019

Review, Arithmetic Operations, Functions, Date Variables

Information

ID: 3396
To Cite: DCA Citation Guide

Credits

Maria Ciargleglio

Download Transcript

00:01Welcome to video 2.1.
00:03The first of our lab 2 videos in our SAS lab video series.
00:07My name is Maria Ciarleglio, and I'm from the Department of
00:10Biostatistics at the Yale School of Public Health.
00:14In this lab we'll cover how to work with variables in SAS.
00:17Specifically, how to create new variables from existing variables
00:21in a dataset.
00:26The topics covered in this lab will be presented over three videos.
00:30Video 2.1 will begin by reviewing proc import to read
00:34a data file into SAS.
00:36We learned how to use this technique in video 1.2.
00:40Then we will begin looking at a number of different ways of creating
00:43variables.
00:45We will cover arithmetic operations, such as addition, subtraction,
00:49multiplication, division, and exponentiation.
00:53Next, we'll look at some commonly used mathematical functions that
00:56can transform a variable.
00:58For example, the exponential function, square root, and logarithmic
01:01functions.
01:03We'll also discuss some arithmetic and statistical functions that
01:07allow us to sum and find the mean standard deviation,
01:10minimum or maximum value of several variables for each
01:14subject.
01:15At the end of video 2.1.
01:17We will discuss how to work with date variables.
01:20We will add a date variable to a SAS data set, calculate the length
01:24of time between two date variables, and discuss different ways
01:27of displaying or formatting date variables.
01:31In video 2.2, we'll continue to learn about creating variables
01:35using comparison operators such as less than,
01:39greater than, or equal to.
01:41These are commonly used in if-then statements.
01:44For example, if a certain condition is true, then a
01:47new variable will take on a certain value.
01:51And lastly, we'll discuss logical operators such as
01:54and, or, and not.
01:57Video 2.3 will show you how to save this new analysis
02:01dataset that you've created containing all of your new variables.
02:06We will again look at data from the Framingham Heart Study.
02:10The data file that we will use in this video is an Excel file called
02:14Framingham subset 2 which is available in the lab 2
02:17Resources folder on the course web page.
02:20This dataset is similar to Framingham subset that we used
02:24in videos 1.2 and 1.3, but some
02:27of the variables have been modified and I've also added a few new
02:31variables.
02:33There are 30 variables in the data set shown in columns and
02:384434 individual observations shown as rows.
02:42Notice that the number of rows in our Excel spreadsheet is
02:474435. This is because the first row of the spreadsheet contains
02:50the variable names, and the individual observations begin
02:54on row 2. This version of Framingham subset includes
02:58a variable I created describing participant race.
03:01And for those who died, I have generated a date of death
03:05in order to give us an opportunity to work with missing data, the 10th
03:08subject shown in row 11 has been modified to give
03:12him some missing data values.
03:14This will allow you to see how missing values in the data affect the
03:17creation of new variables.
03:19Additionally instead of only having information on baseline total
03:23cholesterol, we now have total cholesterol information collected
03:27during three examination periods.
03:30In this study, the three examination periods were approximately six
03:33years apart.
03:35Touching again on missing data.
03:36Notice that when an individual does not have cholesterol level
03:40measured at one of the follow up examinations, the cell is
03:43left blank. When SAS imports these data, SAS
03:47knows that a blank cell indicates a missing value.
03:50So if you're collecting data in an Excel spreadsheet simply leave
03:54the cell blank if a subject has a missing value.
03:57We will use the import procedure to read this Excel file
04:01into SAS which will create a SAS dataset.
04:04The SAS dataset will be given the name Framingham in SAS.
04:08The imported data set will have the same number of variables and
04:11observations as the original Excel spreadsheet.
04:15The first step to working with data and adding new variables to
04:19a data set is to get your data into SAS.
04:22We will use the import procedure to reading our external data file.
04:26This method was discussed in video 1.2.
04:30Here is what typical SAS code for this procedure looks
04:33like. Remember our trick of using the import wizard
04:37and saving the code generated by the wizard for use later.
04:42The generated code is what I have pasted here.
04:45The first line begins with proc import which is the SAS
04:49procedure name. Out = work.Framingham
04:53is where we can specify the library where the imported SAS
04:56data set will be saved, and we can also name the data set.
05:00Here we're saving this data set to the temporary work library
05:04that we discussed in video 1.3.
05:06Remember that the contents of the work library will disappear once
05:10we exit SAS. This means that if we exit our current SAS
05:14session, any data sets that we created in the work library
05:17will need to be re-imported, and any new variables that we created
05:21will need to be recreated once we launch SAS if we want
05:25to continue analysing these data.
05:28We can also create a permanent library using the lib name
05:31statement to save our imported data as a permanent SAS
05:35data set. In video 2.3, I will give you a few
05:39options for saving your modified data set containing all of
05:42your new variables, so that you can directly use that analysis data
05:46set later without having to repeat your data management
05:49steps.
05:51Note that if we had omitted the library name here and just
05:54written out equals Framingham, the Framingham dataset would
05:58have by default been saved to the work library anyway.
06:03The name we are giving the data set is the name Framingham.
06:06So when we refer to this data set in our code we can simply
06:10call it Framingham and SAS will know to look in the
06:13work library if no other library is specified.
06:18On the second line we see the word data file followed by the equal
06:21sign and the physical location of the file.
06:24In this case, the excel file that we're going to import into SAS.
06:28Notice that we put the path of our excel file in double quotes.
06:31You can also use single quotes but just be sure to be consistent
06:35on both ends.
06:37On the 3rd line of code, we have the acronym DBMS
06:40for database management systems.
06:42We need this line of code to tell SAS what type of file we are
06:46importing. Here we're importing an Excel file in this example.
06:50The word replace is necessary to overwrite
06:54any other existing files in SAS by the same name.
06:58Here is where we also put our first semicolon.
07:01The fourth line of code tells SAS which Excel worksheet to
07:05read in. Here we're specifying the name that appears
07:08on the tab of the specific worksheet of interest.
07:12In our excel file Framingham subset 2, our worksheet
07:16is named Framingham. So that is what we specify in the range
07:19statement. We also end this line with a semicolon.
07:23On the fifth line, get names equals yes tell SAS
07:27that there is a header row in the Excel file which contains the
07:30variable names. Line 6 through 9 specify how
07:34text, date, and time variables are imported.
07:38Finally we end the import procedure with the word run,
07:41followed by a semicolon.
07:44Now let's open SAS and run this procedure to
07:47import Framingham subset 2.
07:50I'm going to open SAS by directly double clicking on the existing
07:53program file for this lab called SAS code Video
07:572. This program file is available in the lab 2 Resources
08:01folder on the course web page so that you can follow along.
08:06In the program editor, I like to begin my code with some comments.
08:10I've also pasted the proc import code generated by the
08:14import wizard and inserted a comment about the procedure
08:18to document what I'm doing and to help me remember my thought process
08:21if I return to this code later.
08:24Most of the comments in this file are to help you better
08:28understand and locate the different pieces of code.
08:31But the use of comments is completely up to you.
08:34You can make them as detailed as you want or omit them altogether.
08:38You will likely have to modify the data file statement to point
08:41to the location where you have Framingham subset 2
08:45saved on your computer.
08:47I have the Framingham subset 2 excel files saved
08:50in the SAS video 2 folder on my USP flash drive
08:54which is mapped as drive G on my computer.
08:58I'm going to highlight and submit or run the import procedure.
09:02Remember you can run your highlighted code by either clicking on the
09:05running man button in the menu bar or by using the keyboard
09:09shortcuts F3 or F8.
09:12Let's check our log window.
09:13The lines in black are the lines of code we highlighted and submitted
09:17and the lines in blue tell us that our SAS data set worked up
09:21Framingham was successfully created.
09:23The log also reports the number of observations and the number
09:27of variables in the imported dataset.
09:29This should match the number in the original imported excel file.
09:34Next let's check the contents of the Framingham dataset.
09:37We will use the contents procedure to give us a list of the variables
09:41in our dataset.
09:43We'll also be able to see if the variable was imported as
09:46a numeric or character variable.
09:49I would like the variables to be listed in the order they are included
09:52in the file rather than in alphabetical order, so I specified
09:56the order equals var num option in the first line of
10:00PROC CONTENTS. I find this option useful because there tends to
10:03be a reason for the ordering of variables in a dataset.
10:07Although if I'm looking to see if the dataset contains a specific
10:10variable, then looking at the variable list in alphabetical order
10:14would be more helpful. Again highlight and run the code.
10:19Checking the log, I don't see any errors.
10:22But I do see a new window.
10:24The results viewer window.
10:26This is the window that contains our output.
10:29Here we see the data file Framingham is saved in the work
10:32library because under dataset name we see work.Framingham.
10:37This dataset has 30 variables and
10:414434 observations.
10:43I also see the intake and date of death variables were
10:46imported correctly as a date, and the sex and race
10:50variables were imported correctly as character variables.
10:54It's always a good idea to check that your data are being imported
10:58correctly.
11:00We will take the Framingham data set that we just imported and
11:03create several new variables.
11:05The new variables will appear as new columns at the end of
11:09the data set. This process creates a new SAS
11:12data file that contains both the original variables and
11:16the new variables.
11:18I call this new data set my analysis data set because
11:21oftentimes we are performing this data management and creating
11:25these new variables for the purpose of conducting data analysis
11:29later on. The data set that then contains the variables we
11:32need for a future analysis is my analysis data set.
11:36We have two options for saving this new data set.
11:39We can overwrite the existing Framingham data set by giving
11:43the new data set. the name Framingham, the same as the original.
11:47Alternatively, we can create a completely new dataset by
11:51assigning it a different name.
11:52In our example we will name our new dataset Framingham
11:56new. This approach allows us to keep the Framingham
12:00SAS data file in its original form.
12:03Our new dataset is a copy of Framingham plus any newly
12:07created or modified variables.
12:09Be careful when using option one because you will overwrite
12:13the version of Framingham that you originally imported.
12:16I don't mean that you will overwrite the excel file; you will
12:19overwrite the SAS data file that was created as a result of
12:22the import procedure.
12:24Once you overwrite a data set you cannot simply go back to the
12:28previous version. For example, if we overwrite Framingham
12:31and add 10 new variables, the dataset will now contain 40 variables.
12:36This means that we've lost the original version of the dataset with
12:3930 variables. To get the original back, we would have to rerun
12:43the import procedure.
12:44My preference is to always leave the original data file as
12:48is and create a second analysis dataset that contains any
12:51modifications changes or new variables.
12:55This way if I make a mistake in creating a new variable or if I
12:58accidentally overwrite an existing variable in the original dataset
13:02I can correct my code and use the original dataset to try
13:06again.
13:07We will create new variables using a data step.
13:11We used a data step in video 1.2 to read data
13:14into SAS using manual data entry.
13:17On line one, we begin the DATA step with the word data
13:21followed by the name of the data set that we want to create in this
13:24data step. Because I prefer option 2, I'm
13:28giving the modified data set a new name, Framingham new.
13:32Because I'm not specifying a library, this new data set will
13:36be saved in the work library.
13:38You could explicitly specify the work library as the location
13:42of Framingham new as we see here but it's not necessary.
13:46When you don't specify a library., SAS will default to
13:50the work library. We end the data statement with a
13:53semicolon.
13:55On the second line I begin the SET statement.
13:58This is where I identify the dataset that will be read
14:01into the data step.
14:03We want to read the original Framingham data set into
14:07the DATA step, create a copy called Framingham new,
14:10and make any of the modifications to the dataset specified
14:14within the body of the DATA step.
14:16This is where we will define new variables, usually by performing
14:20operations on existing variables.
14:23Again, we are reading the Framingham data set in from the work
14:27library. You could have also put the SET statement on the same
14:30line as the data statement.
14:33In SAS, it's not the line breaks that indicate the end
14:36of each statement, it's the semicolons.
14:39End the DATA step with the word run followed by a semicolon.
14:43Now we can discuss SAS operators and functions that are
14:47used in SAS programming statements to create new variables.
14:51We will begin by discussing arithmetic operators.
14:55The first arithmetic operation we will discuss is addition.
14:59We will use addition to add two or more variables together
15:03to make a new variable, or to add a constant
15:06to one or more variables.
15:08We type the plus symbol between the terms we are summing.
15:12We have five binary variables in our Framingham dataset
15:16that report the presence of prevalent disease at the time of entry
15:19into the study: prevalent coronary heart disease, angina
15:23pectoris, myocardial infarction stroke, and prevalent
15:27hypertension.
15:28The variable equals one, if the individual has the prevalent disease,
15:32and zero if the individual is free from the disease.
15:36We see the count and percentage of individuals with each prevalent
15:39condition reported in these frequency tables with prevalent
15:43hypertension being the most common prevalent condition in these
15:46individuals. We want to create a new variable in the dataset
15:50that counts the number of prevalent conditions that a participant
15:54has. Because the variables indicating presence or
15:57absence of disease are binary, if we sum these indicator
16:01variables, we will be counting the number of prevalent conditions
16:05for each individual.
16:06Here we are naming this new account variable prev cond,
16:10and this variable will be in Framingham new.
16:13Also notice that when it comes to variable names, SAS is
16:17not case sensitive.
16:18Although the prevalent condition variables are all capitalized
16:22in the dataset Framingham, I do not need to use all caps
16:25when referencing the variables.
16:28Just so it's clear I want you to see that these operations will be
16:31applied to each subject.
16:33The subjects are represented in the rows of the data set.
16:36Seven subjects are shown here.
16:38The last subject happens to be the 10th subject in the original
16:42dataset. This subject has several missing data values so that
16:45you can see how variable creation behaves in the presence of
16:49missing input values.
16:51When you follow along in print some of the new variables we will
16:54create using the code in the SAS program file, the obs
16:58equals ten option in PROC PRINT will print the first
17:0210 subjects in the dataset.
17:04The new variable we're creating sums the values of
17:08prevalent CHD, AP, MY, stroke, and hypertension
17:12for each subject, essentially counting the number of prevalent
17:16conditions each subject has.
17:18Prev Cond equals zero for the first subject shown because
17:22he has no prevalent conditions. One for the second subject,
17:26three for the third, two for the fourth, zero
17:29for the fifth, and one for the six.
17:32When it comes to subject seven we see that his value of
17:36Prev Cond is missing.
17:38This is because one of the variables involved in calculating
17:41the sum is missing.
17:43Let's move on to subtraction.
17:45Here we create a new variable called pulse pressure in Framingham
17:49new which is the difference between systolic and diastolic blood
17:53pressure. The next operation is multiplication which
17:57uses the asterisk symbol.
17:59We are creating a variable age months which is simply
18:02age expressed in months and not in years.
18:06We do this by multiplying the variable age in the Framingham
18:09dataset which is aging years by the number 12.
18:13The constant twelve will be multiplied by each individual's
18:17age to give us our new variable.
18:19Again age in months is missing when age itself is missing.
18:24Looking next at Division we can calculate mean arterial pressure
18:28the average pressure in a patient's arteries during one cardiac
18:31cycle. By adding systolic blood pressure to twice diastolic
18:35blood pressure and dividing that quantity by three.
18:39This example illustrates how you can use multiple arithmetic
18:42operations together when defining a variable.
18:45Here we are using addition, multiplication, and division.
18:49Also notice the use of parentheses.
18:52Here, we want to divide the quantity systolic plus
18:55twice diastolic by three.
18:58If we had admitted the parentheses, SAS would have only divided
19:02the quantity, two times diastolic by three.
19:06A second formula for map involves first calculating
19:10pulse pressure, which we have, and dividing that by three, then
19:13adding diastolic blood pressure.
19:15Even though we have not yet run the DATA step to formally create
19:19pulse pressure, you can still reference variables that you
19:22defined on an earlier line in that same data step.
19:27Finally, we apply exponentiation using two asterisks.
19:31We calculate age squared or age to the power 2.
19:36As a reminder if one of the arguments for an arithmetic
19:39operator is missing, the result is missing.
19:44I want to take a moment before we get into the next topic to show
19:47you how to use SAS as a calculator.
19:50SAS can be used to perform calculations such as simple arithmetic
19:54operations that we've seen or more complex calculations
19:58involving different built in functions which we'll discuss shortly.
20:02First let me show you how not to perform calculations
20:05in your SAS program.
20:07Suppose you wanted SAS to compute the sum of 1 in 1,
20:11so 1 plus 1.
20:12Typing 1 plus 1 in your program and running this line of
20:16code will not return an answer.
20:18In fact, you'll receive an error in your log.
20:21Maybe we need to assign the result to a variable.
20:24We want the variable called answer to equal the result of the
20:28calculation. Unfortunately, this also isn't the way to
20:31do it. You will again receive an error in your log.
20:35SAS is different from other object oriented programming languages.
20:39For the most part you can't have variables floating around.
20:42You need to work within data sets which means we
20:46need to perform our calculations within a data step.
20:50We do this even though we're not reading in a data file using
20:54the SET statement. We still need to work within a data
20:57step when using SAS as a calculator .This means
21:01that we need to print the data set that we create in
21:05our data step or navigate to the created data set
21:08in the Explorer tab in order to see the results of
21:12our calculations.
21:14The calculated value which is the output of the function
21:17or the operation, will need to be assigned a variable name,
21:21and that variable which holds the result will be the last value
21:25in your created data set.
21:26Or perhaps it will be the only value in your data set, as we'll see
21:29with this first example.
21:31As an example of how to use SAS as a calculator, let's use
21:35some of the arithmetic operations that we're already familiar with.
21:39I'm naming the data set that will contain the results of my
21:42calculations, calc.
21:44I want to compute an individual's body mass index, or
21:48BMI, using the formula, weight in kilograms divided
21:51by height in meters squared.
21:53The BMI I want to calculate is for an individual with a weight
21:57of 70 kilograms and a height equal to 1.6
22:01meters. The result of the calculation will be in a
22:05variable called BMI.
22:07After running the DATA step when I run the print procedure to print
22:11the data set called calc, I see a single row in a single
22:14variable, the variable that we created called BMI.
22:18The no OBS option in PROC PRINT suppresses SAS
22:22is customary observation number.
22:24Another way of performing this calculation is to define a variable
22:28for weight and a variable for height and then use those
22:32variables in the formula for BMI.
22:35We can compare this to the calculation that we performed on line 2
22:39where we plug the raw values directly into the formula.
22:42As you would expect, we get the same result.
22:45Notice here that the data set Calc 2 contains all
22:48of the variables that we created in this second data step.
22:53Next we'll discuss using mathematical functions in SAS.
22:57Functions can be applied to a single variable or to a set of variables
23:01to help us transform or analyze data.
23:04There are many built in functions in SAS, and we'll be looking at some
23:07of the most frequently used mathematical and statistical functions.
23:12Functions are applied in a data step in the same way we carried
23:16out arithmetic operations.
23:18Each function takes one or more arguments.
23:21These arguments can be existing variables in a data set.
23:24In this case you want to use the values of those variables to
23:28evaluate the function. SAS will take the input values, the
23:31arguments for each subject, apply the function, and create the
23:35new variable in Framingham new.
23:37For example, we have BMI in our Framingham dataset.
23:41If I want to compute the natural log of BMI for all subjects
23:44in my dataset, I would create a variable in Framingham new
23:48that applies the log function, which is natural log in SAS, to
23:51the existing variable BMI.
23:54The name I chose for this new variable is log BMI.
23:57Notice that the log function applied to a missing value, as we see
24:01here in the last subject, returns a missing value, as you would
24:04expect. A function that can take multiple arguments
24:08is the max function.
24:10Recall that we have total cholesterol collected over three study
24:14periods. The largest non missing value of total cholesterol
24:17for each subject would be the max of existing variables,
24:21Total cholesterol 1, 2, and 3.
24:24The arguments of the function are separated by commas.
24:27Again, notice that if all of the input values of the function are
24:31missing, as we see in the last subject, and the maximum of
24:34a set of missing values is also a missing value.
24:38In this data step we are producing new variables as a result
24:42of the calculations.
24:43The new variables will be appended as columns to the dataset
24:47created in the DATA step.
24:49Log BMI and Max cholesterol are the final columns
24:53of Framingham new.
24:56Here is a list of some commonly used mathematical functions that can
24:59be applied to a number.
25:01Here X is a real number.
25:03The first function we'll discuss is the exponential function.
25:07The exponential function E to the X calculates the value
25:10of E raised to the power of your argument where E is the base
25:14of the natural logarithm 2.718.
25:17Let's.switch to SAS to apply the exponential function to
25:21a variable.
25:23Here I used a DATA step to create a variable X
25:27in a data set that I call Example 1.
25:29I have nine observations here but one observation is missing.
25:33I'm including the missing X value to show you how the functions deal
25:37with missing values. Normally in a data step I enter each
25:40observation on a separate line after the word data lines.
25:44SAS will run a complete iteration of the DATA step to construct
25:48one observation using one raw line of data.
25:51The one line of data can contain one or more variables.
25:55The data step then repeats this process again and again until there
25:58are no raw data lines left to read.
26:01But you see here that I list all nine values of X on the same line.
26:05I am able to do this because I include the double trailing at sign
26:09in the input statement.
26:10Basically this allows me to create several observations from
26:14one line of data.
26:15I could also break the data across more than one line and SAS will
26:19continue to read the next observation in the order the values are
26:22entered. Let's run the data step and check the log for errors.
26:30SAS tells us that we have nine observations and one variable in
26:33our data set Example 1.
26:36We can either print the data set or navigate to it in the Explorer
26:40tab to view it as a table.
26:42Since it's a small data set, let's print it to our results viewer
26:45window.
26:47I see that all nine of my observations have been read in and are
26:51contained in the data setmExample 1.
26:54Notice that we see SAS's automatic observation numbering
26:57because I did not use the no OBS option in PROC PRINT.
27:01Let's go back to our code.
27:03We want to transform X by exponentiating.
27:06I can do this by writing a second data step that reads in the data
27:10set Example 1, and creates a new variable which is the
27:14exponentiated x value, and this would be a fine way to do it.
27:17I'm preserving the original data set Example 1 and creating
27:21a copy called example 1A which will contain e
27:25to the X.
27:26When we print our new data set
27:28Example 1A, We see it contains both X and E
27:31to the X. Notice that exponentiating a missing value
27:35results in a missing value.
27:37SAS also tells us that we try to perform an operation on a missing
27:41value in the log.
27:43I want to show you that you can also create new variables in the
27:46original data step where I'm reading in the data.
27:49I can define any new variables that I also want this dataset
27:52to contain right after the input statement and before the data
27:56line statement. For example I know that this dataset will contain
28:00the variable x because I specified this variable in the
28:03input statement.
28:05I want this data set to also contain e to the X.
28:08So I define a new variable which I name EXPX,
28:12equal to the exponential function applied to the variable x.
28:16This saves us from having to run an additional data step.
28:21Let's print the data set to see if it contains our two variables.
28:25And it does.
28:27The function for natural log or log base e is simply
28:31log in SAS.
28:33You might be used to using LN to refer to natural log
28:37but know that in SAS log is natural log.
28:39The function for log base ten is log ten
28:43both log and log ten must have positive arguments, otherwise
28:47the function will return a missing value and you will see a note about
28:50this in the SAS log.
28:53Next we see absolute value and square root.
28:56Square root of a negative number will return a missing value.
28:59In order to round a number to a certain number of decimal places, we
29:03use the round function.
29:05Specify the argument and then specify to how many decimals
29:08we'd like to round the number.
29:10In this example we would like to round X to the hundredth place.
29:15Int basically cuts off the decimal from a number and returns
29:19the integer only. This function does not round.
29:22Ceiling returns the smallest integer that is greater than
29:26or equal to the argument.
29:28Floor returns the largest integer that is less than or
29:31equal to the argument.
29:33We'll see how these functions behave when we apply them to our x
29:36variable in SAS.
29:39I'm going to repeat the process of creating these new transformed
29:43X variables by applying the remainder of the functions that we just
29:46covered in the original data step where we entered the raw data.
29:50I'm naming this dataset Example 1.
29:52So after I run this data step SAS will overwrite
29:56Example 1 that currently exists in the work library.
30:00To keep our original example 1, we would need to assign
30:03a different name to this dataset.
30:06Let's run this code.
30:09Checking the log we do see several messages telling us
30:12when we are trying to perform a calculation that is not defined.
30:16Each time SAS is asked to perform a calculation using
30:20an invalid argument, it lets you know in the log.
30:23SAS also notifies you when it's asked to perform calculations
30:27using missing values and tells you that this action generated
30:30a missing value. However, these are not program halting errors
30:35and do not stop SAS from running the DATA step.
30:38As we see at the very bottom, Example One was successfully created
30:42and it contains nine observations and 10 variables.
30:46Again, keep an eye on your log after each run because it can help
30:49you identify problems with your data.
30:54Here we see some common mainly statistical functions
30:57that are applied to either a list of raw numbers entered as
31:01arguments or to more than one variable in an existing dataset.
31:05The first function listed here is the sum function.
31:09I'll point out the difference between the SUM function and simply
31:12using the addition arithmetic operation shortly.
31:15Next we have mean, median, min max, number
31:19of non missing values, number of missing values, standard
31:23deviation, and variance.
31:24The arguments are listed inside the parentheses separated by commas.
31:29Note that the result of these functions is based on the non missing
31:33values of the arguments.
31:35Even if a subject has a missing value for one of the arguments
31:38these functions will use the non missing values to perform the
31:42calculation.
31:43This is the difference between the basic arithmetic operations we
31:47performed previously and the sum function, for example.
31:50The SUM function will add all available data whereas if we
31:54use the plus sign to define the sum if any argument of
31:57the arithmetic operation is missing the sum itself will be missing.
32:02Moving to SAS, we will apply these functions to our three total
32:06cholesterol measurements taken over the three periods in this study.
32:10I'm going to read in the original Framingham dataset and apply each
32:14of these functions. The dataset Framingham new will
32:17contain the results of the calculations.
32:20It might be interesting to also create a few duplicate variables
32:24that check the calculations performed by the functions.
32:27For example let's compare a manual arithmetic calculation
32:31of the sum of total cholesterol 1, 2, and 3 to
32:35the sum calculated using the SUM function.
32:38The end function reports the number of non missing observations
32:42in the arguments for each, and Miss reports the number
32:46of missing observations.
32:48Since we have three total cholesterol variables the sum of
32:52N and N Miss should equal three.
32:55Let's also check if the mean cholesterol calculated using the
32:58mean function equals the sum of all observations divided
33:02by the number of observations used in the sum.
33:05That is the number of non missing values in total cholesterol 1,
33:092, and 3. Printing the first 10 observations in Framingham
33:12new, we see that individuals tend not to have complete
33:16cholesterol data in all three periods.
33:19But the arithmetic and statistical functions apply to these variables
33:22are not equal to missing.
33:24In contrast, the manual SUM check Sum is missing if
33:28any of the arguments in the manual some is missing.
33:30Notice that these functions analyze the data within each subject.
33:34We are not finding the means, say, over all subjects.
33:39You can again use SAS as a calculator to manually enter a list
33:42of values as arguments into the function.
33:45Here, I'm computing the mean of the values 1 2 3
33:494 5 and the standard deviation of those data values.
33:53You can also nest functions.
33:55For example suppose I want to compute the square root of the sum
33:59of 1 2 3 4 5.
34:01We can do that without creating an intermediate variable for
34:04the sum. But you could also do it that way.
34:07Again print the data set to view the results of the calculation.
34:11As we would expect, the mean of that sequence is 3 and
34:15the standard deviation of those five numbers is reported along
34:18with the square root of the sum.
34:22Lastly, we're going to talk about working with date variables.
34:25In SAS, date variables are stored as numbers
34:29and are displayed to us using different date formats.
34:33We see here that our 2 date variables in Framingham,
34:37intake and date of death, were imported as numeric variables.
34:41The numeric value of the date is the number of days between
34:45January 1st 1960 and the specified date,
34:49that is day 0 in SAS is January 1st
34:521960.
34:54Dates before January 1st 1960 are represented
34:57as negative numbers, and dates after as positive numbers.
35:01The variables date and SAS date are the same.
35:04The variable on the left has been formatted to display in a format
35:08we're used to seeing, while the variable on the right is an
35:11unformatted date showing you how SAS represents a date
35:15numerically.
35:17This numeric representation of dates allows us to easily
35:20find the number of days between two dates.
35:24You can use the arithmetic operation of subtraction to find the
35:27number of days between an end date and a start date.
35:31For example, to find the number of days between January 30th
35:351948 and April 18th 1964, we see
35:39that the end date is
35:421569 days after January 1st 1960, and
35:46the start date is 4354 days
35:50before January 1st 1960.
35:53To find the number of days in between we subtract the start date
35:56from the end date, which gives us 5923
36:00days. We can also convert those days to months
36:04or years to give a more meaningful timescale.
36:08In the Framingham data we have two dates, date of intake into
36:11the study and date of death when a death was observed during study
36:15follow up. To find the number of days between a participant's
36:19date of death and their intake date, we subtract the start
36:22date from the end date or intake from date of death.
36:27The new variable we're defining is called days to death, and we
36:30see that it is only defined for those individuals with non missing
36:34values for both intake and date of death.
36:37We can convert days to years by dividing the number of days
36:41by 365.25.
36:44With this many days a timescale of years is more appropriate.
36:48Finally, since we also have the variable called age that tells
36:52us the subject's age in years at intake, we can calculate
36:56an approximate age at death by adding years to death to
37:00the age variable.
37:01Note this is an approximate age of death because we do not have the
37:05individual's date of birth in these data.
37:08If we had date of birth we could calculate age of death by subtracting
37:12date of birth from date of death.
37:15If we want to define a new date variable in our data set to use in
37:18calculations, we can do so directly in a data step.
37:23For example, if I know that follow up for long term endpoints ended
37:26on December 31st 1972 in this study, I
37:30can define a new variable and follow up equals December
37:3431st 1972.
37:37I will show you two ways of defining the date variable.
37:40In the first method you enclose the date in quotes followed by
37:43the letter D. It is important to specify the day,
37:47followed by month, and then year.
37:49You must also use the three letter abbreviation for the month.
37:54The second method of defining a date variable uses the N
37:58D Y function in SAS.
38:00The arguments of the function are numerical month, followed by da,y
38:04and then year with each argument separated by a comma.
38:08Both methods produce identical results.
38:11Another useful function is the today function in SAS.
38:15This function will define a date variable using the current date.
38:19There are no arguments of this function.
38:21Because we're reading in the Framingham dataset you're using the SET
38:25statement, these new date variables will be defined for every
38:28subject in the dataset.
38:30We will see three new columns at the end of our dataset but the value
38:34of each date will be constant for all subjects.
38:37This is necessary if we want to use these newly defined variables
38:40to perform calculations involving existing date variables
38:45such as date of intake because SAS evaluates expressions
38:48involving variables in the data set for each observation.
38:53All of these methods for defining date variables will produce
38:56the date in the form of a SAS date value.
38:59This is fine if you just want to use the date to perform calculations.
39:03However, if you want to print the date variables it's helpful
39:06to apply a date format to the variables so that you can
39:10see the date presented in a way we are used to seeing.
39:14For example, the way intake is displayed here.
39:17Since the date variable intake was an existing variable
39:21in the imported Excel workbook, SAS recognized that
39:24this variable was a date and applied a date format to
39:28the variable when reading in the data.
39:30This is why it appears as a formatted date.
39:34We see this when we look at PROC CONTENTS.
39:36The format date 9 was applied to the variable intake.
39:40Formats are applied to variables in a data step using a format
39:44statement. After the word format specify the name of the
39:47variable that you would like to format followed by the name of the
39:50format. The format name must be followed by a period
39:54or decimal point. We closed the format statement with a
39:58semicolon.
39:59You can apply the same format to more than one variable.
40:03By including the format name after more than one variable name.
40:07You can also apply formats to multiple variables in your dataset
40:11using one format statement.
40:13SAS has many predefined date formats to choose from,
40:17even more than I have listed here.
40:19We are applying a different date format to each of these 4 date
40:23variables. Intake is formatted as date 9.
40:279 refers to the length of the date variable under this format.
40:31End follow up is formatted as MMDDyy.
40:35The thing to notice about this format is that it uses a two
40:38digit year and follow up version to use its
40:42date format MMDDyy10.
40:4610 again refers to the length of the resulting formatted date
40:49including the forward slashes between month day and year.
40:54Finally today date uses date format word date
40:58which displays the date as we would in text form.
41:01Formats do not change the value of the variable.
41:04These dates are still thought of as numbers by SAS.
41:08Formats only change the way values of the variable are displayed.
41:13This concludes video 2.1 In this video, we reviewed
41:17PROC import. We also covered how to perform calculations
41:20and construct new variables using arithmetic operations and
41:24using mathematical, arithmetic, and statistical functions.
41:28Finally we discussed working with dates in SAS.
41:32In video 2.2 We will discuss creating new variables
41:36using comparison and logical operators.