Problem Set 2, Problem 4

Econ 741, Spring 2011

The last problem on Problem Set 2 is applied in nature and requires you to use your favorite statistical software to perform a simple Ordinary Least Squares regression with one independent variable (and a constant or intercept). My intention is also that you learn how to load and manipulate datasets in ASCII format. Given the plethora of different sofware and file formats, this is one of the most common formats in which datasets are stored. Comma separated values (CSV) is another common format.

While you are free to use any software you like that is capable of answering the questions, I will provide some pointers for using Matlab and Stata. I would recommend using Stata by default, or Matlab if you prefer to “do it yourself” to get more intuition about what is happening inside the “black box.”

Stata

To load the data into Stata, use the infile command in the following form:

infile var1 var2 var3 ... using filename.dat

where var1, var2, and so on denote the variable names in the order they are stored in the data file and filename.dat is the name of the file containing the raw data. You can find additional help by typing help infile. (And of course the help feature works for other commands as well.) To carry out the actual regression, use the regress command.

Matlab

In Matlab, use the load command to read the raw data into a matrix. Then look at the data description file to determine which columns you need to use for the regression. For example:

load 401k.raw
prate = 401k(:,1)

One option is to use Paul Ruud’s OLS code for Matlab to run the regression: ols.m. If you choose this option, it is instructive to read the code, which is quite short, to understand what is happening.

Alternatively, you can do it by hand, using the matrix form for β^ OLS which we will cover in class on Tuesday. We will also cover R 2 on Tuesday.