Problem Set 2, Problem 4
Econ 741, Spring 2011
The last problem on Problem Set 2 is applied in nature and requires you to use your favorite statistical software to perform a simple Ordinary Least Squares regression with one independent variable (and a constant or intercept). My intention is also that you learn how to load and manipulate datasets in ASCII format. Given the plethora of different sofware and file formats, this is one of the most common formats in which datasets are stored. Comma separated values (CSV) is another common format.
While you are free to use any software you like that is capable of answering the questions, I will provide some pointers for using Matlab and Stata. I would recommend using Stata by default, or Matlab if you prefer to “do it yourself” to get more intuition about what is happening inside the “black box.”
Stata
To load the data into Stata, use the infile
command in the following
form:
infile var1 var2 var3 ... using filename.dat
where var1
, var2
, and so on denote the variable names in the order
they are stored in the data file and filename.dat
is the name of the
file containing the raw data. You can find additional help by typing
help infile
. (And of course the help
feature works for other
commands as well.) To carry out the actual regression, use the
regress
command.
Matlab
In Matlab, use the load
command to read the raw data into a matrix.
Then look at the data description file to determine which columns you
need to use for the regression. For example:
load 401k.raw
prate = 401k(:,1)
One option is to use Paul Ruud’s OLS code for Matlab to run the regression: ols.m. If you choose this option, it is instructive to read the code, which is quite short, to understand what is happening.
Alternatively, you can do it by hand, using the matrix form for which we will cover in class on Tuesday. We will also cover on Tuesday.