# Problem Set 2, Problem 4

*Econ 741, Spring 2011*

The last problem on Problem Set 2 is applied in nature and requires you to use your favorite statistical software to perform a simple Ordinary Least Squares regression with one independent variable (and a constant or intercept). My intention is also that you learn how to load and manipulate datasets in ASCII format. Given the plethora of different sofware and file formats, this is one of the most common formats in which datasets are stored. Comma separated values (CSV) is another common format.

While you are free to use any software you like that is capable of answering the questions, I will provide some pointers for using Matlab and Stata. I would recommend using Stata by default, or Matlab if you prefer to “do it yourself” to get more intuition about what is happening inside the “black box.”

## Stata

To load the data into Stata, use the `infile`

command in the following
form:

```
infile var1 var2 var3 ... using filename.dat
```

where `var1`

, `var2`

, and so on denote the variable names in the order
they are stored in the data file and `filename.dat`

is the name of the
file containing the raw data. You can find additional help by typing
`help infile`

. (And of course the `help`

feature works for other
commands as well.) To carry out the actual regression, use the
`regress`

command.

## Matlab

In Matlab, use the `load`

command to read the raw data into a matrix.
Then look at the data description file to determine which columns you
need to use for the regression. For example:

```
load 401k.raw
prate = 401k(:,1)
```

One option is to use Paul Ruud’s OLS code for Matlab to run the regression: ols.m. If you choose this option, it is instructive to read the code, which is quite short, to understand what is happening.

Alternatively, you can do it by hand, using the matrix form for ${\hat{\beta}}_{\text{OLS}}$ which we will cover in class on Tuesday. We will also cover ${R}^{2}$ on Tuesday.