## Beginning Work in LaTeX

This week I finished breaking down the simplest form of the panel data model and typed it up in LaTeX. At first I wanted to write it all out without using any variables other than x’s and y’s. However, doing so made matrices that were too large to fit on a page. Thus I wrote out each variable so that it could be understood what was going into them. The results can be seen in the attached PDF.

Capstone

## Fixed Effects

In one of my earlier posts, I discussed panel data modeling and my intentions to use it as my capstone. A panel data model is rather general, however, and thus I want to consider specific variations of it. The first variation that I have decided to take a closer look at is the fixed effects model.

The fixed effects model operates under the assumption that unobserved variables are correlated with variables included in the regression. Consequently, studies conducted using this model can only be used to describe the effects of the included independent variables on the dependent variable, and thus cannot extend their results to explain the effects of other variables on the dependent variable.

Recall the general form of a panel data model: $y_{it} = \alpha_i + \beta'x_{it} + \epsilon_{it}$. The fixed effects model assumes that the individual effects coefficients ($\alpha_i$) vary across each cross-sectional unit, while the $\beta'$ coefficient is held constant.  As a result of the variability in the individual effects coefficients, it becomes necessary to use dummy variables representing each cross-sectional unit in order to properly estimate the regression. The resulting equation is $y_{it} = D \alpha_i + \beta'x_{it} + \epsilon_{it}$ where $D$ is the set of dummy variables.

## Panel Data

For my capstone I intend to look at the mathematics behind regression analysis using panel data, or panel data modeling. A panel of data consists of two components: a cross-section and a time-series. For instance, the data I am using consists of individual observations for each of the 58 counties of California over a span of 8 years (2000-2007). This means that each variable in the regression has 464 ($58 \cdot 8$) observations. An advantage of this approach is the ability to account for variability over time as well as across the cross-section. Also, it allows for analysis of data with a limited number of observations over time (provided there are substantial cross-sectional observations) or a limited number of observations over the cross-section (provided there are sufficient time-series observations).

The general form of a panel data model is $y_{it} = \alpha_i + \beta'x_{it} + \epsilon_{it}$. In the model, $i$ represents the cross-sectional units, $t$ represents the time-series units, $y$ represents the dependent variable, $x$ represents the independent variables, $\alpha$ represents the individual effects coefficients, $\beta'$ represents the set of coefficients for the independent variables, and $\epsilon$ represents the error terms. This is just the general form of the panel data model. The specific variations of the model that I will be looking at will be discussed in a later blog post.