For my capstone I intend to look at the mathematics behind regression analysis using panel data, or panel data modeling. A panel of data consists of two components: a cross-section and a time-series. For instance, the data I am using consists of individual observations for each of the 58 counties of California over a span of 8 years (2000-2007). This means that each variable in the regression has 464 ($58 \cdot 8$) observations. An advantage of this approach is the ability to account for variability over time as well as across the cross-section. Also, it allows for analysis of data with a limited number of observations over time (provided there are substantial cross-sectional observations) or a limited number of observations over the cross-section (provided there are sufficient time-series observations).

The general form of a panel data model is $y_{it} = \alpha_i + \beta'x_{it} + \epsilon_{it}$. In the model, $i$ represents the cross-sectional units, $t$ represents the time-series units, $y$ represents the dependent variable, $x$ represents the independent variables, $\alpha$ represents the individual effects coefficients, $\beta'$ represents the set of coefficients for the independent variables, and $\epsilon$ represents the error terms. This is just the general form of the panel data model. The specific variations of the model that I will be looking at will be discussed in a later blog post.