This week I finished breaking down the simplest form of the panel data model and typed it up in LaTeX. At first I wanted to write it all out without using any variables other than x’s and y’s. However, doing so made matrices that were too large to fit on a page. Thus I wrote out each variable so that it could be understood what was going into them. The results can be seen in the attached PDF.

## Tag Archive: model

My last post was on the fixed effects model. I established that the fixed effects model assumes that variables not included in the regression are correlated with the variables included in the regression, and thus the results of the regression cannot be used to assess the effects of unobserved variables. The random effects model, on the other hand, assumes that unobserved variables are not correlated with observed variables, and allows the regression to be used to investigate the effects of variables not included in the regression.

In my past posts on the panel data model and its specific variations, I explained that the general form of the panel data model is , and the general form of the fixed effects model is . With the random effects model, the general form is . In this model, is taken to be constant, and is a measurement of random disturbance for each cross-sectional unit.

In choosing whether to use a fixed effects model or a random effects model, one must first test to see if individual effects exist. This is done using a Langrange Multiplier (LM) test. If they do indeed exist, then a Hausman test can be used. The Hausman test uses a hypothesis test to determine whether or not the fixed effects model and the random effects model have the same variance. If their variances are the same, then a random effects model may be used. If not, the more restricting fixed effects model must be used.

In one of my earlier posts, I discussed panel data modeling and my intentions to use it as my capstone. A panel data model is rather general, however, and thus I want to consider specific variations of it. The first variation that I have decided to take a closer look at is the fixed effects model.

The fixed effects model operates under the assumption that unobserved variables are correlated with variables included in the regression. Consequently, studies conducted using this model can only be used to describe the effects of the included independent variables on the dependent variable, and thus cannot extend their results to explain the effects of other variables on the dependent variable.

Recall the general form of a panel data model: . The fixed effects model assumes that the individual effects coefficients () vary across each cross-sectional unit, while the coefficient is held constant. As a result of the variability in the individual effects coefficients, it becomes necessary to use dummy variables representing each cross-sectional unit in order to properly estimate the regression. The resulting equation is where is the set of dummy variables.

For my capstone I intend to look at the mathematics behind regression analysis using panel data, or panel data modeling. A panel of data consists of two components: a cross-section and a time-series. For instance, the data I am using consists of individual observations for each of the 58 counties of California over a span of 8 years (2000-2007). This means that each variable in the regression has 464 () observations. An advantage of this approach is the ability to account for variability over time as well as across the cross-section. Also, it allows for analysis of data with a limited number of observations over time (provided there are substantial cross-sectional observations) or a limited number of observations over the cross-section (provided there are sufficient time-series observations).

The general form of a panel data model is . In the model, represents the cross-sectional units, represents the time-series units, represents the dependent variable, represents the independent variables, represents the individual effects coefficients, represents the set of coefficients for the independent variables, and represents the error terms. This is just the general form of the panel data model. The specific variations of the model that I will be looking at will be discussed in a later blog post.

In the TED Talk *Sean Gourley on the Mathematics of War*, using mathematics to track and interpret war is discussed.

Sean Gourley, a physicist from New Zealand, began his project by assembling a team of scientists, economists, and mathematicians. They then used various media sources to obtain information on the war in Iraq, and then used a computer to filter all of it and pull out the bits in which they were interested. Using this data, the distribution of attack sizes in Iraq was produced and graphed. The vertical axis was frequency of attacks, and the horizontal axis was number of deaths. For instance, the ordered pair (47,1) would mean there were 47 attacks with 1 casualty.

They then did the same technique for other wars, and surprisingly, the same distribution emerged. Expanding their study further and further, every war produced a similar distribution. Furthermore, each war had a slope that was within .75 of the mean (which was -2.5).

Using this data, the team produced the equation , where is the probability, is the number killed, is a constant, and is the slope of the line. The group theorized that this is a result of necessity when a group is fighting against a much stronger force. In order for their resistance to exist, it has to follow the discovered pattern.

Gourley concludes that we may be able to use this model to interpret the progress of a war, and in theory try to push it in the right direction, whatever that may be.