This week I finished breaking down the simplest form of the panel data model and typed it up in LaTeX. At first I wanted to write it all out without using any variables other than x’s and y’s. However, doing so made matrices that were too large to fit on a page. Thus I wrote out each variable so that it could be understood what was going into them. The results can be seen in the attached PDF.

## Category: Capstone

Last weekend I participated in the Mathematical Competition in Modeling (MCM). This event took place over a 96-hour period starting on Thursday night at 5 Pm and ending Monday night at 5 PM. Having dedicated my entire weekend (and then some) to this endeavor, I was thus forced to spend the rest of the week playing catch up on my homework. Consequently, very little progress was made on my capstone project.

Over the next week I hope to finish constructing the basic forms of the panel data model at their simplest levels. I then will type them using LaTeX so they are easier to read and understand.

Over winter break and J-Term I thought of my capstone project seldom and worked on it even less. However, towards the end of J-Term I did make a trip to PLU’s library. I checked out three books on data analysis and panel data modeling, and began breaking the panel data model down to its simplest form. Ironically, its “simplest” form was too big to fit on a single piece of paper, and involved several matrices filled with summations.

One interesting thing that I learned was that the basic form of the panel data model actually has four variations, each based on its own set of assumptions. Needless to say I have a lot of work ahead of me.

I also met with Professor Munson last Wednesday to discuss the error term of the basic form(s) of the panel data model. We agreed we needed to do some more research and meet again at a later time.

My last post was on the fixed effects model. I established that the fixed effects model assumes that variables not included in the regression are correlated with the variables included in the regression, and thus the results of the regression cannot be used to assess the effects of unobserved variables. The random effects model, on the other hand, assumes that unobserved variables are not correlated with observed variables, and allows the regression to be used to investigate the effects of variables not included in the regression.

In my past posts on the panel data model and its specific variations, I explained that the general form of the panel data model is , and the general form of the fixed effects model is . With the random effects model, the general form is . In this model, is taken to be constant, and is a measurement of random disturbance for each cross-sectional unit.

In choosing whether to use a fixed effects model or a random effects model, one must first test to see if individual effects exist. This is done using a Langrange Multiplier (LM) test. If they do indeed exist, then a Hausman test can be used. The Hausman test uses a hypothesis test to determine whether or not the fixed effects model and the random effects model have the same variance. If their variances are the same, then a random effects model may be used. If not, the more restricting fixed effects model must be used.

In one of my earlier posts, I discussed panel data modeling and my intentions to use it as my capstone. A panel data model is rather general, however, and thus I want to consider specific variations of it. The first variation that I have decided to take a closer look at is the fixed effects model.

The fixed effects model operates under the assumption that unobserved variables are correlated with variables included in the regression. Consequently, studies conducted using this model can only be used to describe the effects of the included independent variables on the dependent variable, and thus cannot extend their results to explain the effects of other variables on the dependent variable.

Recall the general form of a panel data model: . The fixed effects model assumes that the individual effects coefficients () vary across each cross-sectional unit, while the coefficient is held constant. As a result of the variability in the individual effects coefficients, it becomes necessary to use dummy variables representing each cross-sectional unit in order to properly estimate the regression. The resulting equation is where is the set of dummy variables.

For my capstone I intend to look at the mathematics behind regression analysis using panel data, or panel data modeling. A panel of data consists of two components: a cross-section and a time-series. For instance, the data I am using consists of individual observations for each of the 58 counties of California over a span of 8 years (2000-2007). This means that each variable in the regression has 464 () observations. An advantage of this approach is the ability to account for variability over time as well as across the cross-section. Also, it allows for analysis of data with a limited number of observations over time (provided there are substantial cross-sectional observations) or a limited number of observations over the cross-section (provided there are sufficient time-series observations).

The general form of a panel data model is . In the model, represents the cross-sectional units, represents the time-series units, represents the dependent variable, represents the independent variables, represents the individual effects coefficients, represents the set of coefficients for the independent variables, and represents the error terms. This is just the general form of the panel data model. The specific variations of the model that I will be looking at will be discussed in a later blog post.

As everyone (in MATH 499A) knows, last week we were instructed on how to search for articles relating to math, and eventually our specific capstone topic. While helpful, my time in the library proved to be more frustrating than fruitful. That, however, was mostly my fault. Let me try to explain.

I am also an Economics major. The ECON department has a 4-credit Capstone course that takes place in one semester. This makes for a bit of an accelerated pace relative to the MATH Capstone. Therefore, I have already chosen a topic, which is: Efficacy of the influenza vaccination against flu-related death in adults in the United States using time-series data. For the purpose of the MATH Capstone, with guidance from Prof. Munson and the use of some (hopefully advanced) statistics, I will try to determine how effective the flu vaccine is against flu-related death. Already knowing my topic is what caused my frustration in the library.

While in the library, my search terms were far too specific. Already having my topic narrowed down has made searching for articles difficult. While there are copious articles out there similar to what I am trying to do, there is nothing exactly like what I want to do (I suppose this is good, in a way, because it means my work will be somewhat original). Given the specificity of my topic, I had to learn to broaden my search horizons. For instance, instead of searching specifically for the effectiveness of the influenza vaccination, I simply searched for vaccination. From there I added a search term, like efficacy or effectiveness. In doing such, I have been able to find numerous articles that I am interested in. One, for example, is titled *Estimation of Vaccine Efficacy and the Vaccination Threshold*. This article discusses the how to measure vaccine efficacy, and points out that the number of people who would have to be vaccinated to avoid an epidemic varies with vaccine efficacy and virus reproduction.

Unfortunately, many of the articles I have found, PLU does not have direct access to. Thank goodness for Interlibrary Loan.