Thursday, December 2, 2010

Logistic regression analysis - Introduction to the role and Logit odds ratio

Researchers are also studying the structure of a model on the relationship between the predictors (ie, independent variables) and response (ie dependent variable). linear regression is often used when the response variable is continuous. An assumption of linear models is that the residuals follow a normal distribution. This assumption fails if the response variable is categorical, then a normal linear model is not appropriate. This article presents a. regression model of a variable, the answer is often with two dichotomous categories of examples: if a plant lives or dies, if accepted by a respondent or contradicts a statement, or if a child graduates or leaves school risk.

In the town of linear regression is the response variable (Y) is a linear function of the coefficients (B0, B1, etc.), equivalent to the predictor variables (X1, X2, etc.). A typical model lookas follows:

Y = B0 + B1 * X1 + B2 * X2 + B3 * X3 + ... + E

For a dichotomous response variable, we could use a similar linear model to predict individual membership category, if the numerical values are used to represent the two categories. Any value of 1 and 0 chosen for mathematical convenience. The first example, we assign Y = 1 if a plant alive and Y = 0 when the plant dies.

The linear model is not working wellfor some reasons. First, the response values 0 and 1 are arbitrary, the modeling of the actual values of Y is not very interesting. Secondly, it really is the probability that each individual in the population with 0 or 1, we are interested in answer modeling. For example, we plant with a high degree of fungal infection (X1) fall into the category of "life plan" (Y) less frequently than plants with low levels of infection detected. Just as the level of infection is increasing,survival of a plant decreases.

This, we model P, the probability to take into account the response variable. Again, there are problems. Despite the general decrease in the probability of a generalized increase in the rate of infection is associated, we know that P, like all the odds may be within the limits of 0 and 1. Therefore, it is better to accept that the relationship between X1 and sigma P (S-shaped), rather than a scaleLine.

However, you can work in a linear relationship between X1 and function of P. Although a number of functions, one of the most useful finding is the logit function. It 's the natural logarithm of the probability that Y equals 1, which is simply the ratio between the probability that Y 1 divided by the probability that Y is 0. The relationship between the logit of P and P itself is shaped sigmoid. The resulting regression equationis:

ln [P / (1-P)] = B0 + B1 * X1 + B2 * X2 + ...

Although the left side of this equation looks intimidating, in this way, the probability that the right side of the equation is linear and look familiar to us. This helps us to understand the significance of regression coefficients. The coefficients can be changed slightly so that its interpretation makes sense.

The logistic regression equation can be extendedcase of dichotomous variables on the response of groups and categories polytymous ordered (more than two categories).

No comments:

Post a Comment