We use multiple regression when we study the possible relationship between several independent variables and one dependent variable.
For example, we can study human intelligence taking the IQ as the response variable and it is possible that we believe may be related to other variables such as brain size, the size of the person and sex. We could add to the study as independent variables. A multiple regression model could offer a response such as:
IQ = 80 + 0.02 + 0.15 Size brain volume – 0.8 Sex,
where the sex variable is a dichotomous or indicator variable coded as 0 for women and 1 for men. To interpret such a model must be very cautious. Multiple regression models inform us of the presence of relationships but not their causal mechanism.
Another source of problems of interpretation is the relationship between independent variables and collinearity. For example, sex may appear to influence intelligence by the equation, but consider that women are usually smaller than men. If we look at the signs, we appreciate that compensates for the effect of one another.
APPLICATIONS OF MULTIPLE REGRESSION
Multiple regression technique is used for prediction of responses from explanatory variables. But this is not really its most common application in research. Its most common uses are as follows:
- Identification of explanatory variables. It helps us to create a model where the variables that can influence the response, discarding those that do not provide information are selected.
- Detection of interactions between independent variables that affect the response variable.
- Identification of confounders. Although it is a difficult problem, it is not interested in experimental research.
REQUIREMENTS AND LIMITATIONS OF MULTIPLE REGRESSION
There are certain requirements needed to use the technique of multiple regression:
Linearity: is assumed that the response variable depends linearly on the explanatory variables. If the answer appears not be linear, we must introduce nonlinear components in the model.
Normality and equal distribution of remainder: To have a good multiple regression models is not enough that the remainder is small. The validity of the model requires to be distributed normally and with the same dispersion for each combination of values of the independent variables.
Number of independent variables: A rule that is often recommended is to include at least 20 observations for each independent variable that we consider interesting priori in the model. Lower numbers may not lead us to draw conclusions and type II errors.
Colinearity: If two independent variables are closely related and both are included in a model, quite possibly neither be considered significant, although if we had included only one of them, yes. A very simple technique to detect the colinearity is to examine the coefficients of the model to see if they become unstable when introducing the new variable.
Anomalous observations: We must take special care to identify and discard if necessary, as they have great influence on the result. Sometimes, they are only errors in data entry, but of great consequence in the analysis.