Easy to search. Easy to read. Easy to cite with credible sources.

Number of Parameters Counting in a Hierarchically Multiple Regression Model

H.J. Zainodin, Noraini Abdullah and S.J. Yap

*Science International,
2014, 2(2), 37-43.*

**Background:** Manually, when a dependent variable is affected by a
large number of independent variables, the number of parameters helps researchers
determine the number of independent variables to be considered in an analysis.
However, when there are many parameters to be estimated in a model, the manual
counting is tedious and time consuming. Thus, this study derives a method to
determine the number of parameters systematically in a model. **Methods:**
The model building procedure in this study involves removing variables due to
multicollinearity and insignificant variables. Eventually, a selected model
is obtained with significant variables. **Results:** The findings of this
study would enable researchers to count the number of parameters in a resulting
model (selected model) with ease and speed. On top of that, models which fulfill
the assumption are considered in the statistical analysis. In addition, human
errors caused by manual counting can also be minimised and avoided by implementing
the proposed procedure. **Conclusion:** These findings will also undoubtedly
help many researchers save time when their analyses involve complex iterations.

**ASCI-ID: 5233-66**

1 in a linear regression
model, regression coefficients are the unknown parameters to be estimated. In
a simple linear regression, only two unknown parameters have to be estimated.
However, problems arise in a multiple linear regression, when the numbers of
parameters in the model are large and more complex, where three or more unknown
parameters are to be estimated. Challenges arose when computer programme has
to be written and complex iterations are required to perform with certain criterion.
Thus, the exact number of parameters involved should be known in order to prepare
the amount of data to suit such large and complex model. It is also important
to note that in order to have a unique solution in finding the estimated parameters,
according to the assumptions of multiple regression model stated by Gujarati
and Porter^{2}, the number of estimated parameters
must be less than the total number of observations.

According to Zainodin *et al.*^{3}, Yahaya
*et al.*^{4}, in a multiple linear regression
analysis, there are four phases in getting the best model, namely: Listing out
all possible models, getting selected models, getting best model and conducting
the validity of goodness-of-fit. In phase 1, the number of parameters for a
possible model is denoted by NP. To get the selected models, after listing out
all of the possible models in phase 1, multicollinearity test and coefficient
test are conducted on the possible models in phase 2. Before continuing to phase
2, number of parameters in each model must be less than the sample size, n.
Discard the model which failed the initial criteria. In the multicollinearity
test, multicollinearity source variables are removed from each of the possible
models. Then, coefficient test is conducted on the possible models that are
free from multicollinearity problem. Detailed procedure of this phase is explained
in Zainodin *et al.*^{3}. This is to eliminate
insignificant variables from each of the possible models.

For a general model Ma.b.c with parent model number “a”, the number
of variables removed due to multicollinearity problem is denoted by b, the number
of variables eliminated due to insignificance is denoted by (k+1) and the resulting
number of parameters for a selected model is represented by (k+1). In most of
the cases, if the number of the unknown parameters to be estimated for a possible
model is large, then the number of parameters for a selected model will most
probably be large too^{5,6}.
In these cases, the manual counting on the large number of parameters is found
to be time consuming. Furthermore, some of the parameters might be missed out
due to human error in manual counting. Thus, the objective of this study is
to propose a method to count the number of parameters for a selected model,
(k+1). The information of the NP, b and c is useful in getting the number of
parameters for a selected model, (k+1).

According to Gujarati and Porter^{2}, a simple
linear regression model without any interaction variable can be written as follows:

(1) |

where, Y is dependent variable, β_{0} and β_{1} are
regression coefficients and they are the unknown parameters to be estimated,
X_{1} is single quantitative independent variable and u is error term.
So, it can be observed that there are two unknown parameters to be estimated
in Eq. 1. This equation can also be written in the form as
follows:

Next, a hierarchically multiple linear regression models^{7}
with interaction variable can be written as follows:

(2) |

where, Y is dependent variable, β_{0},
β_{1},
β_{2}
and β_{12}
are regression coefficients and they are the unknown parameters to be estimated,
X_{1} and X_{2 }are single quantitative independent variables,
X_{12} is first-order interaction variable and u is error term. Thus,
it can be seen that there are four unknown parameters to be estimated in Eq.
2. This equation can also be written in the following form:

Next, an example for a linear regression model with interaction variables and dummy variables is shown as follows:

(3) |

where, Y is dependent variable, β_{0}, β_{1}, β_{2}
β_{12}, β_{D}, β_{1D} and β_{2D}
are regression coefficients and they are the unknown parameters to be estimated,
X_{1} and X_{2 }are single quantitative independent variables,
X_{12} is first-order interaction variable, D is single independent
dummy variable, X_{1}D is first-order interaction variable of X_{1}
and D, X_{2}D is first-order interaction variable of X_{2} and
D and u is error term. Therefore, it can be observed that there are seven unknown
parameters to be estimated in Eq. 3. This equation can also
be written as follows:

Equation 1-3 can be written in general model in the form:

(4) |

where, Ω_{0}
is the intercept and Ω_{j}
is the jth partial regression coefficient of the corresponding independent variable
W_{j} for j =1, 2,..., k.

According to Zainodin *et al.*^{3}, independent
variable W_{j} included the single independent variables, interaction
variables, generated variables, dummy variables and transformed variables. In
this study, (k+1) denotes number of parameters for a selected model. The corresponding
labels of the general model in Eq. 4-3 are
shown in Table 1.

From Table 1, it is known that β_{0}
represents the Ω_{0}
in the general model, β_{1}
represents the Ω_{1}
and the same goes to other estimated parameters in Table 1.
Variable X_{1} in Eq. 3 represents W_{1} in
the general model and the same goes to other variables in Table
1.

Instead of counting the number of parameters one by one as above, a equation to count the number of parameters in model without interaction variable and in model with interaction variable is proposed in this study. The Eq. 5 is presented as follows:

(5) |

where, NP is number of parameters for a possible model, g is number of single independent quantitative variables, h is number of single independent dummy variables and v is highest order of interaction (between single independent quantitative variable) in the model.

Here, v = 0 denotes model without interaction variable (or model with zero-order interaction variable) and v = 1, 2,… denotes model with first or higher order interaction variable(s). Hence, Eq. 5 can now be tested in the following instances to prove its validity in counting the number of parameters in a hierarchically multiple regression model. The aim of this study is to propose a method to count the number of parameters for a selected model, (k+1). The information of the NP, b and c is useful in getting the number of parameters for a selected model, (k+1).

**ad1**

**
MATERIALS AND METHODS
**

This will help to better understand the application of the equation as mentioned
in previous section. In order to achieve a model free from multicollinearity
effects and insignificant effects the following 4 phase model building procedure
is implemented (details can be found in^{3, 4}).

• | Phase 1: All possible models |

• | Phase 2: Selected model |

• | Multicollinearity test and coefficient test (Include NPM is Near Perfect Multicollinearity test and NPC is Near Perfect Collinearity test) |

• | Phase 3: Best model |

• | Phase 4: Goodness-of-fit |

**Randomness test and normality test:** This study also revealed the parameters
of independent variables measure multicollinearity effect between independent
variables parameters and structural parameters. As discussed in earlier section,
the number of parameters is very important before arriving at a selected and
best model. Thus, some illustrations follow:

**Models without interaction variable:** Here, the following models are
considered without interaction variable or with zero-order interaction variable.
As pointed out earlier in Eq. 5, the number of parameters
in the case of model without interaction variable (or v = 0) can be computed
using g+h+1. For instance, consider Eq. 6, a model with zero-order
interaction variable (or v equals to 0), number of single quantitative independent
variable, g equals to 1 and number of single dummy variable, h equals to 5,
as follows:

(6) |

where, Y is a dependent variable, X_{1} is a single quantitative independent
variable (g =1) and D, B, R, A and G are 5 single dummy variables (h =5). Then,
total number of parameters involved is:

For simplicity, consider another example for calculating the number of parameters in a model without an interaction variable. Considering Eq. 7 which is a model with zero-order interaction variable (or v equals to 0), number of single quantitative independent variable, g equals to 8 and number of single dummy variable, h equals to 10, as follows:

(7) |

where, X_{1}, X_{2}, X_{3}, X_{4}, X_{5},
X_{6}, X_{7} and X_{8} are single quantitative independent
variables and B, C, L, E, W, K, A, G, H and S are 10 single dummy variables.
Then, the following is obtained:

**Models with interaction variable:** Now, consider models with interaction
variable (i.e., v = 1, 2,…)
in this subsection. According to the equation mentioned in Eq.
5, it is known that the number of parameters in a model with interaction
variable is calculated in a different way from a model without interaction variable.
A few examples are presented to provide better understanding of this equation.
For instance, model with interaction variable up to first-order (i.e., v equals
to 1), number of single quantitative independent variable, g equals to 2 and
number of single dummy variable, h equals to 5 is presented in Eq.
8:

(8) |

where, X_{2} and X_{4} are single quantitative independent
variables, D, B, R, A and G are single dummy variables and X_{24}, X_{2}D,
X_{2}B, X_{2}R, X_{2}A, X_{2}G, X_{4}D,
X_{4}B, X_{4}R, X_{4}A and X_{4}G are first-order
interaction variables. Then, this led to:

Next, consider a larger model with higher order of interaction variable, for instance, a model with fifth-order interaction (i.e., v equals to 5), number of single quantitative independent variable, g equals to 6 and number of single dummy variable, h equals to 5 as presented in Eq. 9:

(9) |

In Eq. 9, X_{1}, X_{2}, X_{3}, X_{4},
X_{5} and X_{6} are single quantitative independent variables,
X_{12}, X_{13}, X_{14}, X_{15}, X_{16},
X_{23}, X_{24}, X_{25}, X_{26}, X_{34},
X_{35}, X_{36}, X_{45}, X_{46} X_{56},
X_{1}D, X_{1}B, X_{1}R, X_{1}A, X_{1}G,
X_{2}D, X_{2}B, X_{2}R, X_{2}A, X_{2}G,
X_{3}D, X_{3}B, X_{3}R, X_{3}A, X_{3}G,
X_{4}D, X_{4}B, X_{4}R, X_{4}A, X_{4}G,
X_{5}D, X_{5}B, X_{5}R, X_{5}A, X_{5}G,
X_{6}D, X_{6}B, X_{6}R, X_{6}A and X_{6}G
are first-order interaction variables, X_{123}, X_{124}, X_{125},
X_{126}, X_{134}, X_{135}, X_{136}, X_{145},
X_{146}, X_{156}, X_{234}, X_{235}, X_{236},
X_{245}, X_{246}, X_{256}, X_{345}, X_{346},
X_{356} and X_{456} are second-order interaction variables,
X_{1234}, X_{1235}, X_{1236}, X_{1245}, X_{1246},
X_{1256}, X_{1345}, X_{1346}, X_{1356}, X_{1456},
X_{2345}, X_{2346}, X_{2356}, X_{2456} and X_{3456}
are third-order interaction variables, X_{12345}, X_{12346},
X_{12356}, X_{12456}, X_{13456} and X_{23456}
are fourth-order interaction variables and X_{123456} is fifth-order
interaction variable. Then, total number of parameters:

Lastly, consider another example of a larger model which has interaction variable up to 6th order and nine single dummy variables. Consider Eq. 1, a model with highest order of interaction, v equals to 6, number of single quantitative independent variable, g equals to 7 and number of single dummy variable, h equals to 9.

(10) |

Here, X_{123456}, X_{123457}, X_{123467}, X_{123567},
X_{124567}, X_{134567} and X_{234567} are 5th order
interaction variables and X_{1234567} is a 6th order interaction variable.
Then, total number of parameters:

(11) |

As can be seen from the illustrations, number of parameters calculated using the derived equation tally with that in manual counting.

**
RESULTS AND DISCUSSION
**

The proposed equation defined in Eq. 5 is especially useful in counter checking the variables when listing all of the possible models in an analysis. This is because some of the variables might be missed out when there are a large number of parameters involved in a possible model.

Table 2 shows all the possible models for an analysis that
has two single quantitative independent variables (X_{1} and X_{2})
and one single dummy variable (D).

In Table 2, with the information on g, h and v, the NP for possible models M1-M12 can be computed by using Eq. 5. Then, the number of parameters for each of the possible models can be counterchecked by using the computed NP values. For simplicity, each of the 12 models can be written in general form as in Eq. 4.

After introducing the equation in counting the number of parameters for a possible
model, the way of getting the number of parameters for a selected model, (k+1)
is presented^{8,9}.
A model M32, is used as an illustration and it had also been mentioned earlier
in Eq. 3. So, multicollinearity test is conducted on the possible
model, model M32. The removal of multicollinearity source variables from this
model are shown in Table 2-4. This study
uses the modified method in removing multicollinearity source variables (excel
command: COUNTIF()).

In Table 3, all of the multicollinearity source variables
(variables with absolute correlation coefficient values greater than or equal
to 0.9500) are circled. Then, it is found that variables X_{12}, D,
X_{1}D and X_{2}D have frequencies 2. So, according to Zainodin
*et al.*^{3}, model M32 belongs to case
B. To avoid confusion between dummy variable B and case B, case B is represented
by case 2 in this study.

Similarly, case C is represented by case 3 in this study. Based on the removal
steps for case 2, variable X12 which has the weakest absolute correlation coefficient
with dependent variable Y, if compared to variables D, X_{1}D and X_{2}D,
is removed from model M32. The same removal steps are carried out on the reduced
model; model M32.1, as presented in Table 4.

After removing variable X_{2}D from model M32.1 in Table
4, details correlation coefficients of the reduced model M32.2 are shown
in Table 5. It is observed that each of the variables D and
X_{1}D has the highest frequency of one, respectively. So, it is identified
that model M32.2 belongs to case III. Therefore, variable X_{1}D which
has a weaker absolute correlation coefficient with the dependent variable Y
is then removed from this model. Thus, the resulting model free from multicollinearity
is M32.3.

Table 6 shows that model M32.3 is free from multicollinearity source variables because all the absolute correlation coefficient values between all the independent variables are less than 0.9500 (except the diagonal values).

By observing the model M32.3, it is found that the number of variables removed
due to multicollinearity problem is 3. Therefore, b for model M32.3 is 3. More
details on the definition of model name can be found in^{3}
and^{4,6,10}.
Thus the resulting model, M32.3 is free from multicollinearity effects and the
coefficient test is conducted on the model. The task then is to eliminate insignificant
variables from this model, M32.3.

In Table 7, it is found that variable X_{2} has the
highest p-value among other independent variables and is greater than 0.05 (since
the number of single quantitative independent variables is greater than 5 and
the coefficient test is a two-tail, the level of significance is set at 10%.
This is based on^{11} recommendations). Thus,
variable X_{2} is eliminated from model M32.3 and this reduced model
is called model M32.3.1 as 1 variable is eliminated due to insignificance. Details
on the coefficient test can be found in^{8,12,13}.

Table 8 shows that model M32.3.1 is free from insignificant
variable because both the p-values of variable X_{1} and D are less
than 0.05. From Table 8, it is found that 3 parameters (constant
of the model, coefficient of variables X_{1} and D) are left in model
M32.3.1, or in other words, (k+1) equals to 3. From the model name, model M32.3.1;
it is noticed that 1 variable is eliminated in coefficient test and c equals
to 1. As mentioned earlier in Eq. 3, there are seven unknown
parameters to be estimated for model M32, so NP equals to 7. Thus, by knowing
the NP, b and c for model M32.3.1 (i.e., Ma.b.c), the number of parameters (k+1)
for model M32.3.1 can be counterchecked using the proposed equationin this study
as:

(12) |

Therefore, it is shown in Table 8 that the number of parameters left in the selected model M32.3.1 is the same with the value obtained from the proposed in Eq. 11.

In line with the above discussion, other researchers also highlighted the
importance of this parameters counting. They ranked them (as importance, significance
or dependency etc.) the parameters of a model based on the magnitude of the
coefficients^{11,14,15}.

**
CONCLUSION
**

This study is new and groundbreaking. It has succeeded in proposing a equation
in counting the number of parameters for each of all possible models (details
can be found in phase 1 and in^{3,4}).
This equation helps to countercheck the number of parameters left in the selected
model which is free from multicollinearity and from insignificant variable.
It presented a equation to calculate the number of parameters and demonstrated
their application on models with and without interaction variables.

As can be seen from previous section, it requires lengthy time to calculate the number of parameters in a model, especially for bigger models like Eq. 9-10. Instead of calculating the number of parameters one by one manually, the equation established in this study allow researchers to obtain the number of parameters is an easier, faster yet accurate way. Besides, human errors that are caused by manual counting, (have happened during model development) can also be minimised and avoided. The proposed equation also helps to save tremendous amount of time, where there are analysis involving complex iterations or repeated tasks, especially in software development.

" class="btn btn-success" target="_blank">View Fulltext