There are three main sources of statistical data

There are three main sources of statistical data. These include:

Statistical surveys

Statistical survey is an investigation about the characteristic of phenomenon by means of collecting data from a sample of the population and estimating their characteristics through systematic use of statistical methodology. Survey mainly uses interview to collect data; this can either be direct interview, telephone interview, email interview or online survey.

Surveys are advantageous source of data as the researcher has direct control over the data and there is possibility of asking data according to statistical definitions during the collection.

The disadvantages of surveys are high cost, especially in conducting direct interviews. Quality of the feedback can also be compromised, for instance, non-response and errors.


This is a complete enumeration of a population at a point in time with respect to well-defined characteristics, for example, population, production, etc. For instance, Kenya has been taking census from 1948, when the first census was taken. The last four censuses after the independence (1969, 1979, 1999 and 2009) have been conducted in a span of ten years.

In census, the data is mostly collected through questionnaires.

Census provides better data compared to surveys as it target the entire population of either a country or region. It also provides a basis for sampling frames which may be used in subsequent surveys. On the other hand, it is expensive to plan, conduct and to process the resulting data.


Register is a database that is updated continuously for a specific purpose and from which statistics can be collected and produced. Examples include: administrative registers (e.g. government departments), private registers (such as those from insurance companies among other private entities).

Registers as a source of statistical data has the cost advantage, that is, there is low cost in collecting and processing the data from this source. However, possible under-coverage in terms of information may render it disadvantages to some users.

Measures of Central tendency

They are also known as statistical average. They are statistical values which tend to occur at the centre of any well-ordered data. However, whenever they occur they do not indicate the centre of that data. They tell us the point about which items have a tendency to cluster. Such measures are considered as the most representative figure for the entire mass of the data.

Measures of central tendency include:

Mean is also known as arithmetic average. It may be defined as the value which we get by dividing the total values of the given observations in a series by the total number of the observations.
Arithmetic mean = (?x)/n ; where x is the number of values and n is the number of observations.
It is calculated using the equation:

Arithmetic mean represents the values of the most observations in a given population.

In grouped data, arithmetic mean is calculated as:

Arithmetic mean = Assumed mean + (?fd)/(?f)
The statistical mean is usually used in different experiments. This is because its calculation helps in eliminating random errors in an experiment. The researcher thus can derive a more accurate result than a result he or she would have derived from a single experiment.
It also used in statistical data interpretation. This is because of its feature of measuring the distance of a given value to the mean. For instance, if numbers have average X, then:

Since Xi – X is the distance from a given number to the average. The numbers to the left of the mean are balanced by the numbers to the right of the mean. The residuals sum to zero only if a number is a statistical mean. A single number X is used as an estimate for the value of numbers, then the statistical mean minimizes the sum of the squares (Xi – X)² of the residuals.

The advantage of statistical mean is that it includes every item in the data set and it can easily be used with other statistical measurements. On the other hand, the major disadvantage in using statistical mean is that it can be affected by extreme values in the data set and therefore be biased.

Other types of means are:
Geometric mean
Harmonic mean
Geometric mean is an average that is useful for sets of positive numbers that are interpreted according to their product and not their sum (as is the case with the arithmetic mean) e.g. rates of growth.
It is calculated using the equation:

Harmonic mean is an average which is useful for sets of numbers which are defined in relation to some unit, for example speed (distance per unit of time).
It is calculated using the equation:

The Mode
This is the value within a frequency distribution which has the highest frequency. It is the value that is most likely to be sampled in a set of data. On a histogram, it represents the highest bar in a bar chart. It is the most commonly or frequently occurring value in a series. The mode in a distribution is that item around which there is maximum concentration hence it is the size of the item which has the maximum frequency.
In data where no single value happens to be the mode, the class with the highest frequency is treated as such – referred to as the modal class.
Unlike other measures of tendency such as mean and median, mode is applicable in nominal data, for instance, in the Muslim world, the name ‘Mohamed’ occurs more often than any other name in the male population, thus can be the mode of the sample.

The Median
Median is a statistical value which is normally located at the centre of a given set of data which has been organized in ascending or descending order of magnitude or size. It divides the series into two halves, it separates higher halve from the lower halve.
The median is not skewed so much by extremely large or small values, and so it may give a better idea of a “typical” value as compared to mean. For example, in understanding statistics which vary greatly such as household income or assets. Median income, for example, may be a better way to suggest what a “typical” household income is.

Suppose we are asked to calculate the median in the data below:
65 55 89 92 56 35 14 56 55 87 45 92

We first need to rearrange the data in order of magnitude (smallest first)
14 35 55 55 56 56 65 87 89 92 92

Our median mark is the middle mark in this case 56