METHODOLOGICAL OUTLINE

It is often said that in order to measure something one must first set up a model. This idea which probably comes from econometrics is not true however.

Physicists have been measuring boiling temperatures of various liquids for almost three centuries. This became possible as soon as reliable thermometers were available. In this respect one can recall that it is around 1735 that Anders Celsius (1701-1744), a Swedish astronomer, proposed the temperature scale which bears his name. He determined the dependence of the boiling of water with atmospheric pressure in a way which is accurate even by modern day standards. In short, in order to measure a physical (or social) variable, what is needed is an accurate measurement device (or procedure). One can even accept some uncertainty as to the interpretation of what is being measured. Around 1730, physicists had a fairly poor understanding of the notions of heat and temperature but this did not prevent them from making accurate measurements. Naturally, after having measured the boiling temperatures of many liquids, physicists tried to understand what are the main factors which determine them. They tried to establish correlations with density, viscosity, and several other variables but all these attempts were unsuccessful. We now know that the major factor is the interaction strength, but physicists were not able to measure this variable independently until the 1930s. Thus, boiling temperatures remained a mystery for almost two centuries. By the way, even nowadays we still do not have a comprehensive statistical theory of the phase transition of boiling. Yet, the accurate measurements of the boiling temperatures of thousands of organic and inorganic substances have had (and still have) great usefulness. These data can be found in all handbooks of physical data along with other important physical variables, e.g. heat of vaporization, thermal dilatation coefficients, viscosity, etc.

We should also keep in mind that what made the success of physics is the fact that its experimental results were checked and rechecked until becoming truly trustworthy. Even after having gained widespread acceptance new experiments were repeated again and again to improve the accuracy of the measurements. Similarly, a major guideline of our approach will be that it is better to provide a measurement for a single interaction but which can be confirmed by different methods, than to provide numerous results whose reliability is fragile, unconvincing and questionable. In the first case one has a solid foundation on which further results can be added in a cumulative way, whereas in the second one builds on sand.

What would be the analog in the social sciences of measuring the boiling temperatures of thousands of substances? It would be collecting a large set of comparative data for a specific variable. However, it must be realized that much care is required for making comparative data truly comparable. As an illustration consider the turnout in elections. As is shown by the following ratios, turnout can be defined in many different ways. 1) As the ratio (R1) of the number of voters (V) to the voting age population that is to say for most countries the population over the age of 18. 2) As the ratio (R2) of V to the number of voting age citizens. For instance in the United States around 2000 only about 60% of the voting age population were American citizens (Current Population Survey published by the Bureau of the Census, Voting and Registration in the Election of November 2000, hereafter referred to as CPS 2000) which shows that R2 can be very different from R1. 3) As the ratio (R3) of V to the number of registered voters. Not all citizens are registered for voting. For instance, in the United States around 2000 only 52% of the citizens of Asian origin were registered for voting (in Hawaii the percentage was 45%) as compared to 68% for the whole population (CPS 2000). Thus, R3 can be very different from R2. 4) As the ratio (R4) of V to the number of eligible voters. Not all registered citizens are allowed to vote. For instance in the United States, some felons, that is to say persons who have been sentenced for a serious crime but may no longer be in jail, are barred from voting. According to a recent book on this topic (Locked Out by C. Uggen and J. Manza, 2006) in 2004 these persons represented about 2.5% of the voting age population; the entry ``Voter turnout'' of Wikipedia says that in 2004, ``ineligible voters constituted nearly 10%'' of the voting-age population, but does not give the source of this figure. Whether it is 2.5% or 10% it appears that R4 can be substantially different from R3.

Therefore, to be really useful and trustworthy a comparative compilation should provide detailed data for ALL these different categories. Depending on their respective objectives, researchers may wish to use any of the previous ratios and it is therefore important that they have all relevant data. Just in order to emphasize that one must keep the eyes wide open when using data even when they come from respectacle institutions such as, in the present case, the US Bureau of the Census, it can be mentioned that the CPS 2000 source that we mentioned above is not really accurate. In fact, it is even misleading. To begin with, it must be realized that such data were collected through a survey of about 47,000 households (i.e. 4 in 10,000) where interviews were conducted. Such data are therefore affected by a sampling error which is duly documented. Thus, for the state of Alabama the number of voters at the election of November 2000 as derived from the answers given in the interviews is reported as (in thousands): 1,953 +- 85. However the real number of votes cast at this election was only 1,666 (Statistical Abstract of the United States 2007, p. 243) which shows that the survey data over-estimate the number of voters by as much as 17%. This observation is not specific to Alabama. For the whole country the survey gives 110,826 +- 697 voters as compared to 105,594 votes cast (the relative error is 5.0%). In other words, some people who declared that they have been voting in fact did not vote. This is perhaps because the persons who declared that they did not vote were asked why; the reasons given range from ``too busy'', to ``ill'' or ``not interested''. Thus, in order to avoid embarrassing questions it was simpler to declare that one has been voting. In physics a systematic error of 17% would be considered as unacceptable.

We described this case in some detail in the hope that it may convince our readers that great vigilance is required especially when one works with data provided by surveys. In the present case we were lucky enough to be able to make a check by using data from another source but this is not always possible.

This website will provide data for interaction strengths in different countries. To make them acceptable, the data for each individual country must be accurate and for comparison purposes they must in addition represent the same variable in each country. From the previous discussion it results that in order to ensure these two conditions, one must: (i) Ascertain the accuracy of the data at the level of each country. (ii) Provide detailed desaggregated data so that users can check by themselves the comparability of the data they are using. These conditions are only seldom realized for the international databases about voter turnout that are currently available. For instance, the International IDEA Voter Turnout Website of the ``Institute for Democracy and Elections Assistance'' at http://www.idea.int/vt provides no information about the number of ineligible voters.