Is the Central Valley of Costa Rica a genetic isolate?

In the last decade, the Costa Rican Central Valley population (CRCV), has received considerable scientific attention, attributed in part to a particularly interesting population structure. Two different and contradictory explanations have emerged: (1) An European-Amerindian-African admixed population, with some regional genetic heterocigosity and moderate degrees of consanguinity, similar to other Latin-American populations. (2) A genetic isolate, with a recent founder effect of European origin, genetically homogeneous, with a high intermarriage rate, and with a high degree of consanguinity. Extensive civil and religious documentation, since the settlement of the current population, allows wide genealogy and isonymy studies useful in the analysis of both hypotheses. This paper reviews temporal and spatial aspects of endogamy and consanguinity in the CRCV as a key to understand population history. The average inbreeding coefficients (α) between 1860 and 1969 show a general decrease within time. The consanguinity in the CRCV population is not homogeneous, and it is related to a variable geographic pattern. Results indicate that the endogamy frequencies are high but in general it was not correlated with α values. The general tendency shows a consanguinity decrease in time, and from rural to urban communities, repeating the tendencies observed in other countries with the same degree of development, and follows the general Western World tendency. Few human areas or communities in the world can be considered true genetic isolates. As shown, during last century, the CRCV population has had consanguinity values that definitively do not match those of true genetic isolates. A clear knowledge of the Costa Rican population genetic structure is needed to explain the origin of genetic diseases and its implications to the health system. Rev. Biol. Trop. 52(3): 629-644. Epub 2004 Dic 15.

Few studies have been made on the genetic history of the Costa Rican general population. Today, this majority population is in clear expansion. Therefore, valuable information about its genetic variability and origins is missing. We are also loosing the possibility of applying such information in research and in solving genetic epidemiology and health problems .
Two different explanations of the CRCV population structure are emerging. Curiously, they are contradictory and probably exclusive: (1) An admixed population On the one hand, diverse ethnohistoric (Thiel 1902, Sanabria 1957, Meléndez 1982, 1985, Acuña-León and Chavarría-López 1991, Meléndez Obando 1993, 1997, Madrigal and Ware 1997) and genetic sources have described the CRCV population as: with a multiracial origin, product of an important admixture between ethnic groups from Europe, Africa and the American continent (Roberts 1978, Morera and Barrantes 1995, Morera et al. 2003, with a regional genetic heterocigosity that does not support the idea of genetic homogeneity or the idea that this population is very different from other Costa Rican regional populations . In summary, the CRCV population is similar to other Latin-American populations (Sans 2000, Salzano and Bortolini 2002, Salzano 2004) and it has a moderate degree of consanguinity (Freire Maia 1968, Barrantes 1978, Zumbado and Barrantes 1991, Madrigal and Ware 1997.

(2) A genetic isolate
On the other hand, similarly to other population models, it has been postulated that the CRCV population is genetically homogeneous (Uhrhammer et al. 1995, Freimer et al. 1996b with a recent founder effect (León et al. 1992, Freimer et al. 1996a, 1996b that particularly is of European origin (Leon et al. 1992, Saborío 1992, Freimer et al. 1996a. The ethnic admixture has been mentioned , Freimer et al. 1996b but it is generally considered that it can be removed in genetic studies (Frants 1999). The CRCV population is considered genetically isolated (Uhrhamer et al. 1995, Freimer et al. 1996a, 1996b, with a high intermarriage rate (Meléndez 1982, Stone 1982, Saborío 1993, and with a high degree of consanguinity (Freimer et al. 1996b).
This second hypothesis is frequently present in recent literature (McInnes et al. 1996, Reus and Freimer 1997, Shah et al. 1997, Telatar et al. 1998b, Escamilla et al. 1999, 2001, Auger et al. 1999, Frants 1999, Bech-Hansen et al. 2000, Escamilla 2001, Garner et al. 2001, Sobacchi et al. 2001, Ophoff et al. 2002, Carvajal-Carmona et al. 2003, Venegas et al. 2003, Mathews et al. 2004. Both hypothesis are contradictory and therefore more research is need on the structure of the Costa Rican population. Fortunately, almost since the establishment of the current population, there is an extensive civil and religious documentation that includes biodemografic data. This fact allows isonymy studies and wide genealogic reconstructions (see Melendez Obando 2004), very useful in the analysis of both hypothesis. This revision analyzes the geographicalspatial aspects of endogamy and consanguinity in the Central Valley of Costa Rican as a key to understand population history. Such findings should be useful to answer the question: Is the Costa Rican Central Valley a genetic isolate?

CONSANGUINITY STUDY
Family and population consanguinity can be estimated by means of the inbreeding coefficient (F). This coefficient explains the probability that an individual is the carrier by inheritance of two identical alleles in a given locus because they have a common origin (Crow andMange 1965, Salzano andBortolini 2002). Genealogy is the classical method to study consanguinity (inbreeding) in historic populations. This method allows the analysis of at least three generations (Cavalli-Sforza and Bodmer 1971). Data are obtained in civil or religious records that compulsively report inbreeding marriage besides other biodemografic variables (Zumbado and Barrantes 1991). Anthropologists have also trusted the isonymy method which is based on the frequency of marriages with the same surname in a population (Madrigal and Ware 1997). Furthermore, if a historic population is studied for a sufficiently long period, it is possible to establish if the endogamy and consanguinity levels are stable during the study period or if they change in time (Madrigal and Ware 1997).
Various consanguinity papers on the Costa Rican Central Region populations have been published. Freire Maia's (1968) paper was pioneering but general and preliminary; it determined a (α = 114 x 10 5 ) for a diocese. Two specific studies analyzed in detail the Dota parish (Barrantes 1978) and the Escazú parish (Madrigal and Ware 1997), both inside the Central Valley. In turn, Zumbado and Barrantes (1991) estimated consanguinity in the central provinces of Costa Rica (CRCV) and its evolution during the last 100 years, as will be detailed further on.

MATERIALS AND METHODS
For this revision, we compiled published values of inbreeding coefficients (α) from the five zones in which the four central provinces of Costa Rica were operatively divided (CRCV: Alajuela, Cartago, Heredia and San José) and from the 44 parishes that compose them (distribution in Fig. 1). In general, the data came from the marriage files, dispense books and marriages books in the Metropolitan Curia Archive, and from the Catholic Church parish archives, as described in detail in the original papers (Barrantes 1980, Zumbado and Barrantes 1991, Madrigal and Ware 1997. The three studies included the period from 1800 to 1969 in an uneven way and therefore only the period from 1860 to 1969 is methodologically comparable. During this period, the inbreeding coefficient for each population was calculated with the formula α = ∑ Nr Fr/N: where Nr is the marriages number with a coefficient Fr, and N is the total number of studied marriages. A Costa Rican Central Valley consanguinity distribution map was traced with the interpolation of the inbreeding coefficient (α) estimated in the network nodes for a total of 1000 points, of which 44 are real. Following Calafell and Bertanpetit (1994) the value of each node was given by the surrounding points average according with the squared inverse of the distance between the nodes of each real sample. Interpolation was calculated with the PC SURFER program (Golden Software, version 4.15). The interpolated surface is an spherical rectangle limited by the parallels 9º30'N and 10º20'N and by the meridians 83º40' and 84º40' W. Once the interpolated surface was created, the SURFER TOPO program was used to represent topographically the three-dimensional surfaces with flat level curves. The result of this procedure is a continuous three-dimensional surface (two are geographical and the third is the inbreeding coefficient). It can be understood as an undulated plot or as a landscape with valleys (minima frequencies) and mountains (maxima frequencies). It must be stated that the objective of these maps is to see the inbreeding coefficients in the geographical space. As there are no well-established cline tendencies, a given value of an area without direct study can not be considered an expected value (Calafell 1995). We underline two level curves that divide the surface in three operative areas: high inbreeding areas (α > 500), middle inbreeding areas (300 > α < 500) and low inbreeding areas (α < 300).

CONSANGUINITY IN THE CRCV
The inbreeding average coefficients (α) between 1860 and 1969 in the four CRCV provinces shows a general reduction in time. The tendency is more clear starting from the 1940-1949 period, until the last period, when the consanguinity coefficient is relatively low (Fig. 2). The zones of San José 1 and Heredia show the higher consanguinity values. It has been found that in CRCV most of the inbreeding unions occur between first degree and second degree cousins, explaining total value of α (Zumbado and Barrantes 1991) and repeating the tendencies observed in other countries (Cavalli-Sforza and Bodmer 1971). In general, these results contradict the non-quantitative belief that the CRCV population has a high degree of consanguinity (Freimer et al. 1996b).
Although the general tendency shows a consanguinity decrease in time in the four provinces, when analyzing in detail the geographic space the inbreeding coefficients found in each parish present significant variations among them (Fig. 3). It is remarkable that the consanguinity of CRCV population is not homogeneous, the pattern that it shows is to some extent related with orography.
Analyzed by parish, the marriages tend to be random in the main cities of the Valley. Nevertheless there are some localities in the central region of Costa Rica whose consanguinity values are higher, as it is the case of San Pedro de Poás, Belén, San Isidro de Heredia and Acosta (Fig. 3). During the middle of the 19 th century the first three towns have inbreeding values as high as those of many genetic isolates. In these specific regions    consanguinity could be determined by groups of families who practice consanguineous marriages motivated by the lack of mates, as a result of high emigration towards other parts of the country, including the Central Valley (Zumbado and Barrantes 1991). This was the case of the Dota region (Barrantes 1978) as it will be considered further on.

THE GENEALOGICAL APPROACH
Several studies have made extensive genealogical analyses in CRCV population in an effort to identify genes associated with simple and complex Mendelian diseases (Leon et al. 1981, Leon et al. 1992, Frants 1999, Solís 2000, Leal et al. 2001, Berghoff et al. 2004. The genealogy of a CRCV healthful farmer is shown in figure 4 (BM, unpublished data). It shows all the inbreeding unions found in his ancestors between the 16 th and 20 th centuries. He is indirectly tied with the CMT and BD mapping efforts made in Costa Rica. He came from a parish with an average inbreeding value (α= 367 x 10 5 ). It is possible to detect a relationship between the consanguinity cycles and the agricultural expansive waves within the Central Valley. This type of pedigree -without a quantitative analysis (see Leon and Fournier 2000)-may be useful to understand the overestimated importance that has been attributed to consanguinity in this population. In any case, this pedigree illustrates the possible usages of Costa Rica's civil, hospital and ecclesiastic archives for genetic and anthropological investigation (see Melendez Obando 2004). It also emphasizes one of the comparative advantages of this population in the study of genetic mapping.

ENDOGAMY IN THE CRCV
The proportion of husbands and wives born in the same group of a defined type, is The endogamy analysis between 1860 and 1969 in the four provinces of CRCV shows that the endogamy percentage is high and shows little temporary variation, with a very slight tendency to decrease (Fig. 5). But when the data collected in the four provinces are considered together, there is no significant relationship between endogamy and the consanguinity value (r=0.21; p>0.05). This pattern suggests the presence of moderate emigrations and a predominance of unions between couples born in the same locality, but with different types of consanguineous union. This suggests the presence of few unions of close relatives (for example uncleniece), indicators of high consanguinity in a population. In conclusion, what has changed in time are the types of consanguineous unions, less frequent with time, which decreases the inbreeding coefficient (Zumbado and Barrantes 1991).
Many studies indicate that Costa Rica's populations, specially those of the central region, are highly endogamic (Meléndez 1982, Stone 1982, Saborío 1993, Frants 1999; apparently this is a genealogical deduction. As shown in this paper, the endogamy quantitative analysis of the last century in the four provinces of CRCV corroborates this belief. The most important genetic consequence of keeping endogamy, while simultaneously reducing consanguinity -considering the number of generations staying in the region (15)(16)(17)(18)(19)(20), is the potential increase in carriers of some traits in heterocigotic condition. As is well known, the proportion of heterocigotic individuals is 8 to 10 times higher than the proportion of recessive homocigotes. The continuous breeding between no-consanguineous carriers, or carriers with a distant consanguinity, can rise if endogamy is maintained, increasing the risk of recessive homocigotes with hereditary illnesses or with genes that predispose the individual to complex disorders. At the same time this characteristic represents an advantage for genetic mapping studies, because the probability of finding homocigotes with identical-bydescent alleles increases.
Two parishes from the Costa Rican Central Valley have been analyzed for prolonged and continuous periods of time. Both populations have particularities that deserve reviewing. SPECIFIC STUDIES: DOTA Barrantes (1978) carried out a consanguinity and population structure study in the Dota parish for a continuous period of 75 years (1888-1962) with a sample of 1068 marriages of which 46.59% were consanguineous. The endogamy percentage is high and greater in the consanguineous marriages (80%) that in the non-consanguineous marriages (61%). Endogamy tends to increase in time in both types of union. The inbreeding coefficient shows fluctuations (α=476-194 x 10 5 ) (Fig. 1), with an increase from 1888 to 1917, followed by a decrease, staying with high values in nonmigratory marriages with a greater frequency of unions between second cousins.
The marital and migration distances are short and diminish in the final periods of the study. There is a positive correlation (r= 0.71; p<0.05) between these distances in the consanguineous marriages but not so in the non-consanguineous ones. The marital distances are smaller in consanguineous marriages. Immigration is minimum and reduced to short distances, whereas the post marital emigration is intense (37%) and to greater distances. The emigration behavior is identical in both types of marriages.
In this parish, the inbreeding coefficient substantially increased when analyzing only the emigration of the inhabitants. Nevertheless, this region is considered "a rejecting area", this means that the emigration is high and the immigration is low. This characteristic is probably uncommon in other communities from Costa Rica's central region (Barrantes 1978).
SPECIFIC STUDIES: ESCAZÚ Madrigal and Ware (1997) studied the inbreeding levels in the small, clearly established breeding population of Escazú, Costa Rica, during 1800-1840 and 1850-1899. Inbreeding was studied by analysis of ecclesiastical dispensations and by two isonymy methods (Crow andMange 1965, Pinto-Cisternas et al. 1985). As expected, the dispensation inbreeding coefficients were lower than those obtained through isonymy. However, the three methods indicate that consanguinity increased in the community during the second part of the nineteenth century. Madrigal and Ware (1999) also studied the mating pattern in (Escazú) during the same period. They found that a large proportion of marriages involved individuals who were members of long-standing or core families. Indeed, 27 families provided 56% of all consorts throughout the period under study. When new surnames appeared in the records (presumably as a result of immigration), they were introduced more frequently by males, indicating that more males than females migrated into the community. The core families did not mate preferentially among themselves but appear to have readily accepted the migrants. Indeed, the greatest preponderance of repeated-surname marriages was that expected by chance. However, non-random surname repetition is evident when marriages between non-illegitimate consorts are analyzed. That is, the frequency of repeated-pair surname marriages is statistically significant in marriages involving brides and grooms who carried two surnames. Interestingly, significant departures from random repetition of surnames occurred during the decade in which the great cholera epidemic affected Costa Rica and during the decade following it. This departure from panmixia supports the notion that mating patterns were altered as a result of the epidemic, a suggestion they made previously when they reported that inbreeding increased in these same decades (Madrigal and Ware 1997).

CRCV: A GENETIC ISOLATE?
Few human areas or communities in the world can be considered genetic isolates: relatively small populations that present little or no genetic interchange with other populations. Generally these are isolated due to geographic, religious, ethnic or socio-psychological reasons. Isolation can be temporary, particularly due to the 20 th 's century tendency of "breaking up of isolates" (Vogel and Motulsky 1986). The amount of existing consanguinity in the human populations is very small; nevertheless, the genetic isolates characteristically present extreme consanguinity values, with coefficients of 1 x 10 5 , but they are exceptional cases. The populations of Andra-Pradesh in the south of India (α=1980-3200 x 10 5 ), the Samaritans from Israel and Jordan (α=4340 x 10 5 ), the islanders of Tristan de Cunha (α=3650 x 10 5 ), and the Pennsylvanian Dunkers (α=2540 x 10 5 ) are remarkable cases. Even in small isolated populations the consanguinity values are not necessarily elevated (Cavalli-Sforza and Bodmer 1981).
It has been frequently stated -that CRCV population is genetically isolated, without providing quantitative data to confirm that statement (Uhrhamer et al. 1995, Freimer et al. 1996a, Freimer et al. 1996b, McInnes et al. 1996, Frants 1999, Escamilla et al. 1999, Escamilla 2001, Garner et al. 2001, Ophoff et al. 2002, Carvajal-Carmona et al. 2003, Mathews et al. 2004. As shown by the quantitative studies of the present paper, the Costa Rican Central Valley population presents consanguinity values that definitively do not match those of true genetic isolates (Fig. 6). The average population is not an exception and therefore their consanguinity values are similar to those of other countries with the same degree of development. Thus, in the analyzed populations of Latin America (n=15), the samples present two clear tendencies in reduction of consanguinity levels: 1) in time and 2) from rural to urban communities (Salzano and Bortolini 2002). This follows the general Western World tendency towards consanguinity decrease. This is the expected result in the fast disintegration of genetic isolates that characterizes the 20 th century, a tendency that is accelerated by the development of travel routes and mass media that increase migrations and inter-ethnic hybridizing (Vogel and Motulsky 1986).
Additionally, it must be mentioned that DeLisi et al. (2002) were doubtful about the genetic isolate statement, after their fruitless attempt to locate some genomic linkage in the schizophrenic population. They also failed to detect consanguinity in the parental generation of their schizophrenia patients, although there was some evidence of consanguinity between the descendants.
However, we may consider that recently there has been an implicit rectification from some of the main adherents to the "genetic isolate" hyphotesis (refered to the CRCV population). They found a significant Amerindian admixture in their Costa Rican sample (Carvajal et al. 2003), despite a selection biass towards "Hispanic" individuals with Spanish surnames, as they admitted several times (see Leon and Fournier 2000). It does not matter if the CRCV population continues to be indistinctly classified as a "genetically isolated" (Mathews et al. 2004) or, as a "relatively closed population" (Freimer and Sabatti 2004).
It is necessary to consider the adjectives of "consanguinity" and "genetically isolate" referred to CRCV population as introductory ornaments in almost all the papers about identification of genes associated with complex disorders. They do not represent conclusions from those studies. Then we can ask the question: Why has the population from CRCV been so lightly compared with other population models such as the Amish, Iceland, Finland, Quebec (French Canadians), and Newfoundland?
The only reasonable explanation that comes to our mind is an "Iceland effect" in the gene mapping literature, where the "Icelandlike populations" are quite popular. Nevertheless, a mistaken statement does not become true just because it is repeated many times in the scientific literature.
We agree with the articles cited before, in this: the Costa Rican population is an excellent study model for the search of genes associated with simple and complex diseases. The reason is that this population has a combination of practical characteristics and a singular genetic structure. In the gene mapping investigations the only really important biological characteristics are: (a) the existence of linkage disequilibrium, and (b) a certain degree of genetic homogeneity at the level of specific alleles, associated with pathologies that are themselves specific. Additionally, these characteristics are useful: (c) a recent origin of the population and (d) an important demographic expansion. The Costa Rican general population fulfills all of these criteria (specially the one from CRCV).
Nevertheless, the verified admixture between Europeans, Africans and Native Americans that originated this population (Morera and Barrantes 1995, Carvajal-Carmona et al. 2003, Morera et al. 2003) is probably the main force that has caused linkage disequilibrium. Of course, it is also possible that the founding effect is a force that has participated in the genetic modeling of this population. The founding effect has been demonstrated on small localities within the Central Valley (Leon et al. 1981, Leon et al. 1992, Solís 2000, Leal et al. 2001, nevertheless its participation must be verified allele by allele. For example, the only disease -so far studied-in Costa Rica showing a clear founder effect is Wilson's disease. So, 61% of patients share an identical mutation and its surrounding haplotype at the WD gene (ATP7B), while in North America the most common mutation occurs in only 38% of patiens (Shah et al. 1997, Escamilla 2001.

APPLICATION: EVOLUTIONARY MEDICINE
Leaving aside the controversy, knowledge of the Costa Rican population structure in terms of consanguinity and their possible implications in health (when favoring the appearance of recessive illnesses, or when favoring susceptibility to develop complex diseases), is a necessity if an advance towards the new evolutionary medicine is wanted. This knowledge will strengthen the health system in the next years.
In addition, the genetic diversity geographic patterns allow us to make inferences about the population history and about the evolution of hereditary diseases, whose effects can be probed with other approaches (Menozzi et al. 1978, Barbujani 2000. Many questions concerning the evolution of normal and pathological characters demand answers, and a prudent and systematic application of the geographic analysis could become a correct way to approach them (Barbujani 2000). Consequently, any genetic study that implies the use of specific markers to search for genes, for the epidemiological analysis of genetic diseases and the study of individual susceptibility to diseases and drugs in the Costa Rican Central Valley, must ideally include population structure.
Singular populations, like the CRCV with its particular characteristics, represent natural experimental models that provide excellent opportunities for the study of normal and pathogenic processes, risk factor clarification and the study of origin and dissemination of important human diseases.

ACKNOWLEGMENT
The authors are deeply grateful to Zaidett Barrientos for her help to improve this paper.