Application of the BMWP-Costa Rica biotic index in aquatic biomonitoring: sensitivity to collection method and sampling intensity

The use of aquatic macroinvertebrates as bio-indicators in water quality studies has increased considerably over the last decade in Costa Rica, and standard biomonitoring methods have now been formulated at the national level. Nevertheless, questions remain about the effectiveness of different methods of sampling freshwater benthic assemblages, and how sampling intensity may influence biomonitoring results. In this study, we compared the results of qualitative sampling using commonly applied methods with a more intensive quantitative approach at 12 sites in small, lowland streams on the southern Caribbean slope of Costa Rica. Qualitative samples were collected following the official protocol using a strainer during a set time period and macroinvertebrates were field-picked. Quantitative sampling involved collecting ten replicate Surber samples and picking out macroinvertebrates in the laboratory with a stereomicroscope. The strainer sampling method consistently yielded fewer individuals and families than quantitative samples. As a result, site scores calculated using the Biological Monitoring Working Party-Costa Rica (BMWP-CR) biotic index often differed greatly depending on the sampling method. Site water quality classifications using the BMWP-CR index differed between the two sampling methods for 11 of the 12 sites in 2005, and for 9 of the 12 sites in 2006. Sampling intensity clearly had a strong influence on BMWP-CR index scores, as well as perceived differences between reference and impacted sites. Achieving reliable and consistent biomonitoring results for lowland Costa Rican streams may demand intensive sampling and requires careful consideration of sampling methods. Rev. Biol. Trop. 62 (Suppl. 2): 275-289. Epub 2014 April 01.

Aquatic biological monitoring (i.e., biomonitoring) has had a major impulse in the past decade, and aquatic macroinvertebrates have become important indicators of environmental quality. Their relatively sedentary behavior, abundance and ease of taxonomic identification are key features that make this group one of the most used, compared to other aquatic organisms such as fish, amphibians or diatoms (Rosenberg & Resh, 1993). Moreover, biomonitoring using aquatic macroinvertebrates has the advantage of giving a retrospective view of what happened on the site after a disturbance occurred, compared with chemical analyzes which are usually punctual (Alba-Tercedor, 1996). However, despite the widespread use of macroinvertebrates in biomonitoring, there is a variety of criteria about what sampling equipment should be used, the number of samples to collect, the appropriate taxonomic level for analysis, and which metric or index should be used to quantify the magnitude and intensity of disturbance (Diamond, Barbour & Stribling, 1996;Hawkins, 2006;Herbst & Silldorff, 2006).
Comparisons between quantitative and qualitative methods have often been conducted to contrast the costs and effectiveness of techniques that are used to describe benthic macroinvertebrate assemblages. For example, Paaby, Ramírez & Pringle (1998) compared kitchen sieve samples with Surber samples in lowland tropical streams. Lenat (1988) and Kerans, Karr & Ahlstedt (1992) compared qualitative and quantitative macroinvertebrate sampling methods for biological monitoring in North American streams. Storey, Edward & Gazey (1991) assessed differences between Surber and kick methods for describing the benthic fauna in Australia. Buss & Borges (2008) tested the efficiency of kick screen nets and Surber samples with different mesh sizes with the goal of standardizing rapid bioassessment protocols in Brazil. In these studies, significant differences between methods were often observed. Qualitative methods can be more efficient for collecting taxa present at a site, but may also provide less statistical power for detecting differences in macroinvertebrate assemblages among sites (Kerans et al., 1992). As a result, there is an ongoing need to evaluate trade-offs between methods used to sample benthic macroinvertebrates and evaluate water quality.
In Central America there is a diversity of biotic indices which have proven to be effective in determining water quality (e.g., Fenoglio, Badino & Bona, 2002;Sermeño-Chicas et al., 2010). In Costa Rica, the Biological Monitoring Working Party-Costa Rica (BMWP-CR) biotic index was adopted in the Executive Decree No. 33903-S-MINAE (Ministerio de Ambiente y Energía, Propuesta de Ley del Recurso Hídrico, 2007) for the assessment of the environmental quality of waters. The decree, in addition to adopting the index for the fauna of the country and assigning scores to each taxon, also recommends several methodologies for the collection of macroinvertebrates depending on the type of water body being assessed. Despite the decree, some studies have shown the importance of continuing to evaluate factors which can influence the outcome of the index, such as sampling time (Maue & Springer, 2008) and the use of different methods for collecting aquatic macroinvertebrates (Stein, Springer & Kohlmann, 2008).
Consequently, we consider it appropriate to continue in this direction, evaluating the different options that can be used in biomonitoring, in order to obtain more accurate and appropriate assessments of the environment quality of water. Thus, the aims of this study were to: 1) compare the sensitivity of BMWP-CR biotic index using two sampling methodologies: qualitative sampling with a strainer and replicate Surber samples, 2) determine the main differences between the abundance and taxonomic richness that can be found with each method and their influence in the result of the index, and 3) examine the sensitivity of BMWP-CR biotic index to sampling intensity using the replicate Surber samples. We compared the strainer with the Surber sampler because the two methods work in very different ways. The strainer is a qualitative method highly dependent on operator experience, and can be used to sample a wide variety of habitats. It is also one of the recommended and commonly used methods in biomonitoring studies in Costa Rica. In contrast, the Surber is a quantitative method that is less dependent on the operator experience, but also offers less flexibility for sampling different stream habitat types.

Study site:
The study was conducted in the province of Limón, Costa Rica (9°35'N, 82°40'W). The area is considered as moist tropical forest (Holdridge, 1967) with an annual precipitation of about 2 500mm (Coen, 1983). The streams studied are tributaries of the lower Sixaola River or Gandoca Lagoon. In total, 12 sampling sites were selected in nine small streams (Fig. 1). At each site, a sampling reach approximately 40 times the mean wetted channel width was established for habitat measurements and macroinvertebrate sampling. The 12 sites included four reference sites in watersheds with extensive forest cover, four sites adjacent to pastures with a forested buffer zone of at least 15m on each side, and four sites adjacent to pastures without a forested buffer zone (Lorion & Kennedy, 2009). All sites were similar in size and channel slope, and had a high percentage of forest cover in the watershed above the site ( Table 1). The substrate in the sampling sites was primarily composed of gravel, pebbles, and sand.
Macroinvertebrate sampling: At each site, aquatic macroinvertebrates were collected in two periods, the first from September to October 2005, and the second from February to April 2006. For the collection of macroinvertebrates in the 2005 period, 10 samples were collected using a Surber sampler with an area of 0.093m 2 and 1mm mesh. Surber samples were taken at randomly selected sites within the sampling reach, with five samples collected from riffles and five samples collected from pools (Lorion & Kennedy, 2009). All material collected was preserved in the field with 95% alcohol, and transported to the laboratory where aquatic macroinvertebrates were separated from the rest of matter using a stereoscopic microscope.
The same day that Surber samples were taken, macroinvertebrates were collected in the same site using a concave mesh kitchen strainer with an aperture diameter of approximately 19cm and a mesh of ~1mm. The collection of aquatic macroinvertebrates was conducted by two people during a period of one hour. Macroinvertebrates were collected by disturbing substrates directly upstream from the strainer in riffles and pools, as well as by scooping up substrates and placing the material in a tray so that invertebrates could be sorted out. The time of collection was divided into four habitats which dominated in the study area: leaf litter and stones in pools, and leaf litter and stones in riffle habitats. The sampling time included the time to collect material with the strainer in different aquatic habitats and the separation of macroinvertebrates. The organisms collected were preserved in the field with 70% Ethanol for later identification.
In 2006, the same methodology for Surber samples described above was used. However, there was a change in collection methodology with the strainer. Instead of dividing the time equally between four habitats, macroinvertebrates were collected in all aquatic habitats at the site and the sampling time in each habitat was adjusted based on the relative abundance of this habitat. As in the 2005 sampling period, two people collected macroinvertebrates during a period of one hour. However, in 2006 a third person assisted in the separation of macroinvertebrates from other material during the time of sampling. In one sampling site in 2006, macroinvertebrate collection with the strainer occurred one week after sampling using the Surber. This was due to a heavy rainstorm that interrupted sampling and subsequent rainfall that maintained stream flows unfit for aquatic macroinvertebrate sampling for a period of four days. All macroinvertebrates collected were identified to the lowest possible level (genus or family for aquatic insects) using Merritt, Cummins & Berg (2008) and Springer, Ramírez & Hanson (2010). The collected material is deposited in the Aquatic Entomology Collection, Museum of Zoology, University of Costa Rica.

Analysis of the biotic index BMWP-CR:
In each site the BMWP-CR index value was determined based on the presence of macroinvertebrate families and their scores listed in Executive Decree No. 33903-S-MINAE (Ministerio de Ambiente y Energía, Propuesta de Ley del Recurso Hídrico, 2007). The BMWP-CR assigns sites to one of six categories based on the index value: Excellent water quality (>120), good water quality (101-120), regular water quality with some contamination (61-100), bad water quality (36-60), bad water quality with a high level of contamination (16-35), and very bad water quality (<15). The relationships between the results of abundance, taxonomic richness, and BMWP-CR value between the two sampling methods were analyzed using Pearson's correlation coefficient. Two-way ANOVA was used to determine whether index scores varied by site type (reference, pasture with forest buffer, and pasture without forest buffer) and sampling methodology. The normality of the variables was tested and log transformed (log 10 [x +1]) when necessary. Statistical analyzes were run using the R package (R Development Core Team, 2011, Version 2.13.1).
To determine the influence of sampling effort on the BMWP-CR index value, the Surber samples collected in 2006 at two of the sampling sites were used. For the analysis, we took into account a reference site and a site in pasture that has been impacted by the removal of riparian vegetation and sedimentation. For each site, we determined the index value based in the macroinvertebrates collected in two, four, six, eight and ten Surber subsamples randomly selected from the overall group of 10 samples collected. Each subsample included an equal number of samples from pools and riffles. We repeated this process ten times to determine the mean and standard deviation for each subsample.

RESULTS
In 2005 Table 2 shows the differences between taxa collected using each of the methods.  The influence of the method of collection was also apparent in the relative abundance of different macroinvertebrate families (Table 3). Over 55% of individuals in Surber samples belonged to Leptophlebiidae, Chironomidae, and Leptohyphidae, while these groups accounted for only 40% of individuals in the samples collected in the strainer. Other organisms relatively large in size, such as snails (Thiaridae) and shrimp (Atyidae), were more abundant in strainer samples.
Family richness in 2005 was higher with the Surber (average 28, range 24-33) than with the strainer (average 16, range 13-19). In 2006, the same trend was found between the Surber (average 26, range 17-35) and the strainer (average 16, range 13-20). Consistent with this difference in family richness, the scores of the BMWP-CR biotic index were significantly higher at sites based on the Surber than on the strainer (Table 4). In 2005, significant differences were found in index  Fig. 2A) and 2006 (Fig. 2B). Water quality classifications differed between sampling methods for 11 of the 12 sites in 2005, and 9 of the 12 sites in 2006. Most sites were classified as having excellent water quality using the Surber samples, while most sites were classified as regular or good using the strainer method (Fig. 2). Reference sites assessed with Surber were constant in water category and higher than strainer in both years. A positive correlation between the BMWP-CR index score evaluated with Surber and strainer was observed in 2005 (Fig. 3A). However, this correlation was not significant in 2006 (Fig. 3B) and was relatively weak in both years.
We did not observe a strong effect due to changes in methodology and the number of collectors in the 2006 period using the strainer. More macroinvertebrates, on average, were collected at each site with the strainer method in 2006, but the number of macroinvertebrate families and average BMWP index scores were very similar to 2005. Across all sites, there were fewer taxa that were only collected in Surber samples and more that were collected with both methods in 2006 compared to 2005 (Table 2). BMWP-CR index classifications differed for 4 of the 12 sites between 2005 and 2006 with both collecting methods.
Finally, sampling effort was found to have a strong influence on BMWP-CR index scores in the reference and impacted sites. Abundance (Fig. 4A), family richness (Fig. 4B) and BMWP-CR index score (Fig. 4C) all increased according to the number of Surber samples without reaching an asymptote. In the reference site, family richness and BMWP-CR index curves increased more rapidly than in the impacted site. Abundance of macroinvertebrates in both curves showed a steady increase, but more variability was observed in the reference site. Interestingly, when comparing two Surber samples with the strainer method (Fig.  4), it was observed that abundance was similar between methods in the reference, but higher with the Surber in the impacted site. The number of families and BMWP-CR index value were higher with two Surber samples in the reference, but lower in the impacted site (Fig. 4).

DISCUSSION
Results show that sampling method selection has a large influence on the outcome of the BMWP-CR index. Family richness and BMWP index scores were significantly higher using multiple Surber samples than with the qualitative strainer method, which is one of the most used and recommended methods in aquatic biomonitoring studies (e.g., Beatty, McDonald, Westcott & Perrin, 2006;Maue & Springer, 2008;Ramírez, 2010). Therefore, we recommend caution when using less intensive qualitative methods because, as indicated by our study, they can underestimate macroinvertebrate diversity and water quality. We consider essential to understand the advantages and limitations of different collecting methods and properly define the purpose of the study (e.g., environmental impact or ecological studies) when evaluating water quality.
In terms of abundance and taxonomic richness, the higher effectiveness of the Surber sampler is contrary to previous reports. For example, Paaby et al. (1998) found a greater abundance of aquatic macroinvertebrates using the strainer method than the Surber. Nevertheless, it is important to note that they took three replicas of Surber (0.33m 2 total area sampled) and compared them with ten replicas collected with the strainer (0.40m 2 total area sampled). We did not attempt to quantify macroinvertebrate density on a per-area basis for the strainer method due to the qualitative nature of sampling that involved disturbing substrates upstream from the strainer, scooping up substrates, and passing the strainer through submerged roots and vegetation. Nevertheless, the abundance of macroinvertebrates in our Surber samples was several times higher than reported by Paaby et al. (1998), and resulted in a much higher abundance overall relative to the strainer method. Paaby et al. (1998) also found that the Surber and strainer were similar in the number of taxa collected, while we consistently found higher family richness with the Surber method. This would be expected given the much larger number of individuals collected with the Surber method in our study, and the sensitivity of taxa richness to sample size (Magurran, 1988). An important difference between our study and that of Paaby et al. (1998) was the way in which macroinvertebrates were separated from other matter in the samples collected with the strainer. In Paaby et al. (1998), macroinvertebrates were separated in the laboratory using a stereoscopic microscope, whereas in our study macroinvertebrates were separated in the field without using a microscope. We decided to separate the strainer samples in the field because this is a common technique in biomonitoring studies, but it is important to recognize that the method of separation can also have effects on the study results. It is possible that differences between the Surber and strainer methods in our study, including total abundance, richness, taxonomic composition, and the presence of small taxa were primarily a result of the difference in separation method. Interestingly, a study in Costa Rican rivers using a similar strainer technique, Maue & Springer (2008) observed higher abundance and taxa richness in samples that were separated in the field during 120min of sampling than in samples that were . Note: Other qualities were not found in this study; however it is important to recall that the decree has a total of six categories. Water Quality taken back to the laboratory for sorting. These contrasting results highlight the importance of site-specific characteristics and collector experience on the results of biomonitoring using qualitative techniques. Consistent with patterns in abundance and richness, it was determined that the Surber method was also generally more effective for collecting uncommon and rare taxa compared to the strainer method (e.g., Ecnomidae). The presence or absence of rare taxa may have a significant effect on the index score, as many rare groups belong to families with high index values. Storey et al. (1991) found that the Surber method was more effective for collecting rare and uncommon taxa compared with samples collected with a kick net, probably because the Surber involves a greater effort per unit area sampled.
There were other differences between the two collection methods that appear to be associated to the type of habitat sampled with each method. For example, some odonates which live in submerged roots or under leaf litter were rare or absent in Surber samples. In addition, one of the constraints of the Surber method is that it operates in a passive manner and relies on water flow to capture macroinvertebrates (Brooks, 1994). This is why the strainer method may be more effective in habitats with little flow. The Surber sampler is also limited to relatively shallow sites with small to moderate sized substrates, while the strainer can be used to effectively sample a wider array of stream habitats.
Time and effort during collection and sample analysis are two important factors to take into account when selecting sampling methods. In this study, collecting ten Surber samples required two hours, on average, by three operators in the field, while collecting a sample with the strainer required about one hour in total with two to three operators. In the laboratory, the time required to analyze one Surber sample was up to four hours, on average, including the separation of macroinvertebrates from debris (e.g., sand, sticks, and leaves) and taxonomic identification. The laboratory work for the strainer sample, meanwhile, only required identification time, which averaged around three hours. According to the above, in our work the complete analysis of macroinvertebrates of a site evaluated with the Surber method required an average of 46hrs, whereas with the strainer required an average of five hours. Stein et al. (2008) presents a comparison of the time involved in sample analysis for a D net, which was used in a similar way than a Surber sampler, versus strainer samples. The analysis of material collected with D nets took from 7.1 to 15.1 hours, while material from strainers required only five to seven hours. Importantly, the separation of aquatic organisms from the rest of the material was performed in the laboratory using a microscope, in the same way as in our study. Separating macroinvertebrates from debris in the laboratory using a microscope would have significantly increased the time investment for the strainer method in our study, but may have resulted in more reliable estimates of taxa richness and more accurate BMWP-CR index scores.
We consider ten Surber samples as a good estimate for evaluating water quality, mainly due to the homogeneity of the results in each category, and the number and consistency of the taxa. Taxa richness did not appear to reach an asymptote at this level of sampling, however (Fig. 4). Some authors, such as Buss & Borges (2008), suggested that more than six Surber samples should be collected for an adequate assessment of the biological quality of streams in the Atlantic Forest area in Brazil. Carter & Resh (2001) found that the most agencies in USA using a Surber sampler collected between three and eight samples for each site. It is important to continue with studies aimed at this issue in Costa Rica in order to provide a thorough grounding in the appropriate use of the Surber sampler and other quantitative methods for possible use in aquatic biomonitoring studies, since quantitative methods do not appear as an option in the official decree.
Although we observed low correspondence in BMWP-CR index values and water quality classifications between the two collecting methods used in this study, differences in BMWP-CR index scores among the three site types (forested reference sites, sites in pasture with a riparian forest buffer, and sites in pasture without a forest buffer) were evident using both methods in 2005. Index values based on Surber and strainer samples both showed the same gradient of disturbance among site types that was evident in diversity comparisons by Lorion & Kennedy (2009). The same Surber samples were used in both studies, and so the similarity in those results is not surprising. However, the fact that strainer samples in 2005 produced similar results indicates that BMWP-CR index values based on qualitative sampling can provide useful information about disturbance gradients even when most sites have relatively good water quality. More intensive, quantitative sampling may be required to consistently detect these differences, however, as shown by the lack of differences among site types with the strainer samples in 2006.
In conclusion, it was demonstrated that the BMWP-CR index is sensitive to, and its response is dependent on, the equipment used (Surber or strainer) and the intensity of the sampling effort. Our study shows that intensive sampling with a Surber sampler resulted in much higher BMWP-CR index scores and different water quality classifications compared to qualitative sampling with a strainer. Finally, we emphasize that there advantages and limitations of each of the methodologies used in this study, and these should be taken into consideration before starting a biomonitoring study.