Cuba Publications in the Science Citation Index Expanded: publication characteristics, institutions and journals

Introduction: In contrast with other tropical countries, Cuba has been frequently studied from the point of view of scientometrics. It has been reported that Cuban researchers often failed to cite other Cuban researchers or to collaborate with them and that 78 % of the Cuban scientific output is published in Cuban journals and mostly missed by Scopus and the Web of Science. However, there are no recent comprehensive studies of science in Cuba. Objective: To quantify the Cuban scientific output, in all disciplines, until the beginning of 2021. Methods: We analyzed publications from Cuba, dated 1900 to 2021, that reached the index Science Citation Index Expanded. Results: We retrieved a total of 23 576 publications, mostly articles. In this particular database, English is the dominant language, and, over time, articles have become longer and increased the number of authors and references. Numerically, the leading institution is Universidad de La Habana. Research is strongly concentrated around medical subjects. Collaboration teams led by foreign authors have more citations recorded by the data- base, where the number of Cuban articles has decreased strongly since 2008. Conclusion: Our conclussions only apply to the fraction of Cuban science covered by the Science Citation Index Expanded (under 22 %). We recommend three main improvements: to increase collaboration among Cuban scientists; to expand research areas beyond medical subjects; and to improve the quality of Cuban journals.

https://doi.org/10. 15517/rbt.v69i3.46976 In contrast with other tropical countries, Cuba, a Caribbean island of 11 million inhabitants, has been frequently studied from the point of view of scientometrics. A report that covered clinical essays, from 1991 to 2001, found that a mean of 17 articles were published every year (74 % in English), with a mean of seven authors each, that in 96 journals from 17 countries (Ruiz et al., 2002). The values for dengue studies were consistent with that report, with a mean of 6 authors per article, in 45 journals and 12 countries. The second study additionally reported a mean of six citations per article, in total; and a historical increase in the number of articles in the field of biomedicine, locally led by the Instituto de Medicina Tropical "Pedro Kourí" (Arencibia-Jorge et al., 2008). All reports, however, have limited value because American databases do not reflect the reality of Cuban publications, which appear mostly in Spanish and in journals not properly covered by those databases or other indices (Araujo Ruiz et al., 2005).
A decade ago, other studies found that education, just like other areas of research, grew in output from 2003 to 2012 (Cruz Ramírez et al., 2014) and that Scopus and the Web of Science covered poorly the growing scientific output of Cuba, making them insufficient to study Cuban science (Arencibia-Jorge & de Moya-Anegón, 2010).
More recent authors have reported that Cuban researchers often fail to cite other Cuban researchers or to collaborate with them , a result also reported by Zacca-González et al. (2015). Along the same line, a study of the state-owned pharmaceutical company found a strong specialization of research on vaccines, little collaboration and low visibility of publications (Arencibia-Jorge et al., 2016).
For the period 2003-2012, it was reported that Cuban authors were research team leaders only in small studies, published in Spanish and with low citation values in foreign databases . Over time, Cuban researchers working in Europe -mostly, students-decrease their collaboration with researchers working in the island, but their collaboration is numerically significant, with 991 institutions and 58 countries participating (Palacios-Callender & Roberts, 2018).
More recently, a comparative study in several databases estimated that 78 % of the Cuban scientific output is published in Cuban journals, and that, of those published in foreign journals, Scopus missed 33 % and the Web of Science missed 38 % (Galbán-Rodríguez et al., 2019).
The most recent study considered only evaluations of medical syllabi in Cuban journals and basically found that they have a mean of 0.6 citations per year (Fernández et al., 2020) Considering that there is a need for a study that (1) considers all fields of Cuban science and (2) is updated, we analyzed article characteristics (subject, language, authorship), institutions and journals that appear in the Web of Science, as well as citations covered by that particular database, for all fields and updated to January 2021. We present the results as a solid basis for better decisions in the administration of science in Cuba.

MATERIALS AND METHODS
We used the Science Citation Index Expanded ("SCI-EXPANDED"), Web of Science Core Collection (updated January 21, 2021), and did an advanced search using the word "Cuba" in field country (CU) limited to the period 1900 to 2019. The journal impact factors (IF 2019 ) were extracted from the 2019 Journal Citation Reports (JCR). This study is part of a series on the scientific output of tropical countries and all methodological details have repeatedly been published and can be consulted in Calahorrano et al. (2020). We chose journal articles for further analysis because they represented the majority of document types, as well as whole research ideas and results. We also searched for relationships between article subject and number of journals in Web of Science categories. The Web of Science can classify a document in more than one type, for example, 120 proceedings papers were also classified as articles, and thus the sum of percentages can be higher than 100 %.
The second author, who is familiar with Cuban science, corrected database misspellings, errors and variability in institutional names. The "reprint author" field is the corresponding author, thus this study used "corresponding author". If authorship is not defined as first or corresponding author, the first author was defined as both, similar to single institutional articles. The countries, institutions, and collaboration were obtained from the author's affiliation. "Country independent articles" and "single institute articles" were defined as "author's affiliation is from Cuba" and "only one institute", respectively. "Internationally collaborative articles" means that the coauthors are from different countries and "inter-institutionally collaborative articles" that the coauthors are from different institutions inside Cuba.

Document type, language, year of publication and citation impact:
We retrieved 23 576 publications. In this database, most of the publications are articles, distantly followed by meeting abstracts and letters (Appendix 1). Articles have a mean of 21 authors and 15 citations; meeting abstracts 7 authors and 0.2 citations; and letters have on average 2 authors and 2 citations. The most cited document types are book chapters, with a mean of 48 citations; reviews with 32 citations; and articles with a mean of 15 citations (Appendix 1).
English is, by far, the dominant language. Spanish follows distantly, and the rest of the languages, led by Russian, are only marginal (Appendix 2). English articles have a mean of 23 authors, Spanish articles 4 authors and Russian articles also 4 authors. Citations do not fully correspond with the leading languages, because the highest mean citations, 25, is for publications in Chinese, followed by English with 16 citations and French with 4 citations per paper on average (Appendix 2).
The historical trend shows an increase in the number of authors, references and pages per paper over time, from 2 or 3 authors writing 7-page long papers with 6 to 30 references in the 1970s, to 20-60 authors writing 10-13 page articles with 30-50 references in the last decade (Appendix 3).

Collaboration pattern. Countries and institutions:
Articles from teams led by foreign authors received twice as many citations in journals covered by this particular database (Fig. 1).
Most foreign collaboration was done with Spain, Mexico and Brazil; but the most cited papers were those published with teams from Switzerland, Sweden and the Netherlands (Appendix 4).
The leading collaborating institutions were the National Autonomous University (Mexico), the University of Sao Paulo (Brazil) and the Spanish National Research Council; but the highest citation rates were for publications with the Universities of Tokyo (Japan), Oslo (Norway) and Lund (Sweden) (Appendix 5).
The citation lifespan of Cuban articles is long, reaching four decades; but the pattern is different from those from the nearby Central American countries, because Cuban articles reached peak citation in the second year, while Central American articles were most cited in the 2-6 years following publication (Fig. 2).
Inclusion of Cuban publications in the Science Citation Index Expanded was random before 1972, with only a minimal fraction of work entering the database; it only started to have any visible trend in the 1970s, when less than 200 documents were added every year, and increased to the current volume of around 800 documents and 20 citations per document per year (Fig. 3). The same applies to international collaborative publications, which were about 50 per year in the 1990s and reached 200 per year in recent times (Appendix 6).

Web of Science categories and other data:
The most productive fields were Dairy and animal science and agriculture, Biochemistry and molecular biology, and Multidisciplinary materials science, but the field appearing in more journals was the latter (Appendix 9, Appendix 10).
The journals that published most of the articles were the Cuban Journal of Agricultural Science, the Revista de Neurología and the Medicc Review, but the journals with the highest citation rates were Vaccine; Physical Review B; and PLoS One (Appendix 11).
Historically, all the leading fields grew in output from 1975 to 2010, but decreased afterwards; however, Multidisciplinary materials science seems to be recovering (Fig. 4). The leading journals have evolved differently over time, with the Cuban Journal of Agriculture having a slow growth but then disappearing from the database around 2010; and other journals mostly having a smaller presence in recent years (Appendix 12).
The most frequently used words in Cuban article titles were Cuba, effect, Cuban, analysis, and effects (Appendix 13); and the most frequently used author keywords were Cuba, taxonomy, QSAR (Quantitative structureactivity relationships), systematics and vaccine (Appendix 14).
The most cited articles where those resulting from Cuban participation in international medical megaprojects about human papillomavirus; body-mass index, and proteomics (Appendix 15). Historically, all top-cited papers   followed a pattern of slow increase, plateau and slow decrease in citations, never reaching more than 200 citations per year, but three papers published after 2015 grew exponentially from 400 to 600 citations in recent years: they are all about medical subjects and resulted from international megaprojects in which Cuba was not the leader (Appendix 16).

DISCUSSION
The higher citation rates of book chapters and reviews that we found for Cuba have also been reported previously by other authors for other countries and subjects (e.g. Leydesdorff & Felt, 2012;Torres-Salinas et al., 2013) and simply reflect the fact that books and reviews summarize much work and, thus, are useful for a larger number of scientific writers, standard articles are more specialized and addressed to a smaller number of researchers (Calver & Bradley, 2010;Leydesdorff & Felt, 2012).
When comparing our results with those of older studies done by Ruiz et al. (2002) and Arencibia-Jorge et al. (2008) it is clear that Cuban articles have become longer, now have more authors, and cite more literature than those from one or two decades ago, and this fully matches a world trend for science, with research becoming more complex and requiring larger teams and longer reports that, in turn, cite more previous publications (Palacios-Callender et al., 2016;Palacios-Callender & Roberts, 2018). The largest studies are often part of international megaprojects in the field of human health, and thus, are normally writen in English and highly cited. It does not mean that they are better than studies with less citations, it only reflects the larger number of research done in health-related subjects (Zacca-González et al., 2015;Calahorrano et al., 2020).
The increasing dominance of English in scientific reports, a dominance that is clear in the Cuban articles that we studied, can be misleading, because the Web of Science is known to be heavily biased against articles published in other languages (Falagas et al., 2008;Monge-Nájera & Ho, 2017a). A very significant part of the Cuban technical and scientific literature is published in Spanish, the official language spoken on the island. All that literature, locally important and influential, is missing from the Web of Science (Arencibia-Jorge & de Moya-Anegón, 2010;Zacca-González et al., 2015). This poor coverage of Cuban science by the Web of Science and Scopus has been repeatedly reported but not solved but those databases, in detriment of Cuban science (Araujo Ruiz et al., 2005;Galbán-Rodríguez et al., 2019).
The publication, in Chinese, of articles with Cuban coauthors, is an anomaly that deserves further study, and their high citation rate could result from the large number of scientists active in China, the most populated country in the world.
The predominant collaboration of Cuban researchers with the ex-colonial power Spain, and with neighboring nations with powerful scientific institutions (Mexico and Brazil) also follows a general pattern that has been identified before in other countries throughout the tropics Palacios-Callender & Roberts, 2018;Sáenz et al., 2010). The higher citation rates in articles produced in collaboration with rich nations like Switzerland, Sweden and the Netherlands, can be explained by the access of researchers in those nations to larger budgets and thus to more influential journals (Palacios-Callender & Roberts, 2018).
The long citation lifespan of Cuban articles is typical of poorer countries, in which science advances slowly because of the low budgets assigned to research (Overbeck et al., 2018); a similar result has been reported for the rest of the Caribbean region, for example, in the case of Nicaragua (Monge-Nájera & Ho, 2017b). A curious result, though, is the peak citation of Cuban articles within two years of its publication: this is not normal in tropical countries (e.g. Arencibia-Jorge & de Moya-Anegón, 2010;Monge-Nájera & Ho, 2018) and may result from the Cuban concentration on health-related research, which is the most abundant and rapidly cited type of research (Arencibia-Jorge et al., 2016;Zacca-González et al., 2014). Nevertheless, the most common keywords in Cuban research reaching the Web of Science suggests a concentration on applied topics of local interest, which is normally a recipe for scientific stagnation (Toole, 2012).
The only other country in the region with a good number of scientometrics studies is El Salvador, and, like Cuba, El Salvador is a country that made an important investment in fields that are not limited to the natural sciences, particularly social sciences after the end of its civil war (Monge-Nájera & Ho, 2017c).
Overall, Cuban science has a good potential for participation in the Open Science movement that is prevalent in Latin America, but it needs to apply the following improvements: increase collaboration among Cuban scientists; expand its research areas beyond medical subjects; improve the quality of its own journals. It also needs to identify and correct the cause of its decline in the last decade. Cuba is yet to reach its full scientific potential and it should follow the example of other small nations that are more successful (Allik et al., 2020).
Ethical statement: authors declare that they all agree with this publication and made significant contributions; that there is no conflict of interest of any kind; and that we followed all pertinent ethical and legal procedures and requirements. All financial sources are fully and clearly stated in the acknowledgements section. A signed document has been filed in the journal archives.

ACKNOWLEDGMENTS
We thank Carolina Seas for her assistance with the literature and manuscript preparation.