Skip to main content
EU Science Hub
  • General publications
  • 7 November 2024
  • Joint Research Centre
  • 3 min read

Biomedical research: sex classification in human data improves, but more effort is needed

Consistent reporting of sex and gender in human data for biomedical research could improve diagnosis and treatment as disease and health issues may affect women and men differently.

Lab assistant preparing a female patinet for blood sampling
© dusanpetkovic1, stock.adobe.com

An analysis of the world’s two largest biological databases – which store genetic and traits (a.k.a. phenotype) data – show that a targeted 2018 policy implemented for the European database has led to a higher rate of sex data classification, but imbalances in reporting sex and gender in both databases persist. 

These databases are key resources in biomedical research. The low quality of metadata, such as sex, could potentially skew research findings, as women and men could be affected differently by certain health conditions. Such bias would in turn impact health outcomes, affect decision-making and public policies, and worsen inequalities in healthcare. 

The results are published in a study which also provides recommendations to improve the quality of data. The study, Analyzing sex imbalance in EGA and dbGaP biological databases: Recommendations for better practices, was led by the Barcelona Supercomputing Center with the collaboration of the JRC and other research partner organisations.

The authors analysed sex classification in human data from two large-scale databases, known as “the main public genome archives”, the European Genome-phenome Archive (EGA) and the US counterpart, the Database of genotypes and phenotypes (dbGaP)

Both are repositories for research data that scientists use to understand the relationship between genetic variations and health and disease.

The importance of reporting sex data

The study reveals imbalances in the reporting participant sex in both EGA and dbGaP databases. This means that in many studies, in addition to male and female samples, a significant number of participants had an unspecified sex, yet were still included in analysis.

This lack of reporting is critical, as some diseases or conditions might affect women and men differently and the lack of sex classification results in incomplete data. Without knowing participant’s sex, it becomes difficult to understand these differences fully. By addressing this, scientists can develop more effective treatments.

This is particularly true for precision medicine and AI applications in clinical settings, which rely on accurate data to tailor treatments to individual patients. In oncology, for instance, doctors could use AI models trained on balanced sex medical data to make decisions and offer personalised therapies that align with a patient's profile, potentially enhancing cancer treatment success rates and reducing therapy side effects.

In this regard, recognising the importance of reporting sex is essential to fulfilling the diversity of human biology.

On the positive side, the study points out that effective actions can be taken. EGA implemented for example a policy in 2018 mandating the disclosure of donor sample sex. As a result, the study shows that this policy has contributed to the decrease in the number of samples with "unknown" sex classification. In comparison, dbGaP does not have specific policies, and researchers show that there is a prevalent number of samples classified as “unknown” sex. 

Recommendations

To improve the quality of sex data reporting to enhance precision medicine research, scientists recommend the following actions:

  1. Databases should provide clear definitions of sex and gender. While sex refers to biological characteristics, gender refers to sociocultural attitudes, behaviours and identities. However, both terms are used indistinctively when collecting patient metadata. Clear definitions help ensure that data is accurately reported and consistent.
  2. Improve data management: In recent years, the EGA has made it mandatory to fill in the sex field when creating datasets or adding new inputs, but it is often marked as "unknown." Improving submission field design and centralizing data management would ensure better data quality and interoperability.
  3. Privacy preservation: By assuring strict confidentiality, patients may be more inclined to disclose their sex accurately, resulting in more reliable data. 
  4. Transparency and accountability: It is important not only to report sex but also to clarify the basis on which this information was collected, such as self-report, chromosomes, hormones, genes, reproductive anatomy, etc. This would enhance the accuracy and consistency of the data. 
  5. Promote education and social impact assessment: There should be efforts to raise awareness about the importance of including sex and gender diversity in scientific research. This includes education and training for researchers and database administrators.

These recommendations could improve the quality of biomedical research metadata reporting, leading to better health outcomes for all patients, regardless of their sex. 

Related links

Analyzing sex imbalance in EGA and dbGaP biological databases: Recommendations for better practices

European Genome-phenome Archive (EGA)

Database of genotypes and phenotypes (dbGaP)

 

Details

Publication date
7 November 2024
Author
Joint Research Centre
JRC portfolios