Skip to main content
EU Science Hub

Data and Methodology


Download the complete dataset:

19 APRIL 2023
Dataset - Academic offer 2019-2023 | PREDICT



This section summarises the main characteristics of education programmes analysed, the data source used, and the main methodological steps to produce the final results. The work follows the methodology developed in Academic offer and demand for advanced profiles in the EU (López-Cobo et al., 2019) and revised in Academic Offer of Advanced Digital Skills in 2019-20. International Comparison (Righi and López-Cobo et al., 2020).

Main characteristics of the programmes

  • Technological domain. The study covers four advanced digital domains: artificial intelligence, high performance computing, cybersecurity, and data science. An education programme may be considered in more than one technological domain due to the existing overlap between these domains (e.g., a programme on parallel computing may belong to high performance computing and data science simultaneously).
  • Geographical area. Refers to the country in which the programme is offered. The study covers the 27 EU Member States and six additional countries: the United Kingdom, Norway, Switzerland, Canada, the United States, and Australia.
  • Education level. The study collects data on three education levels: master, bachelor and short professional courses.
  • Programme’s scope. Education programmes are classified into "specialised" and "broad", according to the focus with which they address the technological domain considered. Specialised programmes are those with a strong focus in the domain (e.g. a master on supercomputing), while broad programmes target the addressed domain, but in a more generic way (e.g. a bachelor degree on biomedicine that includes a course on artificial intelligence). A programme has only one scope in a specific technological domain, but it may be a broad programme in one domain and a specialised one in another.
  • Programme’s field of education. This variable of analysis refers to the field of education or discipline in which the programme is taught, according to the Fields of education and training 2013 classification (e.g., “Engineering, manufacturing and construction”, “Business administration and Law”). A programme may be taught in several fields of education. In those cases, the programme is weighted using fractional counting.
  • Programme’s content areas. These refer to the subdomains covered by the programmes’ syllabus. For each of the four technological domains, specific content areas are defined following existing taxonomies or built-up ones by analysing programmes’ descriptions.

The results are provided for each technological domain separately. If a programme belongs to more than one technological domain, it is fully counted within each of them. The statistics calculated are the number of programmes —by scope, field of education and content areas—, and the penetration rate, i.e., share of programmes over total number of programmes (of any type and with any type of content) that are offered in the considered geographical area.

Data source: strengths and caveats

The study uses data from the Studyportals’ platform as the starting point. It includes programmes from 3,700 universities in over 120 countries. Out of the seven dedicated Studyportals’ websites, this study analyses the ones focused on master’s and bachelor’s degrees and short professional courses. These three repositories overall account for more than 150,000 programmes, out of which nearly 50,000 (in 2022) correspond to programmes taught in European universities or study centres.

This source offers the widest coverage among all identified platforms. However, it still suffers from some lack of coverage, as national language programmes are not tracked.

The main assumption of the study is that, even if the source does not cover all the education offer in each country, it shows a representative part of it, and the attributes of the programmes captured by our study can be extrapolated to the whole education offer. This assumption is considered valid, as it resulted from the previous study Academic offer and demand for advanced profiles in the EU (López-Cobo et al., 2019). In addition, the focus on English language is considered pertinent in view of the highly-technological and computer-related domains addressed by this study.

Another strong advantage of the data source is the amount of program-related information available, which makes possible the analysis of the characteristics of the programmes covered. In particular, some of the most interesting attributes for our analysis relate to the programmes’ content (title of the programme, short and long description and programme outline). We use them to first identify a programme as related to the four domains covered, but also to categorise the technological subdomains taught in the programme. The field of education in which the programme is taught is also a very valuable piece of information, which entitles us to explore the diversification or concentration of the provision of advanced digital education offer across disciplines.

Identification of domain boundaries and categories for the analysis

Since official classifications lack to identify transversal technological domains such as the ones examined, we use lists of representative keywords (one list per domain, see following section) to query the data source. The selection of keywords follows a semi-automatic process aimed at identifying a representative list of terms present in specialised scientific publications. The first selection is performed as detailed in Academic offer and demand for advanced profiles in the EU (López-Cobo et al., 2019) for each domain separately. In a second step, the programmes identified as specialised during the 2019 study have been analysed to detect additional keywords.

After the identification of programmes relevant to the technological domains under study, they are classified into “broad” and “specialised”. A programme is considered as “specialised” in a technological domain if either its title or its short description include at least one keyword representative of the technological domain, or at least three different keywords are present in any other text field of the programme description. If neither these conditions are met (i.e., only one or two keywords are found in the long description), the programme is considered as “broad”.

The keywords are also used to classify the programmes according to the content areas taught. In general, the categorisation of content areas is derived following the methodology proposed in the 2019 study (López-Cobo et al., 2019) and refined with the analysis of the syllabus of the most specialised programmes in the data source. When available, existing taxonomies have also been used.

Keywords for programmes’ identification