This section summarises the main characteristics of education programmes analysed, the data source used, and the main methodological steps to produce the final results. The work follows the methodology developed in Academic offer and demand for advanced profiles in the EU (López-Cobo et al., 2019) and revised in Academic Offer of Advanced Digital Skills in 2019-20. International Comparison (Righi and López-Cobo et al., 2020).
Main characteristics of the programmes
- Technological domain. The study covers four advanced digital domains: artificial intelligence, high performance computing, cybersecurity, and data science. An education programme may be considered in more than one technological domain due to the existing overlap between these domains (e.g., a programme on parallel computing may belong to high performance computing and data science simultaneously).
- Geographical area. Refers to the country in which the programme is offered. The study covers the 27 EU Member States and six additional countries: the United Kingdom, Norway, Switzerland, Canada, the United States, and Australia.
- Education level. The study collects data on three education levels: master, bachelor and short professional courses.
- Programme’s scope. Education programmes are classified into "specialised" and "broad", according to the focus with which they address the technological domain considered. Specialised programmes are those with a strong focus in the domain (e.g. a master on supercomputing), while broad programmes target the addressed domain, but in a more generic way (e.g. a bachelor degree on biomedicine that includes a course on artificial intelligence). A programme has only one scope in a specific technological domain, but it may be a broad programme in one domain and a specialised one in another.
- Programme’s field of education. This variable of analysis refers to the field of education or discipline in which the programme is taught, according to the Fields of education and training 2013 classification (e.g., “Engineering, manufacturing and construction”, “Business administration and Law”). A programme may be taught in several fields of education. In those cases, the programme is weighted using fractional counting.
- Programme’s content areas. These refer to the subdomains covered by the programmes’ syllabus. For each of the four technological domains, specific content areas are defined following existing taxonomies or built-up ones by analysing programmes’ descriptions.
The results are provided for each technological domain separately. If a programme belongs to more than one technological domain, it is fully counted within each of them. The statistics calculated are the number of programmes —by scope, field of education and content areas—, and the penetration rate, i.e., share of programmes over total number of programmes (of any type and with any type of content) that are offered in the considered geographical area.
Data source: strengths and caveats
The study uses data from the Studyportals’ platform as the starting point. It includes programmes from 3,700 universities in over 120 countries. Out of the seven dedicated Studyportals’ websites, this study analyses the ones focused on master’s and bachelor’s degrees and short professional courses. These three repositories overall account for more than 150,000 programmes, out of which nearly 50,000 (in 2022) correspond to programmes taught in European universities or study centres.
This source offers the widest coverage among all identified platforms. However, it still suffers from some lack of coverage, as national language programmes are not tracked.
The main assumption of the study is that, even if the source does not cover all the education offer in each country, it shows a representative part of it, and the attributes of the programmes captured by our study can be extrapolated to the whole education offer. This assumption is considered valid, as it resulted from the previous study Academic offer and demand for advanced profiles in the EU (López-Cobo et al., 2019). In addition, the focus on English language is considered pertinent in view of the highly-technological and computer-related domains addressed by this study.
Another strong advantage of the data source is the amount of program-related information available, which makes possible the analysis of the characteristics of the programmes covered. In particular, some of the most interesting attributes for our analysis relate to the programmes’ content (title of the programme, short and long description and programme outline). We use them to first identify a programme as related to the four domains covered, but also to categorise the technological subdomains taught in the programme. The field of education in which the programme is taught is also a very valuable piece of information, which entitles us to explore the diversification or concentration of the provision of advanced digital education offer across disciplines.
Identification of domain boundaries and categories for the analysis
Since official classifications lack to identify transversal technological domains such as the ones examined, we use lists of representative keywords (one list per domain, see following section) to query the data source. The selection of keywords follows a semi-automatic process aimed at identifying a representative list of terms present in specialised scientific publications. The first selection is performed as detailed in Academic offer and demand for advanced profiles in the EU (López-Cobo et al., 2019) for each domain separately. In a second step, the programmes identified as specialised during the 2019 study have been analysed to detect additional keywords.
After the identification of programmes relevant to the technological domains under study, they are classified into “broad” and “specialised”. A programme is considered as “specialised” in a technological domain if either its title or its short description include at least one keyword representative of the technological domain, or at least three different keywords are present in any other text field of the programme description. If neither these conditions are met (i.e., only one or two keywords are found in the long description), the programme is considered as “broad”.
The keywords are also used to classify the programmes according to the content areas taught. In general, the categorisation of content areas is derived following the methodology proposed in the 2019 study (López-Cobo et al., 2019) and refined with the analysis of the syllabus of the most specialised programmes in the data source. When available, existing taxonomies have also been used.
- For AI, we consider the AI taxonomy developed by JRC in the framework of AI Watch, AI WATCH. Defining Artificial Intelligence. Towards an operational definition and taxonomy of artificial intelligence (Samoili and López-Cobo et al., 2020).
- For CS, we use a JRC report aimed at aligning the cybersecurity terminologies, definitions and domains into a coherent and comprehensive taxonomy to facilitate the categorisation of cybersecurity capabilities in the EU to enrich the categorisation of content areas, European Cybersecurity Centres of Expertise Map - Definitions and Taxonomy (Nai-Fovino et al., 2018).
- For HPC and DS, the taxonomy is developed by the authors of the work, based on the review of several specialised masters in the field.
accountability * | deep learning | machine translation | sound synthesis |
adaptive learning | deep neural network | multi-agent system | speaker identification |
ai application | ethics * | narrow artificial intelligence | speech processing * |
anomaly detection | expert system | natural language generation | speech recognition |
artificial general intelligence | explainability * | natural language processing | speech synthesis |
artificial intelligence | face recognition | natural language understanding | strong artificial intelligence |
audio processing * | fairness * | neural network | supervised learning |
automated vehicle | human computer interaction | pattern recognition | support vector machine |
automatic translation | human-ai interaction | predictive analytics | swarm intelligence |
autonomous system * | image processing | recommender system * | text mining |
autonomous vehicle | image recognition | reinforcement learning | transfer learning |
business intelligence * | inductive programming | robot system * | transparency * |
chatbot | intelligence software | robotics | trustworthy ai |
computational creativity * | intelligent agent * | safety * | uncertainty * |
computational linguistics | intelligent control | security * | unsupervised learning |
computational neuroscience * | intelligent software development | semantic web * | voice recognition |
computer vision | intelligent system | sentiment analysis * | weak artificial intelligence |
control theory | knowledge representation and reasoning | service robot * |
|
cyber physical system | machine learning | social robot * | |
* Terms that are queried in combination with domain’s core terms. |
accelerators * | distributed computing | hpc applications * | parallel programming * |
cloud * | distributed systems * | hpcc | parallelisation * |
cloud computing | energy efficiency | infiniband | performance analysis |
cluster * | exascale * | manycore | performance evaluation |
cluster computing * | field-programmable gate array | mapreduce * | performance modeling |
compute unified device architecture * | fpga | massive parallelism * | performance optimisation |
computer architecture * | gpgpu | message passing interface | reconfigurable computing * |
computer modelling * | gpu | multi core | scalability |
concurrent * | graphics processing unit | opencl | single instruction multiple data |
cuda | grid computing | parallel algorithms * | supercomputer |
data center | hadoop | parallel architectures * | supercomputer technology |
data intensive computing | high performance computation | parallel computation * |
|
* Terms that are queried in combination with domain’s core terms. |
access control | cyber warfare | firewall * | phishing |
access management | cybercrime | hacker | pseudonymity |
activity monitoring | cybersecurity | hash function | public key |
anonymity * | cybersecurity incident | identity access management | random number generation |
anonymization | data anonymisation | identity management | security analysis |
computer security | data sanitisation | information assurance | security protocol * |
control system | data security | information protection | stuxnet |
counterintelligence | digital evidence | information security | supervisory control data acquisition |
cryptanalysis | digital forensics | intrusion detection | system security |
cryptography | digital rights management | key management | vulnerability assessment |
cryptology | digital signature | malware | web protocol |
cyber attack | distributed systems | network attack | web protocol security |
cyber risk | encryption | network security |
|
cyber threat | fault tolerance | penetration testing |
|
* Terms that are queried in combination with domain’s core terms. |
ant colony optimisation | distributed computing | metaheuristic optimisation | reinforcement learning |
automated machine learning | distributed processing | multiagent system | scalability |
big data | ensemble method | natural language processing | semantic web |
business intelligence | evolutionary algorithm | natural language understanding | semi-supervised learning |
data analytics | genetic algorithm | neural network | sentiment analysis |
data mining | gradient descent | nosql | spark * |
data science | hadoop | parallel computing * | statistical learning |
data visualisation | information extraction | parallel processing * | supervised learning |
decision analytics | information retrieval | parallelisation * | support vector machine |
decision support | k-nearest-neighbour | pattern recognition | transfer learning |
decision tree | machine learning | predictive analytics | unstructured data |
deep learning | mapreduce | recommender system | unsupervised learning |
ant colony optimisation | distributed computing | metaheuristic optimisation | reinforcement learning |
* Terms that are queried in combination with domain’s core terms. |