Skip to main content
European Commission logo
Joint Research Centre

Data and Methodology

Methodology

This section summarises the main characteristics of education programmes analysed, the data source used, and the main methodological steps to produce the final results. The work follows the methodology developed in Academic offer and demand for advanced profiles in the EU (López-Cobo et al., 2019) and revised in Academic Offer of Advanced Digital Skills in 2019-20. International Comparison (Righi and López-Cobo et al., 2020).

Main characteristics of the programmes

  • Technological domain. The study covers four advanced digital domains: artificial intelligence, high performance computing, cybersecurity, and data science. An education programme may be considered in more than one technological domain due to the existing overlap between these domains (e.g., a programme on parallel computing may belong to high performance computing and data science simultaneously).
  • Geographical area. Refers to the country in which the programme is offered. The study covers the 27 EU Member States and six additional countries: the United Kingdom, Norway, Switzerland, Canada, the United States, and Australia.
  • Education level. The study collects data on three education levels: master, bachelor and short professional courses.
  • Programme’s scope. Education programmes are classified into "specialised" and "broad", according to the focus with which they address the technological domain considered. Specialised programmes are those with a strong focus in the domain (e.g. a master on supercomputing), while broad programmes target the addressed domain, but in a more generic way (e.g. a bachelor degree on biomedicine that includes a course on artificial intelligence). A programme has only one scope in a specific technological domain, but it may be a broad programme in one domain and a specialised one in another.
  • Programme’s field of education. This variable of analysis refers to the field of education or discipline in which the programme is taught, according to the Fields of education and training 2013 classification (e.g., “Engineering, manufacturing and construction”, “Business administration and Law”). A programme may be taught in several fields of education. In those cases, the programme is weighted using fractional counting.
  • Programme’s content areas. These refer to the subdomains covered by the programmes’ syllabus. For each of the four technological domains, specific content areas are defined following existing taxonomies or built-up ones by analysing programmes’ descriptions.

The results are provided for each technological domain separately. If a programme belongs to more than one technological domain, it is fully counted within each of them. The statistics calculated are the number of programmes —by scope, field of education and content areas—, and the penetration rate, i.e., share of programmes over total number of programmes (of any type and with any type of content) that are offered in the considered geographical area.

Data source: strengths and caveats

The study uses data from the Studyportals’ platform as the starting point. It includes programmes from 3,700 universities in over 120 countries. Out of the seven dedicated Studyportals’ websites, this study analyses the ones focused on master’s and bachelor’s degrees and short professional courses. These three repositories overall account for more than 150,000 programmes, out of which nearly 50,000 (in 2022) correspond to programmes taught in European universities or study centres.

This source offers the widest coverage among all identified platforms. However, it still suffers from some lack of coverage, as national language programmes are not tracked.

The main assumption of the study is that, even if the source does not cover all the education offer in each country, it shows a representative part of it, and the attributes of the programmes captured by our study can be extrapolated to the whole education offer. This assumption is considered valid, as it resulted from the previous study Academic offer and demand for advanced profiles in the EU (López-Cobo et al., 2019). In addition, the focus on English language is considered pertinent in view of the highly-technological and computer-related domains addressed by this study.

Another strong advantage of the data source is the amount of program-related information available, which makes possible the analysis of the characteristics of the programmes covered. In particular, some of the most interesting attributes for our analysis relate to the programmes’ content (title of the programme, short and long description and programme outline). We use them to first identify a programme as related to the four domains covered, but also to categorise the technological subdomains taught in the programme. The field of education in which the programme is taught is also a very valuable piece of information, which entitles us to explore the diversification or concentration of the provision of advanced digital education offer across disciplines.

Identification of domain boundaries and categories for the analysis

Since official classifications lack to identify transversal technological domains such as the ones examined, we use lists of representative keywords (one list per domain, see following section) to query the data source. The selection of keywords follows a semi-automatic process aimed at identifying a representative list of terms present in specialised scientific publications. The first selection is performed as detailed in Academic offer and demand for advanced profiles in the EU (López-Cobo et al., 2019) for each domain separately. In a second step, the programmes identified as specialised during the 2019 study have been analysed to detect additional keywords.

After the identification of programmes relevant to the technological domains under study, they are classified into “broad” and “specialised”. A programme is considered as “specialised” in a technological domain if either its title or its short description include at least one keyword representative of the technological domain, or at least three different keywords are present in any other text field of the programme description. If neither these conditions are met (i.e., only one or two keywords are found in the long description), the programme is considered as “broad”.

The keywords are also used to classify the programmes according to the content areas taught. In general, the categorisation of content areas is derived following the methodology proposed in the 2019 study (López-Cobo et al., 2019) and refined with the analysis of the syllabus of the most specialised programmes in the data source. When available, existing taxonomies have also been used.

Keywords for programmes’ identification

Artificial intelligence

accountability *

deep learning

machine translation

sound synthesis

adaptive learning

deep neural network

multi-agent system

speaker identification

ai application

ethics *

narrow artificial intelligence

speech processing *

anomaly detection

expert system

natural language generation

speech recognition

artificial general intelligence

explainability *

natural language processing

speech synthesis

artificial intelligence

face recognition

natural language understanding

strong artificial intelligence

audio processing *

fairness *

neural network

supervised learning

automated vehicle

human computer interaction

pattern recognition

support vector machine

automatic translation

human-ai interaction

predictive analytics

swarm intelligence

autonomous system *

image processing

recommender system *

text mining

autonomous vehicle

image recognition

reinforcement learning

transfer learning

business intelligence *

inductive programming

robot system *

transparency *

chatbot

intelligence software

robotics

trustworthy ai

computational creativity *

intelligent agent *

safety *

uncertainty *

computational linguistics

intelligent control

security *

unsupervised learning

computational neuroscience *

intelligent software development

semantic web *

voice recognition

computer vision

intelligent system

sentiment analysis *

weak artificial intelligence

control theory

knowledge representation and reasoning

service robot *

 

cyber physical system

machine learning

social robot *

 

* Terms that are queried in combination with domain’s core terms.

High performance computing

accelerators *

distributed computing

hpc applications *

parallel programming *

cloud *

distributed systems *

hpcc

parallelisation *

cloud computing

energy efficiency

infiniband

performance analysis

cluster *

exascale *

manycore

performance evaluation

cluster computing *

field-programmable gate array

mapreduce *

performance modeling

compute unified device architecture *

fpga

massive parallelism *

performance optimisation

computer architecture *

gpgpu

message passing interface

reconfigurable computing *

computer modelling *

gpu

multi core

scalability

concurrent *

graphics processing unit

opencl

single instruction multiple data

cuda

grid computing

parallel algorithms *

supercomputer

data center

hadoop

parallel architectures *

supercomputer technology

data intensive computing

high performance computation

parallel computation *

 

* Terms that are queried in combination with domain’s core terms.

Cybersecurity

access control

cyber warfare

firewall *

phishing

access management

cybercrime

hacker

pseudonymity

activity monitoring

cybersecurity

hash function

public key

anonymity *

cybersecurity incident

identity access management

random number generation

anonymization

data anonymisation

identity management

security analysis

computer security

data sanitisation

information assurance

security protocol *

control system

data security

information protection

stuxnet

counterintelligence

digital evidence

information security

supervisory control data acquisition

cryptanalysis

digital forensics

intrusion detection

system security

cryptography

digital rights management

key management

vulnerability assessment

cryptology

digital signature

malware

web protocol

cyber attack

distributed systems

network attack

web protocol security

cyber risk

encryption

network security

 

cyber threat

fault tolerance

penetration testing

 

* Terms that are queried in combination with domain’s core terms.

Data science

ant colony optimisation

distributed computing

metaheuristic optimisation

reinforcement learning

automated machine learning

distributed processing

multiagent system

scalability

big data

ensemble method

natural language processing

semantic web

business intelligence

evolutionary algorithm

natural language understanding

semi-supervised learning

data analytics

genetic algorithm

neural network

sentiment analysis

data mining

gradient descent

nosql

spark *

data science

hadoop

parallel computing *

statistical learning

data visualisation

information extraction

parallel processing *

supervised learning

decision analytics

information retrieval

parallelisation *

support vector machine

decision support

k-nearest-neighbour

pattern recognition

transfer learning

decision tree

machine learning

predictive analytics

unstructured data

deep learning

mapreduce

recommender system

unsupervised learning

ant colony optimisation

distributed computing

metaheuristic optimisation

reinforcement learning

* Terms that are queried in combination with domain’s core terms.