Data Full dataset (JRC Data Catalogue) Methodology Page contents Page contents This section summarises the main characteristics of education programmes analysed, the data source used, and the main methodological steps to produce the final results. The work follows the methodology developed in Academic offer and demand for advanced profiles in the EU (López-Cobo et al., 2019) and revised in Academic Offer of Advanced Digital Skills in 2019-20. International Comparison (Righi and López-Cobo et al., 2020). Main characteristics of the programmes Technological domain. The study covers four advanced digital domains: artificial intelligence, high performance computing, cybersecurity, and data science. An education programme may be considered in more than one technological domain due to the existing overlap between these domains (e.g., a programme on parallel computing may belong to high performance computing and data science simultaneously). Geographical area. Refers to the country in which the programme is offered. The study covers the 27 EU Member States and six additional countries: the United Kingdom, Norway, Switzerland, Canada, the United States, and Australia. Education level. The study collects data on three education levels: master, bachelor and short professional courses. Programme’s scope. Education programmes are classified into "specialised" and "broad", according to the focus with which they address the technological domain considered. Specialised programmes are those with a strong focus in the domain (e.g. a master on supercomputing), while broad programmes target the addressed domain, but in a more generic way (e.g. a bachelor degree on biomedicine that includes a course on artificial intelligence). A programme has only one scope in a specific technological domain, but it may be a broad programme in one domain and a specialised one in another. Programme’s field of education. This variable of analysis refers to the field of education or discipline in which the programme is taught, according to the Fields of education and training 2013 classification (e.g., “Engineering, manufacturing and construction”, “Business administration and Law”). A programme may be taught in several fields of education. In those cases, the programme is weighted using fractional counting. Programme’s content areas. These refer to the subdomains covered by the programmes’ syllabus. For each of the four technological domains, specific content areas are defined following existing taxonomies or built-up ones by analysing programmes’ descriptions. The results are provided for each technological domain separately. If a programme belongs to more than one technological domain, it is fully counted within each of them. The statistics calculated are the number of programmes —by scope, field of education and content areas—, and the penetration rate, i.e., share of programmes over total number of programmes (of any type and with any type of content) that are offered in the considered geographical area. Data source: strengths and caveats The study uses data from the Studyportals’ platform as the starting point. It includes programmes from 3,700 universities in over 120 countries. Out of the seven dedicated Studyportals’ websites, this study analyses the ones focused on master’s and bachelor’s degrees and short professional courses. These three repositories overall account for more than 150,000 programmes, out of which nearly 50,000 (in 2022) correspond to programmes taught in European universities or study centres. This source offers the widest coverage among all identified platforms. However, it still suffers from some lack of coverage, as national language programmes are not tracked. The main assumption of the study is that, even if the source does not cover all the education offer in each country, it shows a representative part of it, and the attributes of the programmes captured by our study can be extrapolated to the whole education offer. This assumption is considered valid, as it resulted from the previous study Academic offer and demand for advanced profiles in the EU (López-Cobo et al., 2019). In addition, the focus on English language is considered pertinent in view of the highly-technological and computer-related domains addressed by this study. Another strong advantage of the data source is the amount of program-related information available, which makes possible the analysis of the characteristics of the programmes covered. In particular, some of the most interesting attributes for our analysis relate to the programmes’ content (title of the programme, short and long description and programme outline). We use them to first identify a programme as related to the four domains covered, but also to categorise the technological subdomains taught in the programme. The field of education in which the programme is taught is also a very valuable piece of information, which entitles us to explore the diversification or concentration of the provision of advanced digital education offer across disciplines. Identification of domain boundaries and categories for the analysis Since official classifications lack to identify transversal technological domains such as the ones examined, we use lists of representative keywords (one list per domain, see following section) to query the data source. The selection of keywords follows a semi-automatic process aimed at identifying a representative list of terms present in specialised scientific publications. The first selection is performed as detailed in Academic offer and demand for advanced profiles in the EU (López-Cobo et al., 2019) for each domain separately. In a second step, the programmes identified as specialised during the 2019 study have been analysed to detect additional keywords. After the identification of programmes relevant to the technological domains under study, they are classified into “broad” and “specialised”. A programme is considered as “specialised” in a technological domain if either its title or its short description include at least one keyword representative of the technological domain, or at least three different keywords are present in any other text field of the programme description. If neither these conditions are met (i.e., only one or two keywords are found in the long description), the programme is considered as “broad”. The keywords are also used to classify the programmes according to the content areas taught. In general, the categorisation of content areas is derived following the methodology proposed in the 2019 study (López-Cobo et al., 2019) and refined with the analysis of the syllabus of the most specialised programmes in the data source. When available, existing taxonomies have also been used. For AI, we consider the AI taxonomy developed by JRC in the framework of AI Watch, AI WATCH. Defining Artificial Intelligence. Towards an operational definition and taxonomy of artificial intelligence (Samoili and López-Cobo et al., 2020). For CS, we use a JRC report aimed at aligning the cybersecurity terminologies, definitions and domains into a coherent and comprehensive taxonomy to facilitate the categorisation of cybersecurity capabilities in the EU to enrich the categorisation of content areas, European Cybersecurity Centres of Expertise Map - Definitions and Taxonomy (Nai-Fovino et al., 2018). For HPC and DS, the taxonomy is developed by the authors of the work, based on the review of several specialised masters in the field. Keywords for programmes’ identification Artificial intelligenceaccountability *deep learningmachine translationsound synthesisadaptive learningdeep neural networkmulti-agent systemspeaker identificationai applicationethics *narrow artificial intelligencespeech processing *anomaly detectionexpert systemnatural language generationspeech recognitionartificial general intelligenceexplainability *natural language processingspeech synthesisartificial intelligenceface recognitionnatural language understandingstrong artificial intelligenceaudio processing *fairness *neural networksupervised learningautomated vehiclehuman computer interactionpattern recognitionsupport vector machineautomatic translationhuman-ai interactionpredictive analyticsswarm intelligenceautonomous system *image processingrecommender system *text miningautonomous vehicleimage recognitionreinforcement learningtransfer learningbusiness intelligence *inductive programmingrobot system *transparency *chatbotintelligence softwareroboticstrustworthy aicomputational creativity *intelligent agent *safety *uncertainty *computational linguisticsintelligent controlsecurity *unsupervised learningcomputational neuroscience *intelligent software developmentsemantic web *voice recognitioncomputer visionintelligent systemsentiment analysis *weak artificial intelligencecontrol theoryknowledge representation and reasoningservice robot * cyber physical systemmachine learningsocial robot * * Terms that are queried in combination with domain’s core terms.High performance computingaccelerators *distributed computinghpc applications *parallel programming *cloud *distributed systems *hpccparallelisation *cloud computingenergy efficiencyinfinibandperformance analysiscluster *exascale *manycoreperformance evaluationcluster computing *field-programmable gate arraymapreduce *performance modelingcompute unified device architecture *fpgamassive parallelism *performance optimisationcomputer architecture *gpgpumessage passing interfacereconfigurable computing *computer modelling *gpumulti corescalabilityconcurrent *graphics processing unitopenclsingle instruction multiple datacudagrid computingparallel algorithms *supercomputerdata centerhadoopparallel architectures *supercomputer technologydata intensive computinghigh performance computationparallel computation * * Terms that are queried in combination with domain’s core terms.Cybersecurityaccess controlcyber warfarefirewall *phishingaccess managementcybercrimehackerpseudonymityactivity monitoringcybersecurityhash functionpublic keyanonymity *cybersecurity incidentidentity access managementrandom number generationanonymizationdata anonymisationidentity managementsecurity analysiscomputer securitydata sanitisationinformation assurancesecurity protocol *control systemdata securityinformation protectionstuxnetcounterintelligencedigital evidenceinformation securitysupervisory control data acquisitioncryptanalysisdigital forensicsintrusion detectionsystem securitycryptographydigital rights managementkey managementvulnerability assessmentcryptologydigital signaturemalwareweb protocolcyber attackdistributed systemsnetwork attackweb protocol securitycyber riskencryptionnetwork security cyber threatfault tolerancepenetration testing * Terms that are queried in combination with domain’s core terms.Data scienceant colony optimisationdistributed computingmetaheuristic optimisationreinforcement learningautomated machine learningdistributed processingmultiagent systemscalabilitybig dataensemble methodnatural language processingsemantic webbusiness intelligenceevolutionary algorithmnatural language understandingsemi-supervised learningdata analyticsgenetic algorithmneural networksentiment analysisdata mininggradient descentnosqlspark *data sciencehadoopparallel computing *statistical learningdata visualisationinformation extractionparallel processing *supervised learningdecision analyticsinformation retrievalparallelisation *support vector machinedecision supportk-nearest-neighbourpattern recognitiontransfer learningdecision treemachine learningpredictive analyticsunstructured datadeep learningmapreducerecommender systemunsupervised learningant colony optimisationdistributed computingmetaheuristic optimisationreinforcement learning* Terms that are queried in combination with domain’s core terms.