Data Science

Geographical features on a map

Collection of data science courses. This category includes courses that cover general data science techniques. Techniques that might be more specialized to risk management and (financial) risk modelling are covered in other course categories.

This course offers a brief introduction to the Copernicus Satellite Data Ecosystem.

Course Content:

This course is an introduction to Tensor calculations with Eigen, a popular C++ library for working with numerical arrays and linear algebra. It covers the following topics:

  • We learn the concept and techniques of the Eigen Tensor class
  • How to declare, initialize Tensors of various ranks and types and how to access Tensor elements
  • Elementary unary and binary operations involving Tensors
  • More complex operations (reductions, contractions)
  • Modifying the shape of Tensors

Who Is This Course For:

Developers in any Domain that need to use higher-dimensional numerical data containers

  • Statistical Calculations
  • Machine Learning

How Does The Course Help:

Eigen is a fairly large library. The course aims to:

  • Introduce the Tensor part of the library and its purpose
  • Sketch its overall structure and functionality
  • Familiarize with the common usage patterns (API's)

What Will You Get From The Course:

  • You will be able to confidently use Eigen::Tensor to solve common numerical processing tasks, in particular those requiring standard manipulation of tensors
  • You will be able to contribute to the specific use cases mentioned above

Course Level and Difficulty Level:

This course is part of the Data Science family.

  • This is a Core Level course in Data Science, which means that good grounding at Introductory level to various Data Science topics is a prerequisite for making the most out of this course.
  • This is a Technical course which means certain mathematical (linear algebra) and/or technology elements (C++) are assumed as known before one can master the course material.

Advanced material not covered here:

  • Memory layouts (how numerical data are stored in memory) and the performance implications
  • Extending Eigen (in particular the C-API's)
  • Numerical algebra / scientific computing concepts beyond what is needed to understand the core Eigen::Tensor functionality

If you have not taken an Open Risk Academy course before the "CrashCourse Academy Demo" provides a quick overview of the Academy.

The following table places the course in the Open Risk skills diagram:

Course Level & Type
Introductory Level Core Level Advanced Level
Non-technical
Technical DAT31071

Course Material:

The course material comprises the following:

  • 14 interactive readings
  • Embedded exercises based on the daily material

Time Requirements and Important Dates

  • The course is self-paced and can be undertaken at any point. It requires a commitment of about one or two days total, depending on your familiarity with linear algebra and C++.

Where To Get Help:

If you get stuck on any issue with the course or the Academy:

  • If the issue is related to the course topics / material, check in the first instance the Course Forum
  • If the issue is related the operation of the Open Risk Academy check first the Academy FAQ. If the issue persists contact us at info@openrisk.eu

Different data validation levels as recommended by Eurostat

Summary:

This course is a CrashProgram (short course) introducing the concept of a structured review of risk data. The course is at an introductory technical level. It requires some familiarity with credit risk data (and an ability to open and inspect data files) Step by step we build the knowledge required to review the suitability of data for a given purpose and how to report the findings

Outcomes:

  • We learn the concept of Data Provenance
  • We get a first exposure to the different levels of Data Validation as recommended by EuroStat
  • We summarize our findings in a mock report written in Markdown format

Course Level and Type:

Introductory Level Core Level Advanced Level
Non-Technical
Technical CrashProgram
DAT31046
class inheritance tools

This course is a CrashProgram (short course) that explores how class inheritance of related data objects can be handled in a data science context. The course is at a core technical level. It requires some familiarity with database models, data specifications such as JSON and a basic knowledge of Python.


Course Level and Type:

Introductory Level Core Level Advanced Level
Non-Technical
Technical CrashProgram
DAT31063
An overview of core Python tools for working with semantic web dat

Geographical features on a map

Summary:

This course is a CrashProgram (short course) introducing the GeoJSON specification for the encoding of geospatial features. The course is at an introductory technical level. It requires some familiarity with data specifications such as JSON and a very basic knowledge of Python

Course Level and Type:

Introductory Level Core Level Advanced Level
Non-Technical
Technical CrashProgram
DAT31053

Exploratory Data Analysis Visualizations

Summary:

This course is a CrashProgram (short course) introducing exploratory data analysis. The course is at an introductory technical level. It requires some familiarity with credit risk data (and an ability to open and inspect data files). Step by step we build the knowledge required to perform a comprehensive exploratory data analysis

Prerequisites:

The course can be pursued on a standalone basis. It is advisable to pursue the course after DAT31046 (Risk Data Review) which discusses a review of the data from a data quality validation perspective.

Outcomes:

  • We learn the concept and techniques of Exploratory Data Analysis
  • Touch upon the issue of bias and how to mitigate it
  • Learn about more advanced formats such as HDF
  • Basic exploratory analysis using pandas
  • Easy visual analysis of association using seaborn
  • Contingency tables, WoE and Information Value using pandas, scipy and statsmodels
  • We summarize our findings in terms of numerical and graphical results in a mock report written in Markdown format

Course Level and Type:

Introductory Level Core Level Advanced Level
Non-Technical
Technical CrashProgram
DAT31048