New Course: Working with Large Matrices using Command Line Tools

New Course: Working with Large Matrices using Command Line Tools

by Ad Min -
Number of replies: 0

 

Dear Academy users,

we are happy to release a fresh new courseWorking with Large Matrices using Command Line Tools

What is this course about

In this course we explore a number of Linux command line tools (CLI):

  • Bash scripting
  • Several basic CLI commands (ls, cd, etc.)
  • File manipulation oriented CLI commands such as head, cut, wc
  • The awk programming language and scripting

We apply these in a very concrete context: working with large matrix files that form part of various economic input-output models. Such files are cumbersome to work with in spreadsheets, but on the other hand the overhead of using a full-blown statistical / data science environment might be also high. Command line tools offer a handy intermediate approach that may be useful in various context.

Prerequisites

Basic knowledge of and a working setup of a Linux or Linux-like development environment (including working with a shell and a text editor) is essential. Any standard Linux distribution should work (Using WSL on Windows machines) and MacOS as well (possibly with the installation of GNU tools). 

Some exposure to scripting and any general purpose programming language (E.g., Python, Javascript, C++, Java) is required for understanding the scripts and work through the awk exercises.

The course derives motivation from the large matrix data processing task. Hence, some idea of what a matrix is and why it is relevant to know how to work with them is assumed, but it is not required for completing the course as we do now go into any mathematical aspects of matrices.

Table of Contents

  • Motivation for Command Line tools
  • Overview and Setup of CLI Tools
  • A hello world in Awk
  • Downloading Data: Using command line tools to get published matrix data stored in local disk
  • Extracting Data: verify we have downloaded correct datasets and (if necessary) bring to a shape that makes it usable (e.g. uncompressing it)
  • Scanning Data Files: get a first high level view of what sort of files we have downloaded
  • Figuring out Structure and Dimensions: understand structure of the file (separators, total number of rows and columns involved and their nature).
  • Scrubbing / Cutting / Reshaping: create clean files where matrix data with a known number of rows and columns are stored in tab separated ascii format.
  • Transformations: Perform simple mathematical transformations and statistical operations. Investigate the degree to which matrix values are non-trivial (non-zero) 

Resources

We will work with Input-Output matrices downloaded from well known public distributions (EXIOBASE, FIGARO, OECD-ICIO). Scripts providing guidance and solutions to the suggested exercises are available the Open Risk Academy Gitub Repositories.

Enjoy!