What we will cover
In this course we explore a number of Linux command line tools (CLI):
- Bash scripting
- Several basic CLI commands (ls, cd, etc.)
- file manipulation oriented CLI commands such as head, cut, wc
- the awk programming language and scripting
We will apply these in a very concrete context: large matrix files that form part of various economic input-output models.
Pre-Requisites
Basic knowledge of and a working setup of a Linux or Linux-like development environment (including working with a shell and a text editor) is essential. Any standard Linux distribution should work (Using WSL on Windows machines) and MacOS as well (possibly with the installation of GNU tools).
Some exposure to scripting and any general purpose programming language (E.g., Python, Javascript, C++, Java) is required for understanding the scripts and work through the awk exercises.
The course derives motivation from the large matrix data processing task. Hence, some idea of what a matrix is and why it is relevant to know how to work with them is assumed, but it is not required for completing the course as we do now go into any mathematical aspects of matrices.
Table of Contents
Step 1
- Motivation for Command Line tools
- Overview and Setup of CLI Tools
- A hello world in Awk
Step 2
- Downloading Data: Using command line tools to get published matrix data stored in local disk
- Extracting Data: verify we have downloaded correct datasets and (if necessary) bring to a shape that makes it usable (e.g. uncompressing it)
Step 3
- Scanning Data Files: get a first high level view of what sort of files we have downloaded
- Figuring out Structure and Dimensions: understand structure of the file (separators, total number of rows and columns involved and their nature).
Step 4
- Scrubbing / Cutting / Reshaping: create clean files where matrix data with a known number of rows and columns are stored in tab separated ascii format.
- Transformations: Perform simple mathematical transformations and statistical operations. Investigate the degree to which matrix values are non-trivial (non-zero)