New Course: Processing (US) Agency Mortgage Data Using Awk, Pandas (Part 1)

New Course: Processing (US) Agency Mortgage Data Using Awk, Pandas (Part 1)

by Ad Min -
Number of replies: 0

A new crash course (PYT26065) is now available on the Academy for all users


An illustration of various stages of data processing


Course Content:

This crash course illustrates how to process loan-level US Agency mortgage data using awk, pandas and django. The first part of the course focuses on static (acquisition) type attributes. This part covers the following topics:

  • Downloading and preprocessing historical loan-performance datasets using awk
  • Processing loan-performance datasets using pandas
  • Working and fixing any missing data type issues
  • Classifying data attributes according to a Credit Data Taxonomy
  • Manipulating and exporting derived data models using pandas
  • The Split-Apply-Combine process
  • Importing data models into a django based web platform (openNPL) that enables interactive work with the data

Nota Bene: The course requires actual historical loan performance data for its proper completion but those data are not provided. Students must source such data themselves from the Data Dynamics website and agree to be in compliance with the applicable terms and conditions.

Who Is This Course For:

The course is useful to:

  • Data Engineers / Data Scientists across the financial industry and beyond that need to work with mortgage data
  • Credit Risk Management professionals and students
  • Credit Portfolio Management professionals

How Does The Course Help:

Mastering the course content provides background knowledge towards the following activities:

  • Improved ability to process large loan-level historical performance data
  • Pre-process, categorize, segment and improve on such data sets in preparation for further analysis

What Will You Get From The Course:

  • You will be able to confidently work with Loan-level historical performance data
  • You will be able to contribute to the specific use cases mentioned above

Course Level and Difficulty Level:

This course is part of the Risk Modeling using Python family.

  • This is a Core Level course in Risk Modelling. A good grounding at Introductory level to various Data Engineering and Datea Science topics is a prerequisite for making the most out of this course.
  • This is a Technical course which means certain technology elements (Python, CLI) are needed for mastering the material.

If you have not taken an Open Risk Academy course before the "CrashCourse Academy Demo" provides a quick overview of the Academy.

The following table places the course in the Open Risk Academy skills diagram:

Course Level & Type
Introductory Level Core Level Advanced Level
Non-technical
Technical CrashProgram
PYT26065

Course Material:

The course material comprises the following:

Time Requirements and Important Dates

  • The course is self-paced and can be undertaken at any point. It requires a commitment of about five hours total, depending on student familiarity and existing development environment.

Where To Get Help:

If you get stuck on any issue with the course or the Academy:

  • If the issue is related to the course topics / material, check in the first instance the Course Forum
  • If the issue is related the operation of the Open Risk Academy check first the Academy FAQ. If the issue persists contact us at info@openrisk.eu