Levenshtein-distance Stock Typo Analysis

Overview

Link: GitHub Repository

Designed a data analysis pipeline to study the prevalence of buying pressure due to typos made by retail traders.

Key Features

  • Automated Pipeline: Automated pulling of latest trade data and analysis of likely ticker pairs.
  • Smart Filtering: Used smart filtering of names and keyboard distance (Levenshtein distance) to identify alpha based on genuine typos as opposed to spurious correlations.
  • Event Detection: Implemented analysis to detect typo trading on high-volume and high-likelihood days, such as IPOs.