Posts

Projects

MegaMap

Giving scientist superpowers in the battle against aging and age-related diseases

math-input

Khan Academy's expression editor for the mobile web.

Script Playground

An in-browser playground for the Bitcoin Script programming language.

Jasper

An open source platform for developing always-on, voice-controlled applications, built on top of a Raspberry Pi. The GitHub repository has over 4500 stars and 1000 forks. (See our coverage in WIRED, Forbes, and Lifehacker.)

KhanQuest

Porting the Khan Academy learning mechanics to a fantasy video game. Built as part of Khan Academy's Third Annual Healthy Hackathon.

Introduction to Hacking

A weekly "hack" class taught at Princeton University with Shubhro Saha to survey useful programming tools and techniques, many of which are excluded from the traditional CS curriculum. Topics covered included: browser automation, web scraping, and computer security (i.e., real hacking).

MAD Topic Model

A topic-model-based tool for authorship detection and stylistic analysis that extracts rich sylistic and lexical n-gram features from text. These features are then used as vocabularies over which topic models are generated. Implemented as part of a larger research project.

Online Boosting

A suite of online (in the machine-learning sense) boosting and weak learning algorithms, implemented in Python, including those of Oza & Russell and Chen et al. Implemented as part of a larger research project.

point-location

Fast point location in planar subdivisions using Kirkpatrick's Algorithm (see here for a demo). Implemented in Python with Numpy, Matplotlib, and more.

semantic

A Python library for extracting semantic information (e.g., dates, numbers, mathematical calculations) from text. Available via PyPI.

Quizzler

NLP-based, automatic quiz-question generation for iOS with a Python back-end. Rated the first place entry in Facebook Seattle's Summer of Hack Hackathon.

wikipedia.py

A Wikipedia API for humans (implemented in Python). This is the same API used by Quizzler to extract useful information from Wikipedia without worrying about low-level scraping.

EveryCollegeCal

An iPhone app for tracking undergraduate college calendars. Included information from hundreds of undergraduate schools and saw hundreds of downloads through the App Store. Note that this project is no longer maintained.

OCaml Threads

A threading module for mimicking parallelism in OCaml by marshalling data across multiple Unix processes. Includes some simple examples and benchmarks.

Grapher

An interface for creating dynamic graphs in iOS, developed as part of a hackathon project. Note that this project is not under active development.

Papers

Towards a Better Understanding of Noun Compound Interpretability

In this paper, we explore the question of "What makes a noun compound interpretable?", taking a computational linguistic approach. Submitted to the Princeton Computer Science department as a senior thesis, advised by Prof. Christiane Fellbaum.

An Overview of Boosting Techniques in the Online Learning Setting

Boosting in the batch setting is a well-known machine learning technique with strong theoretical foundations and extensive use in practice. In this paper, we study the role of online boosting, in which we must ensemble online weak learners into a single online strong learner.

An Introduction to Continuous (or Differential) Entropy

Classically, Shannon entropy was formalized over discrete probability distributions. In this paper, we explore the role of entropy in the continuous domain.

A Generalized Algorithm for Flow Table Optimization

In this paper, we present a series of highly general algorithms for flow table optimization which are parameterized on user-provided hardware specifications. These algorithms allow programmers to implement and optimize policies on network switches regardless of the number of tables available and the type of pattern-matching performed.
© 2023 Charlie Marsh