Staff Software Engineer, Nuna — San Fransisco, CA 2018 – Present
I joined Nuna as a data engineer, working as a contractor on the Medicare Quality Payments Program, which processes billions of medical claims to derive quality and cost-effectiveness metrics for care provided. I worked on the implementation of individual cost and quality measures, as well as improvements to our platform and deployment processes. During that time I took over as tech lead for the project, where my responsibilities expanded to include architecture for new data processing systems, long-range planning for delivery of new measures, and coordination with project stakeholders including subject matter experts, data scientists, IT, and quality assurance personnel.
In January of 2019, I transferred to the commercial side of the business as a tech lead for a major new commercial project the company was launching. My responsibilities included architecting the project's data-processing pipeline, leading development of data processing components, and coordinating with data ingestion and web development teams that were also working on the product.
Lead Data Engineer, O'Reilly Media — Boston, MA 2014 – 2018
I joined the Data Engineering team as its first dedicated engineer, tasked with the process of architecting and building the ETL pipeline for data from the Safari Books Online platform and associated services. I worked to create a scalable, stable, and correct data warehouse for informing business and product decision making and designed and implemented RESTful APIs for feeding analytic data back into the Safari product.
I created and deployed an improved recommender system for the product to provide more relevant recommendations to users, increasing user engagement with recommendations and improving content discovery. I Led successive iterations of the product to improve recommendation quality and diversity utilizing ensemble recommendation methods.
I designed, helped to build, and deployed a platform on Google Cloud for performing large-scale NLP and ML experiments on our content. The full text size of our corpus was over two billion words with new works constantly being added, so the platform needed to be performant, scalable, and provide a way to easily bootstrap the work of data scientists wishing to utilize our full-text corpus to understand our content more deeply and drive new features with that knowledge. The platform was used in classification tasks, topic modeling, and document semantic similarity calculations.
Lead Software Engineer, Cobrain — Bethesda, MD 2013 – May 2014
Working on the Data Science team, my responsibilities began with Cobrain's data ingestion and transformation process. This included work on creating and scaling a distributed web crawler, using natural language processing to categorize products, and performing entity resolution of products across a wide range of data sources. More recently I worked with Neo4J—a graph database—to model our product and user data and develop algorithms to provide recommendations to users.
Tech Lead, Threespot — Washington, DC 2009 – 2013
Johns Hopkins University, Baltimore, MD — B.A. English Literature, 2005
Programming Languages, Etc.
Big Data & Machine Learning
Spark, Hadoop, Redshift, Numpy/Scipy Stack, Pandas, SpaCy, Gensim
Postgresql, Neo4J, MongoDB, Sqlite, MySQL
Linux (Debian and Ubuntu for servers primarily, but also some Red Hat Enterprise experience), OS X
Nginx, Apache, uWsgi, Gunicorn, Solr
Django, Flask, SQLAlchemy (Python), Scalaz/Cats (Scala)
Docker, Kubernetes, AWS, Google Cloud, Puppet, Vagrant, VirtualBox, Confluence, Jira, Git, Subversion, Sphinx, Pentaho
Open Source Projects
CNC Pattern Library
A Haskell-based library for creating SVG patterns for CNC routers.
An automated watering system for lawns and gardens. Written in Python and running on a custom-built valve control mechanism using a Raspberry Pi Zero W. The Pi is joined to your home wireless network, which allows you to control your watering system from anywhere.
An NCSA logfile parser written in Haskell. There is a blog post series (part 1 and part 2) that uses this code to teach people about Haskell and parser combinators.
A static blogging engine written in Haskell. My blog post on the subject provides more information.
A reusable application that provides for a generic section/subsection hierarchy in Django.