James Stevenson

Baltimore, MD

Experience

Staff Software Engineer, Nuna — San Fransisco, CA 2018 – Present

I joined Nuna as a data engineer, working as a contractor on the Medicare Quality Payments Program, which processes billions of medical claims to derive quality and cost-effectiveness metrics for care provided. I worked on the implementation of individual cost and quality measures, as well as improvements to our platform and deployment processes. During that time I took over as tech lead for the project, where my responsibilities expanded to include architecture for new data processing systems, long-range planning for delivery of new measures, and coordination with project stakeholders including subject matter experts, data scientists, IT, and quality assurance personnel.

In January of 2019, I transferred to the commercial side of the business as a tech lead for a major new commercial project the company was launching. My responsibilities included architecting the project's data-processing pipeline, leading development of data processing components, and coordinating with data ingestion and web development teams that were also working on the product.

Lead Data Engineer, O'Reilly Media — Boston, MA 2014 – 2018

I joined the Data Engineering team as its first dedicated engineer, tasked with the process of architecting and building the ETL pipeline for data from the Safari Books Online platform and associated services. I worked to create a scalable, stable, and correct data warehouse for informing business and product decision making and designed and implemented RESTful APIs for feeding analytic data back into the Safari product.

I created and deployed an improved recommender system for the product to provide more relevant recommendations to users, increasing user engagement with recommendations and improving content discovery. I Led successive iterations of the product to improve recommendation quality and diversity utilizing ensemble recommendation methods.

I designed, helped to build, and deployed a platform on Google Cloud for performing large-scale NLP and ML experiments on our content. The full text size of our corpus was over two billion words with new works constantly being added, so the platform needed to be performant, scalable, and provide a way to easily bootstrap the work of data scientists wishing to utilize our full-text corpus to understand our content more deeply and drive new features with that knowledge. The platform was used in classification tasks, topic modeling, and document semantic similarity calculations.

Lead Software Engineer, Cobrain — Bethesda, MD 2013 – May 2014

Working on the Data Science team, my responsibilities began with Cobrain's data ingestion and transformation process. This included work on creating and scaling a distributed web crawler, using natural language processing to categorize products, and performing entity resolution of products across a wide range of data sources. More recently I worked with Neo4J—a graph database—to model our product and user data and develop algorithms to provide recommendations to users.

Tech Lead, Threespot — Washington, DC 2009 – 2013

I led development efforts on a number of large website projects for clients including Knight Foundation, The MacArthur Foundation, The Chronicle of Higher Education, the National Academy of Sciences, and more. I worked as a backend developer primarily on Python/Django projects, but also wrote code in JavaScript, Perl, PHP, and ColdFusion in addition to front-end development work. I took the lead in building, documenting, and maintaining a best-of-breed Django practice within our organization. I also worked to standardize devops and deployment work and implement Agile and Scrum development methodologies in projects across the company.

Education

Johns Hopkins University, Baltimore, MD — B.A. English Literature, 2005

Skills

Programming Languages, Etc.

Python, Scala, Haskell, Javascript, Bash, Groovy, Java, Perl, HTML, CSS/SASS

Big Data & Machine Learning

Spark, Hadoop, Redshift, Numpy/Scipy Stack, Pandas, SpaCy, Gensim

Databases

Postgresql, Neo4J, MongoDB, Sqlite, MySQL

Operating Systems

Linux (Debian and Ubuntu for servers primarily, but also some Red Hat Enterprise experience), OS X

Servers, Etc.

Nginx, Apache, uWsgi, Gunicorn, Solr

Frameworks/Libraries

Django, Flask, SQLAlchemy (Python), Scalaz/Cats (Scala)

Other Software

Docker, Kubernetes, AWS, Google Cloud, Puppet, Vagrant, VirtualBox, Confluence, Jira, Git, Subversion, Sphinx, Pentaho

Open Source Projects

CNC Pattern Library

A Haskell-based library for creating SVG patterns for CNC routers.

Robosquirt

An automated watering system for lawns and gardens. Written in Python and running on a custom-built valve control mechanism using a Raspberry Pi Zero W. The Pi is joined to your home wireless network, which allows you to control your watering system from anywhere.

ncsa-logparse

An NCSA logfile parser written in Haskell. There is a blog post series (part 1 and part 2) that uses this code to teach people about Haskell and parser combinators.

agiluf

A static blogging engine written in Haskell. My blog post on the subject provides more information.

django-scaffold

A reusable application that provides for a generic section/subsection hierarchy in Django.

django-redactoreditor

A project that provides integration with the Redactor Javascript WYSIWYG editor in Django.