After five months of working at Saltbox in early 2012, funds ran out. I seriously considered taking a gamble with a work-for-equity offer, but I didn’t see at the time how I could make ends meet while doing so. I left.

I liked the team I worked with in those few months, as you could probably tell from my past blog posts. Ali and I kept in touch via IM occasionally - and that’s how the Tin Khan project came about.

By October, Saltbox completed the pivot to working on Wax, their learning record store. The old “SaLTBOX” app shut down. Wax LRS was now the primary product of Saltbox.

A week before Halloween, Ali and I had already talked about the Tin Can API (now the Experience API) and it piqued my interest. I floated the idea of doing a little demo project, since I had some spare time. The idea was: pulling student data from Khan Academy’s API and putting it into an LRS.

He said:

A vague demo that can connect to Wax would be enough to blow their minds. Better yet, if you could get it done by Wednesday… we can get it to DevLearn for you in Vegas! :)

This was Wednesday, the week before DevLearn. So I slapped together a Django (a web framework for Python) app that let you add a list of students and an LRS endpoint, then used Celery (a distributed task queue) to send an email to each student asking them to click a link. That link takes them back into Tin Khan, pushes them through the OAuth process for the Khan Academy API, and then pulls their activities (videos watched, badges earned, and exercises done) into a local database. New activities were discovered and synchronized to the configured LRS via Tin Can statements sent via HTTP.

One week. The final result was something that worked as a demo, but would need some actual architecture for real world use. Pretty much what people mean when they talk about an “MVP” - a minimum viable product.

Unfortunately, the Khan Academy API doesn’t provide a way to request just the information about a user’s newest activities (i.e. what’s changed since the last time we looked). That means that Tin Khan had to fetch all of the activities every time and store them locally, so that it could compare with what had already been sent to the LRS previously and send the difference. That works for a demo, but not for real use.

It now (four years later) looks like the Khan Academy API supports constraining the date range when fetching a user’s watched videos. If I were to build Tin Khan today, that’s where I would start.

Four years ago, other projects took precedence, and this project was shelved. A few days ago, as I was going through a list of all the projects I’ve worked on in the past few years, I thought of this project. I still had the source sitting in an archive on Dropbox, so I decided I’d put it up on GitHub for future reference (or laughs ;).

Saltbox has occupied a large part of my life for the past five years. I’ve passed on other opportunities to focus on this team and the product we were building, and I’ve learned a lot in the process.

Why Saltbox? The people, many of whom I’ve previously praised in Not Thirsty for Cool Projects at Saltbox, are the core of why I keep finding myself back there. They are focused, driven, and dedicated; but also friendly, supportive, and professional.

Saltbox has had many growing pains, as should be expected in a bootstrapped “startup”, including pivots and cash-flow issues. The company went through its major pivot in product and focus back in mid-2012. And while I still think back on that initial application I worked on, back when I wrote “Not Thirsty …”, as something with a lot of potential, it’s nothing in comparison to Wax LRS.

Wax LRS is a “web scale”1 platform for receiving data about what learners are doing, supporting an open specification that lets anyone define statements2 that describe some activity. These statements include a core tuple of who, what, how, and when; along with associated metadata that can include scoring, categorization, and whatever else with support for extensions to the schema.

Built on top of a data platform to receive and store all the learning, Wax LRS provides “RESTful” interfaces to derive useful insights into the why’s of learning: Are your companies salespeople struggling to absorb knowledge about new products? What learning activities are the best indicator of later success in their jobs, and how is that different for different departments or teams?

The scope of what was possible with Wax LRS, in terms of the potential sources of data to drive reports and visualizations, as well as the possibilities for making a large impact in the professional lives of an unbounded number of learners that can be guided by L&D leaders using the knowledge they could gain from our product, really appeals to me. This is the kind of thing that excites me, and makes me want to keep coming back.

  1. That’s why we use PostgreSQL :P

  2. For more information, see the Experience API specification of statements at https://github.com/adlnet/xAPI-Spec/blob/master/xAPI-Data.md#statements

I’m an avid fan fiction reader, and one of my on-going projects is Fanonic.net - a fan fiction hosting site that provides helpful features for readers to find the kind of stories they are looking for more easily. If you haven’t heard of the term or come across fan fiction before, here’s the Wikipedia defintition from the fan fiction page:

Fan fiction (alternatively referred to as fanfiction, fanfic, FF, or fic) is a broadly-defined term for fan labor regarding stories about characters or settings written by fans of the original work, rather than by the original creator. Works of fan fiction are rarely commissioned or authorized by the original work’s owner, creator, or publisher; also, they are almost never professionally published. Because of this, many fan fictions written often contain a disclaimer stating that the creator of the work owns none of the characters. Fan fiction, therefore, is defined by being both related to its subject’s canonical fictional universe and simultaneously existing outside the canon of that universe.

Fanonic is free for both readers and authors. It’s main features, which I developed last year as part of its initial launch, are:

  • the ability to search within the entire text of stories
  • story tagging by users
  • an activity stream that lets you see when your favorite authors add new stores or publish new chapters and when friends earn new achievement badges.

What’s Been Added

  • Avatars: Your profile automatically uses your Gravitar image if it’s set, and you can also upload an image as your Fanonic avatar.

  • Badges: A few new achievement badges have been added

  • Favorites: Stories can be favorited and show up on your list of favorites

  • Fanfiction.net import: Import your stories from fanfiction.net into your Fanonic story list

  • Lots of style improvements

What’s Coming Next

I’m using Haystack to provide story search. Fanonic is hosted on a single server and is using Whoosh to index the site’s story data, but my intention is to switch to another indexer/backend for Haystack to allow for future scalability. Elastic Search looks like the best direction to go right now because it looks easier to set up and administer than Solr and should be easier to scale to multiple search index servers than Whoosh or Xapian.

The site will be getting immediate notifications for each user’s activity stream and author’s story import status updates. I’ve set up a new repository on github for this subproject: django-tsuchi. I’m also looking into adding reader statistics for story authors, but I haven’t decided on specifics at this point. Support for importing stories from other fan fiction repositories will be added gradually as well.

I’m getting the word out about Fanonic, and getting in touch with some fanfic authors about bringing their stories to the site. I’m hoping to work with both fanfic creators and readers to build a better experience for everyone.

Technical Details

Django, nginx, uwsgi, redis, memcached, postgresql, celery

  • The new story importer uses Celery to import the content in the in a separate worker process so users can continue using the site while their content is loaded into Fanonic. I’m using redis to hold the queued tasks. Later on, I’m planning on using redis for some future features involving gathering reader statistics as well.

  • Django is a web framework written in Python that provides a lot of the basic funcionality that Fanonic builds upon. Django has a large and growing ecosystem of third-party pluggable apps. I’ve published one such app that is going to provide a new feature for the site, which I’m calling django-tsuchi.

  • nginx serves static content, such as Javascript and CSS files, and acts as a proxy to uwsgi.

  • uwsgi interfaces with the Fanonic Python code to handle incoming requests and return the app’s responses.

  • Fanonic’s database is PostgreSQL.

  • Fanonic uses memcached to cache content generated by the backend.

Puppet

Puppet is a tool that allows system administrators to define how the servers they administer are configured, which programs are installed, which services should be running, and what user accounts should be present on them. I’m using Puppet to manage both my local development system and the production server that Fanonic runs. Because both systems have identical setups, I don’t have to worry about differences in the server setup breaking the site - what works locally is much more likely to work on the production server.

Fabric

  • I’m using Fabric to handle code deployment

  • Right now I’m using Fabric to push the Puppet manifests to the server and run puppet apply on them, rather than having a puppetmaster.

  • I’ve taken the common tasks for this kind of set up into a Python module of Fabric tasks.

I’ve been working on the Saltbox team for about four months now, and I’ve had the privilege both to work with some very talented people and work on interesting projects, both on the frontend and the backend. Here I’m going to summarize what Saltbox is, a little bit about our talented founders and development team, and the projects I’ve been working on over these last few months.

Saltbox, our sales and learning toolbox, is a web application that makes it easy to communicate in-house knowledge among peers; collect news and articles from the web via RSS or Atom feeds; and disseminate knowledge, news, and quizzes from managers. Central to the application are channels which act as continous streams of small, easily-digested, content for learners. Each company or team that sets up a Saltbox site can create groups to organize their employees or team members, and these groups can then be given access to channels suitable for that group. Learners can add RSS and Atom feeds as channels, and these feeds will automatically update as new content becomes available. Every user gets their own channel where they can post their own content, and anyone else in the same group will see this peer channel as available for subscription. Saltbox sites are mobile ready so that teams on the go can get the company news and product information they need when they need it.

John Delano and Ali Shahrazad have both been really great as non-technical founders. They have brought to the table a solid vision of what Saltbox should give to its users. It’s clear that they both have real insight into what sales teams out there are dealing with in terms of existing learning management tools and the lack of good tooling for continuous, mobile, and social learning in companies’ sales forces today.

John and Ali put together our development team, Russell Duhon; Brian Gershon; and myself, and it does them credit. Russell has brought some great ideas and a strong academic background to the team. He rewrote almost all of our database queries, ferreting out the original requirements and reimplementing to reduce the number of queries and code duplication while also improving our database performance. More recently, he has been leading our efforts in adding caching to the backend to reduce database trips. Brian rebuilt the entire Saltbox mobile web app using jQuery mobile (it’s quite nice) and has been a key part in the creation of unit tests and then refactoring of our Javascript code (which is the majority of our codebase).

Both Russell and Brian have made working on this team a great experience by both having strong opinions, identifying and communicating opportunities and pitfalls in code reviews, and being willing to listen and learn from critism and incorporate it in their work.

I’ve had a chance to work on several things at Saltbox that I am quite proud of and have learned some interesting things from. One key piece of functionality in Saltbox is the ability to add RSS or Atom feeds from the web.

Initially we fetched the feeds as needed, processed them, and sent them on to the user’s browser as needed - essentially proxying the RSS feed into our app. This had a few serious downsides given our architecture: entries in the feeds weren’t favoritable or searchable, we couldn’t easily keep track of read entries, and proxying the data would, as the number of users on the app increased, cause major performance issues and likely get the IP addresses of our servers banned from major sites that were subscribed to frequently. A different approach was needed.

Our key requirements were that feed content would be stored locally for searchablity and favoriting, feed processing would occur in the background to keep our web servers from having being preoccupied with long running feed processing tasks, feeds would be refreshed depending on the frequency that new content is available so that we can keep our fetching to a minimum while also getting the latest content for our users, and Javascript and other undesirable markup would be removed from the feed entries before display to the user. The end result was an integration of our Python web backend determining which feeds should be reloaded at a given time; a cronjob that polls our web backend for the feeds and inserts tasks to fetch the feeds into Amazon’s Simple Queue Service (SQS); and a NodeJS/CoffeeScript service that polls SQS for feed tasks, processes their content, and posts the new content back to our web backend.

Another project that spun off from the feed processing service was our thumbnailing system. We find and extract a suitable image from each entry in the feed content and generate a thumbnail to display in our app. We ran into some difficulties performing thumbnailing reliably as part of the feed processing task on NodeJS. This was because thumbnailing took significantly more time than processing the feed itself and handling both tasks separately while making sure that thumbnails still got connected to the correct feed entry proved very difficult to implement. I built a scalable thumbnailing generation service in Java using Amazon’s Simple Workflow Service and Flow Framework (which is part of the Java SDK).

This service takes one or more URLs from the feed processing system via HTTP POST to a round-robin DNS entry, creates a loading placeholder for the thumbnail, fetches an image from each URL, filters these images based on a “thumbnail specification”, scales the selected image, and stores the final thumbnail in S3 (another amazon service). Amazon’s workflow service allows the system to scale to an extent that I doubt we’ll reach. The service is currently fetching, analyzing, resizing, and storing about 12,000 thumbnail images every 24 hours, on two servers that could likely handle several times that amount. It’s been pretty fun to build this system and see it actually work. Very exciting!

This post has gotten quite long, so I will have to dig into more depth about some of these things in future posts. I’ve had a great time as part of Saltbox team and I am really thankful of the opportunity its been so far.

I have found myself writing about my most recent projects (as of October 2011) and I thought I’d put all of this in one place. Here it goes:

I have about six years of experience doing web application development, and have been programming since 1996 (first as a hobbyist, then professionally), primarily in C++, Python, and Javascript.

I’ve been developing projects using Django since the 1.0 release (summer of 2008). My most recent Django projects:

  • A phone number verification system using Twilio, which generates and speaks a PIN to the end user over the phone. It was developed with Django 1.3 using the django_twilio app.
  • A data importing system, which matched existing records fetched via an external API, allowed the user to manually select the best match, edit the generated record, and review records to be imported. This project was also developed using Django 1.3 and jQuery for a simple auto-completed search function. I also used Celery (and the django-celery app) to deal with the task of processing the uploaded data in the background.
  • I am currently between iterations on http://mathisasport.com/, which is also developed with Django and jQuery. Most recently, I have developed several statistics gathering methods for the model managers in the app, using custom SQL (this site is using MySQL) as needed to allow the records returned in QuerySets to be sortable by the results.
  • A series of HTTP-based web services for an unreleased web service API. The API calls return results in JSON and were developed using Django’s class-based views.
  • My most recent personal Django project is at http://fanonic.net/. It’s still in development, but somewhat usable at this stage, although there isn’t much of any content there yet.

I also have lots of examples of my work open sourced at http://github.com/saebyn/. I’ve recently pushed out some updates for the one Django project I have there, django-classifieds, in the django-1.3 branch.

The largest example of my HTML, CSS, and Javascript skills is at http://familysnap.saebyn.info (the original site was taken offline several years ago). I was responsible for about 60% of the front-end for that site, and a considerable amount of the backend as well (which is PHP, so I won’t go into more detail about that).

You can find more about me on my About page and in my online resume.

Update (July 27th, 2012): My mirror of FamilySnap is no longer online.