John Weaver's Blog

Software Engineer / Web App Developer

Fanonic Update

I’m an avid fan fiction reader, and one of my on-going projects is Fanonic.net - a fan fiction hosting site that provides helpful features for readers to find the kind of stories they are looking for more easily. If you haven’t heard of the term or come across fan fiction before, here’s the Wikipedia defintition from the fan fiction page:

Fan fiction (alternatively referred to as fanfiction, fanfic, FF, or fic) is a broadly-defined term for fan labor regarding stories about characters or settings written by fans of the original work, rather than by the original creator. Works of fan fiction are rarely commissioned or authorized by the original work’s owner, creator, or publisher; also, they are almost never professionally published. Because of this, many fan fictions written often contain a disclaimer stating that the creator of the work owns none of the characters. Fan fiction, therefore, is defined by being both related to its subject’s canonical fictional universe and simultaneously existing outside the canon of that universe.

Fanonic is free for both readers and authors. It’s main features, which I developed last year as part of its initial launch, are:

  • the ability to search within the entire text of stories
  • story tagging by users
  • an activity stream that lets you see when your favorite authors add new stores or publish new chapters and when friends earn new achievement badges.

What’s Been Added

  • Avatars: Your profile automatically uses your Gravitar image if it’s set, and you can also upload an image as your Fanonic avatar.

  • Badges: A few new achievement badges have been added

  • Favorites: Stories can be favorited and show up on your list of favorites

  • Fanfiction.net import: Import your stories from fanfiction.net into your Fanonic story list

  • Lots of style improvements

What’s Coming Next

I’m using Haystack to provide story search. Fanonic is hosted on a single server and is using Whoosh to index the site’s story data, but my intention is to switch to another indexer/backend for Haystack to allow for future scalability. Elastic Search looks like the best direction to go right now because it looks easier to set up and administer than Solr and should be easier to scale to multiple search index servers than Whoosh or Xapian.

The site will be getting immediate notifications for each user’s activity stream and author’s story import status updates. I’ve set up a new repository on github for this subproject: django-tsuchi. I’m also looking into adding reader statistics for story authors, but I haven’t decided on specifics at this point. Support for importing stories from other fan fiction repositories will be added gradually as well.

I’m getting the word out about Fanonic, and getting in touch with some fanfic authors about bringing their stories to the site. I’m hoping to work with both fanfic creators and readers to build a better experience for everyone.

Technical Details

Django, nginx, uwsgi, redis, memcached, postgresql, celery

  • The new story importer uses Celery to import the content in the in a separate worker process so users can continue using the site while their content is loaded into Fanonic. I’m using redis to hold the queued tasks. Later on, I’m planning on using redis for some future features involving gathering reader statistics as well.

  • Django is a web framework written in Python that provides a lot of the basic funcionality that Fanonic builds upon. Django has a large and growing ecosystem of third-party pluggable apps. I’ve published one such app that is going to provide a new feature for the site, which I’m calling django-tsuchi.

  • nginx serves static content, such as Javascript and CSS files, and acts as a proxy to uwsgi.

  • uwsgi interfaces with the Fanonic Python code to handle incoming requests and return the app’s responses.

  • Fanonic’s database is PostgreSQL.

  • Fanonic uses memcached to cache content generated by the backend.

Puppet

Puppet is a tool that allows system administrators to define how the servers they administer are configured, which programs are installed, which services should be running, and what user accounts should be present on them. I’m using Puppet to manage both my local development system and the production server that Fanonic runs. Because both systems have identical setups, I don’t have to worry about differences in the server setup breaking the site - what works locally is much more likely to work on the production server.

Fabric

  • I’m using Fabric to handle code deployment

  • Right now I’m using Fabric to push the Puppet manifests to the server and run puppet apply on them, rather than having a puppetmaster.

  • I’ve taken the common tasks for this kind of set up into a Python module of Fabric tasks.

Not Thirsty for Cool Projects at Saltbox

I’ve been working on the Saltbox team for about four months now, and I’ve had the privilege both to work with some very talented people and work on interesting projects, both on the frontend and the backend. Here I’m going to summarize what Saltbox is, a little bit about our talented founders and development team, and the projects I’ve been working on over these last few months.

Saltbox, our sales and learning toolbox, is a web application that makes it easy to communicate in-house knowledge among peers; collect news and articles from the web via RSS or Atom feeds; and disseminate knowledge, news, and quizzes from managers. Central to the application are channels which act as continous streams of small, easily-digested, content for learners. Each company or team that sets up a Saltbox site can create groups to organize their employees or team members, and these groups can then be given access to channels suitable for that group. Learners can add RSS and Atom feeds as channels, and these feeds will automatically update as new content becomes available. Every user gets their own channel where they can post their own content, and anyone else in the same group will see this peer channel as available for subscription. Saltbox sites are mobile ready so that teams on the go can get the company news and product information they need when they need it.

John Delano and Ali Shahrazad have both been really great as non-technical founders. They have brought to the table a solid vision of what Saltbox should give to its users. It’s clear that they both have real insight into what sales teams out there are dealing with in terms of existing learning management tools and the lack of good tooling for continuous, mobile, and social learning in companies’ sales forces today.

John and Ali put together our development team, Russell Duhon; Brian Gershon; and myself, and it does them credit. Russell has brought some great ideas and a strong academic background to the team. He rewrote almost all of our database queries, ferreting out the original requirements and reimplementing to reduce the number of queries and code duplication while also improving our database performance. More recently, he has been leading our efforts in adding caching to the backend to reduce database trips. Brian rebuilt the entire Saltbox mobile web app using jQuery mobile (it’s quite nice) and has been a key part in the creation of unit tests and then refactoring of our Javascript code (which is the majority of our codebase).

Both Russell and Brian have made working on this team a great experience by both having strong opinions, identifying and communicating opportunities and pitfalls in code reviews, and being willing to listen and learn from critism and incorporate it in their work.

I’ve had a chance to work on several things at Saltbox that I am quite proud of and have learned some interesting things from. One key piece of functionality in Saltbox is the ability to add RSS or Atom feeds from the web.

Initially we fetched the feeds as needed, processed them, and sent them on to the user’s browser as needed - essentially proxying the RSS feed into our app. This had a few serious downsides given our architecture: entries in the feeds weren’t favoritable or searchable, we couldn’t easily keep track of read entries, and proxying the data would, as the number of users on the app increased, cause major performance issues and likely get the IP addresses of our servers banned from major sites that were subscribed to frequently. A different approach was needed.

Our key requirements were that feed content would be stored locally for searchablity and favoriting, feed processing would occur in the background to keep our web servers from having being preoccupied with long running feed processing tasks, feeds would be refreshed depending on the frequency that new content is available so that we can keep our fetching to a minimum while also getting the latest content for our users, and Javascript and other undesirable markup would be removed from the feed entries before display to the user. The end result was an integration of our Python web backend determining which feeds should be reloaded at a given time; a cronjob that polls our web backend for the feeds and inserts tasks to fetch the feeds into Amazon’s Simple Queue Service (SQS); and a NodeJS/CoffeeScript service that polls SQS for feed tasks, processes their content, and posts the new content back to our web backend.

Another project that spun off from the feed processing service was our thumbnailing system. We find and extract a suitable image from each entry in the feed content and generate a thumbnail to display in our app. We ran into some difficulties performing thumbnailing reliably as part of the feed processing task on NodeJS. This was because thumbnailing took significantly more time than processing the feed itself and handling both tasks separately while making sure that thumbnails still got connected to the correct feed entry proved very difficult to implement. I built a scalable thumbnailing generation service in Java using Amazon’s Simple Workflow Service and Flow Framework (which is part of the Java SDK).

This service takes one or more URLs from the feed processing system via HTTP POST to a round-robin DNS entry, creates a loading placeholder for the thumbnail, fetches an image from each URL, filters these images based on a “thumbnail specification”, scales the selected image, and stores the final thumbnail in S3 (another amazon service). Amazon’s workflow service allows the system to scale to an extent that I doubt we’ll reach. The service is currently fetching, analyzing, resizing, and storing about 12,000 thumbnail images every 24 hours, on two servers that could likely handle several times that amount. It’s been pretty fun to build this system and see it actually work. Very exciting!

This post has gotten quite long, so I will have to dig into more depth about some of these things in future posts. I’ve had a great time as part of Saltbox team and I am really thankful of the opportunity its been so far.

Recent Projects

I have found myself writing about my most recent projects (as of October 2011) and I thought I’d put all of this in one place. Here it goes:

I have about six years of experience doing web application development, and have been programming since 1996 (first as a hobbyist, then professionally), primarily in C++, Python, and Javascript.

I’ve been developing projects using Django since the 1.0 release (summer of 2008). My most recent Django projects:

  • A phone number verification system using Twilio, which generates and speaks a PIN to the end user over the phone. It was developed with Django 1.3 using the django_twilio app.
  • A data importing system, which matched existing records fetched via an external API, allowed the user to manually select the best match, edit the generated record, and review records to be imported. This project was also developed using Django 1.3 and jQuery for a simple auto-completed search function. I also used Celery (and the django-celery app) to deal with the task of processing the uploaded data in the background.
  • I am currently between iterations on http://mathisasport.com/, which is also developed with Django and jQuery. Most recently, I have developed several statistics gathering methods for the model managers in the app, using custom SQL (this site is using MySQL) as needed to allow the records returned in QuerySets to be sortable by the results.
  • A series of HTTP-based web services for an unreleased web service API. The API calls return results in JSON and were developed using Django’s class-based views.
  • My most recent personal Django project is at http://fanonic.net/. It’s still in development, but somewhat usable at this stage, although there isn’t much of any content there yet.

I also have lots of examples of my work open sourced at http://github.com/saebyn/. I’ve recently pushed out some updates for the one Django project I have there, django-classifieds, in the django-1.3 branch.

The largest example of my HTML, CSS, and Javascript skills is at http://familysnap.saebyn.info (the original site was taken offline several years ago). I was responsible for about 60% of the front-end for that site, and a considerable amount of the backend as well (which is PHP, so I won’t go into more detail about that).

You can find more about me on my About page and in my online resume.

Update (July 27th, 2012): My mirror of FamilySnap is no longer online.

So No More Flying Players in My New Game

I finally realized I could simply track when the player hit a horizontal surface and set a ‘onGround’ flag, and then turn that off when you jump. then each jump attempt checks the flag, if it’s not set it won’t let you jump. Obviously, I tried a much more complicated method first. So no more flying players in my new game! (at least unless I add jet pack or something)

View Parameters in Reusable Django Apps

I’ve read through Eric Holscher’s blog post about Reusable App Conventions in Django a few times, and it got me thinking about problems I have had in the past with other developer’s apps.

When you write your views for a Django application you intend to release, unless every view has exactly the same keyword arguments, please make sure to add a **kwargs parameter to your views. This lets your users pass arguments to your views with the include() in their URLconf, without getting errors from the views that don’t have that parameter listed. On my recent fan-fiction web site, I had to work around this problem and it’s really annoying to have to make a local copy and edit it for what could be really trivial to address by the reusable app creator.

Nebulous Projects

I was recently asked how I deal with nebulous or otherwise confused projects. Here’s what I told them:

Ill-defined projects have been par for the course during my freelance software development career. If the client had no experience as a programmer, and had not already hired someone to refine their project requirements, they frequently came to me with a proposal too vague to transform into a finished product. The key to refining and focusing a project’s goals is to focus on the client’s original thought process, and possibly the sequence of events which caused them to seek my expertise.

I start by determining how to shape the project so that it is computationally and financially practical, and addresses the client’s most important business needs. Then, I break down the problems into smaller parts, and prioritize individual items with respect to both business need and risk (how hard will this actually be to implement, will that even be possible, etc). This strategy eventually results in requirements that are clear enough for me to feel comfortable coding a solution.

Django-classified-ads

Announcing the release of django-classified-ads, which is being hosted by Google Code. For everyone who has e-mailed me about this project… here it is! All of the templates and static media (CSS, images, jQuery, and TinyMCE) from django-classifieds.zxdevelopment.com has been included in the source code repository. I’ve replaced some of the older custom code with existing Django applications like django-profiles django-registration django-paypal.

Update: This code is now available on github

Django-Classifieds

Back in July, my company was contracted to work on a new classified ad site related to the golf industry. A few weeks prior to this we had decided to take the plunge and start using Django on projects whenever possible. After a few emails back-and-forth about this project, I knew that Django would make a good base for building it.

The key feature requested for the original site was to be able to support several categories of postings, each with their own distinct fields. I used a ‘Field’ model to represent the different field types available for each ad category. A ‘FieldValue’ model was used to store the value of a specific field, and associate it with the ‘Field’ and ‘Ad’ via foreign keys.

Additional features include searching, attaching images to ads, PayPal checkout, configurable pricing options, and custom templates for listing, viewing, and editing each category of ad posting.

You can see a demo version of this software here.

I can release this as an open source application, so that others can use it to build their own classified ad sites, but before I do I need to clean up the code a bit more and get more documentation in place.

Edit: Please see the follow up post.