# Friday, March 25, 2011

See also:

In Part I we looked at how I first got into Agile and Scrum. Last week in Part II, we explored how Scrum failed to be flexible enough to fit into my unique process. Today we will take a look at how I got introduced to Kanban.

The start-up I worked at a few years ago that I described in Part II successfully used Scrum for traditional software development, however, when we were faced with a pretty unique development requirement, Scrum failed us. To refresh your memory from Part II, we had to spider thousands of web sites. Each web site was entered with a specification from a business user and prioritized. The developers worked on the highest priority item in the queue and published the RegEx patterns to a test server where the overnight process picked them up and the results were verified by a tester the next day and put into production. In Agile terms, we had a big backlog, each item in the backlog was about the same “size” and the customer (business users) prioritized the items in the backlog.

As I said in Part II, Scrum did not work, for starters we did not need a cross functional team, our team had about 10 developers on it that did one thing: produce the RegEx patterns. No need for a daily scrum either. In addition, we did not have to time box our development effort into a two week or month long cycle, we delivered daily. We prioritized daily as the new requirements came in and old sites stopped working.

All work was the same so we would assign a state to the work:

  • Web site identified for spidering
  • Site’s business analysis and rules complete
  • Site sitting in the developer queue
  • In progress
  • Done-need verify
  • Testing
  • Done-verified

We developed a classic “pull” system. The developers pulled a single work item (as opposed to a batch of work) out of a queue (backlog) and when they were done put them into the test queue (which when complete was then scheduled to go into production.)

We also developed different classes of service for each work item. At first 90% of the sites in our queue were listed as “High” priority and then the business asked me if we could have a “Very High” priority status. I said no, since I knew it would eventually be abused like how the high priority status was. We then made the process simple no more than 10 highs in the system at a time (we only have 10 developers for example) and they were expected to be into testing within 24 hours. Mediums would be assumed to be done in 48 hours and low had no guarantee, we’ll get to them when we get to them-we let the business reprioritize (and change status) daily.

After a few months we made another change. Each developer could do on average two sites a day, so our throughput was 100 Regex a week for a team of 10. At first the business would have 200+ sites in the queue and each morning promote 10 mediums to high (or 8 mediums to high if 2 new highs came in as new items.) After constant daily reprioritizing, we decided to limit the number of items in the queue/backlog to only 2 days’ worth of work since our guarantee was 48 hours for mediums and 24 hours for highs. Now new high priority sites were added only as needed and every two days the business would add 40 new sites into the queue. (In reality, the team never hit 100% on the dot, so sometimes we would add only 32 items since 8 were still left over, etc.)

This process worked great, the team in India pulled items out of the queue while the business team in New York was sleeping and completed the items early morning New York time. The testers would verify early in the morning (when India was sleeping) and either put the site back into the development queue or put it into production. Every second day the business team would add approximately 40 more sites into the queue. In the past the business people would do a dump of about 200 or 300 sites at a time. By having so many items in the queue, the team would then have to spend too much time just managing the process and reprioritizing. Sites fell off and got lost. Only by limiting the work in progress and by limiting the queue did we achieve success. In addition, by keeping the queue small we allowed the business to reprioritize and have the developers pull the items through the system on demand.  Our daily meetings were more focused on bottlenecks, process, and throughput, not “what am I working on today.” Since we were well oiled, we could predict how long it would take something to flow though the system.

This was about 3 or 4 years ago and I did not know it at the time, but we were using a primitive form of Kanban. I would not even hear of Kanban as an Agile process until about a year or two later. (By then I had sold this business and was working at Telerik.)

Next, we’ll take a high level look at Kanban.