I did some lo-fi testing of WordPress 3.3 pre-beta on Sunday/Monday with local users, but since in the next set I want to get more multisite users, finding good participants will be tougher. Savannah has an art school, a music scene, and a burgeoning tech culture, so I can’t spend an hour at The Sentient Bean without overhearing someone tell their co-caffeineators about some new WordPress site they’re working on — a flyer posted for half a day gets plenty of individual users.
Multisite superadmins, though, are a bit harder to come by without effort, so I’m going to test out using some fancy web technology called “the skype” + “quicktime recording” as an experiment.
A Bit of Testing History
Once upon a time, usability testing was done in a lab, and was very expensive. Costly software like Morae was used, whole research teams were needed to recruit participants, moderate test sessions, and analyze results, and sometimes things went really high-end and lasers were involved. We’ll call this “formal usability testing.” The testing done in Spring 2008 on WordPress 2.5 and the Crazyhorse prototype was formal, included laser eye-tracking, and led to/strongly influenced the 2.7 admin redesign. Read all about it.
If you couldn’t afford formal testing, lo-fi setups involving a camcorder and a tripod allowed capturing the screen and participant speech, but even with a decent camera, the screen images were never fantastic. Also, unless you set up two cameras, you would miss things like facial expressions (which are hugely informative during observational testing). Although you could set it up anwhere and didn’t have to rent a lab, it was a clunky solution.
It is hard to believe it’s been so long, but I have been doing usability testing –both formal and informal — for more than 12 years now.
In July 2008, Clearleft released their Silverback app, and popularized the concept of ’guerrilla’ usability testing. Armed only with your Mac and their app, you could get screen capture and user video/audio thanks to your Mac’s built-in iSight. This was a game-changer. Though it didn’t have the hardcore analysis features of Morae, simply getting a high fidelity screen cap on your own was huge… and it was only about $50! Suddenly, you didn’t need to rent a big lab, you could use your regular office or even set up shop at the local caffeine vendor. Just stick people in front of your laptop and you were good to go. Suddenly testing costs could be a tenth of what they had been, and guerilla testers cropped up everywhere. By the time 2.7 was ready for testing in Autumn 2008, that’s how I was doing it, too.
I had grand plans to introduce a distributed testing model to create an avenue for usability testing professionals to have a way of contributing to WordPress that would be analogous to writing patches or helping in the forums. I put up a post about it, corresponded with potential volunteers, and tried to work out the logistics. Having each volunteer running Silverback and then uploading their videos to a central repository for analysis was the plan, but infrastructure was a problem in two ways:
- How would we match volunteer test moderators/usability professionals with volunteer participants? Well, there were some more grand plans around building a volunteer database tied to the .org profiles, but when the person who was working on it left, that basically sputtered, so finding participants remained a normal logistical nightmare.
- It would mean only Mac users could conduct testing, and non-Mac users would be using an unfamiliar OS during the test.
- We didn’t have a good way to collect the session videos.
So that plan never really got off the ground.
Skip to Today
Guerilla testing is still going strong, and if the number of people proposing sessions about it at SXSW is anything to go on, the fact that it’s been more than 3 years since Silverback was introduced hasn’t made it any less exciting a concept. That said, it’s still not a perfect solution. Let’s look at pros and cons…
- Pro: Output video content and quality on par with tools like Morae
- Con: Lacking in the post-test analytic tools of Morae.
- Pro: Works on Macs! Morae is still PC only.
- Con: Works on Macs! 3 years later, Silverback is still Mac only, excluding all our PC users from participating in tests in a familiar environment, and preventing non-Mac usability pros from getting involved this way.
- Pro: Much cheaper than buying Morae.
- Con: If the plan is to have many people doing testing all over, each moderator needs a copy of Silverback. While markedly less expensive than copies of pro software like Morae, adding up all the copies of Silverback would wind up being more than one person or team running Morae.
- Pro: Can conduct testing in convenient locations such as an office, a coffee shop, etc. or travel to participant location. Great flexibility.
- Con: Limited to in-person-only testing unless test participants download and install Silverback on their machines. We don’t want to require people to install software (even a free 30-day trial version) just to be a volunteer test participant, so anywhere we want to test with real users, we would need volunteer moderators with the software to be co-located.
- Con: Non-realistic experience when using a moderator-provided laptop during in-person tests. Real-world things influence how a user interacts in the browser with any web app — saved passwords, form auto-fill, chat windows popping up, email notifications, torrent downloads of Doctor Who hogging your bandwidth, screaming kids in the background, a ringing phone, you name it. To get a more realistic picture of usage/behavior, the researcher always prefers the setup that is as close to normal as possible. Using someone else’s laptop just doesn’t cut it.* And just as Morae limited us to PCs, Silverback is limited to Macs, but we want to test with users of both platforms. (And Linux! Don’t forget Linux!**)
So while guerilla testing with Silverback is cheaper/and more flexible than hiring a lab, and provides you with deeper observational information than an online service like usertesting.com, you’re still limited to doing testing in a place where you can have both a moderator and participants, which makes the participant pool quite limited, and not representative of the WordPress user base as a whole.
With the advent of Skype screensharing and Quicktime screen recording, we may finally be getting to a point where we can bridge that gap and include participants who are not local to the moderators — where we can get good recordings and can let participants use their own machines without significant cost or requiring unfamiliar software downloads. The drawback to this is that if you’re using skype to screenshare, you can’t also continue the video chat talking head.*** Still, observing real-time usage in a far-away participant’s own environment while being able to ask questions is a big step forward. So, I’m trying it out. We’ll see how it goes… Skype screensharing can get jumpy or laggy if you’re not on a reliably blazing internet connection.
If you are someone who uses WordPress multisite as a superadmin and would like to be a guinea pig to help me work out the kinks of this method (and get a look at some new UI stuff we’re considering at the same time), leave a note in the comments and I’ll get in touch over the next couple of days. Thanks!
When we get into Beta, assuming the trial run of this goes okay, I’m hoping to try and revive the distributed testing idea, so if you are a professional usability tester and would like to get involved with that in a couple of weeks, a note in the comments will get you added to the email list for when that’s ready to try again.
All that said, why do we do testing? It’s time-consuming and expensive, even when the software is cheap or free. Most agencies do it to show their clients that the design decisions they made were good ones. Most companies do it to find out what problems customers/users have with their products. With WordPress, we don’t so much have a “client” and our users tell us straight up in the forums what things cause them problems. So why should we do testing at all?
- Define benchmarks. One of the things we’re always trying to improve with WordPress is making it faster. How long it takes to complete various tasks in wp-admin is one benchmark of performance and usability that can be measured. Testing can provide a sample data pool with these stats.
- Test assumptions. With so many people weighing in on every design decision for WordPress, sometimes we have to forge ahead in what we think is the best direction despite siren songs from contributors who would prefer a different UI approach to something. Design by committee, camel, etc. That said, when’s there’s more than one UI idea or suggestion that seems reasonable for a given task, we don’t want to cling dogmatically to the status quo, either. Testing a few different design approaches allows us to see which designs people seem to respond to better.
So, could we live without testing? Sure. But do we want to? No. Seeing our work being used by regular people — not code contributors, not designers, not WordPress insiders with a vested interest in the choices made in the UI — keeps us humble. I challenge you to watch someone who’s not especially web-savvy figure out how to embed a video the first time. Change the tagline. Create a custem menu. I think the perpetual praise of WordPress as the most user-friendly platform for blogging and content management is justified, and we’ve worked hard to earn that reputation, but everyone on the core team is painfully aware of how much further we still need to go. Good testing can help us get there faster.
P.S. I haven’t even touched on testing with people of different abilities, different languages, etc., but that all needs to get more attention as well.
*And also means the researcher will have to re-arrange all her keys to be QWERTY for the sake of testing, then swap back to Dvorak afterward. Hmph.
** We usually forget Linux.
*** If we wanted to do another Mac-only thing, people with iPhones could use FaceTime for the talking head part while Skype was in screen share mode.