How to make video user experience not suck
I've spend much of my career making products that publish content or try to solve problems relating to publishing content. I was preparing a new talk for a UX Conference, and as is normal for me this involved standing, staring into mid-distance, wondering what I was going to talk about. I wanted to identify a problem and discuss a solution. Three or four drafts in - and a week later - I had a talk completed. As it turned out, I really wanted to discuss one issue that I felt had never been properly solved - editorialising video and by association consuming editorial video. As part of the talk, I'd mocked up a really basic slide to explain the problem and what a solution might be. The problem it turned out was pretty simple: videos are too long and have no context - if you want to show me a clip of a single joke in a comedy show, you'll probably link to the entire comedy show via YouTube (unless someone has edited it for you) and tell me to 'check out the funny bit at 2.33'. So to boil that down to a user story: 'The user should be able to see a section of a video in an embed, understand it's context, and before watching it, decide if they want to actually watch it.' Here's that (really basic) slide:
So for some context here; what I've mocked up is (from the top down) a HTML video component, underneath that - instead of a scrub bar - I've placed a SoundCloud music player with user comments, then below that - GIF modules from Buzzfeed which play scenes of the video, which in turn underneath them have editorial space, comments and share options in the style of Facebook. So the demo ran a bit like this - watch a GIF of a scene from the video which has some copy around it telling you what the context is, then if you tap it that scene of the video loads in the main HTML video module. It's a quick and dirty solution to a problem, but it was designed to show how problems can be solved with existing ideas and technology - that taking an element which SoundCloud uses for content navigation, out of context and repurposed - can be used differently while still keeping the elements core UX or USP intact. The feedback I got was surprising (considering how basic the slide was) - people loved this idea and many had opinions on how it could be implemented and other user problems it could solve. A successful talk.
The thing is - this idea didn't go away after the talk. I became slightly obsessed by it. The idea was kind-of solving the problem, but wasn't solving it in an elegant way, and this was itching something in me. It was too clunky. To busy. It was, in too many ways, pulling focus away from the content I was originally trying to uncover - so while it solved some problems, it actually generated more problems. I ended up working on it for a month around other work. I'd walk away and then come back to it. I'd apply knowledge I'd picked up on other products I'd worked on to it. I'd refactor the whole thing only to start again. I used Design Sprints to try and break out alternatives. I Gamestormed it from concept to business to find other perspectives. I made people who didn't care about my random obsession give feedback (and I bought a lot of people drinks until they did care). I was using guerrilla UX tactics to prototype and iterate - asking people their thoughts and almost pitching them it as if they were going to invest.
At the end of the month I felt I had something. Well - I'd found a solution anyway. Maybe not a final draft. Maybe not an MVP. But a solution. It desperately needed engineering minds to collaborate on it to move it forward, but with that cost insurmountable at the time, work pilling up on my desk and the itch scratched...it stalled. It became clear this was a product with more complexity that the engineers I knew could budget for. As an element of closure, I called it RPPLE - named after the process of 'ripple editing' I used to use when I edited video and After Effects promos. Also it felt nice to have an old Final Cut Pro reference in there somewhere. It was branded and sadly abandoned. But the learning, research and discipline I got from the whole process - I still use that constantly. Anyway, here is that solution.
The RPPLE concept
To solve the user story 'The user should be able to see a section of a video in an embed, understand it's context, and before watching it, decide if they want to actually watch it.' I had to look at the point of embed first. This player would need to be part of a distributed media strategy, so it would need to work within a Tumblr post and the Tumblr video embed, but also link back to a main core player experience. In the same way, that main player would have to export elements of itself into embeds with a distributed media strategy, and at the same time allow for revenue generation through standard metric driven advertising formats such as pre-roll and post-roll. The embed is a single scene that has been shared, except rather than it be a full video without context (or a full video with a defined start point in the timecode) it would be context driven: a headline and sell summarising why the video scene should be viewed, with a looped GIF of the scene being played - much like a GIPHY preview. This would then be used to draw the user into experiencing the main player and full video. The key to the embed however was not to generate a defined 'media card' style initially (which would involve a lot of support) but rather it would 'cuckoo' itself - adopting the style of whatever network default it found itself in. The next stage with embeds would be to define a more refined card style once user testing across multiple networks had been completed and evaluated.
The RPPLE player
So this is the basic solution at a 1280px breakpoint. The player is completely responsive, but I'm talking about breakpoints for ease of presentation (I find wireframing initial RWD solutions easier to plan and communicate using 4-5 breakpoints) . The RPPLE player is made up of some consistent elements, regardless of it's position or size. In hierarchical order these are:
A user avatar and <h1> global title
The main HTML5 media player component
A scene based navigation, displaying edits and basic heat mapping
An editorial 'carousel' that editorialises the scene selected with <h2> and <p> tags
A share button that exports the scene as a GIF for distributed media embedding
The user avatar displays the user who generated the video (more on that in a moment), and the title and media player component are self explanatory. There are two elements here however which require some clarification.
The 'scrub bar' has been replaced with a selection of scenes shown in a ripple effect. Each scene shows it's length in time relevant to it's width in size. The wider the scene, the longer it is. These scenes are made of metric subdivisions of the total length of the video: 30 seconds, 20 seconds, 10 seconds, 5 seconds and 3 seconds. These metric subdivisions are there for a reason - 30 seconds is the maximum length of an Instagram video. 5 seconds under the length of a Vine (6 seconds) but with space for an end frame (if required for advertisers). 3 seconds is the minimum for Instagram. Additional to the width there is colour. The colours range from green to yellow to orange to red with green being the scene with the lowest metric of engagement and red the scene with the highest. This performs a few actions - the user can immediately see in a video which element of it everyone is talking about and sharing - it becomes a heat map of engagement. The producer of the video can see a breakdown of their edit and which part of the narrative gained the most engagement (almost as a direct feedback tool) to aid them in making better edits in the future. It also however shows the narrative of the video in a bare, visible metric. If this was a video of the new Star Wars trailer, the red section is going to be the really cool bit - the bit everyone is watching - but the orange bit beyond it is also gaining traction. This means that the user is more likely to browse through the video rather than just watch the 'bit everyone is talking about'. So it aids discovery through colour, it aids discovery as the video is broken down into scenes which are easily digestible. But it also aids navigation - scrub bars are notoriously fiddly on mobile, and these 'scenes' allow the user to simply tap on them to view them, knowing approximately what the engagement is on the scene - but also it's approximate length. It's hiding nothing from the user - it's transparent. Well - transparent except for context.
The carousel provides this context. As the user plays the video through, or jumps to a specific scene, so to does the carousel below it. The carousel contains an <h2> headline and a
<p> body copy option which are completed by the video editor. The <h2> and <p> have - as shown here - character maximums. These are to allow the scene descriptions to accompany the video scene onto any social network regardless of minimums. 200 characters? This is a good amount to boil down what the scene is. 70 characters? The perfect length to form the basis of a tweet or Reddit post - and still give the user room to add some emojis. The carousel is also a navigation however - a user can scroll through the carousel looking for content in the same way they can use the scene selector, and they both move in sync through a simple flowing animation process. Hitting share in the carousel opens up modal dialogues for a variety of video enabled network APIs, and the resulting export is a GIF of the scene with the appropriate context copy depending on the character limits enforced by the API.
Editing a RPPLE
So they're easy to navigate and to share...but obviously they have to be created in the first place. For the end user, the experience is great - 30, 10 seconds chunks of content, easy to view and share. But what about the user trying to create these 30, 10 second chunks of content? And what if they have a two minute video? How do they consistently edit and cut varied chunks of content into a rippled edit? It's a tough ask. So I had to build an editor.
The trouble with online editors are they're a bit 'meh'. They give you too many or not enough options. And they're really expensive to run too - hosting video, streaming it to an editing portal and rendering the result. So what if you can do the upload somewhere else? What if you have video elsewhere already being hosted?
The uploader allows you to upload video - of course - but also pull it from somewhere else. Youtube, Vimeo - even Dropbox. All you need is a URL (and the copyright to do so of course). This allows a user to remix existing videos and distribute them in a different way. Once uploaded or linked, the user can then edit them.
The editor is pretty slick. At the top you add your title and set your video to public or private - and we assign you a URL. Assigning the URL at this point is useful - it shows the video is uploaded (not in some partial storage or cache) and allows you to return to this page if you need to (say your browser crashed). The URL follows the video through the edit process through to completion.
The user can edit the video by clicking on the scene selector, which is all green and single width. By clicking they immediately create an edit with the option to choose the duration. A duration is chosen (in this instance it looks like 30 seconds is the right one) and the user fills in the header <h2> and body <p> copy. Then, going down the hierarchy the user can add comments, voting elements and metrics. Then they can choose an optional end board - for non-looped video this will be the bit that is freeze-framed at the end of the clip. For looped video this will be the pause before it restarts. This would allow you to add a note, small advert, or credit to the video.
The user can go back and change edits by clicking on the scene and changing the length, editorial copy, settings or end board, with the carousel below the scene selector scrolling to interact with the user interaction above. It replicates the process of watching the video - the two processes are purposefully similar. Editing the video should be as easy as watching the video which should be as easy as sharing the video.
In Summary
The player has many variants - voting modules, nested comment styles, proposed media cards, audio options...but these are all subject to any user testing. There are mobile wireframes. Flinto prototypes. But these are all 'nice to have' additions to this core player - in that the core player solves a problem. Hopefully one day I'll be able to return to it and make it better, refactor it again - and maybe finally release a working version of it.