“Look Ma, No Keys!” or, The Future of Picture Editing
As we in the post-production community look forward to the release of Final Cut Pro next week, many have speculated how Apple will move past traditional editing paradigms and create an app that “skates where the puck will be”, to quote Mark Raudonis use Wayne Gretzky’s famous phrase at the pre-NAB Editor’s Lounge. It’s true that much of the current iterations of Final Cut Pro, Avid Media Composer, and Adobe Premiere are based on old metaphors, and if technology today has the potential for much more, why not use it? I don’t know what Apple has come up with, and I don’t expect it to be anything as physically-based as this, but regardless– here’s some ideas I’ve had, originally written in early 2010:
In recent years, science has proven that the human brain effectively sees tools as an extension of the body. When a carpenter picks up a hammer, or a musician his guitar, the brain temporarily “absorbs” that object into the brain’s mental projection of one’s self, and allows one to use it as naturally as if it were a body part. Of course, such integration is dependent on the difficulty of the device and the skill of its user; just as a pen is a basic, universal tool that requires little to master, a complex surgical instrument undoubtedly carries a steeper learning curve. Slightly more difficult to place in context are computer-based tools, as there is no “real” correlation to the physical user input. One presses keys and clicks a mouse, but the resulting virtualaction can be anything from sending an email to drawing an illustration. Can the brain still “absorb” these objects when the results are not only variable, but virtual as well?
Film editing has undergone a few major paradigm shifts since it’s inception with glue and splicers (and in-camera editing before that), but the greatest of them came in the late 80s with the mainstream acceptance of non-linear editing. This transition provided editors with a plethora of new tools that made their work more efficient and their means of accomplishing it more capable (one must only look to the latest versions of modern NLEs to see the depth of their potential). But the most revolutionary of these advancements, inarguably, was the random access of footage. Most artists (and I think it fair to categorize editors as artists of their craft) would agree that the translation of the mind’s eye to reality is one of the most difficult parts of the job; when an editor is struck with the inspiration to use a specific take, it’s important to find that clip as soon as possible. Random access provides this: the reaction shot that may have taken hours to to locate in the days of the K.E.M. can be found in seconds today, and (with the advent of widespread metadata use) in any number of ways. Truly, non-linearity gave new definition to the concept of editing, and opened the doors to new waves of experimentation and technique.
And yet, despite such modern wonders as Avid Media Access and the Mercury Playback Engine, modern NLEs remain fundamentally unchanged from their decades-old origins. You find your clip in a browser, trim it to the desired length, and edit it into a timeline, all with a combination of keys and mouse (or, if you prefer, a pen tablet). But is this process really as physically intuitive as it could be? Is it really an integrable body part in the mind’s eye, allowing the editor to work the way he thinks? Though I can only speak for myself, with my limited years of editing experience, I believe the answer is a resounding “no”. In his now famous lecture-turned-essay In the Blink of an Eye, Walter Murch postulates that in a far-flung future, filmmakers might have the ability to “think” their movies into existence: a “black box” that reads one’s brainwaves and generates the resulting photo-realistic film. I think the science community agrees that such a technology is a long way off. But what untilthen? What I intend to outline here is my thoughts on just that; a delineation of my own ideal picture-editing tools, based on technologies that either currently exist, or are on the drawing board, and which could be implemented in the manner I’d like them to be. Of course, the industry didn’t get from the one-task, one-purpose Moviola to the 2,000 page user manual for Final Cut Pro for no reason. What I’m proposing is not a replacement for these applications as a whole, just the basic cutting process; a chance for the editor to work with the simplicity and natural intuitiveness that film editors once knew, and with the efficiency and potential that modern technology offers.
1. Main Interface
Rear-projected; essentially combines browser, viewer/source monitor and timeline. The “four up” display (four used for illustrative purposes, any number is possible) shows all setups/shots for a given point in the script simultaneously, as if it were shot multi-camera. It can also reconfigure to displaying all takes for a given shot in the same manner. While one can still play, pause and jog, this allows the rough-cut process to be theoretically real-time; the clips are pre-assembled based on the script (the computer finding and matching text to audio), and once playing, the user is free to see all options available to him at all times and make immediate cuts on instinct. Of course, a loop mode is also an option, and the user can move to the timeline, directly below the clip display, for further edit point refinement, or clip rearrangement.
Foundational tech: Avid Technology’s ScriptSync, multicam interfaces of modern NLEs
Displays final version of edit. Like modern NLEs, this monitor displays the current version of the edit. With the proliferation of 4K and even 8K image acquisition, one can also use the canvas to perform “pinch-and-zoom” resizing and reframing of the image, even adding subtle pans and zooms at the same time, with no loss of quality in the final output. Additionally, the monitor can be configured to display at the size of the destination format– if one is editing for TV, the monitor becomes the average size of a consumer television, and likewise for mobile devices or the web. Work destined for theatrical release is scaled as large as possible.
Foundational tech: Apple Inc’s touch-surface device gestures
Displays user created and automatically generated information. A virtually unlimited storage area for the user’s notes, clip/lens metadata, and any other media such as selected stills, audio, or other clips that the user chooses to associate with the currently displaying clip. While things like metadata display only as long as the clip is active by default, these items can be assigned to any length of time, from a few frames to whole scenes. Notes can also be recorded in real time simply by speaking them. Also included is automatically generated data like face recognition, shot type detection, and others (see Philip H0dgetts’ discussion on Inferred and Derived metadata).
Foundational tech: Adobe Systems’ speech transcription
4. Gesture Identification Camera(s)
Interprets and translates hand movements. The core technology of the interface that allows all these actions to be performed in real time with only simple hand movements. Basic commands like play and pause are accordingly simple, requiring only to raise or lower the hand, while shot/take selection, trimming, timeline scrolling, and other functions are controlled by the number and placement of fingers. For example, a two-finger swipe performs a ripple trim, three fingers performs a roll trim, four allows one to slip, and five to slide. These commands are of course user-customizable.
Foundational tech: John Underkoffler’s hand-based UI (Oblong)
5. Other Features
Audio. The quality of a clip’s audio, and that of the clips around it, can be a subconscious and unwanted influence on editing. Since the computer is assembling all shots/takes simultaneously, it can analyze the signal-to-noise ratio of all clips and choose the best one to play while editing (or the one that best matches the others). Alternatively, this can be disabled so the user can manually select the audio take with the desired delivery or inflections. With time-stretching (the ability to slow down audio but retain the pitch), any audio take can be applied to any video take without sync issues.
Mismatching takes. Not all takes are created equal, and a slower take full of pauses might seem to present a problem to a system that relies on synchronicity. The solution depends on the variance between clips; if the differences are minor enough, an algorithm is applied that compensates by subtly adjusting the speed in between lines (using optical flow to mask the effect, which is of course removed when a take is chosen). For takes with greater differences, the user is presented with a choice before proceeding (including the option to “test drive” a take by playing it with what’s been chosen so far).
You may at this point be thinking “All this can do is make cuts! Even iMovie can do more that!”; if this is the case, I suggest you continue editing the way are comfortable and forget the contents of this essay (you may also be thinking “this looks like Minority Report”, and in that respect, I cannot deny some inspiration). It is, after all, a personal exercise, and one person’s ideal editing interface may be useless to another; it doesn’t necessarily make either of them better or worse editors. And as stated, it is by no means intended as a full replacement for a modern NLE; it is simply a better (again, in my opinion) way to perform basic cutting. And it is by no means a system with nothing to be desired; returning to the treasure-trove of editing theory that is Walter Murch, one finds an alternative view to random access being a godsend. Linear systems, he submits, require the operator to see other footage on the current film roll before arriving at one’s destination, and that this “forced interaction” with the footage can spawn new ideas and perspectives on where one thought one wanted to go. My interface, while perhaps better than other NLEs as it displays multiple possibilities for a given moment at once, still does not completely restore this lost advantage to linear editing.
Finally, what are the odds of such a system being adopted in the mainstream? I believe there are three major obstacles that would have to be overcome for this to take place; the first, and surprisingly least problematic, is the technology itself. At the TED 2010 conference last year, John Underkoffler demoed the latest developments in his “g-speak” system, including an example of effortlessly playing, pausing, and jogging video– and he claims that the work his company Oblong is doing has the potential to be distributed as consumer in as little as five years. The other technologies that I’ve referenced, from companies like Apple, Avid and Adobe, all currently exist and would not be theoretically difficult to implement in the fashion required. The second barrier is, as one might expect, cost. The issue here is not so much per-system expense, but rather that post houses and studios, with multiple edit suites already installed, would have to buy into entirely new infrastructures– no matter how low the per-system cost, such expenses could quickly add up (expenses that, to executives, may seem unnecessary). But cynical as it may be, I think the biggest issue is the reluctance of editors themselves to learn a new platform. It’s not entirely unreasonable; you get comfortable with a certain way of doing things, as convoluted as that way may be, and it seems easier to stick with it than starting again a different way. It’s simply human nature, and the longer one has been working a certain way, the more difficult it is to break.
I believe the solution to these issues is simple integration; rather than attempting to create a standalone system that will inevitably fall short of what full-featured NLEs can accomplish, why not just create a front-end to these programs that can be turned on or off at will? One can edit with the hand-control interface until the basic picture edit is complete, or until a more advanced tool is needed, and then export an EDL to their Avids to continue the job. Assuming such a front-end was also platform-agnostic, this would also save the bulk of the cost for studios and post houses; it could even become a tool in educational settings, where budding editors could not only learn the fundamentals of their craft without the burden of a learning curve, but also come out of it with the basic skills for an editing job, regardless of the platform the client uses.
Of course, all of this is in some ways moot; as Murch wisely cautions “technology is hardly ever the final determining factor in questions of speed vs. creativity.”. If this interface or some derivation of it ever comes into existence, it will not make a good editor out of a bad one, nor even a great editor out of a good one. It will not make a better cut than Final Cut, or iMovie, or a Steenbeck. But I do believe it could offer a better experience to its user than can be achieved with those tools, and out of this experience is the potential for a better film, closer to how the mind of the editor, and by extension the audience, thinks.