The Spatial Translation Framework
Spatial computing is going mainstream, with revolutionary mixed reality devices coming from Apple (Vision Pro) and Meta (Quest 3). Product teams will want to bring their experiences to these new environments, and as they do will be faced with the following challenges:
When should we adapt existing smartphone and laptop experiences to headsets?
When we’ve decided to adapt an experience, what specifically should we “spatially translate” onto a headset?
Should we consider building a system that uses BOTH smartphone/laptop and headset for inputs and outputs?
The Spatial Translation Framework can help us decide:
When to adapt existing experiences to headsets.
What to build (or keep) on more traditional 2D systems (e.g., laptops and smartphones), versus in a spatial computing system (e.g., mixed reality headset).
How to leverage interoperability between these systems for a more seamless user experience.
Case Study
Let’s apply the Spatial Translation Framework to an example: in Introducing Apple Vision Pro (AVP), the user can connect to their MacBook simply by looking at it, “turning a 13-inch screen into a giant display”. Using direct manipulation, they position the screen and further enlarge it to their liking.
In addition to service as a shining example of interoperability between a laptop and spatial computing system generating user value, we can see how we can use the Spatial Translation Framework to lead to this positive user experience.
User goal: The user wants to see the image in front of them on a larger screen. In the video, it appears they are editing a photo.
Tasks: The key beats in this user journey are likely: open the app, load a photo, edit the photo, and close the app. I’ll focus on the first three, as those are central in getting the user to their goal.
Map tasks to the strengths and weaknesses of the current display (laptop) versus the AVP:
To open the app and load a photo, you’re likely clicking a series relatively small buttons (aka “targets” - I’m looking at you, fellow Fitts’s Law HCI nerds) in a familiar interface - all things we do well on a laptop.
For editing the photo, you’re likely performing precise inputs on your laptop, but perhaps you want to see how those edits look at scale - or perhaps even in the context of your physical space. In these conditions, editing would become a more immersive act than using your laptop, assuming sufficient display resolution (which the AVP appears to have). This is a step in the user journey where the spatial computing system offers strengths over the laptop display.
We can extend this example to the same user video calling a collaborator to work on the photo edits together.
User goal: Have a more realistic, immersive call while going through photo edits.
Tasks: The key beats in this end-user journey are likely: open the video calling app, find the person you want to call, call the person, share your screen, have a conversation, and end the call. I’ll focus on the first four, as those are central in getting the user to their goal.
Map tasks to the strengths and weaknesses of the current display (laptop) versus the AVP:
To start a video call, find the person you want to call, and share your screen, you’re again likely clicking a series of relatively small buttons in a familiar interface - all things we do well on a laptop.
Having a 2D video conversation is our approximation of a real-world conversation. Except, the IRL conversation would usually involve rich embodied cognition, moving your hands as you speak, and seeing the person across from you in all their volumetric, 3D glory. Instead, you see them through a small 2D frame. Seems like an area for improvement.
What we learned
Going through the steps of goal —> task analysis —> ‘mapping tasks to the current systems’ strengths and weaknesses’, we gained a clearer signal about the opportunity areas along the user journey for the AVP over the laptop (i.e., a more immersive editing process, and the “having the conversation” part of the video call).
Do these opportunities mean we should redesign the editing and video call experience to only live in AVP? Probably not.
In addition to the reality that user probably isn’t throwing away all non-AVP devices upon its purchase, editing a photo or initiating a call on a laptop generally leverages the laptop interface’s strengths, like precise input and familiar interaction patterns.
While we acknowledge that the user might also someday want to have a video call entirely in-headset (and the interface will need to evolve relative to what we currently see on laptops or phones, not to mention because of AI advancements), we can meet the user where they’re at by:
Examining what parts of the user experience work well on the current interface (i.e., leverage its strengths) versus not, to inform when to spatially translate from laptop (or smartphone) to spatial computing system.
Considering how you can leverage interoperability between systems for a smoother user experience (i.e., glancing at the laptop display while in-headset, and then using your hand to adjust and enlarge the screen).
What’s all this about system strengths and weaknesses?
When comparing current systems (like smartphones or laptops) versus spatial computing systems, we see one’s strength is the other’s weakness. Understanding these complementary strengths and weaknesses are key to the Spatial Translation, because we can map tasks to them, to determine if they’re better served by current systems or spatial computing systems.
Three strengths of both smartphones and laptops - aka weaknesses of spatial computing headsets:
Precise input: These devices afford pixel-perfect control, be it through use of touch interfaces, or accessory devices like a mouse, keyboard or stylus. Direct manipulation in spatial computing devices, while rapidly improving, aren’t yet at this level. Similarly, the prospect of only using controllers to design a 2D logo for a traditional 2D screen is unappealing, given the relative lack of control compared to a mouse.
You could counterargue that we can connect accessory devices (like a traditional game controller) to a spatial computing headset and gain that precision. In those instances, I’d ask what are the benefits of playing that game in a headset relative to on a big screen? Maybe you can’t fit a big screen into your physical environment. Maybe you’re doing something else in the headset before and after the game, so it would be more friction than not to remove it in between. This gets into limitations of the framework - see task analysis complexity and cognitive load from task switching.
Familiar interaction patterns: People generally have existing mental models for interacting with these devices (not to mention, designers have established patterns for designing for these devices). Therefore, they tend to be more intuitive than a spatial computing app, where these interaction patterns are still crystallizing.
High-resolution display: Both devices offer high-res screens that are usable under most lighting conditions. Compare this to most spatial computing display systems, where the interaction between display brightness, color and saturation of content, and the user’s setting can make or break an experience.
Three strengths of spatial computing headsets - aka weaknesses of smartphones and laptops:
Spatial input: If we need to manipulate the depth of digital objects (e.g., moving a cube behind another cube), it is more intuitive to do this in a spatial computing device relative to a phone or laptop. We’ve been designing in “z-space” our entire lives - everything we interact with IRL is a 3D object. Therefore, when I need to interact with a 3D thing on my flat laptop screen using various abstractions of what I’d usually just use my hands to do (see “pitch” “yaw” and “roll”), it’s unintuitive and increases cognitive load.
Situational awareness: By virtue of having a spatial computing system on your head, you can now see digital objects overlaid on the physical world in context - all without looking down at a device. Of course, if you put a bunch of digital stuff in your user’s field of view or otherwise bombard them with media in a headset (see Keiichi Matsuda’s now-classic Hyperreality), your user is gonna have a bad time. However, barring that, spatial computing headsets generally allow people to keep better tabs on their physical environment while interacting with digital content, relative to being hunched over their phones or laptops.
Embodied interactions: Related to spatial input, spatial computing headsets allow us to move our physical bodies in the environment in which we’re immersed. This is a key component that gives rise to human cognition IRL (see embodied cognition), and is much easier to achieve in a headset than holding your phone or typing on your laptop.
Easier said than done
Spatial translation challenges are non-trivial, and there are still several limitations to the framework, including:
The strengths/weaknesses rabbit hole: Smartphones and laptops themselves are different form factors, with their own unique strengths and weaknesses (e.g., mobility of a smartphone versus compute power of a laptop). Similarly, there are differences across spatial computing systems. For instance, a head-mounted display form factor versus glasses form factor will have different strengths and weakness when it comes to field of view. In short, we can go several levels deeper with our strength/weakness mapping to precisely focus on strengths and weaknesses of a specific system/device. This adds time and complexity.
But I still want to try the Spatial Computing Framework! How do I address this when I apply it? Take inventory of the different systems (aka devices, surfaces etc) you’ll be designing for - smartphone, tablet, laptop and so on. Workshop with your team on the strengths versus weaknesses of each system, versus the new spatial computing system you’re designing for. Map those strengths and weaknesses to your user’s tasks in step 3 of the Framework.
Task analysis complexity: What if your user’s goal requires a lot of tasks that aren’t executed in a neat, linear order?
How to address this on your team? Start by prioritizing the most important parts of the user journey, informed by data from design research and analytics.
Design for interoperability, but don’t forget redundancy: Even if a user owns both a phone/laptop and headset, sometimes they won’t have both nearby. If you’ve say, designed an interface to use a phone as an input for the headset, ensure you also have a way for the user to enter than input in the headset - ideally in a way that suits the form factor (e.g., voice instead of having to type something out).
How to address this on your team? For interoperability touchpoints, ask: what does the user experience look like if the user only has one device - instead of both - handy?
Summary
Going back to our original challenges…
When should we adapt existing smartphone and laptop experiences to headsets?
We can use the Spatial Translation Framework to help guide which experiences or features we should consider keeping on a smartphone or laptop, versus porting over to a spatial computing headset.
When we’ve decided to adapt an experience, what specifically should we “spatially translate” onto a headset?
Focus on the task analysis in the Spatial Computing Framework to consider certain when features or interactions maximize the strengths of a spatial computing headset. Use the list of strengths earlier in this post as a guide.
Should we consider building a system that uses BOTH smartphone/laptop and headset for inputs and outputs?
Yes. Our case study about the website logo design shows the potential power of interoperability between existing devices and new spatial computing devices, also shown in the Apple Vision Pro announcement video. While I’m all in on spatial computing, phones and laptops aren’t going to disappear overnight, and these devices are best leveraged as an ecosystem, with the user leveraging the best device for the task at hand.
Do you have an iPad or mobile app that you’re thinking of bringing to the Vision Pro? We’d love to talk to you! Reach out at hello@sendfull.com