|

Deep Dive into Manipulation on visionOS

Taking the next step into creating a native input system for Apple Vision Pro.

Background

When Apple unveiled Apple Vision Pro at WWDC 2023 they gave us a glimpse of how input would work. Eye tracking + simple hand gestures like pinch would stand as the foundation of input on visionOS. Developers could extend existing SwiftUI gestures to make 3D content interactive and playful. Tap Gesture was the first stop for most. Developers then learned how to adapt drag, magnify, and rotation gestures to work with RealityKit. We could even combine gestures together to create complex interactions.

To learn more about using SwiftUI gestures in RealityKit, see our full series on the Learn visionOS page.

With these tools, developers could build interactions that covered many use cases, while supporting windows, volumes, and spaces. The downside soon became apparent. With so many ways to combine and tailor system gestures, developers quickly came up with a ton of different approaches. Users often had to learn how to use gestures on a per-app basis.

Eye tracking + pinch to tap was the first native input paradigm on Apple Vision Pro. Manipulation feels like the second. Available as a RealityKit Component and a SwiftUI modifier, Manipulation standardizes several complex interactions. Most of this article will focus on the RealityKit side of things, but we’ll have some SwiftUI examples as well.

Starting with visionOS 26, we can add Manipulation Component to any entity.

entity.components.set(ManipulationComponent())

We get a lot of functionality with this line of code.

  • Pinch and hold to start moving an entity.
  • Movement telescopes along the users z-axis–meaning they can easily pick up something far away and quickly draw it close.
  • Users can rotate an entity simply by rotating their hand.
  • Users can hand off the entity from one hand to the other.
  • Using two hands at the same time
    • users can scale the entity by changing the distance between their hands
    • users can rotate an entity around an axis by moving the hands relative to one another
  • visionOS provides subtle audio cues to the user
  • When the user releases their hand(s) the entity will return to the original position, orientation, and scale.

Does that sounds a lot like drag, magnify, and rotation gestures? This component combines those and more. To get the same functionality with system gestures we would need to do something like:

  • Cache the transform of the entity before interaction
  • Create drag, magnify, and rotation gestures, adding gesture data to the initial transform
  • Combine them together using sequenced or simultaneous gestures
  • Target the gesture to an entity, component type, or all entities
  • Clean up the cached transform state on gesture end
  • Call the gesture on the RealityView

All of this, plus the fact that our code for interacting with entities exists outside of the scope of RealityKit. It works, but now we have something better.

Aside: the system gestures are still supported and may often be the best solution to a problem. For example, if all you need is dragging, then a Drag Gesture is still a good idea.

Manipulation also feels and behaves in a way that most users will find more intuitive. For example, consider a simple Drag Gesture. A user pinches and holds to start dragging an entity. Things look good until the user decides to turn and face another direction during the drag. Unless the developer has done their homework, most of the time the drag gesture movements will be relative to a direction the user is no longer facing. This is confusing at best. Manipulation solves this complex problem with no work at all from us.

Read more about complex dragging on the Apple Develop Forum.

Get Started with Manipulation Component

We saw how to use the default implementation of the component above. We also have a handful of behaviors we can customize.

  • releaseBehavior: reset or stay at the new transform
  • dynamics.translationBehavior lets us specify if the gesture should move the entity
  • dynamics.primaryRotationBehavior lets us specify if the entity rotate with the dragging hand
  • dynamics.secondaryRotationBehavior lets us specify if the entity should rotate with two hands
  • dynamics.scalingBehavior lets us specify if the gesture should scale the entity
  • dynamics.inertia lets us customize the amount of inertia, related to the target’s mass

For example if we want the entity to stay at a new location when releasing the gesture:

let subject = Entity()
var mc = ManipulationComponent()
mc.releaseBehavior = .stay // instead of the default .reset
subject.components.set(mc)

Or say we want to disable the single-hand rotation feature

let subject = Entity()
var mc = ManipulationComponent()
mc.dynamics.primaryRotationBehavior = .none
subject.components.set(mc)

Just like when using system gestures, we need to make sure our entities are interactive. We can do this two ways.

  1. Assign Input Target and Collision Components to each interactive entity. Hover effect is not required, but suggested.
  2. Use ManipulationComponent.configureEntity to set up an entity

It can often be easier to add Collision Components in Reality Composer Pro, where the shapes are calculated for us automatically. We can add the components in code too.

let subject = Entity()
subject.components.set(CollisionComponent(shapes: [.generateSphere(radius: 0.1)]))
subject.components.set(InputTargetComponent())
subject.components.set(HoverEffectComponent())

We can do the same thing with configureEntity. This will set up the subject with a small sphere collision. It will assign the Input Target Component and the default Hover Effect.

let subject = Entity()
ManipulationComponent.configureEntity(subject, collisionShapes: [.generateSphere(radius: 0.1)])

These are both valid ways to set up an entity for manipulation.

Tip: we can use InputType on InputTargetComponent to specify if our entity is interactive with direct (touch) indirect (remote) or both.

Read Getting started with Manipulation Component for a full breakdown. You can try it yourself by downloading the project and running Example 087.

Manipulation Events

There are several Manipulation Events we can use. We can execute code on begin, handoff, and end. We can even run code for each change to the entity transform.

  • WillBegin fires when manipulation starts. For example, changing from dynamic to kinematic physics, hiding a hover effect, deactivating children, etc.
  • DidUpdateTransform can be useful for constraining the transform of the entity. For example, don’t allow an entity to move out of a fixed area or disallow rotation on an axis.
  • DidHandOff can be useful if we need to apply hand-specific offsets to the entity.
  • WillRelease is fired just before manipulation ends. This can be a good place to reset visual effects. We use this to re-add the hover effect.
  • WillEnd is fired at the end. The gesture is no longer updating the entity. This can be useful to reset physics or save transform data in app state.

Begin and End Example:

willBegin = content.subscribe(to: ManipulationEvents.WillBegin.self) { event in
  print("picked up \(entity.name)")
}

willEnd = content.subscribe(to: ManipulationEvents.WillEnd.self) { event in
  print("dropped \(entity.name)")
}

We’ll see some uses cases for events in the Showcase section later in this article.

Read Using events with Manipulation Component to explore these events. You can try these events yourself by downloading the project and running Example 088.

Audio Feedback

Manipulation comes with some built in audio feedback. Most of the time, we would do well to leave this alone. But if we want this feedback to blend in with the soundscape of our app or space, we have options. We can deactivate the system sounds.

let subject = Entity()
var mc = ManipulationComponent()
mc.audioConfiguration = .none // turn off the default soundsT
subject.components.set(mc)

Then we can use Manipulation Events to play sounds. Using our Begin and End examples from above.

willBegin = content.subscribe(to: ManipulationEvents.WillBegin.self) { event in
  print("picked up \(entity.name)")
  event.entity.playAudio(pickUpSound)
}

willEnd = content.subscribe(to: ManipulationEvents.WillEnd.self) { event in
  print("dropped \(entity.name)")
  event.entity.playAudio(dropSound)
}

We can play audio using standard OS or RealityKit features. When we need more immersion, we can set up our entities to play sounds spatially.

Read Using custom sounds with Manipulation Component to see a full example. You can trigger these sounds yourself by downloading the project and running Example 089.

Redirecting Input

One of the more powerful features of Manipulation is the ability to capture input on one entity, but send the results to another. Let’s see an example.

We can set up the subject as normal. This is the entity that will move in our scene.

let subject = createStepDemoBox()
let mc = ManipulationComponent()
subject.components.set(mc)

Then we’ll create the entity that we will interact with. Any manipulation on this event will only impact the subject.

let delegate = createStepDemoBox()
let hitTargetComponent = ManipulationComponent.HitTarget(redirectedEntity: subject)
delegate.components.set(hitTargetComponent)

This simple redirection system opens up some interesting possibilities. For example, consider moving a pin on a minimap. The pin would stay in the in same place, but the larger content of the world could move around the user.

Read Redirect input with Manipulation Component to see a full example. You can try this yourself by downloading the project and running Example 090.

Reading Input Data

Manipulation Component can surface some additional data such as chirality (left or right hand), kind(input type). For example, during the Begin event, we can check the inputDeviceSet for an InputDevice

willBegin = content.subscribe(to: ManipulationEvents.WillBegin.self) { event in
  guard let inputDevice = event.inputDeviceSet.first else { return }
  
  let chirality = inputDevice.chirality
  let kind = inputDevice.kind
  print("picked up \(entity.name)")
}

This can be useful when we need custom behavior. Take a chess board for example. When the input kind is direct, we could allow movement on the board. When input is indirect, we could allow the user to pick up and inspect the piece.

See Reading input data from Manipulation Component for more information. You can try this yourself by downloading the project and running Example 104.

Manipulation Modifiers

On the SwiftUI side, we can use many of the same features described above. We can add manipulable() to a Model3D view.

Model3D(named: "Earth")
  .manipulable()

Note: we can apply this modifier to any view, but it seems to be intended for Model3D view. All other views are clipped by window/volume bounds whereas we can pull 3D models out of their parent scene.

Learn more about Manipulation Modifiers in our Spatial SwiftUI section.

Showcase

Now that we’ve covered the basics, let’s see some real world use cases for Manipulation.

Showcase 1: Jimmy Balladares created this Magic 8-Ball demo using two of the events we described above. Let’s look at the video demo and let Jimmy explain how this works.

This demo uses Speech to Text and Apple’s Foundation model to answer questions in the style of a magic 8-Ball. You can either tap the microphone to start the session or you can grab the 8-ball and speak to it and this is where manipulation events kicks in.

A user can tap and grab the 8-ball model and the system listens for the ManipulationEvents.WillBegin event that fires at the beginning of the manipulation gesture. As soon as this event occurs, the system triggers the speech to text session to start recording the user’s question. When the 8-ball is released, ManipulationEvents.WillEnd fires, stopping the recording and processes the captured audio.

The WillBegin and WillEnd events provide precise lifecycle hooks that create a connection between gesture and voice input.

// Setup manipulation component and event listeners
@MainActor 
private func loadEightBallEntity(content: RealityViewContent) async {
    do {
        let entity = try await Entity(named: "EightBall", in: realityKitContentBundle)
        
        // Configure entity for manipulation
        ManipulationComponent.configureEntity(entity)
        
        content.add(entity)
        
        // Subscribe to manipulation lifecycle events
        willBeginSubscription = content.subscribe(
            to: ManipulationEvents.WillBegin.self,
            on: entity
        ) { _ in
            // Start speech recognition when user grabs the 8-ball
            speechManager.startRecording()
        }

        willEndSubscription = content.subscribe(
            to: ManipulationEvents.WillEnd.self,
            on: entity
        ) { _ in
            // Stop recording when user releases the 8-ball
            speechManager.stopRecording()
        }
        
    } catch {
        print("Failed to load entity: \(error)")
    }
}

Showcase 2: We created a simple Capsule Catch game in Lab 086. Users have to try to catch these capsules before they hit the ground. We use the Begin event again here, this time to update game state.

// We'll use this event to determine success
self.willBegin = content.subscribe(to: ManipulationEvents.WillBegin.self) { event in
    if(gameModel.lastInteraction == event.entity.name) {
        return
    }
    if(gameModel.caughtCapsules.contains(where: { $0.name == event.entity.name })) {
        return
    }
    gameModel.addScore()
    gameModel.caughtCapsules.append(event.entity)
    gameModel.scheduleCapsuleActivation()
    gameModel.lastInteraction = event.entity.name
    print("Success!")
}

Watch the demo and explore the code in the lab.

Showcase 3: Project Graveyard interaction mode. Joseph replaced nearly all system gestures in this project with Manipulation Component. Not only did this component replace the gestures, it unlocked new features and made movements more precise and intuitive. See the video for a breakdown.

This app has three interaction modes.

Display Mode: users can pick up and inspect items. Billboard component is applied to make text easier to read. When the user releases, the items snaps back to where it started. We use some events to support this mode.

private func onManipulationWillBegin(_ event: ManipulationEvents.WillBegin) {
    guard appModel.interactionMode == .display else { return }
    let usesIndirectPinch = event.inputDeviceSet.contains { $0.kind == .indirectPinch }
    guard usesIndirectPinch else { return }

    if(event.entity.components[StoneComponent.self]?.modelName == "Stone01") {
        event.entity.components.set(BillboardComponent())
    }
}
private func onManipulationWillRelease(_ event: ManipulationEvents.WillRelease) {
    guard appModel.interactionMode == .display else { return }
    event.entity.components.remove(BillboardComponent.self)
}

Arrange Mode: users can drag items along the ground, but only within the bounds of the graveyard. We do this with ManipulationEvents.DidUpdateTransform.

private func onManipulationDidUpdateTransform(_ event: ManipulationEvents.DidUpdateTransform) {
    guard appModel.interactionMode == .arrange else { return }
    let limit: Float = 4
    let newPos   = Helpers3D.constrainPosition(event.entity.position, limit: limit)
    let newRot   = Helpers3D.constrainRotationToYAxis(event.entity.transform.rotation)
    let newScale: SIMD3<Float>
    if(event.entity.components[StoneComponent.self]?.modelName == "Lamp01") {
        newScale = Helpers3D.constrainScale(event.entity.scale.x, minScale: 0.75, maxScale: 1.75)
    }else {
        newScale = Helpers3D.constrainScale(event.entity.scale.x, minScale: 0.25, maxScale: 1.25)
    }
    event.entity.transform = Transform(scale: newScale, rotation: newRot, translation: newPos)
}

When the manipulaton has ended, we save the new transform to the SwiftData store.

private func onManipulationWillEnd(_ event: ManipulationEvents.WillEnd) {
    guard appModel.interactionMode == .arrange else { return }
    if let id = UUID(uuidString: event.entity.name),
       let item = try? context.fetch(FetchDescriptor<Item>(predicate: #Predicate { $0.id == id })).first {
       
       // Ignore my sloppy SwiftData code
        item.posX = event.entity.position.x
        item.posY = 0
        item.posZ = event.entity.position.z

        let rot = event.entity.transform.rotation
        let saveY = rot.axis.y == 1.0 ? Float(rot.angle) : Float(-rot.angle)
        item.rotY = saveY
        item.scaler = event.entity.scale.x

        Task { @MainActor in
            try? context.save()
        }
    }
}

Edit Mode: this mode removes Manipulation and adds a Tap Gesture. Users can tap an item to open an Options menu where they can edit and customize it.

We switch between these three models with a static function. This configures and assigns the components needed for each mode.

static func applyMode(_ mode: InteractionMode, to entity: Entity) {
    switch mode {
    case .display:
        var mc = ManipulationComponent()
        mc.dynamics.primaryRotationBehavior = .none
        mc.dynamics.secondaryRotationBehavior = .none
        entity.components.set(mc)
        entity.components.remove(GestureComponent.self)

    case .arrange:
        var mc = ManipulationComponent()
        mc.releaseBehavior = .stay
        mc.dynamics.primaryRotationBehavior = .none
        entity.components.set(mc)
        entity.components.remove(GestureComponent.self)

    case .edit:
        // Replace manipulation with a tap gesture to toggle popover
        let tapGesture = TapGesture()
            .onEnded({ [weak entity] _ in
                if let popoverAnchor = entity?.findEntity(named: "PopoverAnchor") {
                    popoverAnchor.components[PresentationComponent.self]?.isPresented.toggle()
                }
            })
        let gestureComponent = GestureComponent(tapGesture)
        entity.components.set(gestureComponent)
        entity.components.remove(ManipulationComponent.self)
    }
}

Known Issues

Sometimes, translation–or movement–may stop working. When this happens, it seems to impact all apps that use Manipulation Component. Users must restart the device to fix the issue.

That wraps up our article on Manipulation for visionOS 26. Let us know if you have any questions or if there is anything we didn’t cover. Comment below to tell us how you’re using Manipulation in your apps.

Leave a Reply to VitaliiCancel reply

2 Comments