Return to Resources

Why Panning and Zooming in a Web App Can't Be Perfect

Apr 11, 2024

12 min read

By: Brian Houser

Developing software is tough. There are complexities and challenges at every twist and turn, especially when creating products that are at the forefront of our industry. That said, some folks on our development team want to start sharing stories about the challenges of developing indoor mapping applications. It will dive deep, sometimes into technical labyrinths, or cover stories of how we are solving mapping problems.


I wanted to write up a bit of an exposé about working with trackpad gestures and mouse wheel events in the browser, as well as panning and zooming. I did a bit of a deep dive on this while trying to improve the behavior in Mappedin this week.

I'm going to break this into a couple of parts: An overview of the state of play, and a more practical discussion of browser event handling and limitations (Spoiler: it's difficult to determine whether a trackpad or mouse scroll wheel is being used). I'm going to ignore touchscreen devices in the first part for reasons that will be explained in the second part.

Part 1: The State of Play

A survey of existing applications will reveal that there is a non-uniform set of panning and zooming control choices made by the respective creators of those applications. Some apps enable click-and-drag for panning, and some do not. Sometimes the scroll wheel is assigned to panning, and sometimes to zooming. Many of these differences can be attributed to application-specific needs, and some of the differences are compromises made based on the limitations of the platform.

At the risk of stating the obvious, the simplest case is probably a vertical-only scrolling (panning) document, such as most document-type pages viewed in a web browser. Zooming is possible but not a prominently placed interaction mode. Map viewing applications, on the other hand, are much more interesting because zooming and panning are the fundamental interactions. Map editors could be the most interesting application type because while panning and zooming are just as important as in map viewers, there's also a need to reserve mouse buttons for the editing functions. Infinite-canvas editors (e.g. Figma) fall into this category as well.

Let's take a tour of some specific use cases.

  1. Google Maps:  Zoom is accomplished with a mouse wheel scroll or a trackpad pinching gesture. I can left-click and drag the map to pan. On a trackpad, I can also click-and-drag to pan, but if I attempt a 2-finger panning gesture then vertical swiping zooms the map (an interesting choice), and if I swipe to the left I'll be surprised by a browser "back" navigation instead of the expected escapade to the Pacific Ocean (depending on your starting location).

  2. Mappedin Enterprise Editor: The enterprise editor (also known as Mappedin CMS) is probably the most advanced indoor map editor on the planet. But even as we consider how a good thing can be improved further, we see that the panning and zooming controls more closely model best practices for a map viewing application than a map editor. It implements almost the same behavior as Google Maps. The price paid is that since left-click is used for panning, most edit commands (e.g., place a node) use right-click. There’s also some inconsistency here: if you're unsuspectingly hovering a path node before you click-and-drag to pan, you're just going to move the node. Saving the day, unlike Google Maps, is that left-swipe on a trackpad will not leave the page.

  3. Figma: Both Figma and Figjam implement panning with middle-click instead of left-click. The initial objection here might be that middle-click for panning on a trackpad is not possible, but Figma implements two-finger panning in both 2D dimensions. To zoom, you hold Ctrl or Meta, while either scrolling the mouse wheel or dragging vertically with two fingers. If your device supports pinch gestures you can use that as well. The black mark on Figma's record is if one scrolls the mouse wheel without holding Ctrl/Meta, the page pans vertically. Spoiler for Part 2: This is a compromise they made based on browser limitations.

  4. Apple Maps: Apple Maps, like all Apple products, is perfect. For those that haven't experienced Apple Maps' perfection, it may be because Apple Maps is only available as a native application on an Apple device such as a MacBook or iPhone. Panning and zooming can be accomplished as follows: with a mouse, you can click-and-drag to pan, or zoom with the mouse scroll wheel (holding of any modifier key is not necessary). With a trackpad, one can do omnidirectional panning by sliding with two fingers. Trackpad zooming is done with a pinch gesture, and lo and behold, you can even perform a two-finger rotation gesture at the same time as performing a pinch gesture to zoom! Unfortunately, Apple is "cheating"( sort of). I'll discuss why as we proceed into part 2.

    Panning and zooming the viewport are fundamental interactions in a map editing tool

Panning and zooming the viewport are fundamental interactions in a map editing tool.

Part 2: Browser Event Handling and Limitations

Since we all agree that the mouse and trackpad panning and zooming behavior implemented in Apple Maps is perfect, let's discuss how we might try to implement that, and what problems we'll run into.

One small change we will make: since we're implementing an editor, we want mouse-panning to be done with middle-click instead of left-click. This doesn't represent a flaw in Apple Maps because Apple Maps is not an editor.

The Mappedin editor is a browser application, so we'll be using browser events. After browsing the documentation, we determined that wheel events are the way to go. Scroll the mouse wheel? Wheel event. Drag on the trackpad? Wheel event. Pinch the trackpad? Also...a wheel event? How do we determine when to pan and when to zoom?

After the panic has subsided and we've explored our options further, we determine that when the user performs a pinch gesture, the browser tells us that the ctrl key is pressed with the event property ctrlKey: true. This is, of course, a lie since the control key is not pressed. But it's a useful lie. We can now determine that a user is pinching the trackpad and wishes to zoom. If that property is not set, then this trackpad user is sliding their fingers in the same direction to pan.

But they might be a mouse user scrolling a wheel, not a trackpad user doing a swipe gesture! Remember that Apple Maps pans with a two-finger swipe, and zooms with a mouse wheel scroll. Our wheel events for these two cases look exactly the same though.

At this point, a reasonable developer might settle for less than perfection. Let's just ask the user to press control themselves if they wish to zoom! Aspiring to be as perfect as Apple Maps is nonsense anyway. Figma made this compromise, surely they explored all the options…

But wait! Are wheel events the only option? Can't we detect multiple trackpad touches and infer from their relative movement over time what sort of gesture is being performed?  There's an MDN guide (as close as it gets to a source of truth for browser behavior) about implementing pinch zoom using pointerdown events and tracking the movement of multiple pointers. Can't we do that with a trackpad?

You would conclude that this will work if you ask ChatGPT how to detect trackpad touches, but ChatGPT is lying. There are no trackpad-unique events that fire when you touch a trackpad. Not for multiple touches, not even for single touches. pointermove will probably fire, but it looks identical to moving a mouse. Regarding browser events, a trackpad is a mouse, and mice produce clicks, not touches.

All of the guides about touch handling refer to touchscreens (I promised in Part 1 that I would talk about touchscreens!). If you think about it, the touch events produced for touchscreens wouldn't make sense if they were emitted for a trackpad. A touch event includes touch coordinates. You can receive coordinates for multiple touches simultaneously. For a trackpad, the first touch could, in theory, produce the coordinates where the mouse cursor was when the first finger touched. For the second or third touch, the coordinates would be ambiguous though, because the relative position of your two fingers cannot be mapped to the screen. You're not touching the screen...even if the illusion while using a touchpad that you are touching the screen is often convincing.

Not to say that a new event type couldn't be implemented for trackpads, providing the ability to implement custom gestures. But today, it doesn't exist. We're starting to come to terms with the situation. The browser events appear to intentionally obscure any difference between a trackpad and a mouse. The events emitted when using a trackpad do not differ from a mouse, our hands are tied.

A corollary here is that touchscreen events do not conflate with mouse events and can be implemented more or less independently, which is great for touchscreens!

But we still can't pan with a trackpad and scroll with a mouse wheel. So a compromise similar to the one Figma made may be our fate. Users will need to hold down a key like ctrl either while panning, or while zooming. But so as to rage against the dying of the light, let us first attempt to scale the cliffs of insanity. Can't we heuristically differentiate trackpad "pan" wheel events from mouse "scrolling" wheel events?


Beginning of the Cliffs of Insanity expedition. Proceed at your own risk!

Our first approach will exploit the polling rate. We notice that trackpads (at least on a MacBook) produce many events that are tightly packed temporally, typically with less than 20ms between events. Let's define a contiguous set of events like this as an "event burst." Event bursts are always trackpads, so we'll pan the map. If we have single events more than 20ms apart, it's a mouse wheel. And if the previous event was part of an event burst, it's also a trackpad.

This almost works. It breaks down if a mouse user flicks the scroll wheel quickly. The assumption about the "slow" event polling rate on mouse scroll wheels was wrong. The resulting behavior is unacceptable: the mouse wheel normally zooms, but if you zoom too fast it pans vertically, usually by a lot since the user flicked the mouse wheel quickly.

Let's not despair though! The cliffs of insanity were never going to be scaled easily. For our next attempt we'll use polling rate, and we'll also use the wheel event's deltaX property. This indicates horizontal movement. Mouse wheels consistently produce deltaX values of 0, trackpads do not (right?).

We retain our "event burst" detection. Regardless of how our heuristic detection works, it must produce a result very quickly. We cannot perform the wrong choice of panning vs zooming for hundreds of milliseconds before changing course, since that would break user immersion dramatically.

Wheel events are not always slow, but trackpad events are always in fast succession. So if we get an event burst, and one of the first 5 events has a nonzero ‘deltaX’ value, it's a trackpad and we pan. If the previous event is part of an event burst that was identified as a trackpad, we do not wait for five events; we pan immediately. Otherwise, it is a mouse wheel and we zoom.

This almost works! But unfortunately, it moves the immersion-breaking behavior from the mouse to the trackpad. Turns out it's possible to produce dozens of sequential wheel events on a trackpad with deltaX === 0. MacBook trackpads emulate scrolling momentum with a two-finger swipe or flick by emitting artificial wheel events with sequentially decreasing position deltas, after the swiped fingers are not touching the trackpad anymore. So if a user does a brief vertical flick and produces, for example, three events with no horizontal movement, 20 more events appear that also have no horizontal movement! Physical mouse confirmed! Except we are wrong, and the user zooms out to the maximum when they intend to pan a short distance.


Cliffs of Insanity expedition over

We're reaching the end of our journey. How is Apple "cheating" to make Apple Maps perfect? I basically said it at the beginning, although the relevance wasn't clear at that point: Apple Maps is a native application. They aren't beholden to the limitations of the browser events produced by trackpads vs mice. They can use any information the OS provides them. Beyond being a native application, Apple Maps only runs on Apple devices, so their applications probably have access to hardware identifiers telling them unambiguously that the gesture is from a trackpad. I don't know if this is a secret API or something available to anybody who creates a MacOS application, since I didn't research that. My problem is in the browser.

What can we do in the Mappedin editor? We have to make some compromises. Mouse usage is probably the most common interface mode for drawing maps. So what? We'll implement mouse wheel zooming. Two-finger swiping on the trackpad also zooms. To pan with the trackpad, hold down the spacebar and swipe two fingers. We've had the hold-spacebar hotkey for a while to enable left-click drag-panning so this extends existing behavior.

Thanks for joining me on this adventure. And please, tell me if I'm wrong. If there's a way to unambiguously differentiate mouse and trackpad wheel events, then we too can build perfect panning and zooming interactions!

Check out our Mappedin self-serve mapping tools and start mapmaking for free! Explore more advanced subscription tools with Mappedin Plus.