Editing While Invisible: An Accessibility Audit of the Wikipedia iOS App

The Problem

Wikipedia runs on volunteer contribution. Millions of people worldwide read, edit, and discuss its content, but according to Wikimedia’s own Product and Technology Advisory Council, only around 12% of successful edits are made on mobile devices, despite 60% of Wikipedia’s traffic coming from mobile. As new audiences arrive primarily through phones, that gap is a problem. And with iOS 26’s Liquid Glass design language introducing new transparency, reduced contrast, and smaller tap targets across the system, the question became urgent: is the Wikipedia iOS app actually accessible to the people it needs to reach?

Our team of four UX design graduate students was commissioned by the Wikimedia Foundation to conduct a comprehensive accessibility audit of the app’s onboarding and editing experience. My focus area was suggested edits, watchlists, and talk pages: the features that turn a passive reader into an active contributor.

Research Process

We structured the audit around WCAG 2.2 at Level AA conformance, testing 38 guidelines across all four WCAG principles: Perceivable, Operable, Understandable, and Robust. Each criterion was tested using the iOS assistive technologies most relevant to it, including VoiceOver, Switch Control, Voice Control, Dynamic Type, Zoom, Color Filters, and Touch Accommodations.

To ground the audit in real user needs, our team developed two personas informed by disability research, Wikimedia user data, and iOS accessibility documentation. I contributed to Persona 1: Rory Brookes, a 25-year-old graduate student with dyslexia and a spinal cord injury who relies on Voice Control, speech-to-text, and system-level text customization. Rory represents a user who is tech-savvy and expects apps to work seamlessly with iOS settings, not to require workarounds. His needs directly shaped what I prioritized in my testing: heading structure, focus behavior, label clarity, and error communication.

For my three assigned flows, I worked through each WCAG criterion methodically, enabling the relevant assistive technology, navigating through the feature, and documenting what I observed. I used the VoiceOver rotor to test heading navigation, Switch Control to test keyboard trapping, Dynamic Type at maximum accessibility size to test reflow and text scaling, Color Filters in greyscale to test color dependency, and Increased Contrast alongside a contrast checker to verify ratios on both text and UI components.

I was completely new to accessibility testing at the start of this project. Learning VoiceOver mid-audit, including a troubleshooting moment where I had to use Siri to reset it after gestures stopped responding, was part of the process. What moved me forward was continuing anyway: testing, observing, questioning whether what I was seeing was a genuine failure or a gap in my own technique, and working through that distinction until my findings made logical sense.

Findings and Recommendations

Across my three flows, I identified several meaningful accessibility failures, ranging in severity from critical to minor.

The most significant finding was in suggested edits (1.4.1, Use of Color, Tier 1). The app uses color alone to distinguish new edits from existing article text. For users with color blindness or low vision, the edit becomes invisible and indistinguishable from the surrounding content. A user cannot review or verify their own changes. The fix is straightforward: supplement color with a redundant visual indicator such as underline, strikethrough, or symbols like + and − to mark additions and deletions.

Focus behavior after submission was inconsistent (3.2.1, On Focus, Tier 2). After submitting an edit, VoiceOver focus did not land on a meaningful element. The screen appeared visually blank and nothing was announced, leaving a user relying on audio feedback with no confirmation that a context change had occurred. Focus should be programmatically set to the first meaningful element, such as a heading or status message, on every screen transition.

Touch targets were undersized in multiple locations (2.5.8, Target Size Minimum, Tier 2). The “Show more formatting options” button in the editing toolbar and the pre-fill suggestion chips on the edit summary screen both failed to meet the 24x24px minimum. These are high-frequency interaction points in the editing flow, and small targets here create a disproportionate barrier, particularly for users with motor impairments or tremors.

Secondary text contrast was insufficient in the article editor (1.4.3, Contrast Minimum, Tier 2). While primary body text met the 4.5:1 ratio, sub-headers, category labels, and metadata text rendered at an estimated 3.4:1, which is below the required minimum. These text elements need to be darkened to guarantee compliance.

The appearance modal lacked a visible close button (2.1.2, No Keyboard Trap, Tier 2). Without an X or Done button, Switch Control users had no clear way to exit the Reading Preferences modal, creating a partial keyboard trap.

On watchlists, most criteria passed. The reading order was logical, color information was supplemented by text, and status messages were announced correctly. One partial failure was noted: the segmented control switching between “User page” and “Talk” on the user profile had visually small tap targets relative to the importance of that navigation element.

For talk pages, my findings aligned with broader patterns identified by my teammates. Status messages after posting a reply were not announced automatically by VoiceOver, and error states like attempting to publish without text surfaced only as a dimmed button with no programmatic explanation of why the action was unavailable.

My prioritized recommendations, consistent with the team’s tiering framework:

Tier 1 (Critical): Add redundant non-color indicators to distinguish edits in the suggested edits flow.

Tier 2 (Important): Fix focus management on screen transitions; increase touch target sizes on the formatting toolbar and suggestion chips; darken secondary text to meet contrast minimums; add a labeled close button to the appearance modal; implement VoiceOver live region announcements for reply submission and watchlist state changes.

Tier 3 (Enhancement): Clarify filter control labels in the watchlist; audit all icon labels across the talk page toolbar for completeness.

Reflection

Coming into this project, I had no prior experience with accessibility testing. I had a theoretical understanding of WCAG from coursework, but I had never actually held a phone with VoiceOver on and tried to use an app. The gap between knowing the standard and knowing what failure actually feels like turned out to be significant.

The most useful thing I did was keep moving when I was not sure. There were moments where I genuinely could not tell whether something was a failure or whether I was using the assistive technology incorrectly. VoiceOver gestures are unintuitive at first, and the line between “this element has no label” and “I navigated past it wrong” is easy to confuse early on. Rather than stopping, I would try the same test a different way, check my findings against the criterion’s intent, and ask whether what I was experiencing would make sense to someone who had no other way to use the app. That question became my anchor throughout the audit.

What I take away professionally is that accessibility is not a checklist you complete after design is finished. Every failure I found was baked into a design decision made earlier in the process: a color choice that seemed fine visually, a button that looked big enough, a success state that was obvious on screen and therefore assumed to be obvious everywhere. Those decisions had been made without a second mode of perception in mind. That is the gap UX designers have a direct role in closing, not by auditing at the end, but by building the question in from the beginning.

This project also gave me a new way to think about the editing experience specifically. The users most likely to be excluded from contributing to Wikipedia are often the ones with the most to offer and the fewest alternative paths to participation. That framing will stay with me.