Design Critique: Copilot, Microsoft’s AI Chatbot (The Bing App for iOS)

Copilot is the newest iteration of Microsoft’s AI-powered chatbot technology, designed to provide users with access to a conversational interface to assist them with tasks related to productivity, discovery, and creation. To evaluate the user experience, I downloaded the Bing app for iOS and interacted with the Copilot feature myself.

Users’ Goals and The World

To evaluate the user experience of Copilot on the Bing app, we can start by establishing the goals of a user when interacting with this product and the context in which this product exists. Historically, users would interact with a chatbot to connect with customer service or solve a distinct issue. This all changed with advancements in generative AI, in particular with the launch of Chat GPT in November of 2022. Since then, numerous versions and iterations of AI-powered chatbots such as Copilot have emerged to provide internet users with access to this technology which functions as a futuristic search engine that can not only efficiently locate information users are seeking but also generate novel content in a conversational and iterative process.

So, what does this all mean? Ultimately, this emerging technology has a broad set of applications that users likely have a minimal understanding of or experience with. The details of their goals may vary, but the core objective of users is likely to retrieve information effectively and efficiently or generate content while minimizing cognitive load. The user’s conceptual model may include viewing this tool as an assistant, a collaborator, or as an advanced search engine for you to text questions.

Accessing Copilot with the Bing App

A user flow demonstrating how a user accesses Copilot on their iOS device

Accessing Copilot through the Bing app is a simple process: one only needs a smartphone and an internet connection. Once the app is downloaded from the app store, users must create an account using their email address. From there, users are afforded the option of selecting the Copilot icon which is intuitively located at the bottom center of the app. Users are drawn to Copilot through the use of signifiers such as a distinct logo and eye-catching color scheme which stands in contrast to the other grayed-out options available in the tabs menu. After selecting Copilot, appropriate feedback is provided while the feature loads, offering an understanding that the action initiated progress toward the user’s desired state. Once loaded, other previously available tabs disappear and users are ready to interact with Copilot. The experience of accessing Copilot in the Bing app is effective, offering users with necessary affordances to achieve their goals, clear signifiers to direct users’ actions appropriately, and timely feedback to communicate progress.

Using Copilot to Search with Text

A user flow demonstrating the experience of requesting information with text, highlighting effective examples of signifiers, feedback, and affordances.

The opening page of Copilot makes excellent use of afforded space to communicate effectively to users “knowledge in the world” and “knowledge in the head”. The interface’s minimal design balances clean lines with playful colors and symbols, providing an aesthetically pleasing experience that attracts the user on a visceral level. On a behavioral level of processing, the average user would recognize the interface closely resembles that of popular messenger apps or search engines, providing an automatic connection in their brain between what action to perform and expectations. Paired with clear communication on the capabilities of the tool through useful affordances and signifiers, the interface provides effective discoverability for users to understand what actions are possible.  The mapping of the interface is intuitive and follows cultural constraints, making it easy for users to locate actions and orient themselves on the vertical field, with their requests on the right and the system’s responses on the left. In this interface, it is difficult to make a mistake because discoverability makes determining how to proceed toward your goal simple and feedback provides a clear understanding of your current state throughout the interaction.

After sending a request through, users are given instant feedback in multiple locations, notating the status of the request and the actions the system is taking to respond. There are clear controls offered to the user, such as the option to stop a response, allowing users to change or refine their request. The semantic constraints of Copilot are outlined clearly, such as a backward arrow to exit, a camera to take a picture, a microphone to speak, and a keyboard to type. As highlighted on the home page, Copilot is subject to making mistakes due to the nature of AI. To help alleviate this problem, in addition to the clear communication provided which sets expectations with users, each response provides direct links to sources the AI pulled information from to generate responses, building greater trust and understanding for a user. Users may also provide feedback, stating whether the response generated was satisfactory or not, which elicits further conversation and iterations until your questions are answered. From a reflective level of processing, this helps users evaluate the quality of the source, to better evaluate the information consciously.

Using Copilot to Generate Images

A user flow demonstrating the experience of requesting an image be generated, highlighting examples of output and feedback

After requesting “an image of a cat swimming across the River Thames”, a box appears with flowing colors as the image generates. The feedback here is less effective, as the flow of color in the boxes does not indicate a clear message or meaning, and I was not sure if the image failed to generate or was in process. Updating the feedback to explicitly state the image is still generating and providing an estimated completion time will help bridge the gulf of evaluation presented here. In total, the image generation took roughly a minute before a picture emerged. In comparison to the near-instantaneous results from written responses, this felt quite long but not enough for me to consider abandoning my goal. 4 variations of the image were generated, and while nearly correct, none of the images were of a “cat swimming”, but rather “rafting” across the river. I provided a second request for modifications, which generated a correct image, although decidedly gloomier in tone than I had hoped for. After an additional modification request to lighten things up, the image generated met my desired goal.

After the final image was generated, I was able to share the image via a link, which technically worked but not as seamlessly as I would expect. I wanted to download or share the image similarly to how I share photos in iMessage because this process is simple and would better integrate the share functionality with my normal communication channels. Although Copilot failed to accurately map my initial request for a swimming cat, the interactive process of refining the image was effective, and the incorrect image’s silliness brought delight to my experience. Generating an image using Copilot sparked a sense of flow, where each action taken progressed me toward my goal, which I was ultimately able to reach. If Copilot incorporates these suggestions to improve the discoverability, accuracy, and sharability of this feature, it will surely surprise and delight users.

A Suggestion for Improvement:

A mockup of an additional button which opens a pop-up designed to act as an affordance and signifier, allowing users to better understand the GPT-4 feature

One affordance on Copilot that needs to be sufficiently explained is the ability to enable or disable the use of GPT-4 in your requests. As a user, I tried Copilot with this feature both enabled and disabled, and was not able to detect a difference in response quality or experience. I explored the linked FAQs as well and found no mention of the pros or cons of enabling the use of GPT-4. The centrality of this affordance in its position on the home screen indicates a perceived level of importance, but the lack of noticeable logical constraints resulting from enabling this feature is unsettling. While the gulf of execution was bridged by a visible affordance (the purple switch), I could not bridge the gulf of evaluation without understanding what this feature does. This left me as a user feeling uncertain of my ability to use Copilot properly. 

To solve this, I suggest a button be placed below the GPT-4 feature, which when pressed triggers a pop-up window to open within the interface explaining this feature, when to enable or disable it, and affordances to seek further information. I think this simple solution will bridge the gulf of evaluation and allow users to build a more effective conceptual model. The minimalist design of this button and pop-up window will maintain the integrity of Copilot’s interface, avoiding featuritis and overcomplication of the product. Providing this button will also minimize the likelihood of users experiencing learned helplessness by providing clear guidance on how to use this affordance appropriately.

Conclusions: 

Overall, Copilot is a well-designed product that effectively incorporates the concepts of good design outlined in Don Norman’s Design of Everyday Things. To improve, the designers at Copilot should consider solving a few key issues such as clearer feedback and discoverability with the image generation function and adding a button to outline the GPT-4 use cases. Even with its flaws, Copilot offers a best-in-class example of a tool that effortlessly executes extremely complex tasks in an understandable and usable manner.

Sources:

Norman, Donald A. 2013. The Design of Everyday Things. The MIT Press. Cambridge, Mass.: MIT Press.

“Introducing Microsoft 365 Copilot – Your Copilot for Work.” Official Microsoft Blog. Microsoft, March 16, 2023.