Natalie Gent  /  Case Studies  /  Tone of Voice App
Natalie Gent
Case Study 01   ·   2024
01 · Case Study · 2024

Tone of Voice App

Usability research on an AI brand voice generator, from Notion prototype to standalone product launch.

Client
toneofvoice.app
Sector
AI / branding
Year
2024
Method
Moderated usability
Sample
6 participants
Role
Sole researcher

§ 01Introduction

Tone of Voice App is a tool for startups, small businesses and marketing teams to establish their brand tone of voice and in-house writing style. The app uses AI to create a tailored brand voice and writing guidelines in minutes, instead of days or weeks.

The tool was created by Tahi Gichigi, a UX writer and tone-of-voice expert with 15 years of industry experience. He had built the tool as a Notion template and needed answers to three questions before committing to a standalone product: could people navigate the tool independently, did the generated content feel useful and specific enough, and where was friction slowing people down or causing them to disengage?

I was brought in to plan and lead the usability research end to end, from the research plan to the final insights, so the founder had evidence-based direction for the product roadmap. I facilitated all six in-person usability sessions, analysed qualitative data, identified key themes, and delivered findings and actionable recommendations. Since we were tapping into the founder's network, we decided it was better for him to recruit users.

§ 02User needs

As a startup founder or marketing lead without a dedicated brand team
I want to quickly generate a brand voice and style guide tailored to my business
so that I can communicate consistently across all channels without hiring a specialist

§ 03Research approach

Planning and method selection

I chose 1:1 moderated usability testing after ruling out two alternatives:

Moderated sessions let me observe, probe, and intervene in real time. That mattered because each participant interacted with the Notion interface differently, and confusion often surfaced in ways I couldn't have predicted from the tool's design alone.

I worked closely with the founder on the user research testing plan, covering testing objectives, research methodology, timelines, and participant incentives. Each session ran for approximately 45 minutes: a 5-minute intro, 10 minutes of context questions, 20 minutes of usability testing, and a 10-minute debrief. Participants were offered free access to the tool and a free coffee as incentives.

Hypotheses

I had four working hypotheses going into the sessions:

  1. Users would struggle to know how much to input. The open text fields gave no guidance on expected detail. Without prompts or examples, participants would under-provide information, leading to generic output.
  2. The AI-generated output would feel too generic. If users entered minimal context, the AI wouldn't have enough to work with. Output quality was directly tied to input quality, and the interface did nothing to encourage depth.
  3. The Notion editing model would cause confusion. Everything on the page was editable, including headings, instructions, and AI prompts. Users unfamiliar with Notion would interpret the cursor as an invitation to type, leading to accidental edits or deletions.
  4. The brand voice concept would resonate more than the style guide. Brand voice is tangible and immediately useful. A style guide is a reference document, and we expected users to see less value in rules they might never revisit.

Recruitment

Participants were recruited through the founder's network and targeted outreach, screening for founders, entrepreneurs, and marketers who had a need for brand voice work but hadn't yet formalised one.

Six participants were recruited. The sample included a brand consultant running her own agency, an employee at a luxury travel company, a campaign manager at a social selling startup, a head of property at a relocation company, a branding agency founder, and an employee at a sustainable AI infrastructure startup. Company sizes ranged from solo operators to teams of 35. Most had no existing brand voice documentation.

All six sessions were conducted in person.

Limitations

Six participants is a small sample. The founder's network was our only recruitment channel, which limited diversity and introduced potential selection bias. All participants were based in Europe. Findings should be read with these constraints in mind, though the consistency of the usability issues across all six sessions gave us confidence in the patterns.

Discussion guide

I wrote the discussion guide in three parts, each designed to build understanding progressively:

Part 1 · Scripted introduction. Session purpose, think-aloud instructions, reassurance that we were testing the tool not the participant, consent to record, and anonymity of responses. The "we're testing the tool, not you" framing was especially important here because several participants had never done usability testing before and needed permission to be critical.

Part 2 · Context questions. I structured these to establish each participant's mental model before they saw the tool. I asked what "brand voice" meant to them, how important brand consistency was to their business, and whether they used any tools for brand management. Crucially, I also asked whether they would use an AI tool for brand voice before revealing that the tool was AI-powered. This avoided priming and gave me honest baseline attitudes.

I also asked about familiarity with Notion, which turned out to be a significant factor. Participants who weren't Notion users had the hardest time with the editable interface.

Part 3 · Debrief. Perceived value, desirability, and open questions. I included pricing questions ("What would you expect this to cost?" and "Would you pay for this?") to capture willingness-to-pay signals alongside the usability data. This layered product-market-fit research into the usability study without adding session time.

Facilitation

All six sessions were conducted in person, each running 30 to 45 minutes. I facilitated every session, recording and transcribing using Otter.

Every session opened with a verbal informed consent process covering the purpose of the research, confidentiality, consent to record, the right to withdraw at any time, and my independence as an external researcher. Without that framing, participants might have held back criticism of a tool built by someone in their network.

During the task-based walkthrough, I had to intervene in several sessions to prevent participants from corrupting the prototype. In one case, a participant started typing brand voice examples directly into the AI prompt field, which would have broken the generation. In another, a participant almost overwrote a section heading. These interventions were necessary to keep the session running, but they also became some of the strongest evidence for the editability findings.

Analysis

After completing all six sessions, I reviewed each recording and my session notes systematically, identifying patterns that recurred across multiple participants. I tracked each instance of confusion, hesitation, or misinterpretation, then grouped these into themes based on where they occurred in the flow and what they revealed about user expectations.

Ten findings emerged consistently. I cross-referenced these against my hypotheses to separate confirmed assumptions from new discoveries. Two findings I hadn't anticipated, the "new user type" insight and the export expectation, turned out to be among the most strategically valuable.

§ 04Findings

Users saw the value immediately, but the interface got in the way.

The concept landed, the execution didn't. Ten findings emerged across six sessions:

  1. Most participants said they would pay for the tool. Even those who found the current version too rough saw the potential, validating the core proposition and giving the founder confidence to invest in a standalone build.
  2. Agency users emerged as a new user type. Two participants, both running agencies, independently said they'd use the tool to generate brand voices for their clients, not just for themselves, feeding output into ChatGPT for client content or generating Instagram captions in the brand voice. This was a use case the founder hadn't designed for, and it signalled a potential agency market.
  3. Most participants were not interested in the style guide. It was described as something they'd "only look at if they came to write something" and "too generic." Some found specific sections valuable, particularly capitalisation rules and active vs passive voice guidance, but the value was buried under less distinctive content.
  4. Users were unsure how much information to provide. Participants routinely under-filled input fields, skipping optional sections or entering just one line. When input was thin, output suffered, creating a cycle where users blamed the tool for being generic rather than recognising the connection to their own input.
  5. Generated content felt too broad to be actionable. Brand values felt duplicated, industry-specific language was missing, and output defaulted to British English despite US-focused input. The content was described as "common sense and too basic." The root cause was insufficient input, but the tool did nothing to signal that more detail would produce better results.
  6. The editable Notion interface caused confusion. Every element on the page was editable, including headings and instructions. Participants assumed the cursor meant they should type, leading to accidental edits and erasure of input fields. Some couldn't tell where to input information at all.
  7. Exposed AI prompts created friction. AI instructions were visible on the page. Participants tried to interact with them, and one started typing directly into the prompt field, requiring me to intervene before she corrupted the input. The prompt engineering was described as "very obvious." Users couldn't tell what was for them and what wasn't.
  8. Users couldn't tell where input ended and output began. The interface was described as "a wall of text." Without clear signposting between stages, participants were unsure whether they had finished or whether more action was needed.
  9. Users didn't know what to do with the generated content. Participants wanted to generate Instagram captions, run existing content through the tool, or copy the output into ChatGPT. The tool generated the brand voice but didn't help users apply it.
  10. Almost every participant wanted a branded PDF export. Users wanted something they could present to their team. The export feature actually existed within Notion, but the interface didn't surface it, so participants assumed it wasn't available.

§ 05Recommendations

I delivered recommendations in three tiers based on effort and impact.

Tier 1 · High impact, low effort

Clarify input expectations and reinforce usefulness.

I recommended adding structured prompts, helper text, and placeholder examples so users understand what to provide and how much detail is enough. The field label "What you do" should be reframed to "What does your company do?" since multiple participants interpreted it as asking about their personal role. These additional inputs would give the AI richer context and directly reduce the generic output problem.

I also recommended adding confirmation messaging and concrete next steps after content is generated (e.g. "Use your brand voice to brief your copywriter" or "Share this style guide with your team so you have consistency across all communications"). Users needed to see not just what had been generated, but how to use it.

Tier 2 · Medium effort, high value

Surface the export and add custom designs.

This required more design work but would meaningfully improve the output experience. The branded PDF export needed to be prominent and unmissable, not hidden behind Notion's native export. Users wanted something they could present to a team, so I recommended adding custom design options to the PDF to make the output feel like a professional deliverable rather than raw text.

Tier 3 · Strategic enhancement

Move to a standalone product.

The Notion editing model was the root cause of three separate findings: editable fields confusion, exposed AI prompts, and unclear thresholds between input and output. Because everything in Notion is editable, users accidentally deleted parts of the UI. Because AI prompts were visible on the page, users tried to interact with them. I recommended building the tool as a standalone app with form-style inputs, hidden AI prompts, and clear visual separation between stages. This single architectural change would resolve the majority of the usability issues.

§ 06Deliverables

§ 07Impact

Research outcomes

The research gave the founder clear, evidence-based direction for the product roadmap:

What was adopted

The founder built the standalone product at toneofvoice.app, adopting several key recommendations from the research:

Broader outcomes

The original Notion template has been downloaded over 100 times. The standalone app launched with a clear product direction shaped by the research, and the agency tier validated the strategic finding that emerged from just two participants during testing.

§ 08Reflections

What worked well. The moderated in-person format was the right call. Several of the strongest findings came from moments where I had to intervene, such as when a participant started typing into the AI prompt field, or when another almost deleted a section heading. An unmoderated study would have captured task failure but missed the why. Conducting all sessions in person also meant I could observe body language and hesitation that wouldn't have come through on a remote call.

What I'd do differently. I'd push for a larger and more diverse sample. Six participants from the founder's network gave us consistent usability patterns, but a broader pool would have strengthened the findings and reduced selection bias. I'd also run a short post-session survey to capture quantitative satisfaction data alongside the qualitative observations.

What I learned. The agency use case was the most strategically valuable finding, and it came from just two participants. It reshaped the product's pricing model and market positioning. Sometimes the most useful insight is one you didn't go looking for. I also learned how much the underlying platform shapes the research: many of the usability issues were Notion-specific rather than concept-specific, and separating the two required careful analysis.


View the live site →

Continue  /  Case Study 02

Octant · Public goods on Ethereum

End-to-end UX research for a community-driven funding platform. Ten participants, ten findings, three tiers of recommendations.