§ 01Introduction
Tone of Voice App is a tool for startups, small businesses and marketing teams to establish their brand tone of voice and in-house writing style. The app uses AI to create a tailored brand voice and writing guidelines in minutes, instead of days or weeks.
The tool was created by Tahi Gichigi, a UX writer and tone-of-voice expert with 15 years of industry experience. He had built the tool as a Notion template and needed answers to three questions before committing to a standalone product: could people navigate the tool independently, did the generated content feel useful and specific enough, and where was friction slowing people down or causing them to disengage?
I was brought in to plan and lead the usability research end to end, from the research plan to the final insights, so the founder had evidence-based direction for the product roadmap. I facilitated all six in-person usability sessions, analysed qualitative data, identified key themes, and delivered findings and actionable recommendations. Since we were tapping into the founder's network, we decided it was better for him to recruit users.
§ 02User needs
I want to quickly generate a brand voice and style guide tailored to my business
so that I can communicate consistently across all channels without hiring a specialist
§ 03Research approach
Planning and method selection
I chose 1:1 moderated usability testing after ruling out two alternatives:
- Surveys would have given broader reach, but the tool was a new concept with nuanced interaction patterns. We needed to observe real behaviour, not self-reported impressions. A survey wouldn't reveal why users were getting confused.
- Unmoderated testing wasn't appropriate because the Notion prototype had quirks that required real-time management. The editable interface meant participants could accidentally delete parts of the UI, and the visible AI prompts caused confusion that needed in-the-moment intervention. Without a moderator, sessions would have broken down.
Moderated sessions let me observe, probe, and intervene in real time. That mattered because each participant interacted with the Notion interface differently, and confusion often surfaced in ways I couldn't have predicted from the tool's design alone.
I worked closely with the founder on the user research testing plan, covering testing objectives, research methodology, timelines, and participant incentives. Each session ran for approximately 45 minutes: a 5-minute intro, 10 minutes of context questions, 20 minutes of usability testing, and a 10-minute debrief. Participants were offered free access to the tool and a free coffee as incentives.
Hypotheses
I had four working hypotheses going into the sessions:
- Users would struggle to know how much to input. The open text fields gave no guidance on expected detail. Without prompts or examples, participants would under-provide information, leading to generic output.
- The AI-generated output would feel too generic. If users entered minimal context, the AI wouldn't have enough to work with. Output quality was directly tied to input quality, and the interface did nothing to encourage depth.
- The Notion editing model would cause confusion. Everything on the page was editable, including headings, instructions, and AI prompts. Users unfamiliar with Notion would interpret the cursor as an invitation to type, leading to accidental edits or deletions.
- The brand voice concept would resonate more than the style guide. Brand voice is tangible and immediately useful. A style guide is a reference document, and we expected users to see less value in rules they might never revisit.
Recruitment
Participants were recruited through the founder's network and targeted outreach, screening for founders, entrepreneurs, and marketers who had a need for brand voice work but hadn't yet formalised one.
Six participants were recruited. The sample included a brand consultant running her own agency, an employee at a luxury travel company, a campaign manager at a social selling startup, a head of property at a relocation company, a branding agency founder, and an employee at a sustainable AI infrastructure startup. Company sizes ranged from solo operators to teams of 35. Most had no existing brand voice documentation.
All six sessions were conducted in person.
Limitations
Six participants is a small sample. The founder's network was our only recruitment channel, which limited diversity and introduced potential selection bias. All participants were based in Europe. Findings should be read with these constraints in mind, though the consistency of the usability issues across all six sessions gave us confidence in the patterns.
Discussion guide
I wrote the discussion guide in three parts, each designed to build understanding progressively:
Part 1 · Scripted introduction. Session purpose, think-aloud instructions, reassurance that we were testing the tool not the participant, consent to record, and anonymity of responses. The "we're testing the tool, not you" framing was especially important here because several participants had never done usability testing before and needed permission to be critical.
Part 2 · Context questions. I structured these to establish each participant's mental model before they saw the tool. I asked what "brand voice" meant to them, how important brand consistency was to their business, and whether they used any tools for brand management. Crucially, I also asked whether they would use an AI tool for brand voice before revealing that the tool was AI-powered. This avoided priming and gave me honest baseline attitudes.
I also asked about familiarity with Notion, which turned out to be a significant factor. Participants who weren't Notion users had the hardest time with the editable interface.
Part 3 · Debrief. Perceived value, desirability, and open questions. I included pricing questions ("What would you expect this to cost?" and "Would you pay for this?") to capture willingness-to-pay signals alongside the usability data. This layered product-market-fit research into the usability study without adding session time.
Facilitation
All six sessions were conducted in person, each running 30 to 45 minutes. I facilitated every session, recording and transcribing using Otter.
Every session opened with a verbal informed consent process covering the purpose of the research, confidentiality, consent to record, the right to withdraw at any time, and my independence as an external researcher. Without that framing, participants might have held back criticism of a tool built by someone in their network.
During the task-based walkthrough, I had to intervene in several sessions to prevent participants from corrupting the prototype. In one case, a participant started typing brand voice examples directly into the AI prompt field, which would have broken the generation. In another, a participant almost overwrote a section heading. These interventions were necessary to keep the session running, but they also became some of the strongest evidence for the editability findings.
Analysis
After completing all six sessions, I reviewed each recording and my session notes systematically, identifying patterns that recurred across multiple participants. I tracked each instance of confusion, hesitation, or misinterpretation, then grouped these into themes based on where they occurred in the flow and what they revealed about user expectations.
Ten findings emerged consistently. I cross-referenced these against my hypotheses to separate confirmed assumptions from new discoveries. Two findings I hadn't anticipated, the "new user type" insight and the export expectation, turned out to be among the most strategically valuable.
§ 04Findings
Users saw the value immediately, but the interface got in the way.
The concept landed, the execution didn't. Ten findings emerged across six sessions:
- Most participants said they would pay for the tool. Even those who found the current version too rough saw the potential, validating the core proposition and giving the founder confidence to invest in a standalone build.
- Agency users emerged as a new user type. Two participants, both running agencies, independently said they'd use the tool to generate brand voices for their clients, not just for themselves, feeding output into ChatGPT for client content or generating Instagram captions in the brand voice. This was a use case the founder hadn't designed for, and it signalled a potential agency market.
- Most participants were not interested in the style guide. It was described as something they'd "only look at if they came to write something" and "too generic." Some found specific sections valuable, particularly capitalisation rules and active vs passive voice guidance, but the value was buried under less distinctive content.
- Users were unsure how much information to provide. Participants routinely under-filled input fields, skipping optional sections or entering just one line. When input was thin, output suffered, creating a cycle where users blamed the tool for being generic rather than recognising the connection to their own input.
- Generated content felt too broad to be actionable. Brand values felt duplicated, industry-specific language was missing, and output defaulted to British English despite US-focused input. The content was described as "common sense and too basic." The root cause was insufficient input, but the tool did nothing to signal that more detail would produce better results.
- The editable Notion interface caused confusion. Every element on the page was editable, including headings and instructions. Participants assumed the cursor meant they should type, leading to accidental edits and erasure of input fields. Some couldn't tell where to input information at all.
- Exposed AI prompts created friction. AI instructions were visible on the page. Participants tried to interact with them, and one started typing directly into the prompt field, requiring me to intervene before she corrupted the input. The prompt engineering was described as "very obvious." Users couldn't tell what was for them and what wasn't.
- Users couldn't tell where input ended and output began. The interface was described as "a wall of text." Without clear signposting between stages, participants were unsure whether they had finished or whether more action was needed.
- Users didn't know what to do with the generated content. Participants wanted to generate Instagram captions, run existing content through the tool, or copy the output into ChatGPT. The tool generated the brand voice but didn't help users apply it.
- Almost every participant wanted a branded PDF export. Users wanted something they could present to their team. The export feature actually existed within Notion, but the interface didn't surface it, so participants assumed it wasn't available.
§ 05Recommendations
I delivered recommendations in three tiers based on effort and impact.
Clarify input expectations and reinforce usefulness.
I recommended adding structured prompts, helper text, and placeholder examples so users understand what to provide and how much detail is enough. The field label "What you do" should be reframed to "What does your company do?" since multiple participants interpreted it as asking about their personal role. These additional inputs would give the AI richer context and directly reduce the generic output problem.
I also recommended adding confirmation messaging and concrete next steps after content is generated (e.g. "Use your brand voice to brief your copywriter" or "Share this style guide with your team so you have consistency across all communications"). Users needed to see not just what had been generated, but how to use it.
Surface the export and add custom designs.
This required more design work but would meaningfully improve the output experience. The branded PDF export needed to be prominent and unmissable, not hidden behind Notion's native export. Users wanted something they could present to a team, so I recommended adding custom design options to the PDF to make the output feel like a professional deliverable rather than raw text.
Move to a standalone product.
The Notion editing model was the root cause of three separate findings: editable fields confusion, exposed AI prompts, and unclear thresholds between input and output. Because everything in Notion is editable, users accidentally deleted parts of the UI. Because AI prompts were visible on the page, users tried to interact with them. I recommended building the tool as a standalone app with form-style inputs, hidden AI prompts, and clear visual separation between stages. This single architectural change would resolve the majority of the usability issues.
§ 06Deliverables
- A user research testing plan (co-created with the founder), covering objectives, methodology, timelines, and participant incentives
- A discussion guide, including scripted introduction, context questions, task scenarios, and debrief questions
- A findings presentation for the founder
- A UX research report with findings, participant quotes, and session observations
§ 07Impact
Research outcomes
The research gave the founder clear, evidence-based direction for the product roadmap:
- The core value proposition was validated. Most participants said they would pay for the tool, which gave the founder the confidence to invest in building a standalone product.
- The usability issues were almost entirely tied to the Notion prototype, not the concept itself. This meant the standalone app could fix the majority of problems through better architecture rather than rethinking the product.
- The discovery of agency users as a new user type reframed the founder's thinking about market positioning and pricing.
What was adopted
The founder built the standalone product at toneofvoice.app, adopting several key recommendations from the research:
- The tool moved from a Notion template to a standalone web app with form-style inputs, resolving the editability, AI prompt exposure, and threshold confusion issues
- AI prompts were hidden from the user-facing interface entirely
- Custom-designed PDF exports were added so users could share polished brand documents with their teams
- An agency pricing tier was introduced specifically for agencies managing multiple client brand voices, directly inspired by the "new user type" finding from the research
Broader outcomes
The original Notion template has been downloaded over 100 times. The standalone app launched with a clear product direction shaped by the research, and the agency tier validated the strategic finding that emerged from just two participants during testing.
§ 08Reflections
What worked well. The moderated in-person format was the right call. Several of the strongest findings came from moments where I had to intervene, such as when a participant started typing into the AI prompt field, or when another almost deleted a section heading. An unmoderated study would have captured task failure but missed the why. Conducting all sessions in person also meant I could observe body language and hesitation that wouldn't have come through on a remote call.
What I'd do differently. I'd push for a larger and more diverse sample. Six participants from the founder's network gave us consistent usability patterns, but a broader pool would have strengthened the findings and reduced selection bias. I'd also run a short post-session survey to capture quantitative satisfaction data alongside the qualitative observations.
What I learned. The agency use case was the most strategically valuable finding, and it came from just two participants. It reshaped the product's pricing model and market positioning. Sometimes the most useful insight is one you didn't go looking for. I also learned how much the underlying platform shapes the research: many of the usability issues were Notion-specific rather than concept-specific, and separating the two required careful analysis.