Rating Scales in UX Research: The Ultimate Guide

Picture this: you’re designing a new app or website and want to know how users feel about it, but you want a good way to get actionable insights from those feelings, fast. Feedback is vital—and the sooner you get it, the better—and that’s why something as direct and visual as a rating scale comes in mighty handy when you want to get responses in and analyze them for fast and useful data, as you’ll find as you keep reading now.

Table of contents

What is a Rating Scale?
Why Do UX Researchers Use Rating Scales?
Types of Rating Scales: Which One to Use?
What is Bias in Rating Scales, and How to Understand and Mitigate its Impact?
What are Limitations of Rating Scales?
How to Use Complementary Research Techniques in UX Research
How to Plan and Create Rating Scales
What Are Some Examples of Quantitative Surveys?
How to Analyze and Interpret Rating Scale Data
The Take Away
References and Where to Learn More

What is a Rating Scale?

Visual representation of a 1-5 rating scale — © Interaction Design Foundation, CC BY-SA 4.0

In UX (user experience) design—and UX research in particular—a rating scale that runs from poor to excellent can help you understand user opinions and preferences in a way that’s faster to access than, for example, more detailed qualitative research feedback with. It’s not that detailed qualitative user data isn’t useful—for it certainly is a vital asset to get hold of! It’s just that rating scales—when a user researcher knows how to tap their potential for precious insights—offer one handy way to decipher user satisfaction, identify pain points, and uncover opportunities for improvement in product design, product development, service design, and more.

We’re all bound to have run into a rating scale in some way, shape, or form at some point, but we’ll get a “dictionary definition” approach in place right away, not least as we want to get a UX angle on things. A rating scale is, quite simply, a tool that researchers use to measure and assess different qualities, characteristics, or performance ratings. What you do is assign a numerical value—and it’s usual for it to run in the range of 1 to 5—so the rating scale helps people express their opinions, judgments, or preferences about a particular subject or object in a structured way.

You’ve got three main reasons as to why rating scales are useful, and—for one thing—they’re valuable tools to quantify opinions with, as they do play an important role in processes like design thinking since individual respondents can quantify their thoughts, feelings, or experiences about something that researchers are interested in knowing. Some rating scales may just have two “settings” (the binary-response kind), but then others have numbers for more fine-tuned replies—and that’s helpful because instead of using vague terms like “good” or “bad,” people can mark a more precise evaluation with a number on a scale of 1 to whatever.

Watch as UX Strategist and Consultant, William Hudson explains important points about quantitative research in data-driven design.

Show Hide video transcript

00:00:00 --> 00:00:32

This is a very typical project lifecycle in high-level terms. Generally start off with *requirements* – finding out what's needed, and we go off and talk to stakeholders. And one of the problems we have with *user requirements*, in particular, is that often analysts and requirements researchers in the IT world tend to go off and want to ask *users* what they want.
00:00:32 --> 00:01:02

They don't really understand that users don't quite know what they want, that you actually need to do user research, and that is one of the biggest issues that we face in user experience: is the lack of understanding of user research and the whole field of user experience. From requirements, we might expect to be doing surveys to find out – particularly if we have an existing offering of some kind – we might find out what's good about it, what's not so good about it,
00:01:02 --> 00:01:31

what people would like to do with it. And surveys might be helpful in those particular areas. Now, bear in mind that generally when we're talking about surveys, we already need to have some idea of the questions and the kinds of answers people are going to give us. It is really a very bad plan to launch a large survey without doing some early research on that, doing some qualitative research on how people think about these questions and these topics
00:01:31 --> 00:02:00

and trying to understand it a little bit better before we launch a major initiative in terms of survey research. We can also use surveys in *analysis and design* perhaps to ask people which kinds of things might work better for their particular needs and behaviors. We also can start to employ *early-design testing*, even in the analysis and design phase so that we've got perhaps some wireframes that we're thinking about on the *design* side,
00:02:00 --> 00:02:30

and we can start to *test* them – start to try to find out: "Will people understand this? Will they be able to perform the most important tasks from perspective?" I have been involved in user testing of new product ideas where users had *no idea* what the service being offered was about because it was just presented *so confusingly*; there was no clear message; there was no clear understanding of the concepts behind the message because it wasn't very clear to start with, and so on.
00:02:30 --> 00:03:00

So, early-design testing really has an important role to play there. *Implementation* and *testing* – that's when we can start doing a lot more in terms of evaluating what's going on with our products. There we would employ *usability evaluations*. And the things that I've called "early-design testing", by the way, can be done later on too. It's just they don't really involve the finished product. So, they're perhaps not quite as relevant. But if we've got questions about how the navigation might be changed,
00:03:00 --> 00:03:32

then we might fall back to the tree testing where we're just showing people the navigation hierarchy rather than the whole site and asking them to perform tasks and just tweak the navigation as required to improve that. And one of my big complaints with our whole industry – still, after all these decades! – is that we do tend only to be allowed to do usability evaluations, and we do tend to wait until implementation has taken place
00:03:32 --> 00:04:02

and the product is being tested before we start to try to involve real users, which really is far too late in the whole process. If you want to be able to be confident in the concepts and the terminology that your interactive solution is providing to your users and customers, then that needs to start way back at the beginning of the project cycle. And then, finally, once we've got live solutions available,
00:04:02 --> 00:04:30

we can use *analytics* for websites and apps and we can also use A/B and multivariate testing to make sure that our designs are optimal. If we find problems, we might set up an A/B experiment to see whether this particular alternative would be a better solution or we could go down the multivariate route where we provide permutations of a *number* of different design elements on a particular page and see which of those elements proved to be the most effective.
00:04:30 --> 00:05:00

The fact that if you're doing project development, software development in an iterative environment – like agile, for example – then you might be doing a little bit of this in every single iteration; so, there might be a little bit of work on the requirements at the front and there might be a little bit of design and analysis. Having said that, there is usually some upfront requirements and analysis and design that has to go on so that you know what *shape* your project is
00:05:00 --> 00:05:30

– what *shape and size* I think is perhaps a better or more complete description – because in order for you to be able to even guess at how long this is going to take you, you need to have *scoped* it. And to scope it means to set the boundaries, and to set the boundaries means to understand the requirements and to understand what kind of solutions would be acceptable; so, there will be some of this done always up front. Anybody who sets on a major project *without* doing upfront requirements analysis and design of some sort
00:05:30 --> 00:05:34

is – I'm afraid – probably asking for trouble.

For another thing, rating scales facilitate comparison—as in, scales of the same type mean you can compare and analyze data; for example, as anyone who’s ever shopped on Amazon can attest to, a higher rating suggests a better product in product reviews, and ratings help consumers make informed choices about their purchases.

Last—but not least—they’re to collect data, and various fields, such as market research, customer feedback, education, and healthcare use rating scales to gather data in, and then data analysis techniques can help to conclude, make improvements, or inform decisions. From the most individual-level rating of how painful a wound or disease is in hospital to the collected results for a manufacturer’s quality assurance department to monitor a new product’s performance, rating scales make sense for a variety of good reasons.

Watch as Professor of Human Computer Interaction, Ann Blandford explains important points about data collection:

Show Hide video transcript

00:00:00 --> 00:00:32

Ditte Hvas Mortensen: In relation to data gathering, there are obviously different ways of doing it. You can record video or sound, or you can take notes. Can you say something about the advantages or disadvantages of doing it in different ways? Ann Blandford: Yes. So, I think it depends on how the data-gathering method is going to affect what
00:00:32 --> 00:01:00

data you can gather. So, sometimes people are not comfortable being recorded. And they don't *want* to be voice-recorded. And you'll get more out of the conversation if you just take notes. Of course, you don't get quite such high-quality data if you just take notes. On the other hand, it's easier to analyze because you haven't got so much data.
00:01:00 --> 00:01:31

And you can't do as much in-depth analysis if you've only got notes, because you can only analyze what you recognized at the time as being important, and you can't pick up anything more from it later. So, I certainly like to audio-record where possible for the kinds of studies that we do. And different people may have different needs, and therefore that might be more or less important to them.
00:01:31 --> 00:02:02

We also use quite a lot of still photos, particularly in healthcare. We have to have quite a lot of control over what actually features in an image so that it doesn't violate people's privacy. So, using still photos allows us to take photos of technology and make sure that it doesn't include any inappropriate information. Whereas video – well, firstly, video means that you've got a *lot* more data to analyze.
00:02:02 --> 00:02:33

And it can be a lot harder to analyze it. And it depends on the question that you're asking in the study, as to whether or not that effort is merited. And for a lot of us, it's not merited, but also it's harder to control what data is recorded. So, it's more likely to compromise people's privacy in ways that we haven't got ethical clearance for. So, we don't use a lot of video ourselves.
00:02:33 --> 00:03:01

But also, particularly if one is trying to understand the work situation, it's often also valuable to take *real notes*, whether those are diagrams of how things are laid out or other notes about, you know, important features of the context that wouldn't be recorded in an audio stream. And also, video can be quite *off-putting* for people.
00:03:01 --> 00:03:30

You know, it's just that much more intrusive. And people may become much more self-conscious with a video than with audio only. So, it can affect the quality of the data that you get for that reason. So, I think when you're choosing your data-gathering *tools*, you need to think about what impact they will have in the environment.
00:03:30 --> 00:04:00

It may or may not be *practical* to set up a video camera, quite apart from anything else. Audio tends not to be so intrusive. As I say, there are times when just written notes will actually serve the purpose better. But it also depends on what you're going to *do* with the data. You know – how much data do you need? What kinds of analysis are your going to do of that data? And hence, what *depth of data* do you actually need to have access to, anyway?
00:04:00 --> 00:04:35

If you've got more data than you can deal with, then it can feel overwhelming, and that can actually be quite a deterrent to get on with analysis. And analysis can be really slowed down if, as a student or other researcher, you just feel so overwhelmed by what you've got that you don't know where to start! Actually, that's not a good place to be. So, having too much data can often be as difficult as not having enough.
00:04:35 --> 00:04:38

But what matters most is that you've got an *appropriate* kind of data for the questions of the study.

Why Do UX Researchers Use Rating Scales?

In user experience (UX) research, rating scales are valuable tools for collecting data and gauging user sentiment, and they can be a handy way to get, for example, a net promoter score, due to several reasons:

1. Simplicity and Ease of Use

Rating scales—especially ones using a 1-5 range—offer simplicity in data collection, and participants can comprehend and respond to questions by selecting a number on the scale that best represents their experience.

2. Quantitative Data Collection

One significant advantage of using rating scales in UX research is researchers can gather quantitative data with them, and since each rating corresponds to a numeric value, researchers can quantify user experiences.

Watch as Author and Expert in Human-Computer Interaction (HCI) Professor Alan Dix explains important points about quantitative research:

Show Hide video transcript

00:00:00 --> 00:00:32

Ah, well – it's a lovely day here in Tiree. I'm looking out the window again. But how do we know it's a lovely day? Well, I could – I won't turn the camera around to show you, because I'll probably never get it pointing back again. But I can tell you the Sun's shining. It's a blue sky. I could go and measure the temperature. It's probably not that warm, because it's not early in the year. But there's a number of metrics or measures I could use. Or perhaps I should go out and talk to people and see if there's people sitting out and saying how lovely it is
00:00:32 --> 00:01:01

or if they're all huddled inside. Now, for me, this sunny day seems like a good day. But last week, it was the Tiree Wave Classic. And there were people windsurfing. The best day for them was not a sunny day. It was actually quite a dull day, quite a cold day. But it was the day with the best wind. They didn't care about the Sun; they cared about the wind. So, if I'd asked them, I might have gotten a very different answer than if I'd asked a different visitor to the island
00:01:01 --> 00:01:31

or if you'd asked me about it. And it can be almost a conflict between people within HCI. It's between those who are more *quantitative*. So, when I was talking about the sunny day, I could go and measure the temperature. I could measure the wind speed if I was a surfer – a whole lot of *numbers* about it – as opposed to those who want to take a more *qualitative* approach. So, instead of measuring the temperature, those are the people who'd want to talk to people to find out more about what *it means* to be a good day.
00:01:31 --> 00:02:02

And we could do the same for an interface. I can look at a phone and say, "Okay, how long did it take me to make a phone call?" Or I could ask somebody whether they're happy with it: What does the phone make them feel about? – different kinds of questions to ask. Also, you might ask those questions – and you can ask this in both a qualitative and quantitative way – in a sealed setting. You might take somebody into a room, give them perhaps a new interface to play with. You might – so, take the computer, give them a set of tasks to do and see how long they take to do it. Or what you might do is go out and watch
00:02:02 --> 00:02:30

people in their real lives using some piece of – it might be existing software; it might be new software, or just actually observing how they do things. There's a bit of overlap here – I should have mentioned at the beginning – between *evaluation techniques* and *empirical studies*. And you might do empirical studies very, very early on. And they share a lot of features with evaluation. They're much more likely to be wild studies. And there are advantages to each. In a laboratory situation, when you've brought people in,
00:02:30 --> 00:03:00

you can control what they're doing, you can guide them in particular ways. However, that tends to make it both more – shall we say – *robust* that you know what's going on but less about the real situation. In the real world, it's what people often call "ecologically valid" – it's about what they *really* are up to. But it is much less controlled, harder to measure – all sorts of things. Very often – I mean, it's rare or it's rarer to find more quantitative in-the-wild studies, but you can find both.
00:03:00 --> 00:03:34

You can both go out and perhaps do a measure of people outside. You might – you know – well, go out on a sunny day and see how many people are smiling. Count the number of smiling people each day and use that as your measure – a very quantitative measure that's in the wild. More often, you might in the wild just go and ask people. It's a more qualitative thing. Similarly, in the lab, you might do a quantitative thing – some sort of measurement – or you might ask something more qualitative – more open-ended. Particularly quantitative and qualitative methods,
00:03:34 --> 00:04:01

which are often seen as very, very different, and people will tend to focus on one *or* the other. *Personally*, I find that they fit together. *Quantitative* methods tend to tell me whether something happens and how common it is to happen, whether it's something I actually expect to see in practice commonly. *Qualitative* methods – the ones which are more about asking people open-ended questions – either to both tell me *new* things that I didn't think about before,
00:04:01 --> 00:04:32

but also give me the *why* answers if I'm trying to understand *why* it is I'm seeing a phenomenon. So, the quantitative things – the measurements – say, "Yeah, there's something happening. People are finding this feature difficult." The qualitative thing helps me understand what it is about it that's difficult and helps me to solve it. So, I find they give you *complementary things* – they work together. The other thing you have to think about when choosing methods is about *what's appropriate for the particular situation*. And these things don't always work.
00:04:32 --> 00:05:00

Sometimes, you can't do an in-the-wild experiment. If it's about, for instance, systems for people in outer space, you're going to have to do it in a laboratory. You're not going to go up there and experiment while people are flying around the planet. So, sometimes you can't do one thing or the other. It doesn't make sense. Similarly, with users – if you're designing something for chief executives of Fortune 100 companies, you're not going to get 20 of them in a room and do a user study with them.
00:05:00 --> 00:05:07

That's not practical. So, you have to understand what's practical, what's reasonable and choose your methods accordingly.

3. Versatility in Question Types

Rating scales can adapt to various types of questions, and researchers can use them to assess satisfaction, usability, likelihood to recommend, and more—so they’re a versatile tool that’s useful for addressing a wide array of research questions.

4. Comparative Analysis

Rating scales make it easier to do comparative analysis, and researchers can examine average ratings to compare different user interface aspects, features, or products.

Types of Rating Scales: Which One to Use?

Whenever you’re embarking on user research, it’s important to pick the best performance rating scale—and there is such a thing as the right tool for the specific task at hand, hence why it’s good to understand the nuances of different rating scales and how factors such as your research objectives and the type of data you need will help you work out which one best fits your specific context and research goals.

5 types of rating scales — © Interaction Design Foundation, CC BY-SA 4.0

1. Binary Rating Scale

One of the most straightforward rating scales, what it’s made up of is a simple “yes” or “no” as the response, and binary rating scales are most appropriate whenever you need a concise, “yes/no” answer to straightforward questions—like to verify attendance (at a concert, say), obtain consent, or determine agreement or disagreement with a basic statement.

On the plus side, they’re straightforward for respondents to understand (“yes” and “no” leave no room for ambiguity)—and use; they’re suitable for rapid data collection and analysis (and useful in time-sensitive situations when the pressure is on). On the downside, though, they can lack detailed information and may not capture subtle differences in opinions.

2. Likert Scale

The Likert scale is one you may be familiar with, given it sees a lot of use—and it’s a performance rating scale from 1-5 or 1-7 that gives respondents various options, with it being typical for the responses to range from “strongly disagree” to “strongly agree.”

On Likerts, people pick the option that most matches their level of agreement or disagreement with a specific statement or question, and so they’re ideal for when researchers need to measure the intensity of agreement or disagreement on a particular issue, and so are good to assess respondents’ opinions or attitudes with a high level of detail.

Likert scales have many strong points, like in how they enable respondents to provide nuanced feedback—and so they can capture a range of opinions and attitudes—and they’re versatile tools that turn up in research and surveys across diverse fields. What’s more, the higher level of detail—than, say, binary scales—can provide more fine-tuned valuable insights to help with decision-making in design and beyond. On the downside, however, they’re scales that can be susceptible to response bias, and interpretation may vary among individuals.

3. Semantic Differential Scale

The semantic differential scale is a performance appraisal rating scale that, like the Likert, runs in range from 1-5 or 1-7, but it presents respondents with pairs of opposing adjectives or adjectival phrases (e.g., “good” vs. “bad” or “efficient” vs. “inefficient”). What you do is ask respondents to rate an item—or a concept—by picking a point between two opposing descriptors on a continuum, and so they’re scales that are at their best when you’re after the emotional or qualitative aspect of a concept, product, brand, or service.

To be sure, on the plus side, this scale offers a clear contrast for evaluation by using opposing adjectives, it makes it easier for respondents to mark their sentiments, and it helps in understanding the emotional or qualitative aspect of a concept (handy for brand perception or product evaluation). It doesn’t stop there, though, because the scale provides a structured format for collecting qualitative data in a more standardized way. On the downside, though, these scales are limited to bipolar concepts and mightn’t suit all situations, and they may not capture subtleties in opinions—like, for instance, “(I find the onboarding of this product:) Somewhat exciting” might be of limited use.

4. Numerical Rating Scale

Just like it sounds, a numerical rating scale assigns a numerical value to rate an item—or concept—and respondents rate items on a scale within a specified range like from 1 to 10. It’s best to go for these when you need precise quantitative data for analysis and comparison—like when you’re evaluating attributes or features in a numerical way and want to pinpoint responses with more granularity.

On that note, numerical rating scales provide a finer grain of detail without the limitations of specific labels; they offer flexibility in choosing the scale range and allow some customization, and you can perform calculations or compare data on numerical scales, a nifty plus that makes them practical for research and evaluation purposes.

As with the other scales, there are downsides, and here it’s that interpretation may differ among respondents, and then there’s the point that it may not be as intuitive as other scales for some individuals.

5. Visual Analog Scale (VAS)

The visual analog scale (VAS) calls for respondents to mark a point along a continuous line to indicate what their response to a specific question or statement is, and researchers use this performance appraisal rating scale to measure a particular attribute’s intensity or preference. You might want a visual analog scale when you need to get a high level of precision in capturing the degree of a particular attribute—such as pain levels, satisfaction, or preferences.

On the plus side, the VAS offers a visual representation of intensity or preference—a neat point that makes it easier for respondents to express their feelings—and it provides continuous data and allows for more detailed analysis and interpretation. What’s more, it allows for fine-grained measurement of attributes, a nifty plus that makes it suitable for capturing subtle differences in responses. On the downside, though, a VAS calls for more effort to implement, and interpretation may vary due to the lack of fixed categories (and we’ll see more about interpretation about scales a little later, too).

What is Bias in Rating Scales, and How to Understand and Mitigate its Impact?

Like all research methods, rating scales aren’t immune to biases—not least since we’re dealing with humans in the real world, so it’s vital to recognize and understand these biases to make sure that there’s sound validity and reliability in research findings.

1. Recency Bias

This bias occurs when respondents give more weight to recent events or experiences than earlier ones—like if they see the last item in a series and it “sticks” in their minds more.

Another example might be if a user faced a minor glitch in a software application just before filling out a survey, they might rate the overall experience as negative due to the recent frustration.

To mitigate recency bias, it’s a good idea to hold feedback sessions or conduct them at various points during the user experience to capture a holistic view rather than wait till the very end.

2. Primacy Bias

This one “jumps” to the other end of the timeline from recency bias in that individuals tend to recall—and give more importance to—items that come at the beginning of a list or sequence. So, it can happen that if users have got a list of product features to rate, they might give high ratings to the first ones and not care for the ones that come later.

To mitigate primacy bias, you can rotate the order of questions or features presented to users.

3. Halo/Horns Effect Bias

The Halo effect is where a positive impression in one area influences impressions in other areas, and the Horns effect is the opposite. This can crop up in ratings if a visually stunning website makes users overlook usability issues—the Halo side—while one poor feature could make users rate all the other features negatively (the Horns).

How to mitigate Halo/Horns bias, you can ask for specific feedback on distinct features to prevent generalization.

4. Centrality/Central Tendency Bias

This one means respondents take the “middle of the road”—avoiding using extreme response options and sticking instead to the middle or neutral options. On a scale with an odd number of responses (like five), users tend to pick the middle the most, regardless of how they feel, so they mark “3” on a “5” scale.

To mitigate centrality/central tendency bias, give clear and distinct descriptions for each point on the scale, plus you might want to use an even-numbered scale to force a choice.

5. Leniency Bias

Some raters are overly generous in their ratings—they’re too nice! It happens, for instance, if a tester always gives a maximum score in a product test, even if there are evident flaws.

To mitigate leniency bias, combine quantitative scales with open-ended questions to gather context from the provided ratings; they’ll have to think about backing up their praise with truths, then.

6. Similar-to-me Bias

Just as it sounds, raters favor those who are similar to them or share similar views and experiences—like when a tester prefers a product that someone of the same age group, background, or views designed.

To mitigate similar-to-me bias, make sure you’ve got diversity in research panels and consider blinding evaluators to certain demographic information; you’re after a good swath of feedback from a wide representative group.

7. Confirmation Bias

This one you may have heard of as it crops up in many other areas, and raters seek out and prioritize information that confirms their pre-existing beliefs. It can go either way—positive or negative—because if a user enters into a rating system with a “preset mindset,” they can end up looking for qualities to back up what they already believe, good or bad, about a brand, for instance.

To mitgate confirmation bias, it’s best to frame questions in a neutral way and not use any leading questions—and incorporate diverse methods of data collection to help balance things out, too.

8. Law of Small Numbers Bias

Evaluators can—for a variety of reasons—believe that small sample sizes are just as representative of the population as large ones are, and it can be shaky ground to try and base foundations on when there’s a false sense of security—or insecurity. For instance, if you’ve heard nothing but high praise from a tiny group of respondents, there’s a risk that you may have just made a “lucky” hit on a bunch of positive reviewers (assuming they’re responding in earnest, that is!). Despite the temptation to go with it, you can’t draw a firm conclusion, for example, about a product’s popularity from what one small group of respondents declared about it—far better to reach more respondents to raise the chances of a more “representative” response group.

To mitigate small group bias, educate stakeholders about how important sample size is, including how results from small groups can be inaccurate whether it’s good or bad “news,” and make sure there are adequate sample sizes in research for more accurate results.

What are Limitations of Rating Scales?

1. Lack of Depth

Rating scales provide numeric values but don’t have the depth of qualitative insights, and because they don’t—or can’t—uncover that vital “why” behind a user’s rating, it’s sometimes best to supplement them with other response means to cover the “why”s.

2. Subjectivity

Satisfaction can be a hard thing to “corral” into a cut-and-dried stratification system of numbered responses and the like, and—sure—user rating scales for satisfaction can be subjective matters. People, being human, are different, and—for instance—two users may assign the same rating to an aspect of the UX, but their underlying experiences and expectations may be quite different behind it.

3. Limited Context

Rating scales offer a snapshot of a user’s experience at a specific moment, but they might miss the mark from not capturing the entirety of a user’s journey or the context in which they interact with a product; the holistic experience is valuable to gauge, and the magic of a seamless experience is more than the sum of the parts, anyway.

4. Scale Interpretation

As the pros and cons of the various scales indicated, interpretation is a “biggie” of a matter and can vary a great deal from person to person—like your “4 out of 5” might indicate a positive experience (i.e. a “strong 4” but without you having “strong” to qualify that), while for others, it might mean average (a “weak 4”).

How to Use Complementary Research Techniques in UX Research

1. Usability Testing

You do usability testing by observing users as they interact with a product—and it’s a method that provides real-time insights into how users navigate through an interface.

Watch as UX Strategist and Consultant, William Hudson explains important points about usability testing:

Show Hide video transcript

00:00:00 --> 00:00:32

If you just focus on the evaluation activity typically with usability testing, you're actually doing *nothing* to improve the usability of your process. You are still creating bad designs. And just filtering them out is going to be fantastically wasteful in terms of the amount of effort. So, you know, if you think about it as a production line, we have that manufacturing analogy and talk about screws. If you decide that your products aren't really good enough
00:00:32 --> 00:01:02

for whatever reason – they're not consistent or they break easily or any number of potential problems – and all you do to *improve* the quality of your product is to up the quality checking at the end of the assembly line, then guess what? You just end up with a lot of waste because you're still producing a large number of faulty screws. And if you do nothing to improve the actual process in the manufacturing of the screws, then just tightening the evaluation process
00:01:02 --> 00:01:17

– raising the hurdle, effectively – is really not the way to go. Usability evaluations are a *very* important tool. Usability testing, in particular, is a very important tool in our toolbox. But really it cannot be the only one.

2. User Interviews

User interviews involve one-on-one conversations with participants, and with them you’re able to dig deeper into user experiences, motivations, and preferences—in fact, it’s great to combine this research technique with quantitative questionnaires for comprehensive results that speak more to how users find your product, service, or what have you in earnest terms.

Watch as Ann Blandford explains important points about user interviews:

Show Hide video transcript

00:00:00 --> 00:00:35

So, semi-structured interviews – well, any interview, semi-structured or not, gets at people's perceptions, their values, their experiences as they see it, their explanations about why they do the things that they do, why they hold the attitudes that they do. And so, they're really good at getting at the *why* of what people do,
00:00:35 --> 00:01:02

but not the *what* of what people do. That's much better addressed with *observations* or *combined methods* such as contextual inquiry where you both observe people working and also interview them, perhaps in an interleaved way about why they're doing the things that they're doing or getting them to explain more about how things work and what they're trying to achieve.
00:01:02 --> 00:01:32

So, what are they *not* good for? Well, they're not good for the kinds of questions where people have difficulty recalling or where people might have some strong motivation for saying something that perhaps isn't accurate. I think of those two concerns, the first is probably the bigger in HCI
00:01:32 --> 00:02:00

– that... where things are unremarkable, people are often *not aware* of what they do; they have a lot of *tacit knowledge*. If you ask somebody how long something took, what you'll get is their *subjective impression* of that, which probably bears very little relation to the actual time something took, for example. I certainly remember doing a set of interviews some years ago
00:02:00 --> 00:02:32

where we were asking people about how they performed a task. And they told us that it was like a three- or four-step task. And then, when we got them to show us how they did it, it actually had about 20, 25 steps to it. And the rest of the steps they just completely took for granted; you know – they were: 'Of course we do that! Of course we—' – you know – 'Of course that's the way it works! Of course we have to turn it on!' And they just took that so much for granted that *it would never have come out in an interview*.
00:02:32 --> 00:03:11

I mean, I literally can't imagine the interview that would really have got that full task sequence. And there are lots of things that people do or things that they assume that the interviewer knows about, that they just won't say and won't express at all. So, interviews are not good for those things; you really need to *observe* people to get that kind of data. So, it's good to be aware of what interviews are good for and also what they're less well-suited for. That's another good example of a kind of question that people are really bad at answering,
00:03:11 --> 00:03:31

not because they're intentionally deceiving usually, but because we're *not* very good at *anticipating what we might do in the future*, or indeed our *attitudes to future products*, unless you can give somebody a very faithful kind of mock-up
00:03:31 --> 00:03:56

and help them to really imagine the scenario in which they might use it. And then you might get slightly more reliable information. But that's not information I would ever really rely on, which is why *anticipating future product design is such a challenge* and interviewing isn't the best way of getting that information.

3. Heatmaps and Click Tracking

Heatmaps and click-tracking tools visualize user interactions with a website or application—and they’re handy tools that visually represent where users click, hover, or spend the most time, which you can run alongside what the users themselves reveal in ratings about their reaction to your, for example, website.

4. A/B Testing

A/B testing is when you compare two versions of a product or interface, to work out which one performs better in the “department” of user engagement, conversions, or other key metrics—as in, one test group tests out design “A” while the other gets design “B” to test.

How to Plan and Create Rating Scales

12 steps to plan and create rating scales. — © Interaction Design Foundation, CC BY-SA 4.0

Define your research objectives—and start by outlining your research objectives so you’ve got clear goals to work to, ones that can guide your rating scale’s design and content.
Choose the type of rating scale—and make sure it’s a suitable type—for instance, a Likert, semantic differential, or numeric—based on what your research goals are like.
Determine the number of response options—and decide on the number of answer options—be it, for instance, 3, 5, 7, or 9 points—that align with your data needs.
Define the anchors or labels—and it’s vital to make unambiguous labels that represent the full range of possible responses; ambiguity is a major enemy to watch out for on scales.
Consider the response order—and keep a logical order of answer options—either negative to positive or vice-versa, but always be consistent in either case.
Pilot test the scale—and do it with a small group so you can spot and address issues nice and early on.
Ensure balanced response options—and be sure to include an equal number of positive and negative answer options, as it will help avoid bias because too many positives can “angle” things one way and too many negatives can “spin” matters in the other direction.
Consider using a neutral option—so respondents can have a neutral choice (e.g., “Neither Agree nor Disagree”) when they need to, as they may truly feel “in the middle” on something—which is fine and something to respect.
Include an opt-out response—and use “Not Applicable” for questions that may not apply to everyone, and it’s a little point that will help respondents respect what they’re handling more when they notice you “know” them that bit better.
Provide clear instructions—as in, explain how respondents should use the scale and put things to them in no uncertain terms.
Test for consistency and reliability—so you work out how the scale is in terms of its internal consistency using methods like Cronbach’s alpha.
Analyze and interpret data—is last, but not least here, so you analyze the collected data using appropriate statistical techniques, and you do it well enough that you can draw meaningful conclusions from data that you know is valuable.

Get your free template for “Research Plan Checklist”

Secure form

Name Please provide your name.

We respect your privacy

Please provide a valid email address.

Get free UX design learning material every week

315,748 designers enjoy our newsletter—sure you don't want to receive it?

What Are Some Examples of Quantitative Surveys?

You present a statement, and respondents choose from five options: strongly agree, agree, neither agree nor disagree, disagree, and strongly disagree—and this Likert scale captures the direction and intensity of feelings for consistent and easy-to-analyze feedback.

1-5 Likert scale assessing agreement on organizational investment in employee tech updates. — © Questionpro, Fair Use

This survey concerns a post-meeting experience on a digital platform, Hangouts Meet, and respondents indicate how satisfied they are using a five-point scale to help give user experience insights that are crucial for platform improvements—and the options range enough that users get to indicate a good deal of depth of their sentiments about the platform’s performance.
© Hotjar, Fair Use
This satisfaction scale survey focuses on post-purchase satisfaction, where respondents use a 1-5 scale to express satisfaction level with a recent purchase—a neat way to give businesses immediate feedback on their purchasing experience, and the subsequent question asks them for qualitative feedback so responders can add context as to why they gave the number they did—with a “3” perhaps being: “It was OK, about what I expected.”

Purchase satisfaction rating prompt with a feedback input box. — © Hotjar, Fair Use

Here’s a matrix-style feedback scale survey where respondents can rate different aspects of a product on a scale from 1 to 5—where they evaluate “Product Analytics,” “User Engagement Experiences,” and “User Feedback Tools,” and the scale ranges from “very dissatisfied” (being a “1”) to “very satisfied” (a “5”). This kind of survey format enables businesses to get granular feedback in on multiple components of their product—or service—in a structured way, although it’s good to note that matrices can be taxing for respondents and may lead to abandonment, on mobile devices in particular.

Survey rating products on a scale of 1 to 5. — © Userpilot, Fair Use

How to Analyze and Interpret Rating Scale Data

1. Understand the Basics of the Scale

Whenever you’re analyzing data from a scale, some things that are common to look at are the mean (average), median (middle value), and mode (most frequently occurring value)—and they’re metrics that offer a bird’s-eye view of the data distribution and general sentiment. For instance, a mean value of 4.5 might suggest that consumers have, in the main, a positive attitude towards a product or service.

2. Consider Directionality

Always consider the directionality of the scale. In most cases, 1 signifies disagreement or dissatisfaction, and 5 signifies agreement or satisfaction. It’s crucial to interpret data with this in mind, understanding the nuances of each number. Be sure to check that your survey tool is applying values in the order you expect.

3. Examine Central Tendency and Variability

The central tendency gives a general idea of the dataset’s center, but variability shows how values are spread out—so, that means a high variability indicates diverse opinions, while a low variability suggests there’s a consensus.

4. Identify Response Patterns

Analyze patterns and it’ll help you understand trends and commonalities—for instance, if most participants rate a service with a “4” or a “5,” then it indicates high satisfaction levels (great!), but—on the other hand—if there are a whole tangle of dispersed ratings there, it’s going to suggest mixed feelings.

5. Make Comparative Analyses

Compare data over time or against different datasets and you can draw conclusions on changing attitudes or opinions, and they’re ultra-handy analyses that can help you spot what’s working and what’s not—and so what needs improvement and can then lead on to more user-centered design decisions.

6. Do Filtering and Cross-tabulation

Filtering is helpful as it focuses on specific groups—like responses from a particular age group or gender—and that shines light and provides insights for you into specific segments of the population. Cross-tabulation, meanwhile, compares two or more datasets so you get to understand the relationships between them, and it’s something that can reveal how different groups perceive an issue in relation to others.

7. Have Visual Representations

Visuals—such as bar graphs, pie charts, or histograms—can make data more digestible, and they do offer a quick way for you to understand the essence of the findings right there.

8. Get Beyond Numbers: Insights and Stories

It’s not enough to present numbers at face value, and you’ll find the real value lies in interpreting these numbers, identifying problems, offering solutions, and sharing stories that the data tells. You can see this in that, for example, instead of merely stating that 70% of respondents rated a product as 5, it’s imperative to understand why they loved it and how it stood out to them as a “wow” item.

Rating scales can provide you with valuable quantitative insights, sure enough, but for a holistic understanding, you can complement them with qualitative methods like open-ended questions—and you’ll find that combining numbers, patterns, and stories makes for impactful conclusions that can drive improvements and strategic decisions.

The Take Away

Rating scales in UX research have emerged as a handy and even primary method for gathering quantitative data from users—and that’s because of their simplicity and effectiveness—though depending on their design they can have qualitative aspects. Scales offer a clear distinction between user experiences, and they allow designers and developers to identify areas of strength and weakness in their product or service.

Rating scales are straightforward for participants to understand and use, and well-designed scales increase the accuracy of responses and minimize confusion. What’s more, while compact, scales offer enough gradation to capture varying degrees of satisfaction—and ranged scales provide more nuanced insights than a binary system. Speaking of satisfaction, rating scales establish clear benchmarks for user satisfaction and teams can measure the impact of changes and improvements over time. Last—but not least—you’ll need to collate, visualize, and interpret the data you get in, so you’ve got more solid components on which to help guide more informed decision-making in the design process.

References and Where to Learn More

Learn about the best practices of qualitative user research

Dig into UX design

Grab a notepad and start learning about UX research

Have your thinking cap handy? Read more about Design thinking here

Get your master bundle of 17 “User Research” templates

Download template bundle

Rating Scales in UX Research: The Ultimate Guide

What is a Rating Scale?

Why Do UX Researchers Use Rating Scales?

1. Simplicity and Ease of Use

2. Quantitative Data Collection

3. Versatility in Question Types

4. Comparative Analysis

Types of Rating Scales: Which One to Use?

1. Binary Rating Scale

2. Likert Scale

3. Semantic Differential Scale

4. Numerical Rating Scale

5. Visual Analog Scale (VAS)

What is Bias in Rating Scales, and How to Understand and Mitigate its Impact?

1. Recency Bias

2. Primacy Bias

3. Halo/Horns Effect Bias

4. Centrality/Central Tendency Bias

5. Leniency Bias

6. Similar-to-me Bias

7. Confirmation Bias

8. Law of Small Numbers Bias

What are Limitations of Rating Scales?

1. Lack of Depth

2. Subjectivity

3. Limited Context

4. Scale Interpretation

How to Use Complementary Research Techniques in UX Research

1. Usability Testing

2. User Interviews

3. Heatmaps and Click Tracking

4. A/B Testing

How to Plan and Create Rating Scales

What Are Some Examples of Quantitative Surveys?

How to Analyze and Interpret Rating Scale Data

1. Understand the Basics of the Scale

2. Consider Directionality

3. Examine Central Tendency and Variability

4. Identify Response Patterns

5. Make Comparative Analyses

6. Do Filtering and Cross-tabulation

7. Have Visual Representations

8. Get Beyond Numbers: Insights and Stories

The Take Away

References and Where to Learn More

User Research – Methods and Best Practices

Get Weekly Design Insights

Topics in This Article

What You Should Read Next

A Simple Introduction to Lean UX

How to Do a Thematic Analysis of User Interviews

How to Conduct User Interviews

7 Great, Tried and Tested UX Research Techniques

User Research: What It Is and Why You Should Do It

UX Roles: The Ultimate Guide – Who Does What and Which One You Should Go For?

Data Analysis: Techniques, Tools, and Processes

20 People in UX that You Have to Follow on Twitter

How to Conduct User Observations

How to Design an Information Visualization

Open Access—Link to us!

Popular related searches

Filters

Privacy Settings