Academic Writing AdviceAcademic, Writing, Advice
ServiceScape Incorporated
ServiceScape Incorporated
2021

How to Write a Thematic Analysis

ScienceEditor

Published on
Last Modified on

Thematic analysis is a method of analyzing qualitative data—such as survey responses, interview transcripts, or social media profiles—to identify common themes that come up repeatedly. The data is "coded," or labeled so that similar responses can be grouped together to facilitate further analysis (sticky notes are a must). For example, survey responses about online learning during a pandemic might be coded for "slow internet connection", "frequent interruptions", and "lack of peer interactions."

Thematic analysis is commonly used in psychology research, and is appreciated for its flexibility and ease of use. Despite its widespread use, the steps involved in thematic analysis were not formally described until 2006. Those published guidelines by Braun and Clarke have been widely adopted, and have over 90,000 estimated citations as of January 2021.

The published guidelines for thematic analysis include six steps:

  1. Familiarizing yourself with the data
  2. Generating initial codes
  3. Searching for themes
  4. Reviewing themes
  5. Defining and naming themes
  6. Producing the report

Braun and Clarke emphasize that researchers must make active choices about their methods of analysis, and must clearly describe their choices and methods in their manuscripts. Otherwise, it is difficult to evaluate the research, and to compare it with other studies. Braun and Clarke come out staunchly against the idea that themes "emerge" from data. Rather, they feel that researchers must actively identify patterns and themes, select those of interest, and report them to readers. By doing so, researchers can analyze qualitative data in a deliberate and rigorous way. Minimally, thematic analysis allows you to organize your data set and describe it in rich detail. Thematic analysis can also allow you to better interpret your research topic.

Things to know before getting started

Some definitions

  • Data corpus: All data collected for a particular research project
  • Data set: All data from the corpus that are being used for a particular analysis
  • Data item: An individual piece of data collected (e.g. an individual interview)
  • Data extract: An individual coded chunk of data (e.g. a quote from an interview)

Questions that should be considered before data is analyzed (or even collected)

  • Breadth or depth of coverage? If you aim to provide a description of your entire data set (breadth), some of the nuances within the data will necessarily be lost. However, this can be an effective approach when the topic is now well understood. Alternatively, you can aim to gain a detailed understanding of one or a few specific aspects of your data (depth).
  • Inductive or deductive approach to thematic analysis? Adopting an inductive approach means that you will allow your research question to evolve as you analyze and code the data. A deductive approach (also called a theoretical thematic analysis) involves coding the data with a specific research question in mind.
  • Semantic or latent approach? A semantic approach involves identifying themes based entirely on the words used by the participants (e.g. in a survey, interview, website, etc.). Once the data is coded, organized, and described, the researcher can provide further interpretation. Alternatively, a latent approach allows the researcher to code data items based on what the item reveals about a participant's underlying ideas, patterns, and assumptions.

Let's now consider the six steps of thematic analysis.

Step 1: Familiarizing yourself with your data

Before you can effectively code your data, you must become intimately familiar with it. Immerse yourself in the entire data set by actively reading through it at least once, searching for meanings, possible patterns, etc. Importantly, you should read with an open mind, and not just search for data that support your hypothesis or assumptions. This process of actively reading and re-reading the data is time consuming, but is the foundation for any meaningful interpretation of it.

Verbal data (e.g., interviews, recorded presentations, etc.) need to be transcribed in order to be used for thematic analysis. Transcribing data is an excellent way to become familiar with it. The act of transcription should be considered an interpretive act, especially for spontaneous responses (e.g., an interview compared to a recorded presentation). The simple placement of punctuation marks can dramatically alter the meaning of words (e.g., "Let's eat Bob" is very different from "Let's eat, Bob"). If you are provided with data that has already been transcribed, you should check the transcription against the original recording to ensure its accuracy. Listening to the original recording will also improve your understanding of the data.

As you familiarize yourself with your data, you can start to better focus your research question, and jot down ideas for coding. The process of coding, and improving your codes to better address your research question, will continue through the entire analysis.

Step 2: Generating initial codes

Codes identify a portion of the data of potential interest to the researcher. The relevant text is highlighted and labeled with a "code" that is short and descriptive. For example, survey responses about online learning during a pandemic might be coded for "frequent interruptions" when the text includes a mention of being interrupted by family members, the doorbell, internet lapses, etc.

Coding can be done manually (with highlighters, colored pens, sticky notes, etc.), or on a computer. You need to work systematically through your entire data set, giving equal attention to each data item, whether or not it supports your hypothesis. Add new codes as necessary (e.g. "slow internet connection" and "lack of peer interactions"). It is better to add a code that might not be helpful, than to go back to your data to add it later. An individual section of text might remain uncoded (e.g. if unrelated to your research question about online learning), coded once (e.g. for "frequent interruptions" from family members), or coded multiple times (e.g. for "frequent interruptions" due to "slow internet connection"). As you work through your data, you may find you need to combine or subdivide your codes (e.g. add a new code for "interruption by sibling").

You are summarizing your data with these codes, so be thorough in your work.

Step 3: Searching for themes

Once you have completed the initial coding of your data, you can organize your data into groups. Include the relevant text, not just the code, to facilitate searching for themes. Consider how different codes may contribute to an overarching theme. For example, you may find that the codes "slow internet connection", "financial stress", and "no help with schoolwork" commonly occur together. You might then choose to place these codes within a larger theme of "low-income students face additional challenges with online learning." You might consider adding "financial stress" as a sub-theme. The code "lack of peer interactions" might fit within a larger theme of "mental health stressors." Some codes may occur too infrequently to be included, may be too vague, or may not be relevant to your evolving research question.

This step provides candidate themes that will need to be refined, so don't exclude any data or possibilities just yet. In the next step, reviewing all the data extracts will help you determine how the themes might need to be combined, separated, or discarded.

Step 4: Reviewing themes

Now that you have a set of candidate themes, you need to revise them to accurately reflect your data and your research question. This may involve eliminating themes that are not well supported, creating new themes, combining closely related themes, and dividing overly broad themes into multiple themes.

You will need to review your data at two levels. First, you should review the coded data extracts for each theme, and determine whether they are sufficiently coherent. Problems may occur if the theme itself is problematic, or if some of the data extracts do not fit within their theme. You may need to rework the theme, re-code some data to create finer divisions, or remove a code from a theme.

Next, you need to evaluate whether your themes accurately reflect your entire data set. You should re-read your entire data set to determine whether your themes fit the data set, and to code any relevant data that may have been missed in earlier rounds of coding.

Step 5: Defining and naming themes

Formulate exactly what you mean by each theme, and examine how it helps one understand your data. Devise a concise and informative name for each theme. Take the data extracts for each theme, organize them so that they are easy to follow, and add an accompanying narrative to explain why the data are interesting and how they support the theme.

Step 6: Producing the report

At this point, you should have produced quite a bit of writing as you have moved through steps 1 through 5. You should have your coded data extracts, your themes, and some narrative explaining how your data support your themes. For your final report, you should organize this information according to the guidelines of your graduate program or target journal, and write a compelling story that convinces the reader of the validity of your analysis. Choose examples from your data that clearly and memorably illustrate your main points. Go beyond presenting your data, and make an argument that could influence policy, or serve as the basis for future studies.