Main

Humans have an innate tendency to compare themselves to others1,2. Social comparison—thinking about social information in relation to the self3—is ubiquitous in human cognition4. Comparing oneself to others serves self-motives and is involved in various social behaviours and coping processes5,6,7,8,9. Accordingly, social comparison exerts a fundamental influence on people’s private, public and collective behaviour7.

Social comparison is a process that involves selecting a social standard (for example, another individual), evaluating (dis-)similarities between the self and the standard on a particular dimension (for example, energy usage), and reacting to the comparison outcome (for example, behavioural adaptation or maintenance)5. Social comparison standards can be perceived as superior (upward), similar (lateral) or inferior (downward) to the self. Various intra-personal factors such as self-motives10, cognitive resources, perceived malleability of the dimension or self-efficacy beliefs11 can affect the cognitive, affective and behavioural reactions to social comparison. Similarly, contextual factors can influence social comparison processes. For instance, social comparison increases in the wake of uncertainty or threat6,12,13. This relates to both collective crises (for example, climate change14) and private challenges (for example, major health threats13). Crucially, social comparison has been associated with both adaptive and maladaptive coping behaviour. For example, during the COVID-19 pandemic, social comparison was related to increased risk-reduction behaviour15 and increased well-being16. Yet, frequent upward social comparison has been associated with poorer mental and physical health outcomes13,17,18.

The potential of social comparison as a behaviour change technique (SC-BCT) to increase desired behaviours (for example, recycling or sunscreen use)19,20,21 or to decrease undesired behaviours (for example, the use of finite resources or alcohol consumption)19,22,23 has been investigated in numerous randomized controlled trials (RCTs) and some domain-specific meta-analytic syntheses thereof. Attention to SC-BCTs has increased substantialy during the past few decades19,24,25,26,27,28,29,30,31. Yet, a systematic review and meta-analysis on the use and efficacy of SC-BCTs across the behavioural sciences is lacking. The present work attempts to fill this gap by means of a comprehensive systematic review and meta-analysis summarizing RCTs across the behavioural sciences. We were interested in how social comparison was used as a BCT in RCTs and to what effect. Accordingly, our work attempted to answer the following two main research questions: (1) How effective are SC-BCTs when applied as the primary intervention? (2) How effective are SC-BCTs when added to a BCT bundle? To this end, we aimed to include only literature explicitly referring to social comparison or related terms.

Results

Included trials

The PRISMA flowchart provides an overview of the study synthesis (Fig. 1). Of the identified 18,058 unique hits after duplicate deletion, 17,314 were excluded on the basis of screened title and abstracts for not meeting inclusion criteria. One potentially relevant full text was inaccessible. Of the 744 records entering the full-text screening, 669 were excluded for not meeting inclusion criteria. Twelve of these32,33,34,35,36,37,38,39,40,41,42,43 were principally eligible but did not report data in a usable format to calculate Hedges’ g, and the primary authors did not respond to at least two data request emails. Two other principally eligible publications44,45 targeted preventing undesired behaviour from occurring (that is, alcohol consumption on a specific future occasion) and included various participants who did not engage in the undesired behaviour at baseline (that is, abstainers) and were thus excluded. Lastly, four other publications46,47,48,49 reported on RCTs that assessed only long-term cognitive or affective change following SC-BCTs but not behaviour change, and were thus excluded. A short summary of each of these 18 aforementioned publications is provided in Supplementary Appendix C. The exclusion of all other full texts was straightforward. In total, 74 publications reporting on 79 independent RCTs were included in the present meta-analysis20,21,23,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120. Five publications50,51,52,86,113 each reported on two independent eligible RCTs. We received data for 26 trials21,23,50,52,53,54,58,59,61,68,80,81,93,99,102,105,110,111,113,115,116,117,118,119 via email communication.

Fig. 1: Study process.
figure 1

PRISMA flow chart depicting the study synthesis process.

Full size image

Meta-analytic synthesis

Basic trial characteristics

Supplementary Appendix D provides an overview of the characteristics of the trials included in the meta-analytic synthesis. The 79 RCTs involved a total of N = 1,356,521 participants. A total of 71 RCTs investigated the efficacy of SC-CBTs as the primary intervention (that is, the first research question), whereas 8 RCTs72,75,82,98,100,103,104,109 investigated the efficacy of SC-BCTs as an add-on intervention (that is, the second research question). One dismantling RCT investigated both research questions109. Of those assessing the efficacy of SC-BCTs as the primary intervention, 14 independent RCTs reported in 13 publications20,63,64,65,73,74,78,85,86,90,91,96,99 investigated SC-BCTs as a stand-alone BCT. The remaining 57 independent RCTs reported in 54 publications21,23,50,51,52,53,54,55,56,57,58,59,60,61,62,66,67,68,69,70,71,76,77,79,80,81,83,84,87,88,89,92,93,94,95,97,101,102,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120 investigated SC-BCTs in conjunction with other BCTs. Provision of intra-individual feedback on the behavioural dimension, alongside social feedback on the same dimension, delivered to participants on two or more occasions (that is, enabling both temporal and social comparison) was the most common complementary BCT accompanying the SC-BCT among the 56 RCTs investigating SC-BCT as the primary (but not stand-alone) intervention. Most data were from the USA and other high-income countries. On average (that is, unweighted mean across the 79 RCTs), short-term assessments took place (or covered for continuous assessments) about 3.7 months post-intervention (mean, 110 days; s.d., 193 days). Long-term assessments took place (or covered for continuous assessments) about 6.2 months post-intervention (mean, 187 days; s.d., 179 days) on average (that is, unweighted mean) across the 23 RCTs assessing behaviour change more than once. While passive control conditions were homogenous (that is, assessments only), active control conditions varied considerably, with intra-individual feedback on the target dimension being the most commonly applied active control condition. Mean age ranged from 9.3 to 65.4 years with a weighted mean across trials of 38.92 years (s.d., 14.50 years). While some studies included a large age range (for example, 17 to 74 years23), others included a selective age range such as adolescents and young adults only (for example, 16 to 20 years69) or older adults (60+ years old)76. About half of the trials (48% or k = 38) involved student/pupil samples. While usually referred to as convenience samples, many of these trials involved very large samples (for example, up to 105 schools in one study103). Most other trials involved general population samples, with most of these investigating effects on climate change mitigation behaviour such as water or electricity usage on a population level54,62,64,67 and some of these investigating behaviour change on a population level such as behaviour change regarding traffic violations23. Lastly, a few studies targeted other selective populations (that is, non-student) such as primary care physicians97. Two studies involved female participants only88,96, and one study involved male participants only69; all other studies involved mixed-gender samples. Thirty-eight trials (48%) used online methods (for example, email or apps such as leaderboards) to apply SC-BCTs. In 12 trials (15%), the SC-BCT was applied personally in a lab setting. Another 16 trials (20%) sent SC-BCT letters home. Other studies sent SC-BCT letters to schools (k = 7; 9%) or work environments (k = 5; 6%). One trial (k = 1; 1%) applied the SC-BCT in a hospital (that is, to inpatients before discharge)79.

Risk of bias

Figure 2 provides all risk-of-bias assessments. In most trials (k = 63; 80%), some concern of bias emerged from the randomization process. In eight trials (10%), the risk of bias was judged to be low. Conversely, eight trials (10%) were judged to be at high risk of bias due to concern with randomization. Half of the trials (k = 41; 52%) had low risk of bias due to deviations from the intended interventions, 29% of trials (k = 23) had some concern and 19% of trials (k = 15) had high risk of bias. Most studies (k = 65; 82%) had low risk of bias due to missing outcome data, and the remaining trials had high risk of bias. Most studies (k = 74; 94%) had low risk of bias arising from the outcome assessment, and the remaining trials had high risk of bias. Lastly, most trials (k = 66; 84%) had some concern of bias regarding selection of the reported result(s) given that preregistrations and prespecified analysis protocols were rare. The remaining 13 trials (16%) had low risk of bias regarding selection of the reported result(s).

Fig. 2: Risk-of-bias assessments.
figure 2

Risk of bias of the included studies.

Full size image

Results for research question 1

Efficacy of SC-BCTs compared to passive control conditions

Table 1 presents the results of all overarching analyses. In the short term, SC-BCTs produced a significant (P < 0.001) small effect in terms of behaviour change in the intended direction relative to passive control conditions (g = 0.17; 95% confidence interval (CI), 0.11–0.23; k = 37; I2 = 94%; Fig. 3). This analysis involved 578,792 independent participants examined across 37 RCTs. High and significant heterogeneity in outcomes was found. The results remained similar after two outliers were removed (g = 0.15; 95% CI, 0.09–0.21; k = 35; I2 = 94%). Certainty of evidence was rated as low due to concern about risk of bias and considerable significant unexplained heterogeneity between outcomes. Risk of bias emerged from insufficient reporting of the randomization process, deviations from the intended intervention (for example, study participants or interventionists were likely to be aware of the assigned intervention), missing outcome data (for example, data were not available for all randomized participants) or lack of predefined analysis protocols in several studies. Significant unexplained heterogeneity was addressed by multiple sub-analyses. Non-important heterogeneity was found for four sub-analyses: intended upward SC-BCTs (across outcomes), desired outcomes only, health outcomes only and performance outcomes only. No concern regarding indirectness emerged from the selection of the population (that is, the majority of trials presented directness regarding population, whereas ten trials presented probable indirectness regarding population because they investigated a relatively specific sample without providing a rationale for its selection—for example, examining the efficacy of a SC-BCT for climate change mitigation behaviour in a student sample). No concern of indirectness emerged regarding intervention (that is, social comparison and SC-BCT definitions were always met; see inclusion criteria), comparator (that is, assessment only in passive control conditions) and outcome (that is, all behavioural outcomes matched the given social comparison dimension in SC-BCTs). All directness (versus indirectness) ratings of evidence for each included RCT using GRADE criteria are detailed in Supplementary Appendix E (that is, for this analysis as well as all other overarching analyses). The CI for the pooled effect excluded the null, signalling confidence in a significant behaviour change in the desired direction. Egger’s test did not indicate significant small study effects (that is, publication bias was deemed unlikely). Most studies targeted a decrease in undesired behaviour (k = 32), whereas only a few studies targeted an increase in desired behaviour (k = 5). Most studies investigated climate change mitigation behaviour (k = 20), followed by health behaviour (k = 13) and performance behaviour (k = 4).

Table 1 Efficacy of SC-BCT
Full size table
Fig. 3: Forest plot depicting the efficacy of SC-BCT relative to passive control conditions.
figure 3

BC, behavioural change; RE model, random-effects model; SC, social comparison. For each study, the black square represents the effect size (standardized mean difference, Hedges’ g), and the horizontal bars represent the 95% CI. The size of the black squares is proportional to the sample size of the given RCT. The diamond denotes the 95% CI of the pooled effect across the N = 37 independent RCTs, and the error bars of the diamond denote the corresponding 95% PI. Positive (negative) Hedges’ g values indicate a higher (lower) efficacy regarding behaviour change (that is, a larger increase in desired behaviour and a larger decrease in undesired behaviour) in the SC-BCT arms than in the passive control condition arms.

Full size image

In the long term, data availability was considerably more limited (k = 13). The results remained similar to those in the short term. In the overarching analysis across all data at follow-up, SC-BCTs produced a highly significant (P < 0.001) small effect relative to passive control conditions (g = 0.10; 95% CI, 0.06–0.13; k = 13; I2 = 0%). Heterogeneity in outcomes was low and non-significant. The results remained very similar after one outlier was excluded (g = 0.10; 95% CI, 0.06–0.13; k = 12; I2 = 0%). Certainty of evidence was rated to be moderate due to concern about risk of bias, such as insufficient reporting of the randomization process, missing outcome data (for example, data were not available for all randomized participants) or missing predefined analysis protocols in several studies. Statistical analyses revealed non-important and non-significant heterogeneity between the outcomes. No concern of indirectness emerged regarding population (that is, the majority of trials presented directness regarding population, whereas three trials presented probable indirectness regarding population due to investigating a rather specific sample without presenting a rationale for its selection). Similarly, no concern of indirectness emerged in relation to intervention (that is, social comparison and SC-BCT definitions were always met), comparator (that is, assessment only in passive control conditions) and outcome (that is, all behavioural outcomes matched the given social comparison dimension in SC-BCTs). The CI for the pooled effect excluded the null, signalling confidence in a significant change in behaviour in the desired direction. Egger’s test did not indicate significant small study effects (that is, publication bias was deemed unlikely). Most trials reporting follow-up data targeted health (k = 5) or climate change mitigation behaviour (k = 6).

Efficacy of SC-BCTs compared to active control conditions

Table 1 displays the results from the overarching analyses. The most commonly used control condition involved the provision of intra-individual feedback on the behavioural dimension. Other examples of active control conditions were shaping knowledge (for example, instructions on how to lower electricity usage) and goal setting (for example, setting a goal concerning lowing electricity usage). See Supplementary Appendix D (column 9) for all active control conditions. In the overarching analysis in the short term, SC-BCTs were also significantly (P < 0.001) more efficacious in changing behaviour in the intended direction relative to active control conditions in the short term, with a small pooled effect (g = 0.23; 95% CI, 0.15–0.31; k = 42; I2 = 96%; Fig. 4). This analysis involved 148,233 independent participants examined across 42 RCTs. Heterogeneity in outcomes was high and significant. The results remained similar when two statistical outliers were removed. Egger’s test indicated significant small study effects (that is, potentially due to publication bias), and the trim-and-fill method added ten studies to the left to establish symmetry. The results remained similar (g = 0.15; P < 0.01; 95% CI, 0.05–0.25; k = 52; I2 = 98%). Certainty of evidence was rated very low due to concern about risk of bias, considerable unexplained heterogeneity between outcomes and potential publication bias. Risk of bias emerged from insufficient reporting of the randomization process, deviations from the intended intervention (for example, participants or people delivering the intervention were likely to be aware of the assigned intervention), missing outcome data (for example, data were not available for all randomized participants) or lack of predefined analysis protocols in several studies. Non-important heterogeneity was found for two sub-analyses: clearly upward SC-BCTs (across outcomes) and service outcomes only. No concern of indirectness emerged regarding population (that is, the majority of trials presented directness regarding population, whereas 16 trials presented probable indirectness regarding population due to investigating a rather specific sample without presenting a rationale for its selection). Similarly, no concern of indirectness emerged regarding intervention (that is, social comparison and SC-BCT definitions were always met), comparator (that is, active control conditions as a comparator regarding behaviour change) and outcome (that is, all behavioural outcomes matched the given social comparison dimension in SC-BCTs). The CI for the pooled effect excluded the null, signalling confidence in a significant behaviour change in the desired direction. As mentioned above, concern regarding a potential publication bias emerged from significant small study effects detected by Egger’s test. Most studies targeted an increase in desired behaviour (k = 27), whereas 14 RCTs targeted a decrease in undesired behaviour. The highest number of studies investigated health behaviour (k = 14), followed by performance behaviour (k = 12), climate change mitigation behaviour (k = 9) and service participation behaviour (k = 7).

Fig. 4: Forest plot depicting the efficacy of SC-BCT relative to active control conditions.
figure 4

For each study, the black square represents the effect size (standardized mean difference, Hedges’ g), and the horizontal bars represent the 95% CI. The size of the black squares is proportional to the sample size of the given RCT. The diamond denotes the 95% CI of the pooled effect across the N = 42 independent RCTs, and the error bars of the diamond denote the corresponding 95% PI. Positive (negative) Hedges’ g values indicate a higher (lower) efficacy regarding behaviour change (that is, a larger increase in desired behaviour and a larger decrease in undesired behaviour) in the SC-BCT arms than in the active control condition arms.

Full size image

In the long term, data availability was considerably thinner (k = 6). The results remained similar to the short-term results. SC-BCTs produced a significant (P < 0.05) small effect relative to active control conditions (g = 0.24; 95% CI, 0.03–0.45; k = 6; I2 = 77%). Heterogeneity in outcomes was high and significant. No outliers were identified. Certainty of evidence was rated low due to concern about risk of bias and considerable significant unexplained heterogeneity between outcomes. Risk of bias emerged from insufficient reporting of the randomization process, deviations from the intended intervention (for example, participants or people delivering the intervention were likely to be aware of the assigned intervention), missing outcome data (for example, data were not available for all randomized participants) or lack of predefined analysis protocols in several studies. Data availability allowed for two sub-analyses (desired outcomes only and non-restricted SC-BCTs only), which revealed similar significant unexplained heterogeneity. No concern of indirectness emerged regarding population (that is, the majority of trials presented directness regarding population, whereas two trials presented probable indirectness regarding population due to investigating a rather specific sample without presenting a rationale for its selection). Similarly, no concern of indirectness emerged regarding intervention (that is, social comparison and SC-BCT definitions were always met), comparator (that is, active control conditions as a comparator regarding behaviour change) and outcome (that is, all behavioural outcomes matched the given social comparison dimension in SC-BCTs). The CI for the effect excluded the null, signalling confidence in a significant change in behaviour in the desired direction. Egger’s test was infeasible (k < 10). Most trials reporting follow-up data targeted desired behaviour (k = 4) and applied SC-BCTs not restricted in direction (k = 5).

Moderator analyses

Table 2 displays all moderator analysis results. In the overarching analysis comparing SC-BCTs to passive controls in the short term, no significant moderations were observed. In the analysis comparing SC-BCTs to active controls, a higher number of SC-BCT sessions (k = 33, β = 0.02, P = 0.007) and targeting a desired (versus undesired) behavioural outcome (k = 41, β = 0.19, P = 0.017) were both related to higher short-term efficacy of SC-BCTs. Unexplained heterogeneity in outcomes remained significant and high for all analyses, irrespective of the significance of moderation. No evidence was found for a significant difference in the efficacy of upward versus non-restricted SC-BCTs, nor for SC-BCTs targeting health versus climate change mitigation behaviours, nor for SC-BCTs targeting health versus performance behaviours.

Table 2 Moderators of the efficacy of SC-BCT
Full size table

Sub-analyses

Table 3 displays the results from the sub-analyses. In the short term and relative to passive control conditions, both SC-BCTs targeting undesired behaviour (k = 32) and SC-BCTs targeting desired behaviour (k = 5) yielded highly significant (all P < 0.001) small effects when analysed in isolation. When studies investigating climate change mitigation behaviour (k = 20), health behaviour (k = 13) and performance behaviour (k = 4) were analysed in isolation, all three produced highly significant (all P < 0.01) small effects. Most studies applied non-restricted SC-BCTs (k = 24) or upward SC-BCTs (k = 14), and when analysed in isolation both yielded highly significant (all P < 0.001) small effects. Too few trials investigated other directions (for example, downward SC-BCTs) for isolated review. About an even number applied clearly upward SC-BCTs (k = 7) versus intended upward SC-BCTs (k = 7; that is, upward for most but not all participants), with both yielding significant small effects when analysed in isolation. Studies investigating service participation behaviour were too few for isolated synthesis (k < 4). In the long term, data availability was scarce. When SC-BCTs targeting health (k = 5) or climate change mitigation behaviour (k = 6) were analysed in isolation, only SC-BCTs targeting climate change mitigation behaviour (but not health) produced a significant (P < 0.001) small-sized long-term effect. No other sub-analyses were feasible (k < 4).

Table 3 Efficacy of SC-BCT: sub-analyses
Full size table

In the short term and relative to active control conditions, both SC-BCTs targeting an increase in desired behaviour (k = 27) and SC-BCTs targeting a decrease in undesired behaviour (k = 14) yielded highly significant (all P < 0.01) small effects when analysed in isolation. One trial reported a primary outcome that mixed desired and undesired behaviour (that is, aggregated sun protection behaviours such as sunscreen use as well as sun exposure) and was consequently not included in the isolated syntheses114. When SC-BCTs targeting health behaviour (k = 14), performance behaviour (k = 12), climate change mitigation behaviour (k = 9) or service participation behaviour (k = 7) were analysed in isolation, all produced significant small effects. Most studies applied non-restricted SC-BCTs (k = 27), followed by upward SC-BCTs (k = 15) and then downward SC-BCTs (k = 6). Only the former two (but not downward SC-BCTs) produced significant (small) effects relative to active control conditions. Clearly upward (k = 8) and intended upward SC-BCTs (k = 7) both produced significant small effects. At follow-up, data availability was scarce (k = 6). There was insufficient evidence for isolated review of behavioural domains (k < 4). Only two isolated reviews were feasible, with a low number of included trials and limited power to detect effects. When SC-BCTs targeting desired behaviour were analysed in isolation, no evidence was found for a significant long-term effect relative to active control conditions (g = 0.34; 95% CI, −0.02 to 0.70; k = 4; I2 = 82%). Similarly, non-restricted SC-BCTs analysed in isolation also yielded no significant long-term effect when compared to active controls (g = 0.21; 95% CI, −0.01 to 0.43; k = 5; I2 = 79%).

Relative efficacy of different types of SC-BCTs

Table 4 shows the results on the relative efficacy of different types of SC-BCTs. The number of trials was low (k = 9), limiting statistical power to detect significant differences. Across this thin evidence base, no evidence was found for significant differences in the short-term efficacy of upward SC-BCTs versus downward SC-BCTs, upward SC-BCTs versus non-restricted SC-BCTs, or SC-BCTs presenting more versus less attainable social comparison standards. In the long term, however, SC-BCTs presenting more (versus less) attainable social comparison standards were superior (P < 0.01), with a small pooled effect (g = 0.18; 95% CI, 0.05–0.31; k = 4; I2 = 0%). See Supplementary Appendix E for a detailed description of certainty-of-evidence assessments for each of the abovementioned analyses. Certainty of evidence ranged from very low to moderate, depending on the analysis.

Table 4 Efficacy of varying SC-BCTs directly compared in trials
Full size table

Results for research question 2

Efficacy of social comparison as an add-on BCT

Table 5 shows the results for research question 2. The number of trials was low (k = 8), limiting statistical power to detect significant differences. In both the overarching analysis and the sub-analysis (that is, upward SC-BCTs only), differences in efficacy between BCT bundles with and without the SC-BCT were non-significant. Certainty of evidence of the overarching analysis was rated very low due to concern about risk of bias, considerable significant unexplained heterogeneity between outcomes and imprecision. See Supplementary Appendix E for a detailed description of ratings of indirectness using GRADE criteria.

Table 5 Efficacy of social comparison as an add-on BCT
Full size table

Discussion

We conducted a comprehensive meta-analysis covering data from RCTs on the efficacy of SC-BCTs across the behavioural sciences. In 79 RCTs, we found evidence supporting the efficacy of SC-BCTs in shaping behaviour in the desired direction, albeit with small magnitudes of pooled effects. We found evidence across types of control condition (that is, passive controls and active controls), behavioural domains (that is, health behaviour, climate change mitigation behaviour, performance behaviour and service participation behaviour), SC-BCTs focusing on increasing desired behaviours, SC-BCTs focusing on decreasing undesired behaviours and assessment timeline (that is, short- and long-term assessments). However, certainty of evidence was often limited, mainly due to concern about risk of bias and considerable unexplained heterogeneity. Further high-quality research is needed to thoroughly examine the robustness of the current findings. Notably, the vast majority of studies did not investigate SC-BCTs as a stand-alone intervention, but rather as the primary BCT accompanied by another BCT, such as the provision of repeated intra-individual feedback enabling temporal comparison in addition to social comparison.

Our results align with prior findings in related fields. Other meta-analyses on the efficacy of SC-BCTs also revealed significant results on behaviour change, including in climate change mitigation behaviour14,24,27,30 and work performance31. In one review, for instance, SC-BCTs were identified as one of the two most effective BCTs for changing climate change mitigation behaviour, alongside financial approaches24.

We extended prior work by performing various moderator analyses. In the overarching analysis comparing SC-BCTs to active controls, the number of SC-BCT sessions was positively associated with efficacy. This suggests a dose–response relationship, which might be attributable to cumulative reinforcement, habit formation, cognitive shifts or emotional shifts, fostering sustained engagement and internalization of change. However, in the overarching analysis comparing SC-BCTs to passive controls, the number of SC-BCT sessions did not moderate efficacy. More research is needed to determine when and under what circumstances a dose–response relationship can be assumed. Similarly, the significant negative moderation of study quality in the overarching analysis comparing SC-BCTs to active (but not passive) control conditions suggests that effects are overestimated. Pooled effects are biased by lower-quality evidence finding larger effects. Future research needs to improve methodological quality to ensure accurate estimation of effects. As more trials accumulate over time, future meta-analytic research should re-evaluate the present finding, especially as the present work found this association in only one analysis (but not the other). In studies comparing SC-BCTs to active control conditions (but not in those comparing them to passive control conditions), SC-BCTs targeting desired behaviours were associated with higher efficacy in behaviour change than SC-BCTs targeting undesired behaviours. This suggests that SC-BCTs might be more effective when they target a desired behaviour (for example, increasing healthy food intake) rather than an undesired behaviour (for example, reducing unhealthy food intake). However, given mixed findings across the moderator analyses relative to passive and active controls, this association remains preliminary and requires further testing. Lastly, trials directly comparing upward SC-BCTs portraying a more (versus less) attainable social comparison standard were associated with higher long-term behaviour change, whereas behaviour change in the short term did not differ significantly. This suggests that more (versus less) attainable upward SC-BCTs might yield longer-lasting behaviour change. Given that the number of trials was rather low, this association should also be interpreted with caution and re-examined in future research.

Overall, the small effect sizes found for SC-BCTs need to be interpreted in the context of low cost in developing and disseminating SC-BCTs, making them realistic options for large-scale implementation (for example, for preventive health interventions). For instance, in various studies only one or two letters or emails with social comparison information (for example, personal energy usage versus that of a social standard) were sent to thousands of participants, or scalable low-cost digital health interventions using peer comparison or leaderboards were applied. Small effects according to statistical benchmarks121 might therefore have large real-life impacts when costs are low and scalability is large.

There are limitations to our meta-analysis. First, we aimed at only including research that explicitly referred to social comparison or social-comparison-related terms (for example, upward/downward/lateral comparison). This choice was made in an effort to maximize internal validity. Yet, this may have resulted in missing related literature. For example, research on audit and feedback to influence health professional behaviour often involves interpersonal comparison (for example, comparison with median performance) as one part of a BCT bundle122. Likewise, any group-based intervention is likely to be influenced by social comparison processes. Yet, studies not explicitly referencing social comparison (recall that we performed all-fields searches) were deemed unlikely to feature a study design capable of isolating the individual impact of SC-BCTs on behaviour change. Despite the restriction on research explicitly referring to social comparison terms, the present work covered a broad range of literature (79 RCTs) and a very large number of included participants (N = 1,356,521). Second, data were scarce for some analyses (for example, RCTs directly comparing different SC-BCTs), limiting statistical power. As more research accumulates, more (fine-grained) meta-analytic analyses will become feasible. Third, the generalizability of our results is limited, as most of the included trials were conducted in the USA and other high-income countries, necessitating more research from other contexts. Fourth, while we adjusted for the potential impact of publication bias on meta-analytic synthesis (that is, Egger’s test and the trim-and-fill method), we excluded non-peer-reviewed data. While maximizing internal validity through synthesizing only quality-controlled data, this decision may have diminished the external validity of the results, and the results might be biased by publication bias (that is, underreporting of null results). Some included RCTs did report on null results. Yet, it remains unknown to what extent the present results are affected by publication bias. Fifth, the results concern only collective behaviour (that is, group mean differences). Future qualitative and quantitative research is necessary to investigate individual processes evoked by SC-BCT (for example, investigating how many participants do versus do not change behaviour and for which reasons), which may help in tailoring and optimizing SC-BCTs.

On the basis of the present literature, SC-BCTs appear to have the potential to shape adaptive behaviour change concerning climate change mitigation, health, performance, and service participation. Small effect sizes need to be interpreted in terms of low cost and large scalability. Generalization of the results is limited given that most of the available data are from the USA or other high-income countries. More research in diverse contexts is needed to further investigate the generalizability of the results. Certainty of evidence was constrained by various sources of bias in the current literature, highlighting the need for more high-quality research to more robustly examine the efficacy of SC-BCTs in future research and meta-analytic syntheses.

Methods

Preregistration and guidelines

This work was preregistered with the PROSPERO database (CRD42022343154) and followed PRISMA 2020 guidelines123. One deviation from our preregistration should be noted. In line with a peer reviewer’s recommendation, we included only behavioural outcomes (for example, electricity usage) in the meta-analysis and excluded studies exclusively investigating cognitive (for example, intentions) and/or affective outcomes (for example, feelings). The systematic literature search, data extractions and statistical analyses were conducted by at least two authors independently (T.H.H., R.M.C., J.N. and F.L.). In cases of disagreement, consensus was reached among T.H.H., R.M.C. and N.M. in personal discussions.

Definitions of social comparison and SC-BCTs

We primarily adhered to Wood’s3 definition of social comparison as thinking about social information in relation to the self. Furthermore, we followed the general comparative-processing model5 to conceptualize social comparison as a process encompassing the selection of the social comparison standard, the basic comparison process itself (that is, evaluation of similarity or discrepancy between the target and the social standard) and the resulting reactions. Lastly, we followed the definition and taxonomy of BCTs provided by Michie et al.124. Accordingly, BCTs are defined as observable, replicable and irreducible interventions or components of a more complex intervention designed to have a causal influence on behaviour change124. Michie et al. defined SC-BCTs as interventions in which attention is drawn to the performance of others to enable social comparison on a particular dimension. Nonetheless, our decision on whether a study used social comparison as an intervention was primarily based on Wood’s definition of social comparison. For this comparison to occur, sufficient information needs to be provided, particularly for covert behaviour. For instance, an intervention on reducing energy usage needs to provide quantitative feedback about one’s own energy usage and the usage of a given social standard. Accordingly, descriptive norm interventions that provide such quantitative feedback belong to the category of SC-BCTs, whereas injunctive norm interventions (that is, interventions providing information on the degree to which a particular behaviour is socially approved or disapproved31,125) do not belong to SC-BCTs. The efficacy of SC-BCTs may be accompanied by other potentially contributing factors, even when trialists aim to explicitly and solely focus on the efficacy of SC-BCTs. For instance, if an intervention provides quantitative feedback on an individual’s performance (alongside the social standard) multiple times, it enables not only social comparison but also temporal comparison (that is, intra-individual comparison with past performance). Similarly, an intervention may include an SC-BCT accompanied by a frowning face or a downward-pointing thumb, introducing additional potential influences on behaviour change processes. As a result, other factors could potentially interfere with the efficacy of SC-BCTs, preventing them from functioning as stand-alone interventions. Regarding our first research question on the efficacy of SC-BCTs, we focused on trials examining SC-BCTs either as the stand-alone BCT or as the primary BCT. Regarding our second research question (‘How effective are SC-BCTs when added to a BCT bundle?’), we focused on studies investigating SC as an add-on component. Such trials compare the efficacy of two intervention arms, both using BCT bundles, where the only difference between the bundles is the inclusion of the SC-BCT component (that is, the BCT bundle including SC-BCTs versus the BCT bundle excluding only the SC-BCT component). Add-on studies thus examine whether adding social comparison to a bundle of BCTs adds incremental value in behaviour change (that is, relative to the BCT bundle without SC-BCT). Categorizations of BCTs were based on Michie et al.’s taxonomy and conducted independently by two of the authors (T.H.H. and R.M.C.). Discrepancies were discussed among three authors (T.H.H., R.M.C. and N.M.) until consensus was reached.

Sub-categorization of SC-BCTs by comparison direction

In all three sub-categories, SC-BCTs were further sub-categorized according to the applied social comparison direction of the SC-BCTs. In some SC-BCTs, a group average (for example, the average energy usage across a given residential area) or multiple social standards with varying degrees of the desired outcome are provided. These examples represent a non-restricted comparison direction, as the perceived comparison direction may vary between participants5. In other SC-BCTs, a group average for a selected sub-population is provided. For instance, providing feedback on the energy usage of the top 20% energy-efficient neighbours represents an upward social standard for most, but not all, study participants. The present work refers to the direction of such SC-BCTs as intended upward (and vice versa for intended downward).

Categorization of control conditions

Control groups were categorized into either passive control conditions (defined as assessment-only conditions without any form of manipulation) or active control conditions (defined as conditions that received any kind of non-social-comparison-related BCT or unspecific control task).

Systematic literature search

We systematically searched MEDLINE, PsycINFO and Web of Science with the preregistered search strategy. In line with the systematic search by Gerber et al.12, who chose to conduct a broad search by using only the term ‘social comparison’, our equally broad search used the category ‘all-fields’ and included only terms related to social comparison (TX ‘social compar*’ OR TX upward comparison* OR TX downward comparison* OR TX ‘lateral comparison*’). The search was conducted on 2 January 2024 and was not time-restricted (that is, involving all electronic hits from inception to the search date). No restrictions were made concerning scientific disciplines. Yet, all data turned out to be from the behavioural sciences. Also, no restrictions were made concerning sample characteristics. Languages of publications were limited to English, Dutch and German, and searches were carried out with English search terms only. The full search strategy is shown in Supplementary Appendix A.

Inclusion criteria

Studies had to meet all of the following inclusion criteria to be included in the present work: (1) the study was an RCT, (2) at least one arm investigated the efficacy of SC (as defined by Wood3) as a BCT (as defined by Michie et al.124) and used either a stand-alone or primary SC-BCT (research question 1) or an add-on SC-BCT (research question 2), (3) data for at least one behavioural outcome (for example, electricity usage) were reported, (4) the outcome was assessed at least 24 hours after the induction of the (first) SC-BCT session to exclude experimental studies assessing only immediate reactions to social comparison, (5) the available outcome data included at least ten participants per arm126 to exclude potential chance findings, and (6) the data were peer-reviewed. Withdrawn (that is, retracted) publications were not included.

Assessment intervals: short-term and long-term

We divided data assessment points or intervals into two categories: short-term and long-term outcome assessment. As we included data from diverse behavioural sciences covering a broad range of studied behaviours, we had to rely on primary study author-defined assessment time points or intervals. While some behaviours were assessed over long and continuous intervals (for example, energy usage), other behaviours were assessed at a given time point (for example, basketball free throw ability). Predefined cut-offs for categorizing short-term versus long-term data were therefore not meaningful. For the analyses on short-term efficacy, we extracted the first (that is, shortest) outcome assessment that met inclusion criterion number four (≥24 h after the (first) SC-BCT session). If a trial reported data for only one assessment time point or interval, this assessment was automatically extracted for the short-term category. For trials reporting data for two or more assessments, the first assessment (≥24 h after the (first) SC-BCT session) was extracted for the short-term category, and the last assessment was extracted for the long-term category.

Data prioritization: primary outcome and comparison

For complex trials investigating multiple outcomes and/or multiple arms, we applied a data prioritization algorithm to avoid data dependencies in a given meta-analysis. When applicable, data were prioritized as follows: (1) when multiple outcomes were reported, but only one of them aligned with the social comparison dimension, this outcome was prioritized; (2) when both objective (for example, objective step count) and subjective outcomes (for example, subjective step count) were reported, objective outcomes were prioritized; and (3) when a trial involved arms with different kinds of SC-BCTs, the primary SC-BCT arm (reported as such by the primary authors of the given trial) was prioritized. Data excluded from one analysis could still be included in other (sub-)analyses as long as data independence was met.

Primary outcome

The primary outcome metric was the standardized mean difference (Hedges’ g) for the primary behaviour outcome (see above) between the SC-BCT arm and the comparator arm at a given assessment time point (see above). Following Cohen’s conventions121, effect sizes were interpreted as small (0.2), moderate (0.5) or large (0.8). In the present work, positive values of g indicate behaviour change in the intended direction (that is, higher increases in desired outcomes and higher decreases in undesired outcomes), which allowed for meaningful pooling of effects across desired and undesired outcomes.

Desired versus undesired behaviour

SC-BCT studies may target an increase in desired behaviour (for example, steps per day or course grades) or a decrease in undesired behaviour (for example, alcohol consumption or water usage). We sub-classified studies accordingly and analysed data both across all classes and in isolation.

Risk-of-bias assessment

Risk of bias for each (independent) RCT was assessed using the Cochrane risk of bias tool v.2.0 (ref. 127), which assesses the methodological rigour of RCTs on five domains. It assesses bias arising from the randomization process (D1), deviations from the intended intervention (D2), missing data (D3), outcome assessment (D4) and selection of the reported result(s) (D5). On the basis of the Cochrane algorithm, an overall ordinal rating was derived for each trial indicating low risk, some concern or high risk. Supplementary Appendix B provides a detailed overview of criteria per domain. We visualized the risk-of-bias assessments via the free online tool robvis (https://mcguinlu.shinyapps.io/robvis/).

Certainty of evidence

Certainty of evidence was assessed using GRADE criteria128 via the following five domains: (1) risk of bias (for example, the pooled effect is mainly based on studies with insufficient randomization), (2) inconsistency (that is, unexplained heterogeneity), (3) indirectness (for example, the pooled effect is mainly based on interventions that were examined in a particular sub-sample without providing a rationale for selective inclusion), (4) imprecision (that is, the CI does not allow a firm conclusion about the effect and its direction) and (5) publication bias. The assessment of risk of bias was derived from risk of bias v.2.0 assessments (see above). Indirectness was assessed across four domains proposed by GRADE: population, intervention, comparator and outcome. The evaluations of heterogeneity, imprecision and publication bias were derived from the results of the meta-analytic analyses (see below). The interpretation of I2 corresponds to the recommended classification by GRADE (for example, 75% to 100%, considerable heterogeneity). Certainty of evidence overall can range from high (4) to very low (1). As recommended, certainty of evidence was assessed only for the overarching analyses and not repeated for the sub-analyses.

Statistical analysis

Hedges’ g was calculated129 whenever (raw) means and standard deviations were reported in the publication (or sent via email). In rare cases, publications reported Hedges’ g instead of means and standard deviations, which we then extracted. Hedges’ g values were pooled in random-effects meta-analyses given that the present work summarized heterogenous data (that is, diverse behaviours and diverse samples studied in diverse contexts). Statistical analyses were performed with the metafor package (v.3.4.0) in R (v.4.1.1)130,131. Two-sided tests were run with α = 0.05. We performed a given meta-analysis only when the number of independent data points reached at least four (k ≥ 4)126. We first analysed data in an overarching analysis across all data. We then performed sub-analyses, separating data by three factors that might influence the efficacy of SC-BCTs5: (1) desired versus undesired behaviours, (2) the applied social comparison direction of the SC-BCT and (3) the behavioural outcome domain (that is, climate change mitigation, health, performance or service participation; see more details in the Results). To examine whether efficacy between levels of these three factors differed significantly, we entered a given factor in sub-group moderator analyses. Moreover, one potential continuous moderator of efficacy was analysed in meta-regressions: the number of SC-BCT sessions (for example, the number of SC-BCT emails or letters sent to participants or the total number of SC-BCT lab sessions). Moderator analyses were carried out only when sufficient power to detect effects was present (whenever k ≥ 10 for the continuous moderator and k ≥ 10 per level of the dichotomous moderators132) and only for the overarching analyses to avoid risks of an inflated type I error rate. To examine heterogeneity in outcomes, we calculated the Q statistic and I2. The latter provides an estimation of true heterogeneity in outcomes between studies rather than heterogeneity due to sampling error. To estimate in which margin the true population effect size falls with a given margin of certainty, we calculated 95% CIs of effect sizes. Furthermore, we calculated 95% PIs. PIs supply a margin in which the true population effect is to be expected when similar future trials accumulate133. When both the CI and the PI exclude the null, there is particular certainty in the respective effect. To account for the potential effect of extreme observations on the pooled effect size, outlier-adjusted analyses were run whenever one or more outliers were identified. Outliers were defined as extraordinarily low and high effect sizes (that is, at least 3.3 standard deviations below or above the pooled g)134. To account for small study effects (for example, due to publication bias in the literature), we tested for significant funnel plot asymmetry with Egger’s test135 whenever k ≥ 10 (ref. 136). Whenever Egger’s test was significant, we used the trim-and-fill method and reported asymmetry-adjusted results when the trim-and-fill method added one or more studies to establish symmetry137.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.