When users want to relate a binary variable to a continuous or discrete variable, Statwing runs a two-tailed t-test to assess whether either of the two groups tends to have higher values than the other for the continuous/discrete variable. Statwing defaults to the Welch’s t-test, also known as the t-test for *unequal* variances; if the assumptions of that test are not met, Statwing recommends a ranked version of the same test.

## Assumptions of Welch’s T-Test

Statwing recommends Welch’s t-test (hereafter “t-test”) if several assumptions about the data hold:

- The sample size of each group is above 15 (and therefore the Central Limit Theorem satisfies the requirement for normally distributed data).[1]
- There are few or no outliers in the continuous/discrete data.[2]
- The data are in fact continuous or discrete and not ordinal.[3]

Unlike the slightly more common t-test for *equal* variances, Welch’s t-test does not assume that the variances of the two groups being compared are equal. Modern computing has made that assumption unnecessary. Furthermore, assuming equal variances leads to less accurate results when variances are not in fact equal, and its results are no more accurate when variances are actually equal (Ruxton, 2006).

## Ranked T-Test

When assumptions are violated, the t-test may no longer be valid. In that case, Statwing recommends the *ranked* t-test; Statwing rank-transforms the data (replaces values with their rank ordering) and then runs the same Welch’s t-test on that transformed data. The ranked t-test is robust to outliers and non-normally distributed data. Rank transformation is a well-established method for protecting again assumption violation (a “nonparametric” method), and is most commonly seen in the difference between Pearson and Spearman correlation (Conover and Iman, 1981). Rank transformation followed by Welch’s t-test is similar in effect to the Mann-Whitney U Test, but somewhat more efficient (Ruxton, 2006; Zimmerman, 2012).

Note that while the t-test tests for the equality of the *means* of the two groups, the ranked t-test does not explicitly test for differences between the groups means or medians. Rather, it test for a general tendency of one group to have larger values than the other.

Please contact us if you have questions or feedback about t-tests in Statwing or the explanations above.

## Footnotes

1. With sample sizes below 15, data can still be visually inspected to determine if it is in fact normally distributed; if it is, unranked t-test results are still valid even for small samples. In practice this assessment can be difficult to make, so Statwing recommends ranked t-tests by default for small samples.

2. With larger sample sizes, outliers are less likely to negatively affect results. Statwing uses Tukey’s “outside fence” to define outliers as points more than 3 times the intra-quartile range above the 75th or below the 25th percentile point.

3. Data like *Highest level of education completed* or *Finishing order in marathon* are unambiguously ordinal. Though Likert scales (like a 1 to 7 scale where 1 is *Very dissatisfied* and 7 is *Very satisfied*) are technically ordinal, it is common practice in social sciences to treat them as though they are continuous (i.e., with an unranked t-test).