Knowledge base | Statwing

Statwing Correlations

 

Overview

When users select two continuous or discrete variables, Statwing runs a correlation to assess whether those two groups are statistically related. Statwing defaults to calculating Pearson’s r, the most common type of correlation; if the assumptions of that test are not met, Statwing recommends a ranked version of the same test, calculating Spearman’s rho.

Additionally, Statwing uses the Fisher Transformation to calculate confidence intervals for the correlation coefficient.

 

Assumptions of Pearson’s r

Statwing recommends Pearson’s r as a valid measure of correlation if certain assumptions about the data are met:

  • There are no outliers in the continuous/discrete data.[1]
  • The relationship between the variables is linear (e.g., y = 2x, not y = x2).[2]
  • The data are in fact continuous or discrete and not ordinal.[3]
Statwing does not display a line of best fit when it detects a violation of these assumptions.

 

Ranked Correlation (Spearman’s Rho)

When assumptions are violated, the Pearson’s r may no longer be a valid measure of correlation. In that case, Statwing recommends Spearman’s rho; Statwing rank-transforms the data (replaces values with their rank ordering) then runs the typical correlation. Rank transformation is a well-established method for protecting again assumption violation (a “nonparametric” method), and the rank transformation from Pearson to Spearman is the most common (Conover and Iman, 1981).

Note that Spearman’s rho still assumes that the relationship between the variables is monotonic.

Please contact us if you have questions or feedback about t-tests in Statwing or the explanations above.

 

Footnotes

1. With larger sample sizes, outliers are less likely to negatively affect results. Statwing uses Tukey’s “outside fence” to define outliers as points more than 3 times the intra-quartile range above the 75th or below the 25th percentile point.
2. Statwing identifies a relationship as nonlinear when Spearman’s rho > 1.1 * Pearson’s r and Spearman’s is statistically significant.
3. Data like Highest level of education completed or Finishing order in marathon are unambiguously ordinal. Though Likert scales (like a 1 to 7 scale where 1 is Very dissatisfied and 7 is Very satisfied) are technically ordinal, it is common practice in social sciences to treat them as though they are continuous (i.e., using Pearson’s r).