Also see GitHub Reop on Statistical tests in Data Science

Statistical Tests: When to Use Them, Assumptions, and SciPy Links

Quick Decision Rule

Situation	Test
2 independent groups + numeric outcome + normal data	Independent t-test
3+ independent groups + numeric outcome + normal data	One-way ANOVA
2 independent groups + numeric outcome + non-normal data	Mann-Whitney U test
3+ independent groups + numeric outcome + non-normal data	Kruskal-Wallis test
2 paired/repeated measurements + non-normal data	Wilcoxon signed-rank test
3+ paired/repeated measurements + non-normal data	Friedman test
Category vs category	Chi-square test of independence
Two numeric variables + linear relationship + normal-ish data	Pearson correlation
Two numeric/ordinal variables + monotonic or non-normal relationship	Spearman correlation
Check whether numeric data is normally distributed	Shapiro-Wilk test

1. Independent Samples t-test

SciPy function: scipy.stats.ttest_ind
Official documentation: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

When to use

Use an independent samples t-test when you want to compare the mean of a numeric variable between two independent groups.

Example question

Do male and female students differ in their mean exam scores?

Variables

Group variable: gender → Male / Female
Outcome variable: exam_score → numeric

Hypotheses

H0: The two group means are equal.
H1: The two group means are different.

Assumptions

The dependent variable is numeric.
The two groups are independent.
The dependent variable is approximately normally distributed within each group.
The variances of the two groups are equal if using the classic Student t-test.
If variances are not equal, use Welch’s t-test by setting equal_var=False.

Related assumption checks

Normality: scipy.stats.shapiro
Equal variance: scipy.stats.levene

Example code

from scipy.stats import ttest_ind, shapiro, levene

male = df.loc[df["gender"] == "Male", "exam_score"]
female = df.loc[df["gender"] == "Female", "exam_score"]

# normality
print(shapiro(male))
print(shapiro(female))

# equal variance
lev_stat, lev_p = levene(male, female, center="median")

# t-test
t_stat, p_value = ttest_ind(
    male,
    female,
    equal_var=lev_p > 0.05,
    nan_policy="omit",
    alternative="two-sided"
)

print(t_stat, p_value)
````

---

## 2. One-way ANOVA

**SciPy function:** `scipy.stats.f_oneway`
**Official documentation:** [https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html)

### When to use

Use one-way ANOVA when you want to compare the **mean** of a numeric variable across **three or more independent groups**.

### Example question

Do Online, Classroom, and Hybrid teaching methods differ in mean exam scores?

### Variables

* Group variable: `teaching_method` → Online / Classroom / Hybrid
* Outcome variable: `exam_score` → numeric

### Hypotheses

* H0: All group means are equal.
* H1: At least one group mean is different.

### Assumptions

1. The dependent variable is numeric.
2. Groups are independent.
3. The dependent variable is approximately normally distributed within each group.
4. The group variances are approximately equal, also called homoscedasticity.

### Related assumption checks

* Normality: `scipy.stats.shapiro`
* Equal variance: `scipy.stats.levene`

### Example code

from scipy.stats import f_oneway, shapiro, levene

groups = [ group["exam_score"].dropna() for name, group in df.groupby("teaching_method") ]

normality for each group

for name, group in df.groupby("teaching_method"): print(name, shapiro(group["exam_score"].dropna()))

equal variance

print(levene(*groups, center="median"))

ANOVA

f_stat, p_value = f_oneway(*groups)

print(f_stat, p_value)

### Important note

ANOVA tells you whether **at least one group differs**, but it does not tell you exactly **which groups differ**. If ANOVA is significant, use a post-hoc test such as Tukey HSD.

---

## 3. Mann-Whitney U Test

**SciPy function:** `scipy.stats.mannwhitneyu`
**Official documentation:** [https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mannwhitneyu.html](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mannwhitneyu.html)

### When to use

Use Mann-Whitney U when you want to compare **two independent groups**, but the numeric outcome is **not normally distributed** or is ordinal.

### Example question

Do male and female students differ in screen time?

### Variables

* Group variable: `gender` → Male / Female
* Outcome variable: `screen_time` → numeric but skewed

### Hypotheses

* H0: The two groups come from the same distribution.
* H1: The two groups come from different distributions.

### Assumptions

1. The dependent variable is numeric or ordinal.
2. The two groups are independent.
3. Observations are independent.
4. The distributions should have a similar shape if you want to interpret the result as a median/location difference.

### Example code

from scipy.stats import mannwhitneyu

male = df.loc[df["gender"] == "Male", "screen_time"] female = df.loc[df["gender"] == "Female", "screen_time"]

u_stat, p_value = mannwhitneyu( male, female, alternative="two-sided", nan_policy="omit", method="auto" )

print(u_stat, p_value)

---

## 4. Kruskal-Wallis Test

**SciPy function:** `scipy.stats.kruskal`
**Official documentation:** [https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kruskal.html](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kruskal.html)

### When to use

Use Kruskal-Wallis when you want to compare **three or more independent groups**, but the numeric outcome is **not normally distributed** or is ordinal.

### Example question

Do Online, Classroom, and Hybrid teaching methods differ in stress levels?

### Variables

* Group variable: `teaching_method` → Online / Classroom / Hybrid
* Outcome variable: `stress_level` → numeric but skewed

### Hypotheses

* H0: The groups have the same distribution.
* H1: At least one group differs.

### Assumptions

1. The dependent variable is numeric or ordinal.
2. Groups are independent.
3. Observations are independent.
4. Each group should usually have at least 5 observations.
5. If distributions have similar shapes, the test can be interpreted as comparing medians.

### Example code

from scipy.stats import kruskal

groups = [ group["stress_level"].dropna() for name, group in df.groupby("teaching_method") ]

h_stat, p_value = kruskal(*groups, nan_policy="omit")

print(h_stat, p_value)

### Important note

Kruskal-Wallis tells you whether **at least one group differs**, but not exactly **which groups differ**. If significant, use post-hoc pairwise tests such as Dunn’s test.

---

## 5. Wilcoxon Signed-Rank Test

**SciPy function:** `scipy.stats.wilcoxon`
**Official documentation:** [https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wilcoxon.html](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wilcoxon.html)

### When to use

Use Wilcoxon signed-rank test when you have **two paired/repeated measurements** from the same people, and the differences are **not normally distributed**.

### Example question

Did students’ stress levels change after an intervention?

### Variables

* Before variable: `stress_before`
* After variable: `stress_after`

### Hypotheses

* H0: The median difference between paired measurements is zero.
* H1: The median difference between paired measurements is not zero.

### Assumptions

1. The outcome is numeric or ordinal.
2. The two measurements are paired.
3. Pairs are independent of other pairs.
4. The distribution of differences is approximately symmetric.
5. The test is used when the paired differences are not normally distributed.

### Example code

from scipy.stats import wilcoxon

stat, p_value = wilcoxon( df["stress_before"], df["stress_after"], alternative="two-sided", nan_policy="omit", method="auto" )

print(stat, p_value)

---

## 6. Friedman Test

**SciPy function:** `scipy.stats.friedmanchisquare`
**Official documentation:** [https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.friedmanchisquare.html](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.friedmanchisquare.html)

### When to use

Use Friedman test when you have **three or more paired/repeated measurements** from the same people and the data is **not normally distributed**.

### Example question

Did students’ motivation scores change across Week 1, Week 2, and Week 3?

### Variables

* `motivation_week1`
* `motivation_week2`
* `motivation_week3`

### Hypotheses

* H0: The repeated measurements have the same distribution.
* H1: At least one repeated measurement differs.

### Assumptions

1. The outcome is numeric or ordinal.
2. The same participants are measured three or more times.
3. Observations are paired/repeated.
4. Participants are independent of each other.
5. SciPy notes that the p-value is more reliable with larger sample/repeated-measure conditions.

### Example code

from scipy.stats import friedmanchisquare

stat, p_value = friedmanchisquare( df["motivation_week1"], df["motivation_week2"], df["motivation_week3"], nan_policy="omit" )

print(stat, p_value)

---

## 7. Chi-square Test

There are two common Chi-square cases.

---

### 7.1 Chi-square Test of Independence

**SciPy function:** `scipy.stats.chi2_contingency`
**Official documentation:** [https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html)

### When to use

Use Chi-square test of independence when you want to test whether **two categorical variables are associated**.

### Example question

Is teaching method associated with pass/fail status?

### Variables

* Variable 1: `teaching_method` → Online / Classroom / Hybrid
* Variable 2: `passed` → Pass / Fail

### Hypotheses

* H0: The two categorical variables are independent.
* H1: The two categorical variables are associated.

### Assumptions

1. Both variables are categorical.
2. Observations are independent.
3. Expected frequencies should generally be at least 5 in cells.
4. If expected counts are too small, consider Fisher’s exact test for 2x2 tables or combine rare categories.

### Example code

import pandas as pd from scipy.stats import chi2_contingency

ct = pd.crosstab(df["teaching_method"], df["passed"])

chi_stat, p_value, dof, expected = chi2_contingency(ct)

expected_df = pd.DataFrame( expected, index=ct.index, columns=ct.columns )

print("Observed counts:") print(ct)

print("Expected counts:") print(expected_df.round(2))

print("Minimum expected count:", expected.min()) print(chi_stat, p_value, dof)

---

### 7.2 Chi-square Goodness-of-Fit Test

**SciPy function:** `scipy.stats.chisquare`
**Official documentation:** [https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chisquare.html](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chisquare.html)

### When to use

Use Chi-square goodness-of-fit when you have **one categorical variable** and want to check whether observed counts match expected counts.

### Example question

Are Pass and Fail outcomes equally distributed?

### Variables

* One categorical variable: `passed`

### Hypotheses

* H0: Observed category frequencies match expected frequencies.
* H1: Observed category frequencies do not match expected frequencies.

### Assumptions

1. The variable is categorical.
2. Observations are independent.
3. Observed and expected frequencies should generally be at least 5.
4. Total observed and expected counts should match.

### Example code

from scipy.stats import chisquare

observed = df["passed"].value_counts()

chi_stat, p_value = chisquare(observed)

print(observed) print(chi_stat, p_value)

---

## 8. Pearson Correlation

**SciPy function:** `scipy.stats.pearsonr`
**Official documentation:** [https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pearsonr.html](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pearsonr.html)

### When to use

Use Pearson correlation when you want to measure the **linear relationship** between two numeric variables.

### Example question

Is study time linearly related to exam score?

### Variables

* Variable 1: `study_hours` → numeric
* Variable 2: `exam_score` → numeric

### Hypotheses

* H0: There is no linear correlation between the two variables.
* H1: There is a linear correlation between the two variables.

### Assumptions

1. Both variables are numeric.
2. The relationship is linear.
3. There should be no extreme outliers.
4. Variables should be approximately normally distributed for classical inference.
5. Observations are independent.

### Example code

from scipy.stats import pearsonr

r, p_value = pearsonr( df["study_hours"], df["exam_score"], alternative="two-sided" )

print(r, p_value)

### Interpretation

* `r` close to +1 → strong positive linear relationship
* `r` close to -1 → strong negative linear relationship
* `r` close to 0 → weak/no linear relationship

---

## 9. Spearman Correlation

**SciPy function:** `scipy.stats.spearmanr`
**Official documentation:** [https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.spearmanr.html](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.spearmanr.html)

### When to use

Use Spearman correlation when you want to measure a **monotonic relationship** between two variables, especially when data is non-normal, skewed, ordinal, or contains outliers.

### Example question

Is screen time related to exam score in a rank-based/non-normal way?

### Variables

* Variable 1: `screen_time` → numeric but skewed
* Variable 2: `exam_score` → numeric

### Hypotheses

* H0: There is no monotonic correlation between the two variables.
* H1: There is a monotonic correlation between the two variables.

### Assumptions

1. Variables are numeric or ordinal.
2. Observations are independent.
3. The relationship should be monotonic.
4. Normality is not required.

### Example code

from scipy.stats import spearmanr

rho, p_value = spearmanr( df["screen_time"], df["exam_score"], nan_policy="omit", alternative="two-sided" )

print(rho, p_value)

### Interpretation

* `rho` close to +1 → strong positive monotonic relationship
* `rho` close to -1 → strong negative monotonic relationship
* `rho` close to 0 → weak/no monotonic relationship

---

## 10. Shapiro-Wilk Normality Test

**SciPy function:** `scipy.stats.shapiro`
**Official documentation:** [https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.shapiro.html](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.shapiro.html)

### When to use

Use Shapiro-Wilk test to check whether a numeric variable is approximately normally distributed.

### Example question

Is exam score normally distributed?

### Hypotheses

* H0: The data comes from a normal distribution.
* H1: The data does not come from a normal distribution.

### Assumptions / notes

1. The variable should be numeric.
2. The sample should have at least 3 observations.
3. For very large samples, small deviations can become statistically significant.
4. Always combine Shapiro with visual checks such as histogram, KDE plot, or Q-Q plot.

### Example code

from scipy.stats import shapiro

stat, p_value = shapiro(df["exam_score"])

print(stat, p_value)

### Interpretation

```text
p > 0.05  → normal enough
p <= 0.05 → not normally distributed

11. Levene Test for Equal Variances

SciPy function: scipy.stats.levene Official documentation: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.levene.html

When to use

Use Levene test before t-test or ANOVA when you need to check whether group variances are similar.

Example question

Do Male and Female exam scores have similar variance?

Hypotheses

H0: Group variances are equal.
H1: At least one group variance is different.

Example code

from scipy.stats import levene

male = df.loc[df["gender"] == "Male", "exam_score"]
female = df.loc[df["gender"] == "Female", "exam_score"]

stat, p_value = levene(
    male,
    female,
    center="median",
    nan_policy="omit"
)

print(stat, p_value)

Interpretation

p > 0.05  → equal variance assumption is okay
p <= 0.05 → variances are different

Final Memory Map

Normal + 2 independent groups      → t-test
Normal + 3+ independent groups     → ANOVA
Non-normal + 2 independent groups  → Mann-Whitney U
Non-normal + 3+ independent groups → Kruskal-Wallis
Non-normal + 2 paired measures     → Wilcoxon
Non-normal + 3+ paired measures    → Friedman
Category vs category               → Chi-square independence
One categorical variable counts    → Chi-square goodness-of-fit
Numeric vs numeric linear          → Pearson
Numeric/ordinal monotonic          → Spearman
Normality check                    → Shapiro
Equal variance check               → Levene

Main SciPy pages used: `ttest_ind`, `f_oneway`, `mannwhitneyu`, `kruskal`, `wilcoxon`, `friedmanchisquare`, `chi2_contingency`, `chisquare`, `pearsonr`, `spearmanr`, `shapiro`, and `levene`. :contentReference[oaicite:0]{index=0}
::contentReference[oaicite:1]{index=1}

Parametric and Non-Parametric statistical tests

About this project

Statistical Tests: When to Use Them, Assumptions, and SciPy Links

Quick Decision Rule

1. Independent Samples t-test

When to use

Example question

Variables

Hypotheses

Assumptions

Related assumption checks

Example code

normality for each group

equal variance

ANOVA

11. Levene Test for Equal Variances

When to use

Example question

Hypotheses

Example code

Interpretation

Final Memory Map