We’re not worried about the overall normality of the dependent variable (that is, the normality of all values—oaks and birches—taken together). We care about normality around conditional means: Are values normally distributed at a given value of the predictor (“within oaks” or “within birches”)?
A between-subjects t-test is equivalent to a linear regression model with a continuous outcome and a binary predictor, and we can write it as:
\[height_i = B_0 + B_1species_i + e_i ~~~~~~~~~~ e \sim N(0, \sigma^2)\] The normality assumption here is attached to the errors (residuals). This implies normality at each value of the predictor variable (i.e., around conditional means). The model assumes that height values are normally distributed around \([B_0 + B_1(0)]\) and around \([B_0 + B_1(1)]\)).
In this case, since you only have two levels of the predictor variable, you could examine normality among oaks and among birches. However, the general approach to evaluating the normality assumption is to look into the normality of the errors (residuals) of the model or test.
Say that I have y values for two groups, one with mean = 0 and SD = 1; the other with mean = 4 and SD = 1.
set.seed(100)
d <- data.frame(y = c(rnorm(500, 0, 1), rnorm(500, 4, 1)),
group = c(rep('a', 500), rep('b', 500)))
The y values, when considered altogether, are not normally distributed.
hist(d$y)
However, the y values at a given level of the predictor are normally distributed.
hist(d[d$group == 'a', 'y'])
hist(d[d$group == 'b', 'y'])
So, when we fit a linear model predicting y from group (equivalent to the between-subject t-test), the errors are extraordinarily normal.
mod <- lm(y ~ group, data = d)
summary(mod)
##
## Call:
## lm(formula = y ~ group, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.2832 -0.6567 0.0106 0.6875 3.3418
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.03761 0.04605 -0.817 0.414
## groupb 4.10882 0.06512 63.095 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.03 on 998 degrees of freedom
## Multiple R-squared: 0.7996, Adjusted R-squared: 0.7994
## F-statistic: 3981 on 1 and 998 DF, p-value: < 2.2e-16
# Equivalent to:
t.test(d$y ~ d$group, var.equal = T)
##
## Two Sample t-test
##
## data: d$y by d$group
## t = -63.095, df = 998, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group a and group b is not equal to 0
## 95 percent confidence interval:
## -4.236616 -3.981034
## sample estimates:
## mean in group a mean in group b
## -0.0376074 4.0712176
hist(resid(mod))
I should note too that, on the whole, the normality assumption is fairly weak. Parametric tests tend to be pretty robust to error-normality violations.