Let's assume that Extroversion and Introversion are on a scale - something like a bell curve- for the population. Sound fair?

Okay, so most people will fall within 1 standard deviation of what we could call "50/50" meaning no preference to either "E" or "I." Others will be farther out on the scale, with strong preferences for one or the other.

Prediction: given the margin of error for ANY statistic, a great number of individuals will test slight preferences for 'E' or 'I.' When it comes to test-retest reliability, a "change" from 'E' to 'I' (or any of the other categories) is not necessarily evidence of an unreliable test. Saying you have a slight preference for introversion then later saying you have a slight preference for extroversion (which is what the test actually does) is different than saying "you are always an introvert" or "you are always an extrovert" which leads me to...

Your flawed reasoning: the categories are not concrete. They are on a scale, not dissimilar to a bell curve. Reading this test as "I am xxxx" versus "I tend to be xxxx" is the issue.

