Personality

How Many Personality Configurations Exist?

Clustering analysis produces a different answer than does factor analysis.

Posted Aug 03, 2019

Clustering analysis is a powerful tool (see Tan & Mueller, 2017) but it doesn’t get used as often as it should. It can provide us with different insights than factor analysis, even when applied to the same data.

Here’s an example: personality theory. Factor analysis has helped to generate a set of independent traits: the Big-5 personality theory (see John & Srivastava, 1999) with positive dimensions Agreeableness, Conscientiousness, Extroversion, Emotional stability (Neuroticism), and Openness to experience. These five traits give us 32 possible binary personality types. When we (Mueller & Tan) examined 1282 responses (of mostly undergraduate students) to the 44-item Big Five personality inventory split by the midline response, we indeed found individuals that fit into each of the 32 profiles. 

However, 75 percent of the responses fell into just six patterns, and only four personalities described more than 5 percent of respondents each. Therefore, instead of the simplistic five factors, or the overwhelming 32 alternatives, we decided to use clustering analysis to get a better picture of the clusters of personalities. We used a method with the esoteric name “k-means clustering,” which is simply the best way to divide people into a fixed number (k) of groups.

When we examined a 5-cluster solution using k-means clustering in the figure below, we see several things that are not exposed by factor analysis. Cluster 4 shows people who score highly on all dimensions, and Cluster 2 shows people close to the midline on all dimensions. But we see two groups that have low-extroversion: Cluster 5, who also have low emotional stability (neuroticism), and Cluster 3, who have more positive values overall. Finally, Cluster 1 has moderately-high scores on everything but emotional stability. 

This analysis shows a small number of personality profiles—five in this case—account for most of the patterns. These results can help inform personality theory in new ways. It may also be useful for psychotherapists seeking to apply Big-5 personality theory to their clients.

Mueller & Tan, 2019
Comparison of 5 clusters
Source: Mueller & Tan, 2019

Putting Clustering Analysis into Practice

If we want psychological researchers to use tools like clustering analysis, we have to help people who are experts in the domain—but unfamiliar with the statistical method—by teaching them how to use, interpret, and make inferences with the tool.

Typically, when instructors try to train others in how to use an analytic tool, they focus on the mechanics of executing an analysis. But they may ignore many other issues—typically, these are the ones we have to learn after we understand the mechanics of setting up and running the tool. 

To get at these other issues, we used a cognitive tutorial method developed by Mueller and Klein (2011). The cognitive tutorial seeks to develop the cognitive training that people need for mastering and applying intelligent software tools.

The cognitive tutorial goes beyond the simple mechanics covered in most user guides and tutorials. The focus of the cognitive tutorial is that in order to teach someone how to use a tool, you cannot just show them how it works; you must also show them how it does not work.

We developed a cognitive tutorial for clustering analysis as a demonstration project. We started by collecting cases from a number of sources that help us identify how the k-means clustering system does not work: errors, misconceptions, and other problems people have using the system. We examined course books, blogs, YouTube videos, and online help forums to identify potential learning objectives. In addition, we also conducted a round-table discussion of members of a psychology graduate-level statistics class that covered k-means clustering. We also conducted think-aloud interviews with four class members as they applied k-means clustering to two data sets.

Using these sources, we generated around 50 candidate learning objectives. Of these different sources, the most valuable came from the think-aloud interviews, because they isolated specific misconceptions that experienced novices had when applying the clustering analysis algorithm. 

On the basis of this, we have identified a handful of specific learning objectives suitable for a cognitive tutorial that illustrate where the k-means clustering algorithm breaks down, what its limitations are, and what some potential workarounds might be for when you run into trouble. We are currently putting the cognitive tutorial for the clustering analysis tool in practice in order to evaluate it. 

To examine how the learning objectives that we derived for the cognitive tutorial differ from standard approaches, we examined 32 existing online training approaches to k-means clustering (mostly book chapters, blogs, and video tutorials), summarized below:

Mueller/Tan 2019
Learning objectives derived from the cognitive tutorial
Source: Mueller/Tan 2019

In identifying our own candidate cognitive tutorial lessons, we focused on several areas where people either didn’t correctly apply the k-means clustering algorithm, or else interpreted it incorrectly.

One of the common problems is that people don’t select the correct number of clusters. Although most training (74 percent) claims to cover this issue, we found that people are often willing to accept clustering partitions that aren’t justified. By showing users some improper solutions in our cognitive tutorial, we expect to help these users adopt a better set of clusters. 

Another common issue is how to determine which features are relevant, and how they should be combined. Only 32 percent and 25 percent of past trainings covered these two issues, respectively. And only one of the past training programs (6 percent) provided any examples of failures. In fact, relatively few of the past training programs (36 percent) covered limitations of the algorithms in any way.  

Finally, one important lesson for us was seeing the advantages of applying the clustering analysis tool to real data within a domain of expertise. Many teaching examples use arbitrary and even artificial data, which data science and statistics students are comfortable with. However, we found that our psychology domain experts could do a better job reasoning about solutions when the variables and groups were meaningful and interpretable, rather than artificial. The cognitive tutorial should make it easier to work with meaningful data in the future.

The biggest gap in existing training is exactly where our cognitive tutorial focuses: showing how the tool might fail. Even after a learner is shown a typical solution that works, they are likely to mis-apply the tool when facing other circumstances. In our past research on the cognitive tutorial, we have argued that by showing users these failure modes and boundaries, they will better anticipate problems, be less impaired when they meet those issues, and be able to use work-arounds that allow them be successful at using the tool, despite its limitations.

This essay was written by Shane Mueller and Sarah (Yin-Yin) Tan of the Department of Cognitive and Learning Sciences at Michigan Technological University. For more information, please contact Shane at shanem@mtu.edu.

Facebook image: Dmitrijs Bindemanis/Shutterstock

References

John, O. P., & Srivastava, S. (1999). The Big-Five trait taxonomy: History, measurement, andtheoretical perspectives. In L. A. Pervin & O. P. John (Eds.), Handbook of personality: Theory and research (Vol. 2, pp. 102–138). New York: Guilford Press.

Mueller, S.T. & Klein, G. (March-April 2011). Improving Users’ Mental Models of Intelligent Software Tools," Intelligent Systems, IEEE, 26(2), 77—83.

Tan, Y-Y & Mueller, S. T. (2016). Adapting cultural mixture modeling for continuous measures of knowledge and memory fluency. Behavior Research Methods,48, 843–856. DOI 10.3758/s13428-015-0670-4