Measuring Style in the Book of Mormon
This blog post is long (3000 words, 28 images), and it is a quick, poor, and short summary of stylometry relevant to the Book of Mormon. I’ve left out some topics that were only covered with words. I’ve not included one Master’s Thesis that I was shown more recently that confirms 19th century word/phrase choice for most of the language used in dictating the Book of Mormon, nor examples showing extensive, anachronistic17th-18th century syntax in the Book of Mormon, but neither of these has significant new impact on what stylometry shows us about the Book of Mormon. It was written by several authors, and we don’t have any other writings from these authors. Enjoy the summary, or head on over here http://www.exploringsainthood.org/measuring-style-in-the-book-of-mormon/ and follow my whole exploration through, or download a pdf of all 22 posts here 4.4 MB PDF
Data Selection
Breaking up the Book of Mormon is a hard question. Here’s how one researcher did it:Note that all the segments have close to 10,000 words. This means that there are plenty of words to achieve statistical significance for many kinds of tests and comparisons.
Here’s what that author found:
Principle Component Analysis of Five Partially Independent Vocabulary Richness Variables
Holmes claimed Joseph had a personal voice, Isaiah had a voice, and all Mormon scripture had one single, additional voice—the prophetic voice of Joseph.
Reexamination of the Principle Component analysis
One “prophetic voice”, or multiple authors? Looks like many to me.
Put some normalized numbers of the variability, and it looks like 4-10 authors for Mormon scripture.
What Holmes Didn’t Do
Holmes could have easily predicted a number of expected authors based on the methods he used, like this author did before him:
That worked well.
Here’s what the table would have looked like if Holmes had done all his work:
Number of Selections |
Number of “Observed” Authors |
Holmes Conclusion |
Expected (Sichel Method) |
Names of Observed Authors |
Names of Authors Proposed by Holmes |
1 |
3 |
0 |
? |
Lehi, Jacob, Abraham |
|
2 |
2 |
0 |
? |
Alma, Moroni |
|
3 |
4 |
2 |
? |
Isaiah, Joseph Smith, Nephi, Doctrine and |
Isaiah, Joseph Smith |
4 |
0 |
0 |
? |
||
5 |
1 |
0 |
? |
Mormon |
|
18 |
0 |
1 |
? |
Prophetic Voice |
Next time, finish the work. And that’s poor peer reviewing.
Joseph Smith and the Prophetic Voice
The Prophetic Voice
Joanna Southcott generated different styles that Holmes called her “prophetic voice”. Notice the three different genres:
Even with three genres, Southcott couldn’t match Joseph with just his personal writings and the Doctrine and Covenants. Here’s a look at the numbers with Isaiah used to normalize the measures of variability between the second study (excluding the Book of Mormon) and the first study (including the Book of Mormon):
Sources |
1st Principle Component Variability |
2nd Principle Component Variability |
Isaiah |
1.0 |
0.4 |
Joanna Southcott |
4.8 |
3.8 |
Joseph Smith (personal) |
2.1 |
1.2 |
Joseph Smith and Doctrine and Covenants |
8.7 |
2.4 |
Joseph Smith and Mormon Scripture |
9 |
6 |
Joseph produced double the variability in the first two components, not including the third and fourth that Holmes showed were significant for the Book of Mormon (but ignored for Southcott).
Vocabulary Richness and Non-contextual Words
Holmes switched to non-contextual words, later. They are more discriminating. Kind of funny that non-contextual words were the measure used by Larsen et al. and by Hilton and coworkers in the studies of Book of Mormon stylometry that Holmes claimed his results disproved. We’ll take his apology as implicit.
Stylometry and Forgeries
Sometimes fraud or imitation fools stylometry, sometimes it doesn’t. Here’s a summary of factors involved in fooling stylometry and how they apply to Book of Mormon studies:
Comparison of Adversarial Authorship Studies with Mormon Scripture Historical and Stylometric Studies
Fooling Stylometric Measures | Mormon Scripture | Favors/Disfavors Fraud |
6500 word reference samples | Multiple 10000 word reference samples | Disfavors. Longer reference texts give more information regarding authors’ styles. |
500 word samples for classification | 10000 word samples for classification | Disfavors. It is presumably harder to hide your style over longer texts. |
Simple, familiar topics | Complex, unfamiliar topics | Disfavors. Greenstadt and her coauthors assume it is harder to concentrate on obfuscation or imitation while inventing or remembering complex, new material. |
Written in short times | Written or dictated in short times | Neutral. Long times and multiple revisions are not necessary to fool some stylometric measurements. |
“Dumbed down” to obscure personal style | Less rich vocabulary in Mormon scripture than in Joseph Smith’s personal papers. The same is seen for Joanna Southcott’s prophetic voice, although to a lesser degree. | Favors. This was the most common technique to obscure a personal style. |
Distictive authorial style for imitation | No historically verified texts being imitated, other than the Bible, which is quoted extensively and not disguised | Disfavors. Joseph Smith apparently created distinct and consistent authorial styles for Nephi, the Doctrine and Covenants, Moroni, and Alma, and nearly consistent for Mormon—all without having any known reference authors to copy. This is the most readily testable question, however, with hundreds of thousands of books from Joseph Smith’s time now available in electronic format. |
Closed set of authors | Open set of authors | Disfavors. Using a closed set of authors forces the results to select the closest style without allowing for the possibility that none of the styles match. The Pauline Epistles, Sherlock Holmes, and Jane Austen studies used open set methods and were all able to identify authors as different from the imitated author. |
Adversarial authorship attack known | No direct evidence of fraud | Disfavors. Stylometric methods are demonstrably highly effective at identifying authors when sample sizes are as large as those from Mormon scripture. The only time this is known to be untrue is when authors are deliberately disguising their style or copying another, and even then they often fail to disguise their style. |
Machine translation doesn’t disguise style | Claimed to be translations | Disfavors. Authors’ styles are preserved through multiple machine translations, consistent with Joseph Smith having “translated” texts by multiple authors. |
Automated selection of machine identifiable stylometric features | Stylometric features including the very sensitive, noncontextual word pairings | Disfavors. Forecasting a little to papers not yet presented, but two Book of Mormon stylometry papers use noncontextual word pairings to test authorship. This method was employed in the Sherlock Holmes and Jane Austen studies, but not in the adversarial authorship studies. |
Nearly every consideration mentioned in studies on stylometric fraud disfavors the presence of fraud in Book of Mormon styles.
Faulkner’s “Fraudulent” Wordprints
Some LDS researchers went looking for authors who created multiple styles when examined by strong stylometric measures (non-contextual word pairs). They did find one—Faulkner. They also found two more who were explicitly trying, but failed—Mark Twain and Robert Heinlein.
And Faulkner did it by making conscious contextual word pairs that are typically subconscious and non-contextual. He was imitating dialects, and he apparently had a really good ear for it. Add this to the studies on fraudulent stylometry, and it’s getting really hard to argue that Joseph made all the styles himself.
Joseph’s Personal Style
Joseph did a lot of dictations. How did that affect his personal style?
“Most Likely” Authors of Joseph’s Dictations using closed-set stylometry
The following is an excerpt from the first table in a 2013 study by Jockers. It shows the number of the 96 texts dictated by Joseph with the author identified by stylometry that had a style most similar, or second most similar, to the text:
Table 1 | ||
Identified Author | 1st choice | 2nd choice |
Barlow | 1 | 0 |
Cowdery | 32 | 21 |
IsaiahMalachi | 3 | 1 |
Longfellow | 2 | 4 |
Pratt | 24 | 12 |
Rigdon | 12 | 10 |
Smith | 15 | 25 |
Spalding | 7 | 23 |
Notice that:
- 13/96 (Barlow, Longfellow, Isaiah/Malachi, and Spalding) were attributed to authors with no connection to the texts.
- 32/96 were assigned to Cowdery
- 24/96 were assigned to Pratt
- 12/96 were assigned to Rigdon
- 15/96 were assigned to Smith
13.5 % ‘wrong’ (assigned to controls), 15.6 % ‘right’ (assigned to Smith). How many of the texts assigned to Cowdery, Pratt, and Rigdon were penned by those scribes?
Scribe | # texts assigned to scribe | # of those texts for which scribe acted as scribe |
Cowdery | 32 | 2 |
Pratt | 24 | 1 |
Rigdon | 12 | 0 |
So Rigdon and Spalding yielded a total of 19.8 % false positives. 81.2 % of the passages were objectively misattributed. Whatever this method is, however good it is elsewhere, it’s hopeless for answering questions about Joseph’s dictation—which includes all of Mormon scripture.
Stylistic Overlap and Joseph’s Dictations
An Alternate Interpretation
If it’s even worth doing, here’s one way to explain how closed-set stylometry could so badly misattribute Joseph’s writings to others:
All it would take is Joseph’s style being more diffuse than those of the scribes, and their having some overlap in style. Then Joseph’s style would be attributed first to the scribe with a more focused style, and only later to Joseph. And if scribes were imperfect, not catching every word exactly as it was spoken? That could diffuse Joseph’s style even further without implying that Joseph had no personal style.
Dual Authored Book of Mormon
Closed-set Stylometry and the Book of Mormon
Here’s one of those charts, again, that had 80% errors on the last problem it was applied to. This from a 2008 paper by Jockers et al.:
# of Book of Mormon Chapters Assigned to Author | ||
Proposed Author | 1st Choice | 2nd Choice |
Rigdon | 93 | 104 |
Isaiah & Malachi | 63 | 38 |
Spalding | 52 | 58 |
Cowdery | 20 | 17 |
Pratt | 9 | 15 |
Barlow | 0 | 1 |
Longfellow | 2 | 6 |
Of the text that isn’t quotes from Isaiah or Malachi:
45.8% assigned to Rigdon
25.6% to Spalding
13.3% to Isaiah/Malachi
5.4% to Pratt/Longfellow
So 18.7% of the text is objectively (by any measure) false positives.
From the 2013 study we saw that Rigdon and Spalding showed up as false positives 19.8% of the time. In addition, Cowdery showed up most often when he wasn’t even scribe for the texts—31.3% of the total texts. We also saw that the NSC method got it “right” only 15.6% of the time. Something broke between the control tests and application to Joseph Smith’s dictations.
How Closed-sets Generate Misattribution
A closed-set method will always give a positive answer. Here’s a simplified visual representation of what this closed-set study has definitively shown us:
Chapter 1 would be assigned to Cowdery, chapter 5 to Pratt, 6 to Spalding, etc., despite none of the chapters matching the styles of the candidate authors.
The Book of Mormon Still Has Many Authors
The closed-set method may be poorly applied, but it did assign chapters of the Book of Mormon to 5 different authors.
Opening Authorship Possibilities
Stylometry is a multidimensional statistical problem. You observe all the features in multiple dimensions and look for clustering. Arrows represent the stylometric features vectors of four hypothetical authors. If the arrows are for different authors make different clusters, then you can tell them apart with your set of stylometric measurements.
The same thing can be done in classifying tumor cells.
Test a fourth tumor type against three known tumor types in a closed-set problem, and the results tell you it is one of the first three types. Modify the method to be open-set, and the fourth type (+’s) clusters in one corner, while types 1-3 cluster in the middle. You can tell there is a new type.
Sidney Rigdon Wrote the Federalist Papers?
The closed-set methods applied to the Federalist Papers (instead of the Book of Mormon) showed that Sidney Rigdon wrote most of Alexander Hamilton’s Federalist Papers.
The open-set method does much better, telling us there is an unknown author (Hamilton):
Include Hamilton in the closed-set of authors, and the closed-set method works fine:
Closed-set methods are a bad idea if there might be an unknown author.
Book of Mormon Authors Unknown
Use open-set methods on the Book of Mormon and what do you get? The Book of Mormon was written by an unknown author or authors. None of the 19th century authors fit the bill.
The closed-set study had plenty of evidence present in its own results to show that it was misapplied. Only the chapters with values above the line at 1.9 (figure below) could be confidently attributed by the closed-set method, and almost all of those were the Isaiah and Malachi quotes.
Again, the open-set method does better. And you will notice that Book of Mormon styles are all over the map compared with 19th century styles.
One “prophetic voice”? Rigdon and Spalding? Try again.
Nephi is not Alma is not Joseph
Using frequencies of non-contextual word pairs, the same author uses word pairs with almost the exact same frequency over different texts (0-6 rejections, or statistically significant differences in frequencies). Different authors often have 7 or more differences. When avoiding changes in genre, which can confuse stylometry, comparisons can be made between Joseph, Nephi, and Alma. Nephi matches his own style, as does Alma.
When you compare Nephi and Alma with each other they are different:
They are also different from Joseph, Oliver, and Solomon Spalding:
Once again, multiple authors in the Book of Mormon.
The Highly Criticized 1st Book of Mormon Study
While the results of the oldest Book of Mormon stylometry study overstate the evidence for multiple authorship (not intentionally—the field has progressed a lot since 1980), it too found multiple authorship and a lack of 19th century authorship:
The Late War, The Book of Mormon, and Rare n-grams
Stylometric comparisons with The Late War uncovered the presence of many many short phrases in the Book of Mormon that are nearly unique to early 19th century pseudo-biblical writings. The Book of Mormon is written in Pseudo-Biblical, 19th century language. That’s terribly unsurprising, but truly interesting. As for the rest of the comparisons, keep in mind these observations if you choose to sift through them:
- The Book of Nullification, published in 1830, has twice as many similarities with the Book of Mormon as the Book of Mormon has with The Late War.
- The Johnsons’ second study, including more books and an improved method of comparison, identified three other books—which had never previously been proposed as source material for the Book of Mormon—as being more similar (and thus more closely related) to the Book of Mormon than The Late War.
- Their study does not include any texts of fewer than 15,000 words. “Unique” matches with the Book of Mormon may be much less unique than is indicated if the body of smaller texts were included.
The Urantia Book and the Book of Mormon
The Urantia book is another purportedly multi-authored, revealed text. Some stylometric measures seemingly confirm multiple authorship, however, the differences are plausibly explained by shifts in genre and the passage of time—two criteria that cannot explain shifts in Book of Mormon styles.
Could a Single Author Produce the Variety of Stylometric Features? | ||
Fooling Stylometric Measures | Mormon Scripture | Urantia Book |
6500 word reference samples | Multiple reference samples with as many as 10–30,000 words per proposed author | Multiple papers of at least 1000 words |
500 word samples for classification | 200–10,000 word samples for classification (method dependent) | At least 1000 words for classification. 1000 is better than 500, but weaker than 2,000–10,000. |
Simple, familiar topics | Complex, unfamiliar topics. Joseph Smith is reported to have told stories about some topics treated in the Book of Mormon, but had not previously written anything of significant length. | Complex topics of unknown familiarity. Sadler (the Urantia Book recorder) was demonstrably well-read and previously or simultaneously wrote on several topics related to the Urantia Book. |
Written in short times | Written or dictated in short times. No time available for significant, subconscious shifts in authorial style, and no rewriting. | Written in unknown amounts of time (possibly short), but over many years, thus allowing for observed linear shifts in authorial style. |
“Dumbed down” to obscure personal style | Less rich vocabulary in Mormon scripture than in Joseph Smith’s personal papers. | Styles changed to match genre, a conscious decision influencing style that even untrained authors can affect. |
Distictive authorial style for imitation | No historically verified texts being imitated (except the Bible) | Sadler is known to have read and possessed numerous texts on topics treated in the Urantia Book, however the only clear imitation is the Bible |
Closed set of authors | Open set of authors | Open set of authors |
Adversarial authorship attack known | No direct evidence of fraud | No direct evidence of fraud |
Machine translation doesn’t disguise style | Claimed to be translations, suggesting authorial styles should be preserved. | Claimed to be revelations. We don’t know what to expect stylometrically from revelations from different sources. |
Genre controlled | Genre controlled for in some studies, revealing multiple authorship. | Genre controlled for in one study, consistent with single authorship. |
One author didn’t write the Book of Mormon. Two didn’t, either. And we don’t have anything else written by the people who did. For me, this is fact. Explain it how you will.
Despite your self-deprecating disclaimer, this article is well thought-out. Your willingness to address stylometry arguments with the intellectual precision required for such a task is much appreciated. I much enjoy your posts.
Thank you.
You want an eye opener, you should watch or read Hugh Nibley info about the BoM
I love Nibley on the Book of Mormon. I especially love how he interprets it as a commentary for our day.
Thank you for this. I actually thought I was a pretty smart person until I read this post and got dizzy. 😉 But I get the point and love the work you put into this. The final paragraph was a great reward. I'm grinning.
There are so many holes in these too brief explanations. If you got the main point, that’s what I was hoping for with this. I’m sure parts of it simply don’t make sense without going back and reading the original posts and/or papers.
This is really fascinating stuff, Jonathan.
I had looked at the early stylometery studies and did find them wanting.
While I know I will need to look at what you’ve presented here more carefully, there does seem to be something worth that more careful reading.
I am interested in available data sets if I want to replicate your work. Where can I find the specific Book of Mormon divisions used for different parts of the analysis? For instance, when you did the “Joseph Smith and the Prophetic Voice” section, I want to know what counts under “Mormon Scripture” for “Joseph Smith and Mormon Scripture”. Did scripture that includes large quotations of other scripture get included? What happens if I include Joseph Smith (personal) and Isaiah?
Hi Jacob,
I’m glad to see your interest. None of this work is mine. If you follow the links through my series at exploring sainthood, you can find all of the references I used, including links to many of the articles. I can provide access to other articles I used that aren’t free if you message me privately for specific articles.
I looked at the wherefore/therefore shift in a little more detail and shared my thoughts here: http://jonathan.metacannon.net/2015/08/mosiah-priority-or-changing-authors.html
Your examples of phrases indicating unity of authorship are all contextual words. As such they are not strong indicators of unity of authorship. It is as or more likely that they indicate unity of genre, or unity of translator in my understanding of stylometry. There is certainly more work to be done, though, for those who are interested. While I think it’s a fool’s errand to look for single authorship, I’d like to see if there is a natural explanation of syntactical oddities observed by Stanford Carmack. There are also many interesting unasked questions about how translation plays into stylometry that could have bearing on understanding the Book of Mormon.
Comment
Jacob H.,
Also, I’m not sure I trust Hilton’s “didactic writing” genre. I don’t know of any work that controls for the well-evidenced shifting in author word choices over the course of the narration (the wherefore/therefore shift is only one of many shifts), which seem to be important to factor into the studies, or to prove not important. Hilton also didn’t account for a major difference between Alma and Nephi — audience, one being contemporary and the other the expected future readers of the text. Alma’s sermons seem better compared to Jacob’s, Mormon’s, and Benjamin’s, despite their relative paucity. Nephi’s didactic writings better fit the same genre as some of Mormon’s, and Moroni’s. So rather than buy into the idea that somehow Nephi and Alma represented the same “genre” in their didactic writings, it seems to me that the workable data space in which to test hypotheses was unfortunately heavily reduced by Hilton’s choice.
Also, with regard to ngrams. It seems to me that one stylistic feature of the entire BoM text and much of the D&C is the frequent formulaic coupling of words — horses and chariots, flocks and herds, wickedness and abominations, wars and contentions, great and marvelous, great and terrible, gold and silver, priests and teachers, power and authority, revelation and prophecy, wars and contentions, sacrifice and burnt offerings, etc. Some kind of bag of words (rather than ngrams, because of variations on the couplings) analysis at the sentence level, after controlling for the many longer biblical quotes in the text, really ought to be done. My own non-computerized search along these lines strongly suggests a unity of authorship, but I am not situated to do such an analysis myself for quite some time.
Ok Rob
Thank you Jonathan
Wow! This might contend for the laziest scholarly work I’ve found in Stylometry. You’ve made random assertions without any demonstration of your methods and pasted a lot of graphics to confound your readers. Any Mormon looking for an excuse to believe will love this. Any scholar looking for real data just waisted their time.
Could you please quote the 1992 study by Holmes and demonstrate exactly how you’d do it differently.