Statistical definitions

A word: A word contains many alphabets, numeric letters, hyphens, apostrophe, markers.
A difficult word: A word does not belong to the list of 3200 familiar words.
A unique word: A word that appears only once in the whole text.
A character: Any printable or non-printable like spaces are counted as a character.
A sentence: A sentence contains many words and ends by a full stop or a marker.
A paragraph: Paragraph consist of many sentences and ends by a new-line character.
The average sentence length: The number of words divided by the number of sentences.

Mathematical background

Flesch Kincaid Reading Ease: 206.835 - 1.015 x (words/sentences) - 84.6 x (syllables/words)
Flesch Kincaid Grade Level: 0.39 x (words/sentences) + 11.8 x (syllables/words) - 15.59
Gunning Fog Score: 0.4 x ( (words/sentences) + 100 x (complexWords/words) )
SMOG Index: 1.0430 x sqrt( 30 x complexWords/sentences ) + 3.1291
Coleman Liau Index: 5.89 x (characters/words) - 0.3 x (sentences/words) - 15.8
Automated Readability Index (ARI): 4.71 x (characters/words) + 0.5 x (words/sentences) - 21.43

Readability is measured by two main methods: Reading Ease and Grade Levels.
Comprehensibility forms with Lexical Density the third pillar of text analytics.

Reading Ease

Measures textual difficulty, which indicates how easy a text is to read. The defacto standard here is the Flesch reading-ease. Higher scores indicate that it’s easier to read; lower numbers that it’s more difficult. If you want to reach the biggest audience possible with your writings, you should design your text to score high on this.

90–100 Understood by a student +12 y/o
60–70 A High school graduate can understand this.
0–30 Comprehensible to someone with a university degree.

Summary: The higher the reading ease, the more people will understand your text.

Grade Levels

Refers to the years of education required (starting from 1st grade) to be able to comprehend a text. There are multiple accurate and sophisticated mathematical models in existence. For sake of simplicity recommends the Average Grade Level. That is a median of the Top 5 Grade Levels (Flesch-Kincaid Grade Level, Gunning Fog Index, Coleman-Liau Index, SMOG and Automated Readability Index).

6 Kids and teenager
10 Average adult
14+ Academic audience

Summary: The lower the grade level, the more people will understand your text.

Lexical Density

In addition to readability exists comprehensibility: The likelihood that educated people will be able to understand a text about a topic prior unknown to them. This is done by Lexical Density. It measures the ratio of content words to grammatical words. Content words are most important for explaining information. If you have a high number of content words, you’ve most likely written a specialized academic text which will only be understood by people in that field. If you have a low number of content words, you have a very simple, easy-to-understand piece.

60% Bachelors / Master thesis
55% Wikipedia article
50% Newspaper article

Summary: The lower the lexical density, the more people will comprehend your text.

Search Engine (SEO)

For an article to rank good within a search engine, you have to fullfill a couple of technical minimum criteria. These are majorly defined by a mix of reading ease, keyword density and text length. For a good search engine visibility, your article should at least have 300 words, a reading ease of >80 and a keyword density of below 3%.

Top 10 Professional and relevant article
Top 50 Average article
Top 100 Badly written and/or spinned content

Summary: The lower your "Top", the more likely you will rank high with your text in search engines.


The unified definition of plagiarism has three criteria that all need to match:

(1) The use of ideas, concepts, words, or structures
(2) without appropriately acknowledging the source
(3) to benefit [in a setting where originality is expected].

The most common forms of plagiarism are:

Inaccurately citing the source. 51%
Interweaving various sources together without citing. 43%
Using quotations, but not citing the source. 26%
Proper citations, but failing to change structure/wording of the ideas. 21%
Melding cited and uncited sections of the piece. 17%
Citing some, but not all passages that should be cited. 12%
Re-writing someone’s work without properly citing sources. 11%
Taking passages from own previous work without adding citations. 8%
Submitting someone’s work as own. 2%

Percentage values reflect the number of detected occurences per 100 US undergraduate students.

Summary: Most cases of plagiarism can be prevented by employing proper citation.

