Search published articles


Showing 3 results for Stylometry

Arash Poorakbar,
Volume 9, Issue 35 (10-2016)
Abstract

Rasā’el are the six prose pieces by Sa’dî which are usually published in his oeuvre. Since the first attempts to publish the scientific edition of his oeuvre, the question of these writing’s authorship was central to the topic. Different researchers have done some study on these six treatises and given some answers to the problem of their authorship attribution. It is meant in this dissertation, to quantitatively analyze the Rasā’el using stylometry and authorship attribution techniques and compare it with Golestān - the known work of Sa’dî. We have used two different techniques to do our analysis. The first technique used is characteristic curves proposed by Mendenhall. The second is a quantitative model to explain the repetition and distribution of vocabulary in each piece of writing. Ultimately, our answers fit with the Forughî’s guess about these writings. Three of these writings (Nasîhat ul muluk, Aql o eshq, and ankîyāno) are surely written by Sa’dî. Taqrîrāt-e salāse and dar taqrîr-e dîbāche are not written by Sa’dî. Majāles-e panjgāne’s author isn’t Sa’dî, but it is possible that the content is Sa’dî’s speeches for people.
Morteza Heidari,
Volume 13, Issue 51 (8-2020)
Abstract

The stylistics deals with studying repetitions, and so employing the statistical analyses in stylistic evaluations proved to be useful in studying the frequencies. Whereas the descriptive statistics cannot always explain the stylistic fluctuations, using inferential statistics theories in evaluating stylistic phenomena will bear a scientific accuracy. “Stylometry” is the term that has been coined for studying stylistics with the use of computer.  In the present article, the author tried to evaluate imagination in the construction of simile in Khorasani style. In doing so, the books of twelve eminent poets of Khorasani style were studied diachronically and the relation of literal similes and imagination was examined. To analyze the data, the author adapted an inferential approach and calculated the Pearson Correlation Coefficient. The results showed that the correlation coefficient was 0 /374 and the significance level was 0/626 and therefore it could be said that there is a positive and direct relationship between variables. In stylistics terms, it means that the more compressed the angle of simile in Khorasani style, the more frequent literal similes would be, which is in line with the dominant norms of Khorasani Style.

Volume 15, Issue 6 (3-2024)
Abstract

Advances in science and technology have made it no longer acceptable to have works with a dubious author. Stylometry is a method that uses statistical analysis to determine the author of a literary work. Author attribution methods rely heavily on writing style; assuming that each person has unique style. Author identification is used in areas such as plagiarism, criminology, and unspecified author identification. Due to the fact that many factors are involved in identifying the author of texts, a method with 100% accuracy has not been presented so far, and researchers are still trying to find a way to minimize computational errors. One of the methods that is claimed to have good accuracy is Yule’s theory. In this article, Yule's theory and four other theories have been combined to compare the vocabulary richness of the Munajat Khams 'Ashar and the prays of Al-Sahifa al-Sajjadiyya. Then, Using descriptive-analytical method and explanation of statistical datas, the correctness of the attribution of Munajat Khams 'Ashar to Imam Sajjad (PBUH) has been investigated. The results show the high accuracy of the calculations and the independence of the output of the theories to the length of the text. Also, due to the slight difference between the vocabulary richness of the Munajat Khams 'Ashar and the prays of Al-Sahifa al-Sajjadiyya, its attribution to Imam Sajjad (PBUH) is confirmed.

1. Introduction
The issue of attributing a text to someone who did not really write it, has always been the focus of researchers. With the advancement of science in the twentieth century, the need to prove the accuracy of attributing a text to a particular author has intensified, and with the advancement of information technology, the popularity of intelligent methods of author recognition has increased. Today, to identify the author of a text, various methods are used, one of the most important methods is study the writing style.
The study of writing style is a subset of the new rhetoric. The new rhetoric aims at adding formal logic a field of reasoning, and applies whenever action is linked to rationality (Perelman, 1971). In stylistics, using text reasoning and analysis, characteristics are considered for the author's style.
A variety of methods for attribution have been proposed. There are three main approaches: lexical methods, syntactic or grammatic methods, and language-model methods, including methods based on compression (Zhao & Zobel, 2005). In this article, the lexical method will be used. One of the most practical lexical methods to achieve the author's style is the "vocabulary richness" method. Unfortunately, the output of many methods depends on the length of the text. Therefore, a method should be used that has the least dependence on the length of the text. In this paper, we have combined five theories to calculate vocabulary richness to achieve the most accurate results.

Research Question(s)
1. How accurate and reliable are the results of the five equations used in this research?
2. How much does the output of the theories depend on the length of the text?
3. What is the difference between the vocabulary richness of Munajat Khams 'Ashar and the prays of Al-Sahifa al-Sajjadiyya?
2. Literature Review
Authorship attribution (AA) is the process of attempting to identify the likely authorship of a given document, given a collection of documents whose authorship is known (Bozkurt et al., 2007). The accepted assumption behind AA is that every author writes in a distinct way; some writing characteristics cannot be manipulated by the writer’s will, and therefore can be identified by an automated process (Howedi & Mohd, 2014).
One of the fundamental sub-problems of AA is the extraction of the most suitable features to represent the writing style of each author. This problem is known as “stylometry” (Howedi et al., 2020, p. 1334). stylometry is defined as those techniques that allow measure the style of an author by the identification of its features of style (stylemas). Those stylemas, also called style markers, are obtained from textual measurements normally calculated by statistical methods (Escobedo et al., 2013, Stamatatos, 2009).
Some researchers have used a combination of some lexical richness functions to achieve better results, namely: K proposed by Yule (1944), R proposed by Honore (1979), W proposed by Brunet (1978), S proposed by Sichel (1975), and D proposed by Simpson (1949) which are defined as follows (Stamatatos et al., 2000): 
      
     
     
     
     
where:
Vi : is the number of words used exactly i times
N: Total number of words
V: Number of non-repetitive words
α: usually is fixed at 0.17
The final output for calculating vocabulary richness is obtained by combining these five equations.
Since the series of narrators and the document of Munajat Khams 'Ashar is not mentioned completely in the available sources, attributing it to Imam Sajjad (PBUH) needs to be proved, so in this research, using stylometry techniques, it is examined.

3. Methodology
In the present article, the correctness of attributing Munajat Khams 'Ashar to Imam Sajjad (PBUH) is examined by sampling the prays of Al-Sahifa al-Sajjadiyya and comparing his vocabulary richness with the Munajat Khams 'Ashar. Since, according to the claim, the output of the theories is not dependent on the length of the text, two statistical populations are selected: the first consists of prays which 80 words have been selected, and the second consists of prayers With different number of words; Therefore, in addition to comparing the vocabulary richness of the samples, the dependence of the equations on the length of the text will also be examined. Also, From Munajat Khams 'Ashar, we chose the first, fifth, tenth and fifteenth prays as samples.

4. Results
The results show that: 
1. The accuracy of the calculations is very high and therefore the output of the theories is reliable.
2. The output of the theories was not dependent on the length of the text and did not increase in proportion to the increase in the number of words.
3. There is not much difference between the vocabulary richness of Munajat Khams 'Ashar and the prays of Al-Sahifa al-Sajjadiyya in both statistical populations; Therefore, the correctness of attributing the Munajat Khams 'Ashar to Imam Sajjad (PBUH) - from the perspective of stylometry techniques - is proved.
 


 

Page 1 from 1