Discuss the affordances of the internet medium that allows for “fake news” to alter audience perception of the truth and their information-seeking behaviors?
ScienceDirect
Available online at www.sciencedirect.com
Procedia Computer Science 141 (2018) 215–222
1877-0509 © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and peer-review under responsibility of the scientific committee of EUSPN 2018. 10.1016/j.procs.2018.10.171
10.1016/j.procs.2018.10.171
© 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and peer-review under responsibility of the scientific committee of EUSPN 2018.
1877-0509
Available online at www.sciencedirect.com
Procedia Computer Science 00 (2018) 000–000 www.elsevier.com/locate/procedia
The 9th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN 2018)
Detecting Fake News in Social Media Networks Monther Aldwairi, Ali Alwahedi
College of Technological Innovation, Zayed University, Abu Dhabi 144534, UAE
Abstract
Fake news and hoaxes have been there since before the advent of the Internet. The widely accepted definition of Internet fake news is: fictitious articles deliberately fabricated to deceive readers”. Social media and news outlets publish fake news to increase readership or as part of psychological warfare. Ingeneral, the goal is profiting through clickbaits. Clickbaits lure users and entice curiosity with flashy headlines or designs to click links to increase advertisements revenues. This exposition analyzes the prevalence of fake news in light of the advances in communication made possible by the emergence of social networking sites. The purpose of the work is to come up with a solution that can be utilized by users to detect and filter out sites containing false and misleading information. We use simple and carefully selected features of the title and post to accurately identify fake posts. The experimental results show a 99.4% accuracy using logistic classifier.
© 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).
Keywords: Fake news; clickbaits; social media; classification
1. INTRODUCTION
The idea of fake news is not a novel concept. Notably, the idea has been in existence even before the emergence of the Internet as publishers used false and misleading information to further their interests. Following the advent of the web, more and more consumers began forsaking the traditional media channels used to disseminate information for online platforms [11]. Not only does the latter alternative allow users to access a variety of publications in one sitting, but it is also more convenience and faster. The development, however, came with a redefined concept of fake news as content publishers began using what has come to be commonly referred to as a clickbait. Clickbaits are phrases that are designed to attract the attention of a user who, upon clicking on the link, is directed to a web page whose content is considerably below their expectations [24]. Many users find clickbaits to be an irritation, and the result is that most of such individuals only end up spending a very short time visiting such sites.
∗ Corresponding author. Tel.: +971-2-599-3238 ; fax: +971-2-599-3685. E-mail address: [email protected]
1877-0509© 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).
Available online at www.sciencedirect.com
Procedia Computer Science 00 (2018) 000–000 www.elsevier.com/locate/procedia
The 9th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN 2018)
Detecting Fake News in Social Media Networks Monther Aldwairi, Ali Alwahedi
College of Technological Innovation, Zayed University, Abu Dhabi 144534, UAE
Abstract
Fake news and hoaxes have been there since before the advent of the Internet. The widely accepted definition of Internet fake news is: fictitious articles deliberately fabricated to deceive readers”. Social media and news outlets publish fake news to increase readership or as part of psychological warfare. Ingeneral, the goal is profiting through clickbaits. Clickbaits lure users and entice curiosity with flashy headlines or designs to click links to increase advertisements revenues. This exposition analyzes the prevalence of fake news in light of the advances in communication made possible by the emergence of social networking sites. The purpose of the work is to come up with a solution that can be utilized by users to detect and filter out sites containing false and misleading information. We use simple and carefully selected features of the title and post to accurately identify fake posts. The experimental results show a 99.4% accuracy using logistic classifier.
© 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).
Keywords: Fake news; clickbaits; social media; classification
1. INTRODUCTION
The idea of fake news is not a novel concept. Notably, the idea has been in existence even before the emergence of the Internet as publishers used false and misleading information to further their interests. Following the advent of the web, more and more consumers began forsaking the traditional media channels used to disseminate information for online platforms [11]. Not only does the latter alternative allow users to access a variety of publications in one sitting, but it is also more convenience and faster. The development, however, came with a redefined concept of fake news as content publishers began using what has come to be commonly referred to as a clickbait. Clickbaits are phrases that are designed to attract the attention of a user who, upon clicking on the link, is directed to a web page whose content is considerably below their expectations [24]. Many users find clickbaits to be an irritation, and the result is that most of such individuals only end up spending a very short time visiting such sites.
∗ Corresponding author. Tel.: +971-2-599-3238 ; fax: +971-2-599-3685. E-mail address: [email protected]
1877-0509© 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).
216 Monther Aldwairi et al. / Procedia Computer Science 141 (2018) 215–222 2 M. Aldwairi et al. / Procedia Computer Science 00 (2018) 000–000
For content publishers, however, more clicks translate into more revenues as the commercial aspect of using online advertisements is highly contingent on web traffic [12]. As such, despite the concerns that have been raised by readers about the use of clickbaits and the whole idea of publishing misleading information, there has been little effort on the part of content publishers to refrain from doing so. At best, tech companies such as Google, Facebook, and Twitter have attempted to address this particular concern. However, these efforts have hardly contributed towards solving the problem as the organizations have resorted to denying the individuals associated with such sites the revenue that they would have realized from the increased traffic. Users, on the other hand, continue to deal with sites containing false information and whose involvement tends to affect the reader’s ability to engage with actual news [4]. The reason behind the involvement of firms such as Facebook in the issue concerning fake news is because the emergence and subsequent development of social media platforms have served to exacerbate the problem [27]. In particular, most of the sites that contain such information also include a sharing option that implores users to disseminate the contents of the web page further. Social networking sites allow for efficient and fast sharing of material and; thus, users can share the misleading information within a short time. In the wake of the data breach of millions of accounts by Cambridge Analytica, Facebook and other giants vowed to do more to stop the spread of fake news [23].
1.1. Research Problem
The project is concerned with identifying a solution that could be used to detect and filter out sites containing fake news for purposes of helping users to avoid being lured by clickbaits. It is imperative that such solutions are identified as they will prove to be useful to both readers and tech companies involved in the issue.
1.2. Proposed Solution
The proposed solution to the issue concerned with fake news includes the use of a tool that can identify and remove fake sites from the results provided to a user by a search engine or a social media news feed. The tool can be downloaded by the user and, subsequently, be appended to the browser or application used to receive news feeds. Once operational, the tool will use various techniques including those related to the syntactic features of a link to determine whether the same should be included as part of the search results.
2. LITERATURE REVIEW
A look at contemporary scholarly work shows that the issue of fake news has been a major concern amongst scholars from various backgrounds. For instance, some authors have observed that fake news is no longer a preserve of the marketing and public relations departments [21]. In the stead, the problem is increasingly being regarded as part of the responsibilities associated with the information technology (IT) department. Traditionally, it was believed that the two departments mentioned above were the ones to deal with any implications arising from the dissemination of misleading news related to an organization. However, current research indicates that fake news is considered to be a threat to information security. The involvement of the IT department, therefore, is premised on the idea that it would help avert the various risks associated with the problem. Similarly, other authors have noted that the participation of IT professionals in resolving matters concerning fake news is paramount considering the demands of the contemporary corporate environment [7]. Rather than as it was the case a few years ago when perpetrators of such gimmicks were motivated by just attracting web traffic, the practice has evolved into a matter that includes the involvement of hackers. Specifically, some content publishers have resorted to including material that contains malicious code as part of the content provided on their web pages, leading those who visit such sites to click the links and download the malware without their knowledge. Such developments, according to the scholars, have exposed modern companies to further risk of cyber intrusion as the perpetrators of the fake news tend to target employees of certain organizations with the aim of exploiting the latter’s curiosity [2].
Monther Aldwairi et al. / Procedia Computer Science 141 (2018) 215–222 217 M. Aldwairi et al. / Procedia Computer Science 00 (2018) 000–000 3
It is also apparent that aside from the risk of having malware introduced into their information management systems, modern firms also have to deal with the challenge of having their employees manipulated into giving out their credentials. Some scholars have posited that there is a group of content publishers that is increasingly using clickbaits as a technique to facilitate their phishing objectives [17]. Once an individual, who also happens to be an employee of the target firm, clicks on the link and accesses the web page’s contents, he or she is led into providing sensitive information, albeit in an indirect manner. The user may, for instance, be tricked into believing that they are helping to disseminate the news further when, in the actual sense, they are providing the perpetrators with access to their emails [19]. Data integrity has also been singled out as being one the information security implications associated with fake news [18]. In the current business world, data is increasingly being considered as being a valuable asset and, as such, it is imperative that companies put in place all the necessary measures that would help secure sensitive information from being accessed by unauthorized persons. However, the prevalence of content publishers keen on using fake news serves to negate such efforts. It is against this background that organizations are investing more resources to facilitate the invention and formulation of more effective solutions to be used in countering the ramifications that arise from using clickbaits to attract users into providing their information. Nonetheless, employees still continue to visit such sites even after being discouraged from doing so and, thereby, placing their firms at risk of cyber-attacks [6]. On the other hand, some scholars have argued that fake news can sometimes result in positive implications. For instance, there have been cases whereby companies listed in the stock market have experienced an increase in the price of their shares as a result of fake news [13]. As more and more users share the link to the site containing information that is seemingly related to an organization, prospective investors gain interest in the firms operations and, consequently, its share price increases considerably. Such changes, however, are bound to result in worse consequences as a majority of the individuals who buy the shares based on the misinformation end up being disappointed. In the same vein, other authors have noted that fake news can help further the marketing objectives of an enterprise. For example, when the information provided in the web pages associated with such news is one that favors the products furnished by a company, more consumers develop an interest in the same despite the fact that the contents of the web page are far from the truth [15]. Regardless, such an organization ends up reaching out to a wider pool of prospective clients in spite of the fact that the fake news was not part of its marketing campaigns. The scholars posit that the concept of fake news is not bad in its entirety as it can contribute positively toward the growth of an enterprise. However, this tendency has its limits and cannot be relied upon by businesses as its opposite would have extensive and adverse ramifications [8]. When the contents of the web page contain misleading information that portrays a company in a negative light, such a firm is bound to experience a drop in its performance irrespective of the fact that the news disseminated to its prospective customers was false. It is also apparent that the idea of using clickbaits to lure non-suspecting users to visit web pages has played a significant role in shaping opinions within other contexts aside from that which involved the business environment. For instance, the events leading to the 2016 presidential elections of the United States were characterized by the widespread dissemination of fake news through social media platforms [9]. Claims of celebrated personalities endorsing certain candidates were, for example, part of the information that was being shared by the users after visiting sites that informed them of the same. Later on, the users would realize that the assertions had been false. By then, the intended impact would have already occurred, and it is argued that such occurrences might have played a contributive role in determining the course of the elections [1]. Finally, the contemporary literature indicates that there have been ethical concerns about the whole concept of fake news especially regarding the involvement of individuals who have a background in journalism. For instance, some scholars have argued that using clickbaits is a demonstration of a disregard for the ethics associated with the media profession [16]. Journalists are expected to furnish readers with information whose veracity and accuracy have been determined to the last detail. However, the idea of fake news is completely at variance with these requirements. When professionals engage in activities that are intended to misguide their readers for the sake of increasing web traffic and online ad revenues, it raises a concern as to whether such people are keen on complying with the code of conduct associated with their career.
218 Monther Aldwairi et al. / Procedia Computer Science 141 (2018) 215–222 4 M. Aldwairi et al. / Procedia Computer Science 00 (2018) 000–000
Despite fake news detection in social media getting attention fairly recently, there has been a flux of research and publications on the issue. Before talking about machine learning for fake news detection we must address the dataset issue. William Yang Wang [26] in his paper ”Liar, Liar Pants on Fire”, provided a publicly available dataset and so did many of the previous researchers. Additionally, the first Fake News Challenge Stage-1 (FNC-1) was held in June of 2017 and featured many novel solutions using various artificial intelligence technologies [? ]. Natural Language Processing (NLP) techniques have been used for news outlet stance detection to facilitate fake news detection on certain issues [20]. Riedel et al. and other FNC-1 winning teams achieved close to 82% accuracy in the stance detection stage. Once this competition and all stages of fake news detection are concluded, we believe great and commercial solutions will emerge. FNC-1 have made the datasets available publicly and we’re getting closer to having standard benchmarks to compare all the newly proposed techniques. For a more comprehensive survey of work on fake news detection, the reader is referred to Kai Shu et al. [22]. In this effort we try to focus on a lightweight detection system for clickbaits based on high-level feature title features.
3. Proposed Solution
The proposed solution involves the use of a tool that is designed with the specific aim of detecting and eliminating web pages that contain misinformation intended to mislead readers. For purposes of attaining this goal, the approach will utilize some factors as a guide to making the decision as to whether to categorize a web page as fake news. The user will, however, need to have the tool downloaded and installed on a personal computer before making use of its services. It is expected that the proposed method will be compatible with the browsers that are commonly used by users all over the world. The syntactical structure of the links used to lead users to such sites will be considered a starting point. For instance, when a user keys in a group of search terms with the aim of finding web pages that contain information related to the same terms, the tool will come into operation and run through the sites that have been retrieved by the search engine before they are delivered to the user. In doing so, the extension will identify sites whose links contain words that may have a misleading effect on the reader, including those that are characterized by a lot of hyperbole and slang phrases. Such web pages will be flagged as being potential sources of fake news, and the user will be notified before electing to click on either one of them. A visualization of the links and their syntactical structure will help the user understand the decision [5]. Additionally, the tool will also use the number of words associated with the wording used in the titles of the sites for purposes of determining which of them contains false information. A threshold of say eight words will be used as a baseline for categorizing a web page as having correct information, with those whose links containing more than the threshold number of words being classified as potential sources of fake news. The rationale behind this approach is premised on the idea that from a general perspective, clickbaits tend to have considerably longer words than non- clickbaits [14]. It is, therefore, expected that the tool would use the wording as a metric to decide whether a headline can be considered as a potential clickbait. Aside from the syntactic characteristics of the headlines associated with apparent clickbaits, the tool will also monitor how punctuation marks have been used in web pages. In particular, the model will flag sites whose headlines contain extensive usage of exclamation marks and question marks. The links to such web pages will be categorized as potential clickbaits. For instance, a credible site would have a title such as Donald Trump Wins the US Presidential Race! On the other hand, a clickbait would be structured in a manner such as Guess what???? Donald Trump is the Next US President!!!!!!!!!. In such a case, the tool would categorize the former as being a non-clickbait and the latter as being a potential lead to misleading information. In addition, the proposed approach will examine factors associated with individual sites including the bounce rates as a way of determining the veracity (or lack thereof) of the information provided therein. One key characteristic of clickbaits is that they tend to lead readers to web pages containing information that is very different or hardly related to the information highlighted by the link. The result is that a majority of the users end up disappointed, leaving the sites as soon as they have visited it, and resulting in high bounce rates for such web pages [10]. The proposed tool will assess whether a site has a high bounce rate and designate it as a potential source of fake news. Once the algorithm executes, the search engine will release the entire list of results to the user. However, those links whose sites have been noted as being potential sources of misleading information will be highlighted in a manner that
Monther Aldwairi et al. / Procedia Computer Science 141 (2018) 215–222 219 M. Aldwairi et al. / Procedia Computer Science 00 (2018) 000–000 5
allows the reader to take notice. Thereupon, the user will be provided with an option of blocking such web pages and having them excluded from the search results in future [3]. It is expected that after using the proposed method for a while, the user will have eliminated a considerable number of clickbaits from the search results retrieved by his or her preferred search engine.
4. METHODOLOGY
The first step was to locate a credible clickbaits database, then compute the attributes and produce the data files for WEKA. That was not easy, therefore, we crawled the web to collect URLs for the clickbaits. We focused on social media web sites that are likely to have more fake news or clickbaits ads or articles, such as: Facebook, Forex and Reddit. The second step, after gathering URLs in a file, a python script computed the attributes from the title and the content of the web pages. Finally, we extracted the features from the web pages. The features are: keywords in Arabic and English, titles that starts with numbers, all caps words, contains question and exclamation marks, if user left the page immediately, and content related to title.
4.1. SCRIPT (PSEUDO CODE)
We had to use WEKA machine learning in order to validate the solution [25]. As WEKA requires specially for- mated input, we used the script below to extract the parameters needed to funiculate WEKA. Ten-fold Cross-validation was used in all experiments.
Algorithm 1 Compute fake news websites attributes
1: Open URL file 2: for each title 3: title starts with number? 1→ output f ile 4: title contains ? and/or ! marks? 1→ output f ile 5: all words are capital in title? 1→ output f ile 6: users left the website after visiting? 1→ output f ile 7: contents have no words from title? 1→ output f ile 8: title contains keywords? NoKeywords→ output f ile 9: end for
4.2. ATTRIBUTES SELECTION
After reading the websites attributes file into WEKA, we rank the attributes based on several algorithms, to choose the most relevant to increase the accuracy and decrease the training time.
• InfoGainAtributeEval evaluates the worth of an attribute by measuring the information gain with respect to the class. In f oGain(Class, Attribute) = H(Class) − H(Class|Attribute). Basically, what it does is measuring how each feature contributes in decreasing the overall entropy. The Entropy, H(X), is defined as follows. H(X) = −sum(Pi ∗ log2(Pi)) with Pi being the probability of the class i in the dataset, and log2 the base 2 logarithm (in WEKA natural logarithm of base e is used, but we take log2). Entropy basically measures the degree of ”impurity”. The closest to 0 it is, the less impurity there is in your dataset. Hence, a good attribute is an attribute that contains the most information, i.e, reduces the most the entropy.
220 Monther Aldwairi et al. / Procedia Computer Science 141 (2018) 215–222 6 M. Aldwairi et al. / Procedia Computer Science 00 (2018) 000–000
• CorrelationAttributeEval evaluates the worth of an attribute by measuring the correlation (Pearson’s) between it and the class. Nominal attributes are considered on a value by value basis by treating each value as an indicator. An overall correlation for a nominal attribute is arrived at via a weighted average. So, an indicator for the value of a nominal attribute is a numeric binary attribute that take on the value of 1 when the value occurs in an instance and 0 otherwise.
Table 1 reports the attributes selection results, based on Info Gain and Correlation Attribute, for the tops attributes we use in our tests.
Table 1: Attributes Selection
Attribute Correlation Attribute Eval
Info Gain Attribute Eval
Start with number 0.0768 0.00433 Content have title words 0.775 0.00434 Contain question and excla- mation mark
0.0862 0.00545
All words capital 0.1195 0.104 User left the webpage imme- diately
0.3672 0.12883
Keywords 0.4455 0.27042
4.3. WEKA CLASSIFIERS
The classifier can described as the algorithm that evaluates the given data and provides the end result. WEKA ships with numerous classifiers, we experiments and choose the best performing ones for our dataset.
• BayesNet: Bayes network learning using various search algorithms and quality measures. Bayes Network clas- sifier provides data structures such as network structure, conditional probability distributions, etc., and facilities common to Bayes network learning algorithms such as K2 and B. • Logistic: Class for building and using a multinomial logistic regression model with a ridge estimator. • Random Tree: Class for constructing a tree that considers K randomly chosen attributes at each node. It performs
no pruning and has an option to allow estimation of class probabilities (or target mean in the regression case) based on a hold-out set (backfitting). • NaiveBayaes: Class for a Naive Bayes classifier using estimator classes. Numeric estimator precision values
are chosen based on analysis of the training data. For this reason, the classifier is not an UpdateableClassifier (which in typical usage are initialized with zero training instances).
5. RESULTS
This section presents the performance metrics and discusses the classification results.
5.1. METRICS
Precision is the true positives divided by the predicted positives (the true positives plus the false positives). Mean- while the recall is the rate of the true positives and called also the sensitivity, which is the true positives divided by the true positives plus the false negatives. As for the f-measure, it is the combination of precision and recall, we multiply the precision and recall then divide them to the precision plus the recall and then multiply by two.
Monther Aldwairi et al. / Procedia Computer Science 141 (2018) 215–222 221 M. Aldwairi et al. / Procedia Computer Science 00 (2018) 000–000 7
5.2. CLASSIFIERS RESULT
The classifiers are compared based on: Precision, Recall, F-Measure and ROC. Logistic classifier has the highest precision, 99.4% and therefore the best classification quality as shown by Table 2. Logistic and RandomTree classifiers had the best recall that is best sensitivity of 99.3%. The f-measure combines precision and recall, the Logistic and RandomTree classifiers outperformed others at 99.3%. Finally, BayesNet and Naivebayes had the best area under the ROC curve.
Table 2: Classification Results
Classifier Precision Recall F-Measure ROC Bayes Net 94.4% 97.3% 97.2% 100% Logistic 99.4% 99.3% 99.3% 99.5%
RandomTree 99.3% 99.3% 99.3% 97.3% Naive Bayes 98.7% 98.7% 98.6% 100%
6. CONCLUSIONS
Fake news and Clickbaits interfere with the ability of a user to discern useful information from the Internet ser- vices especially when news becomes critical for decision making. Considering the changing landscape of the modern business world, the issue of fake news has become more than just a marketing problem as it warrants serious efforts from security researchers. It is imperative that any attempts to manipulate or troll the Internet through fake news or Clickbaits are countered with absolute effectiveness. We proposed a simple but effective approach to allow users in- stall a simple tool into their personal browser and use it to detect and filter out potential Clickbaits. The preliminary experimental results conducted to assess the method’s ability to attain its intended objective, showed outstanding per- formance in identify possible sources of fake news. Since we started this work, few fake news databases have been made available and we