This pub is part of the summer research work of Emmanuel F. Oquendo as part of the 2016 MIT Summer Research Program (MSRP). It describes the first version of the novel ranking algorithm that will be implemented on PubPub Discussions. The PubPub team decided to publish this work in order to receive feedback from the PubPub community. Please, leave your insightful ideas, questions, and views on the discussions section.
PubPub and Meaningful Scientific Discussion
PubPub is an open platform for collaborative editing, instant publishing, grassroots journals, and continuous review. It was founded in 2015 with the mission of providing a novel way of publishing; one that adapts better to how research actually works: through iteration and collaboration. For this reason, it is optimized for collaborative work and open discussion.
The Dual Role of Discussions
Discussions have different roles for authors and readers.
For authors, discussions are a way to improve their work through feedback and suggestions. Hence, having the newest and most active discussions on the top should improve their experience and usage of the discussions section.
For readers, discussions extend the pub by introducing new ideas, questions and views. With them, reading science is no longer a one-dimensional activity between authors and readers. Now, readers can also learn from and engage in meaningful conversations about scientific work; increasing their engagement with science by making it easier to digest and providing exposure to other field experts. For this reason, readers are often more engaged (take more actions) with higher-quality discussions.
The Importance of an Effective Ranking Algorithm
Almost every website with comments include different sorting options for the user and include a default ranking algorithm that has some sort of intelligent way of deciding what comments should the user see first. For example, Quora’s default sorting algorithm tries to measure “how useful is this answer?” and Facebook, “what are the most active and engaging conversations?”
In the case of PubPub, our algorithm should answer: “what are the most valuable discussions?”. This is a responsibility that PubPub does not take lightly. We believe that every new feature that gets implemented on the platform should be aligned with the core values of openness and transparency of PubPub.
Hence, on this paper, we describe a novel ranking algorithm that introduces a more fair process for discussions to rise to the top of the list and a more robust analysis of quality.
General Observations on Discussions
Four types of discussions have been identified on the platform:
Not focused on topic - Suggestions on grammar, syntax, and essay improvements [has a call for action for the author]
Feedback on factual errors [may include a call for action for the author]
Discussions that elicit further conversation by asking questions, telling ideas, and explaining views on pub topic
The ones that elicit most meaningful discussions usually are within the third category. Furthermore, some general observations on PubPub discussions are:
The majority of discussions (outside the Journal of Design and Science) are direct suggestions for the author.
An equivalent to “Likes” or “Reactions” without particular insights might get ranked higher than “higher-quality” discussions
However, journal communities are usually very effective curators of high-quality discussions
Framework for the Novel Ranking Algorithm
Ranking Discussions is our way to reward discussions with more visibility based on certain factors that indicate quality.
Quality is often subjective, however, on PubPub, we define quality as the extent to which the discussion adds value to the topics that were introduced by the pub.
The most valuable discussions are:
eliciting of objective discussion on the topic (by introducing ideas, questions, and views)
focused on the topic (and not suggestions focused to the essay)
based on facts (well-referenced)
relevant to the current state of the pub (particularly important on PubPub due to the evolving nature of pubs)
On the other hand, factors are quality indicators that we can measure on the platform and that are correlated to higher-quality discussions. Some of the factors used in other websites, such as Quora, are:
Commenter’s “Karma” points (quality of previous comments)
Actions of Page Administrator
Amount of Replies
Type of content (images, links, videos, etc)
Finally, visibility is affected by the ranking of discussions and it is proportional to the amount of actions (replies and upvotes) that the discussion receives.
The main goals for the new ranking algorithm are:
Display the best, most engaging, and valuable discussions on the top of the list
Implement a “fair” process for discussions to rise to the top of the list
Elicit (or maintain) high-quality discussions
Increase the engagement of readers
Provide exposure to field experts and top commenters
The latter will have an important role on the pub discovery process by driving traffic towards the profile (work) of the top commenters which are often field experts.
Reddit Ranking Algorithms
Reddit uses two main intelligent ranking algorithms on their website: RedditHot for their stories feed and RedditBest for their comments section.
RedditHot is highly biased toward new stories since freshness is a very important component of the stories feed. The following equation is used to calculate the “hot score”:
where t is the time the story was posted relative to the epoch time, y is used to assign a positive or negative sign based on the difference between upvotes and downvotes, and z is maximal value of the absolute value of x and 1.
One can interpret this equation as assigning a lifetime to the stories. A story with no upvotes will “live” for 45,000 seconds or 12.5 hours on the feed and each upvote will add more lifetime. After 12.5 hours, a newer story will be ranked higher than the older story with no upvotes.
RedditBest is based on the Wilson Score (WS). This interval uses the fraction of upvotes which is recognized as a better indicator of quality than the difference between upvotes and downvotes. Furthermore, it considers the statistical significance of the amount of votes to assign an estimate interval that contains, to certain probability, what the actual proportion of upvotes will be.
For example, for a discussion with 2 upvotes and no downvotes, the Wilson Score interval is interpreted as:
“There is a 95% that the actual fraction of upvotes will be within .342 and 1.”
RedditBest uses the lower bound of WS, or the worst case scenario, to assign definite rankings to the comments. The lower bound of the Wilson Score can be interpreted as:
“There is a 95% that the actual fraction of upvotes is at least .342.”
The problem with this current implementation is that it does not take into account the high level of uncertainty of the Wilson Score when the sample size is very small. In the following graph, in can be seen than the levels of uncertainty are between 20% and 80% when the amount of votes is less than 15 votes (which is within the average of the most upvoted discussions on PubPub).
Taking into account these error intervals and incorporating them to the ranking algorithm would increase fairness and provide the opportunity for more high-quality discussions to rise to the top of the list.
A Novel Ranking Algorithm for Discussions
We are introducing a novel ranking algorithm that implements a more fair process for discussions to rise to the top of the list and a more robust analysis of quality.
Adding Fairness to Discussions Ranking
In order to increase the fairness of rankings, we are introducing a “fairness” value which gets added to the lower boundary of the WS interval.
The random values will change every time a new user accesses the pub, which will give the opportunity for more discussions to get ranked higher and, therefore, receive additional views and actions.
Ranking variations are proportional to the amount of overlapping between the ranges of randomness. As overlap increases, the distribution of rankings becomes more similar. For example, all discussions with only one upvote have 100% of overlapping on the randomness range. Hence, the distribution of rankings as views increase should be very close to 33.33% for each discussion.
To add the fairness value to the score, we need to calculate the lower and upper boundary of the WS for the discussion. We then add a random value that can increase the score to up to a third of the error interval.
Adding Quality Factors to Discussions Ranking
In addition to accounting for the error intervals, we are introducing new quality factors that serve as indicators of high-quality discussions. In the first version of the algorithm, we are including:
Amount of Replies
The rationale we used for the weightings is:
What is the most upvoted discussion that a discussion with only 1 upvote which possesses the three quality indicators could get ranked higher?
What is the relative weight of these factors?
We decided to assign more weight to the author replies since they often serve not only as indicators but also aggregators of quality on discussions.
In addition to, the replies factor has a log to make sure that the value does not get out of proportion as the number of replies increase.
A more fair and robust algorithm was introduced. With this implementation, we increased the alignment of the discussions section with the PubPub core principle that everyone should have an impartial chance to share their knowledge and be recognized for it. In addition to, the expected impact of this algorithm are:
Elicit more engagement of readers (by taking actions on discussions)
Provide more visibility to field experts
Increase (or maintain) the quality of discussions
As next steps, we expect to measure the effectiveness of this algorithm and introduce new quality indicators. Furthermore, we will study how this implementation can be adjusted b journals with different levels of activeness and votes on the discussions.