21 August 2012

Fixing Hacker News: A mathematical approach

There is a certain phenomenon that seems to happen in almost every online community of user-generated content. A community is created: the initial users define the values of this new community. After a while the community experiences growth in numbers. As a result of that growth, users that joined before it feel like its no longer the same community with the same values. The latest widely discussed example seems to be Hacker News.

Paul Graham responds that the reason is mostly a shift in values and increase of anonymity:
It's a genuine problem and has been growing gradually worse for a while. I think the cause is simply growth. When a good community grows, it becomes worse in two ways: (a) more recent arrivals don't have as much of whatever quality distinguished the original members, and (b) the large size of the group makes people behave worse, because there is more anonymity in a larger group.
I've spent many hours over the past several years trying to understand and mitigate such problems. I've come up with a bunch of tweaks that worked, and I have hopes I'll be able to come up with more.

The idea I'm currently investigating, in case anyone is curious, is that votes rather than comments may be the easiest place to attack this problem. Although snarky comments themselves are the most obvious symptom, I suspect that voting is on average dumber than commenting, because it requires so much less work. So I'm going to try to see if it's possible to identify people who consistently upvote nasty comments and if so count their votes less.
As online communities grow, the values of the group shift. The majority now may or may not hold the same values as the majority before. The question is, how to preserve the old values of the group with minimum side-effects?

As it happens, my master's thesis was an attempt to fix exactly this problem mathematically and implement an improved voting system tailored specifically for communities with user-submitted content. I won't provide a link to the thesis as its not written in English, but I'll try to summarize the gist of it.

The voting system used in most communities today (democratic voting) is the one most susceptible to value shift when significant growth occurs. Its no surprise: democratic systems are designed to measure what the majority values. When significant growth occurs, the majority changes and therefore what they value also changes.

In contrast, previous moderator/editor based systems offer a strict filter on content based on the more static values of the current set of editors. However, it has the downside of being limited to what the editors are able to review and publish.

I propose a hybrid feedback-loop based system. In this system people have variable voting influence and editor-like individuals are given as a "reference point" or exemplary users with maximum voting influence. The system attempts to find out what they value and recognize it in others.

The system is based on the mathematics described in the beta reputation system, which is a system for measuring trust in online e-commerce communities.

Here is a short description of the system:
  • Voting influence is not the same for all users: its not 1 (+1 or -1) for everyone but in the range 0-1.
  • When a user votes for a content item, they also vote for the creator (or submitter) of the content.
  • The voting influence of a user is calculated using the positive and negative votes that he has received for his submissions.
  • Exemplary users always have a static maximum influence.
Suppose we have a content item \(C\) submitted by the user \(U_c\). Now a voter \(V\) comes to vote for it and clicks on the +1 button.

The voter has his own submissions for which he has received a total amount of positive vote \(p_V\) and a total amount of negative vote \(n_V\). As a result, his voting influence \(i_V\) is modified: its not +1 but calculated according to the formula:
$$ i_V = f_W(p_V, n_V) $$
where \(f_W\) is the lower bound of Wilson score confidence interval. While a simple average such as:
$$ i_V = \frac{p_V}{p_V + n_V} $$
might work when the number of positive and negative votes is large enough, its not good enough when the number of votes is low. The Wilson score confidence interval gives us a better, flexible balance between desired certainty in the result and the result itself.

This vote in turn is received by the content item \(C\). Because its a positive vote, the amount of positive vote \(p_C\) is changed for this content item
$$ p_C \leftarrow p_C + i_V $$
and as a result, it has a new rating
$$ r_c = f_W(p_c, n_c) $$
but the positive points \(p_U\) of the creator of the content item are also changed:
$$ p_U \leftarrow p_U + i_V $$
and as a result the voting influence \(i_U\) of submitter is also changed:
$$ i_U = f_W(p_U, n_U) $$
or in other words, he has "earned" a bigger influence in the voting system by submitting a well-rated content item.

This means that new members have no voting influence. As they submit content and receive votes their influence may rise if the existing users with high influence in the system consider their content to be good.

This is where the reference users \(R\) come in. Their influence is fixed to always be 1
$$ i_R = 1 $$
Because of this, influence propagates through the system from them to other users who submit content which is deemed high-quality by the reference users. Those users in turn also change influence by voting for others and so forth.

Its also possible to scale down votes as they age. The two possible strategies are to scale all \(p_X\) and \(n_X\) values daily, for all content items and all users by multiplying them with a certain aging factor \(k_a\)
$$ p_X \leftarrow k_a p_X $$
$$ n_X \leftarrow k_a n_X $$
or to simply keep all positive and negative votes \(V_p\) and \(V_n\) in the database and recalculate \(p_X\) and \(n_X\) according to the age of the votes \(a_V\), for example:
$$ p_X = \sum_{\forall V_p} { i_V k_a^{a_V} }$$
$$ n_X = \sum_{\forall V_n} { i_V k_a^{a_V} }$$
One of the convenient aspects of this system is that its easy to test-drive. It doesn't require more user action than simple democratic voting. It only requires an administrator to specify some reference users at the start which seed and then propagate influence throughout the system.

I tested this system on a forum dataset (details available on request) and found that the system achieves around 50% reduction of difference from a moderator only system compared to scores of a democratic system, even when the direct voting of reference users is turned off for content items and only the indirect (to other users) influence is counted. \((p < 0.05)\)

What does a 50% reduction in the difference mean? Let the score of a content item \(C\) be measured in 3 systems: democratic \(D\), reference-users-only \(R\) and hybrid \(H\) with direct influence of reference users to content items being turned off. By sorting the items according to those scores we can calculate their ranks in the 3 systems: \(r_D\), \(r_R\) and \(r_H\) respectively. The value of the rank is in the range \(1\) to \(n\), where \(n\) is total number of content items. The absolute difference between the democratic ranking and the reference ranking \(d_{DR}\) is:
$$ d_{DR} = abs(r_D - r_R) $$
while the absolute difference between the hybrid ranking and the reference ranking \(d_{HR}\) is:
$$ d_{HR} = abs(r_H - r_R) $$
and it turns out that on average,
$$ d_{HR} = 0.5 d_{DR} $$
The important downside of these results is that the people using the system were not aware that points are calculated in a different way. The original votes were given by people who knew that the system is democratic and acted accordingly. It remains to be seen what the results would be if people are aware that their voting influence depends on the way others vote for their submitted content.

I pondered starting a website similar to hacker news based on this voting and scoring scheme, however starting a whole new news website is about much more than just scoring algorithms (it requires reputation in the online comminty, popularity and most importantly time, none of which I presently have in sufficient amounts or know how to achieve). But hopefully, pg and the rest of the hacker news team might find this scheme useful enough to somehow incorporate it into the existing scoring system.



11 comments:

Unknown said...

Hi,

You describe quite well the system. But, could you provide theorem about the properties of such a voting system?

It feel more robust to "bad" new people, and I am pretty sure you can prove it.

Have you other "quality indicators"?

spion said...

Hello Yann,

Unfortunately I don't have a formal proof about the properties of the system - all I have are statistical measurements on a single existing dataset.

Mathematically defining certain properties of the system, stating assumptions if any then attempting to prove or disprove the defined properties is on my to-do list. (I can only offer one excuse: my background is in engineering not math so the work is moving slowly).

James said...

Gkosev is the data set you used available? I did attempt to create a HN style site with a weighted voting system and would be interested to run my system on your data set.

BrittonRT said...

This is just what I need. Do you mind if I implement your system in my content api?

spion said...

@BrittonRT I don't mind it at all, go right ahead.

@J M Southern: I believe I have a permission to make the dataset available as long as its anonimized (user and post IDs replaced with randomized numbers and no content shown). I'll see what I can do.

jmcentire said...

I don't know that your solution is optimal. While the reliance on reference individuals could conceivably help promote a community like the earlier days of HN, I don't think anyone should presume that one thing is better than another. Consequently, the choice of reference individuals is undesirable.

Consider an alternative: take a very similar solution which correlates a user's votes with the votes of others. Those who vote more similarly to the user have a greater influence whereas those with whom the user tends to disagree have a lesser influence. Thus, the content is tailored to a particular user.

For simplicity of implementation, you could use a clustering technique to cluster users into arbitrary categories and calculate votes based upon the category as a whole. It'll loose some individualization for outliers based upon your choice for the number of clusters and whether or not you allow overlap. Of course, choosing the number of clusters to be equal to the number of users is the original case while choosing that number to be 1 is the current case (everyone's vote is weighed equally).

One argument against this, though, is that sometimes we specifically want to challenge people and their ideas. WBC's protests aren't held in their auditorium for the benefit of their membership. So, employing this technique would necessarily support niching individuals (or clusters/communities) wherein everyone is simply preaching to the choir, as it were. Of course, it still allows those who value dissenting opinions to form a community who will upvote something despite disagreeing with its content.

spion said...

@Unknown: I've considered the collaborative filtering approach. But I suspect that solution has bigger flaws and I'll speculate on them:

1) It won't make people think about the quality of what they're submitting.

Their submission has no effect on how good of a predictor they are, so why bother?

Contrast that to submitting any kind of content in this hybrid system: the quality of the content will have a direct effect on the submitter's voting influence. People might stop and think for a moment before posting.

2) Its not a good approach to building a community. The sense of a belonging to group with its own values and taste is lost. There is no single front page and instead the website is whatever everyone wants it to be, lacking in character and focus and as such less attractive to people (A news site about what? Whatever I want it to be about? I'll pass)

Sure, a community may feel more narrow-minded as a whole and presumptuous in deciding what is "good" and what is "bad". But isn't that already the case? Users always have the option to be members of as many communities as they like. I believe that nothing important is lost, except perhaps a bit of open-mindedness. On the other hand, focus on doing a particular thing very well is gained.

(note: my original system actually has multiple ratings, one for each sub-forum each corresponding to an area of expertise and the reference users are actually reference experts in that area)

jmcentire said...

I assumed the post-comment sign-in would update with a pointer to my account; so, when I signed in, it wouldn't remain anonymous. Be that as it may be...

To the first point, if Reddit is any guide, the _vast_ majority of people are lurkers. A much smaller amount are voters and a diminishing fraction are submitters. If you hinge weight on submissions you're conflating the two -- your conflation is: a good submitter knows a good post. I content that this is invalid reasoning; that, in fact, a voter can recognize quality content without searching for it.

Further, bad submissions don't really concern me as they'd be very quickly buried by a voting system. At least, what's bad to be would be reflected in the votes of similar voters. In this sense, the collaborative approach is focused more on the voters than the submitters. I think we both agree that lurkers are difficult to leverage -- but both systems encourage more participation.

2) Your complaint here, I feel, is a little short-sighted. The news site for you wouldn't be about whatever you want it to be about. As I noted, it allows for niche communities. But, for those who engage with the world in the same manner as liberals who watch Fox News, the site would maintain relevance. Further, the increase in submissions and emphasis on the vote allows for a much broader base for submissions while quickly filtering diamonds-in-the-rough to the top.

Finally, the same argument about group-think plagues your solution. If the principles of your system do not believe, say, automated parallelization is an interesting topic, it's buried and the submitter effectively penalized. It seems much more authoritarian. I suppose that's great as long as you're in charge of choosing the principles.

* I do like your additional complexity... in fact, I've got another alternative as well: see my next.

jmcentire said...

If you want to build a social community like HN or Reddit whereby submissions are judged and advanced based upon their merits, but you don't like the collaborative approach. How about a jury?

One of the issues Reddit faces (or most communities like this, for that matter) is that their primary utility -- filtering content -- is at odds with their mechanism -- user submission + voting. To be voted on, you must be seen; to be seen, you must be voted on. It's a well-known problem (Cold Start) with a host of approaches.

I suggest a jury system. Jurors are selected at random from the user base. That selection process can (and likely should) be weighted based upon the particular demographic (clustering) the user represents. The resulting jury is a representative cross-sample of the community. Since only a few jurors are selected from a large user base, the work-load of evaluating new posts is minimal. Since the jurors are statistically representative of the community, the posts they promote will be well-received by the community.

Of course, not every juror will be selected and there will be a growth function depending on your community for new submission rating. It is, clearly, bounded above my the theoretical approval of the community at large. You should be able to determine some number of jurors n who need to vote to reach a high confidence in that rating and a number of jurors m who should be selected to participate based upon the likelihood that a given juror will participate.

You can leverage this system to further skew the selection of jurors based upon additional criteria (your reference users would be disproportionately selected, for instance).

In fact, if jurors are limited to just that pool, I'm sure the correlation between this and your system would be very high. As that number grows, you get closer to the existing solution employed at Reddit/HN.

To me, I would use a combination of this and the previous -- along with another concept I've been kicking around that introduces a form of anonymous accountability (as I believe accountability is the crux of the problem). All told, each user may be creating a lilly-white world of yes-men; but that's not necessary. In fact, your desire for an old-school HN community is equally viable. Afterall, even with your own solution, you must be willing to alienate some people and their opinions -- I'm doing that while simultaneously giving them their own playground.

It might lack a single, cohesive feel. And that's a psychological aspect I've not considered -- our desire to be a part of something greater than ourselves. With only a moment's thought on the matter, I believe that feel will exist. I do not believe it necessitates conflict; rather, I think it'll be much more intense (even if technically superficial and contrived) as the community you see will be perfectly welcoming.

spion said...

> If you hinge weight on submissions you're conflating the two -- your conflation is: a good submitter knows a good post. I content that this is invalid reasoning; that, in fact, a voter can recognize quality content without searching for it.

Both your statement and mine are true and aren't contradictory. I merely claim that those that *do* search for and submit quality content consistently also know good content when they see it. A good voter that is not a submitter also know good content when they see it.

Whether submitters will vote for content might be an issue though. In my test dataset they do vote (enough to make a difference). However, some statistics of high profile submitters from a site such as reddit would be really helpful.

If submitters do tend to vote, I don't think its wrong to give most of the voting power to them, even if they're the minority of voters. On the contrary I would really like to find out how would that turn out. I hope that it might even stimulate passive voters to submit high quality content. Even "meaningless" karma seems to do that - and this time, karma means voting power :)

Perhaps my system can also recognize that a early voters are essentially half a submitter and reward them some fraction of the points that the submitters get. The reward could depend on the uncertainty about the final outcome at the time of voting

But I really disagree with your view on the "groupthink problem". I think its a non-issue as long as the reference users are well chosen in that they have a wide enough coverage. You can always add an automated parallelization guy/girl to the working set and let votes propagate from them to the submitters of such content. The system only helps the original organizers of the community to implement their vision of what they want the community to be about.


About the jury idea: I think that something similar is implemented on Slashdot and it seems to work well. But I don't think they have an elaborate selection criteria is though (last time I checked I think it was site activity + positive karma)

Sha said...

Hi!

I really enjoyed your post since I'm actually building a HN-like site myself. I'll definitely use some of your ideas.

If you're interested, I'd love to discuss this more in detail with you. Who knows, my project might end up being a way to test your algorithm on a real audience :)

Maybe you can contact me through http://sachagreif.com/contact ?