## Thursday, March 10, 2011

### Beware of the Friendly People

How friendly are you? Well, in our age it all depends on how many friends you have on Facebook and how many Twitter accounts are pretending to listen to what you are sharing. MediaMetrix company has hired Anne-Marie (one of our winning puzzle-solvers) to help sort out the friendship connections. They told her that statistical studies have shown that the average number of friends that a person has on Facebook is 120. MediaMetrix was asking Anne-Marie to figure out what percentage of the Facebook users have 300 and more friends and therefore fall into a Very Friendly category.

Their idea was to create a special "Beware of Friendly Me" badge that could be sent to all those Very Friendly people for easy identification. Why? Because they do have a lot of power. Their voices and opinions can be heard on Facebook by over 300 readers and then re-shared further and further. Imagine you or your company mistreating one of those people and later seeing yourself being trashed all around the web. With this badge, you will make sure to give every Very Friendly person a VIP attention and they hopefully will share your praise with the world.

Can you help Anne-Marie figure out what percentage of the Facebook users are Very Friendly?

Answers accepted all day long on Friday & Saturday, on our Family Puzzle Marathon. They will be hidden until Sunday morning (EST) and everyone who contributed something reasonable will get a puzzle point. Please, explain your answer.

Tom said...

We can estimate the percentage and it should be quite close IF standard deviation is applicable. Sadly, I'm not experienced with SD but let's see....

If the "average" is 120, and assuming the minimum is one or zero, and given that a good number of users have 300+, then the curve is going to be a bit skewed toward the "high end," and I'm not surprised. However, with a skewed curve I will abandon my attempt to use SD. Sorry, I'm out.

If I had to know, I'd find out directly from Facebook. Certainly they have a very good idea.

Bean said...

Um...is the answer, "No, I can't, because there isn't enough information?". If the average user has 120 friends, it is possible that every user has 120 friends and that's that. Or that one has 4 bazillion friends and everyone else has two. It seems like without some sort of information about distribution, there is no way to know what percentage have over 300 friends.

I will look forward to reading tomorrow's responses to see what the more statistically savvy posters come up with!

anne-marie said...

The range and the standard deviation are unknown.
I consider that the distribution is symmetric.
The mean is 120.
95% of the population will be within two standard deviation from the mean.
2 to 2.5 % of the population would have 300 or over 300 friends.

-lex- said...

If the average is 120 connections then there is 120 n connection ends in Facebook where n is the total number of users. That is certainly more than all the connection ends held by the Friendly users. Finally that number is more than 300 times the number of friendly users. Hence the number of friendly users is less than 120/300 of all the users, that is 40%.

Sophia said...

None.

Carrie said...

One thing that seems to be missing to be able to answer this question is the variance or Standard deviation. I found online that the the standard deviation for this distribution of number of Facebook friends is 73. Using that assumption, I was able to calculate the z-score and then find the percentage of facebook users that have more than 300 friends. There is only a 0.694% probability that an user has 300 or more friends...or 1 in 144 are considered a Very Friendly person.

Annie said...

Not confident but it's all I've got!!

x = total # of friend matches
y = total number of members
z = % of y

120 = x/y 300 = x/yz

Solve for x:

120y = x 300yz = x, therefore

120y = 300yz
120 = 300z
z = 120/300 = 40%

40% of membership has 300 or more friend matches

Maria said...

First of all, a puzzle point for everyone who dares to write her/his thoughts: Tom, Bean, anne-marie, -lex-, Carrie, Annie.

I think the closest answer is Carrie's assuming the normal distribution of the Facebook friends spread. Carrie visualized a bell curve with a center at 120, std of 73 and calculated the area of the tail after 300.

We all assumed it is a Normal Distribution because it is the one we are mostly familiar with. But Normal Distribution means that very few Facebook users have 0-10 friends and that the curve is symmetrical around 120. Is it true?

anne-marie said...

Thanks for the puzzle and thanks for helping solving it!
I really enjoyed the three puzzles and the idea to have a choice between different problems.

anne-marie said...

Ps
The central limit theorem gives an explanation on what cause the shape of this curve.
As the size of a sample grows (>30), a normal curve will result where most people's scores fall in the middle and fewers scores fall towards the outside or tails of the curve. ( I let you find a good source of the theorem's definition)
With the standard deviation, it is possible to calculate the z score.if the standard deviation is unknown, we can always approximate.

Pat said...

I think the best answer is ‘go ask Facebook’, more specifically, this is an SQL problem, not a math problem.

In my own attempt, I tried to set this up as a binomial distribution, which can then be approximated using a normal distribution using a mean and variance of 120 (meaning a std deviation of about 11). However, this means that 300 would be a little over 16 standard deviations away from the norm, which effectively means no one has 300 connections. Also, since the norm is only 11 std devs above 0, it it’s not possible to have less than 0 friendships (even for the Unabomber) it doesn’t seem realistic to use a symmetric distribution. To me the distribution of the number of friends smells more like a gamma(2) distribution, but I really can’t find any justification for that.

FWIW – I think the number of possible matches for a population of size n = (n-1)!. E.g., a randomly selected group of 5 people will have 24(= 4!) possible friendship pairings. Therefore within my own 120 friends, there are 119! other potential pairings.

So to me, this leads to a database solution rather than a math solution. Facebook must maintain a table with one row per person, and another table with links between the people. Query the links associated to each person, which would provide not only the number of ‘Watch for’ users, but who they are. I think you would also want to look beyond that, to count not only the number of friends, but who is frequently ‘Liked’ by their friends (indicating influence), and what is the size of the friends of friends. E.g., if you post something that my friend likes, I’ll see not only what you posted, but that it is endorsed by my friend. I think that may be a stronger measure of influence than simply counting the number of friends links.

Last, a number of these distributions were developed because math was the only technology available at the time. Erlang developed queuing theory (and the Erlang distribution) for the telephone companies who were trying to decide how much copper to buy to wire, e.g., Boston and NYC. This was in the early 1900’s, long before the idea of a relational database. These distributions often make assumptions that are only valid in limited circumstances.

We may be approaching a new realm of math, based on social connections. Who knows, in 80 years we may see the Mark Zuckerberg distribution.

(Coming full circle, I stumbled onto the Math Mom site from some Steven Strogratz articles on NYT.com. According to wikipedia, Strogratz was involved in the ‘Six Degrees of Separation’ work done in the 90’s, very similar to this problem.)