Tag Archives: statistics

Sociologists vs statisticians tweet

A brief comparison of the first tweets of the WES: Work, employment and society (sociology) conference in Leeds and the RSS – Royal statistical society conference in Manchester which are happening at the same time. 

How do sociologists vs statisticians tweet about a talk they like? Adjectives vs nouns+verbs!


“Genuinely one of the best opening plenary talks I’ve ever listened to. Succinct and sophisticated #Wesconf2016”

At the same time the typical response to a good talk at the RSS conference is more matter of fact and informative:

Xia-Li Meng giving brilliant talk on big data at #RSS2016Conf ‘the bigger the data the more likely you will miss the target’ #RSS2016Conf

Or humorous: 

Coffee in hand. Let’s bring on the stats. #RSS2016Conf

Tagged , , , , , , , , , , ,

Birthdays, binomial distributions, and romantic mathematicians

Suppose that four out of your 402 facebook friends have birthdays today. What’s the likelihood probability of that happening?  It’s not enough to just take (1/365)^3 because that doesn’t take into account how big your sample of facebook friends is. To find out, we need the binomial distribution: http://en.wikipedia.org/wiki/Binomial_distribution (the probability we need is the formula just after the table of contents with n = 402, k = 3, and p = 1/365). Plug the numbers in and you get 402!/6*399! * 48627125 * 0.997^399. How on earth does one calculate this by hand? I got stuck. But then my other half (who is a mathematician and who told I need the binomial distribution in the first place), wrote and emailed me a little Python program to calculate this. A mathematical/programming gift. Better than chocolates! And the likelihood is 0.0739604817154 – or just a bit more than 7%. That’s quite rare indeed, given that with 402 friends we have more people in need of birthdays than there are spare days in the year.

P.S. Actually, this formula is for sampling without replacement – but in this case we sample one friend at a time and then discard the name out of the pile, so  there is one name less each time – which means that the draws are not independent. So we actually get a hypergeometric distribution, instead of a binomial one. However, Wikipedia claims that “for N much larger than n, the binomial distribution is a good approximation, and widely used”, so I’m happy with that.

P.P.S. In the first draft I used “likelihood” as a synonym of  probability. As you see in this one, that was WRONG. Dammit, I’m such an imprecise social scientist.

Tagged , , , , , ,

The year of code


Did you know it was the Year of Code?

I can’t really code. But it makes life so much easier (and cheaper). If your job requires using computers for anything, then learning a bit about coding will help you do more stuff, not rely on others for help, be faster and more efficient on the computer, and, eventually, spend less time on it. And it’s fun because you get the computer to do things.  It’s like training a dog – only in fact you are training yourself, and not the dog. Strangely, I haven’t been able to find much research about the addictive potential of coding, apart from this now old book from 1989 by Margaret A. Shotton and this book about Hackers by Paul Taylor – although several friends who have done programming swear that it can be a highly addictive activity.

Well, it’s not that much fun, if you have health problems with your hands, arms, joints or back like me, so it is a bit of a Catch 22. This – and also the fact that I ended up working as a social scientist specialising in qualitative research – is why I don’t know much coding.  Thankfully, my friends do, and so does Google. By pestering friends and Google I’ve been able to do some small bits of HTML coding, and write hundreds of pages in LaTeX (without losing any work or ending up with hideous formatting – MSOffice, it’s your turn to blush). I tried to learn R last month and although it didn’t go very well, I’ll go back to it soon, because there are some awesome extensions for R that don’t exist on “button-based” data analysis programmes, made “especially” for us, social scientists… One in particular, TramineR, is so awesome and relevant for my work that I’m dreaming of being able to use it. Not to mention how often SPSS and NVIVO crash and how expensive they are for anyone who isn’t attached to a rich institution which can buy the packages for its employees. And- meh – they don’t work on Linux, while R and LaTeX have no problem with different platforms.

I really think that social science students and researchers in the UK, in general, could do with more knowledge about how to use computers to their own benefit. One reason why the existing packages are so, well, bad, is because the market is not educated enough. I’m told that the quality of coffee in the UK has soared in the last two decades. Why? Because consumers have become more demanding. I’m sure that one day when more social scientists and other people who need computers for their daily lives start being a bit more discerning about the software they use,  someone out there, or even one of us, will gather their wits and design better software.

It might be a better idea to get a keen pupil teach a class on coding, and not a teacher who is new to it, but hey. If “2014 – the year of code” succeeds in getting more students and teachers to learn code, then  with all its flaws it is a fantastic initiative (watch the video…but try not to headdesk when you realise that its director can’t code yet). Knowing some code it’s like knowing a bit of swimming – won’t hurt you (unless you have an underlying health condition, and even then can be beneficial under supervision), makes life more fun, and heck, it can even save your arse. So if like me you know little or no coding, do check out the Code Academy. And if your word document has ever crashed on you, have a peek at the marvellous thing called LaTeX [pronounced “leitek”].

Tagged , , , , , , , , , , ,
%d bloggers like this: