Showing posts with label twitter. Show all posts
Showing posts with label twitter. Show all posts

Thursday, November 19, 2009

Twitter's new Retweet Feature

Twitter has been rolling out the new retweet feature to some users. If you're one of them, you'll have a new button saying "Retweet" below each tweet, and also have notifications about who retweeted you (using this new feature, I'd guess).


A lot of people are already complaining that the use of the feature is too limited, mainly because it doesn't allow you to add a comment to a retweet by putting some text before the actual retweet like this:


Actually, I think the critics are right, the comments are a useful feature, and hopefully, twitter will add it rather quickly.

One of the nice things about the retweet feature is that you can actually see who has been retweeting you. But even when you're not one of the beta testers, you can get that information pretty easily from twimpact. Just type in your name in the search bar, or go directly to http://twimpact.com/user/yourname, and you will have a list of all your retweeted messaged with a list of who retweeted you, at least back till June 2009.
For example, here is my (pretty pathetic) list:


And before you complain, we know that not all retweets make it in there, but this is mostly due to the fact that twitter's search infrastructure is eventually consistent, at best.

Tuesday, October 13, 2009

Machine Learning Feeds and Twitterers

I’ve collected some information about machine learning people on twitter and machine learning blog’s.

Twitter

Blogs

A few days ago, I thought that it would be very nice to have some kind of blog aggregator like planet debian for machine learning. After some poking around I found that the easiest way to aggregate and publish RSS feeds is actually through Google reader. Here is the resulting machine learning feed. Feel free to subscribe to it!

Here is what is currently in there:

People/Websites/Companies:

Journals/Paper feeds:

If you have a suggestion of feeds to add, or want your feed removed for some reason, please let me know!

Update: I've set up two different feeds. You can either have the original Google reader feed, or the feedburner feed. The latter has more compact summaries, while the former might have a nicer web view.

More updates:I've put up different feeds for only blogs and only papers.

Monday, August 17, 2009

Twimpact Work In Progress

Twimpact has been running smoothly in its small niche of the internet, and we're currently trying to improve the way retweets are crawled and analyzed. The problem is that people often add some comment to the end of the original message, and also edit the original message such that it's not that straightforward to really know whether you have a new tweet or not.

There are also some more bugs, which will be fixed soon. For example, apparently, we weren't handling underscores in user names correctly such that "RT @nfl_games" became a retweet of the user "nfl" with the message "games", which has been retweeted more than 1800 times.

We currently also don't filter out users who retweet a tweet repeatedly or who retweet their own tweets, leading to all kinds of retweet bots and retweet-spam networks being high up in our retweet trends. While that may not be so informative, it is still interesting to see what kind of business ideas people come up with around the twitter platform.

For example, dannywhitehouse apparently has a service called twitter-bomb which I guess does all kinds of nasty things which are certainly not covered by twitter's Terms of Service, but who still managed to amass more than thirteen thousand followers.

In any, case we'll be rolling out the improvements soon, maybe this week, so stay tuned! The only problem we'll run into is that we have to reprocess all the tweets already in our database 8-O

Monday, August 03, 2009

Twimpact!


For the last one and a half months, Matthias Jugel and I have been working on a site which computes impact scores for twitter users based on how often their tweets become retweeted.

The project was really lots of fun so far. The first time we got the thing up and running was around the time of the Iranian elections and suddenly seeing all those tweets in real-time gave a feeling of directly tapping into the twitterverse.

The winner twimpact wise is clearly mashable with a twimpact score of 89 right now and over ten thousand retweeted messages. Other top users include: news cites like breakingnews, cnnbrk, and smashingmag, or celebrities with many, many followers like aplusk (Ashton Kutcher), or iamdiddy (Puff Diddy).

On the entry page you can see a live view of what has been retweeted most in the last hour. It's quite interesting to see what is popping up there. For example, surprisingly, there are many competitions of the form "retweet this and win a laptop" like this one which has been retweeted over 1300 times. Another kind of retweet is the inspirational message from users like deepak_chopra which people like to pass on. But apart from that you have of course current news, interesting links and so on. These are mostly technology and web related, which reflects the user base of twitter quite well, I think.

So go ahead and compute your twimpact score, or just sit back and look at what people are currently retweeting.

Wednesday, July 08, 2009

Threads, Exceptions, Net::HTTP, Timeouts and JRuby

I recently had some fun debugging a little application of mine written in JRuby which crawls the twitter search API looking for specific tweets. The problem was, every now and again, the crawler would hang even if I set appropriate timeouts. The crawler consisted of two threads, one periodically issuing a search to twitter, and another one writing the results into the database.

The first surprise to me was that in Ruby, by default, if there is an exception in a thread, the thread silently dies, not even issuing an error message. Only after you joined the thread, you get the error message. You can set Thread.abort_on_exception = true which completely kills your application, however. This meant that what appeared to be missed timeouts could just as well be uncaught exceptions.

So when working with multithreaded applications, enclosing the whole thread in a begin .. rescue Exception => e ... end is important if you want to get noticed about errors at all.

But still, the thread would misteriously die without properly handling the timeout. Some digging deeper revealed an old bug report and an interesting article about Ruby and timeouts in general which seemed to imply that timeouts might not always work, in particular if system calls are involved. It was unsure, though, whether the situation is the same for JRuby which uses native threads instead of Green threads (simulated threads by doing explicit time-slicing in the interpreter).

So I started to read the JRuby sources, to understand where and how timeouts are implemented in Net::HTTP. Which lead to my next surprise: Net::HTTP completely handles timeouts through the Timeout module, not on a socket level. It does not use the possibilities to set timeouts on reads or writes but encapsulates all the significant portions of code in Timeout::timeout { ... } calls. I also found a nice old post by Charles Nutter (a.k.a. headius) explaining the implementation in depth.

I guess based on that post, JRuby has started to implement Timeout again in Java (source). And some first tests revealed that this timeout plays well within Net::HTTP, but my crawler was still hanging every once and again.

Finally, I found the last missing piece of information: From the sources, it seems that JRuby's implementation of the Timeout module raises the Timeout::ExitException when the timeout happened. However, that is a ruby 1.9 feature, in ruby 1.8, the exception was named Timeout::Error. So basically, I was catching the wrong exception, in fact not catching it at all, thereby killing the thread (silently). Interestingly, some more testing showed that if JRuby raises a Timeout::Error if you use the Timeout module yourself, but raises a Timeout::ExitException when you're using Net::HTTP.

In the end, I just enclose the whole HTTP request section with a Timeout::timeout of my own, catching both Timeout::Error and Timeout::ExitException and finally everything was running robustly.

So in summary

  • Uncaught exceptions silently kill threads in ruby.
  • Net::HTTP does it's own timeouts through the Timeout module.
  • Somtimes, JRuby raises Timeout::ExitException, not Timeout::Error. Will be fixed in JRuby 1.4 (see below)


I guess I should post a bug report on the latter... .

Update: I submitted a bug report, and it's already fixed! Those jruby people are really incredibly fast!

Thursday, April 30, 2009

Machine Learning Twibe

Twibes is some new twitter-related website which manages topic-related groups of people and collects tweets based on up to three tags.

Since apparently nobody did so far, I set up a twibe on machine learning. Follow the link and click on "Join" on the right and side to join.

Currently, the group is picking up tweets with either "machine learning", "#machlearn", or "#machine-learning" in them. Anyone got an idea how to improve the tags?

Tuesday, September 09, 2008

Why are people following me on twitter?


A few months ago I created an account on twitter. Part of me justed wanted to try out the newest Web 2.0 thing everybody's crazy about, but I also got myself convinced to perceive a certain need as I was going to a wedding without my family and thought that this way I could keep them up to date on my whereabouts.

So basically, I posted a few tweets for about one weekend in German, and that was more or less it.

The funny thing is that after that weekend I got 3 followers, most of which didn't even speak German. By now, I have 10 followers, and I really don't think I deserve them. So why are the people subscribing to my feed?

Well, one person I know personally, and a few seem to follow me as I stated in my profile that I'm working on machine learning, but that still leaves about 5 people, and frankly, I don't even know how they even found my feed.

Anyway, I also haven't really yet understood what twitter could do for me. I'm not saying that it doesn't make sense at all. For example, I'm following Charles Nutter, one of the main guys working on jruby, and I found his tweets to be a nice way to track what he is doing and what he is working on.

In my case, however, it doesn't really work. I'm involved in so many things that people would get seriously confused if I wrote down every little bit (writing proposal/discussing with students/thinking about world-domination (muahahah)/reviewing a paper/fixing cron jobs). I could tweet about my research, but I'm not even sure if it would be wise if I told everybody what I'm working on, because either it doesn't work out, and then it could be kinda embarassing, or it actually works, and then I'm just giving other people ideas what to look into.

Lately, I've had kind of an insight: the penalty for subscribing to a low-volume twitter is quite small (apart from you loosing track of what the heck you're subscribed to). Some people have like ten thousand subscriptions. But if most of them don't post anything useful, everything's fine. And if you subscribe to somebody who posts a lot but you lose interest, you can get rid of him easily. So maybe everything's making sense.

Well, I'll be attending this years NIPS conference in December. An excellent opportunity to try twitter again ;)