Sunday, December 30, 2007

Babbleknot indexes 90,000 forums

The site reached 90,000 forums around 6:30 AM Central Time, December 30, 2007. It took much longer than expected. As soon as I mentioned the rate per day in the blog, we started having trouble. All seems to be well now & we're back on the 2,000 per day rate for a while. Until the Cowboy strikes again...

Some other updates.

We started the full text indexing last week. We aren't going back for history & only plan to keep a "window" of indexed posts for the time being. From looking at statistics we've indexed about 880,000 posts in about 4 days time, just running a single thread in the background. We're doing about 1 forum per minute right now, so about 1,440 forums per day. Obviously that rate will need to increase. Our Amazon EC2 & S3 based technical architecture will help here.

A new search page that queries the indexes was also released. It links to the forums & threads right now. The content, images, and past graphs will be there soon.

Work on identity claiming is progressing. I will blog about this soon.

I've said before that I am surprised to see what is out there. The adult filter is working pretty well, and I spend most of my time on the "Safe" side. Occasionally something adult-oriented makes the front page, but someone is always quick to flag it. This will need automated moderation very soon. Even on the "Safe" side, there is some odd stuff. What do you think a site named "Badger and Blade" would be about.?

"Our goal is to provide a fun and informative site covering all aspects of wet shaving, catering to all groups from beginners to seasoned pros.".

A site & forums dedicated to shaving. And they look pretty busy.

Yep, anything is there. Just search at


Sunday, December 2, 2007

Babbleknot New Features

The team has been gulping diet rockstar & busy adding new forums and features. Currently the board is loading at the rate of about 4,000 new forums per day, with a goal of 150,000 boards during the beta. Just now over half-way there with 78,184 forums loaded.

I continue to be amazed at the variety of boards out there, and the activity that still occurs, even with the popular web 2.0 destinations. I was a fan of King of Queens, but below is a thumbgraph of an entire board dedicated to Leah Remini. Anything is there. You just have to look. In the graph below, the "green" lines are indications of direct replies in threads, which is a clue to relationships between members.

We'll crank up the search & daily indexing rates in the future. Right now, many graphs are done on demand or at the rate of 2,000 per day in a background job. Babbleknot is running on Amazon's Elastic Compute Cloud, EC2, with plans to add servers for spiders, large graph layout, and other types of large scale analysis as demand increases.

The new features are key milestones that I am happy to finally check off. Here they are in my perceived order of significance.

The biggest new feature in my opinion is background generation of graphs. Before, if you clicked a forum link & the data was unavailable, the board was parsed while you waited, hourglass clocking, driving you crazy. Now if the data isn't on hand, the generation is pushed to the background. Most complete in 30 seconds or so, but a large board with lots of content & posts will obviously take longer. The forum names appear on the right sidebar immediately & will change to a hyperlink once complete. Sending the generation the background is good for both usability & the overall performance of the site. Right now the requests are persisted for the current web session, but the thought is we will keep that history for a short time so you can go back a bit.

Second new feature is an RSS feed of the recent graphs page.

Finally, image flagging is now in with very basic functionality. Clicking the red flag next to content image will remove the image from view on subsequent retrieves of that graph. It's possible entire forums should be flagged. To me, flagging isn't necessarily bad. I believe there is a large segment of the population that would probably like to surf only the flagged content & boards. This will eventually become part of the Safe Search feature, which I think I'll brand as an opposite, like Wreckless Search.

More updates coming soon. Please post questions and comments if you love or hate

Saturday, December 1, 2007

Social Graph Visualization with is now open for a public beta.

Babbleknot scans the index pages of over 70,000 message boards to generate thread / topic metrics. Things such as velocity, mass, acceleration to name a few. These metrics are used to generate input to a spider that indexes the content and generates graphs of the hot threads.

Below is a sample graph which is part of this flickr set.


On the graph above, you see people, threads, and images, with lines representing relationships between objects with a "circular" layout.

Graphs? Why? The graphs provide a top-down view of the content, people and their relationships. You can see 10s, 100s of threads & topics at once. More coming hopefully as we learn to scale. In the future you'll also be able to generate graphs that span boards & forums. The best part of these graphs are that they aren't just static images. You can zoom & pan (screencam), and all the embedded content images are wrapped in a lightbox for easy display and perusal. You have quick access to the thread & user profiles with links representing post responsibility and direct quoting. More coming.

Privacy? Yep, there will be complaints. I expect some push back. I'll be happy to pull any board that doesn't want to be indexed. We'll be respecting robots.txt soon to make that easy.

What else is coming? Identity claiming so you can link your identity to individual board identities, and delve deeper into your own social graph. Tagging of all kinds is 99% complete & will be on the home page soon. Full content indexing, thread tracking, alerts, content recognition beyond images, more board types, and maybe even posting capability.

Watch this blog for more information and give a try today.