RailsConfEurope slides

written by benjamin on September 20th, 2007 @ 06:20 PM

I did a short session at this years RailsConfEurope, talking about the way the page caching is working on omdb. We decided to use page caching wherever possible, and with REST in mind, page caching will become more important. I did enjoy RailsConf, looking forward to next years conference, again in Berlin. Here are my slides

Caching in a multilanguge environment

Bratwurst sold out

written by benjamin on September 14th, 2007 @ 09:57 PM

Just like the RailsConfEurope, BratwurstOnRails is a sell out. Well, actually the Event is free, but after more than 400 registrations, we had to close down the sign up. We hit our capacity limit for the venue. We’re blown away by how many people are interested in a pre-conf socializing event, and we hope to start a tradition here :)

Thanks to Mathias, Florian and Andrea for their help and support during the organisation. See you all on sunday.

Tracking down a memory leak in Ferret 0.11.4

written by benjamin on July 29th, 2007 @ 12:44 PM

We recently discovered a unnatural growth of our mongrel servers. Starting with about 70megs of memory, each mongrel process will have more than 200megs just a couple of hours later. We first blamed rmagick for the leak, as there are some reports circulating about possible leaks in rmagick.

Gathering data about the memory usage of rails is quite simple, using bleak house, which uses a specially patched version of ruby. It can create nifty images about the object/memory usage of any ruby application. After some requests, we cannot find any memory leak in rmagick, but created this scary result, requesting several search pages.

the green line is the memory usage

omdb uses ferret on a lot of occasions. All edit movie dialogs are dependent on ferret, and even the ‘similar movies’ feature is just a complex ferret search statement. First of all we tried to isolate a search query, that will leak. Some of the more basic queries (like searching for people by name) were not affected. We focused on the movie-search, trying to find out, what causes the leak. Our first guess was, that some IndexReader or Writer was not closed properly, so old indexes still remains in the memory. The memory growth was quite huge, consuming almost 10megs every 10 requests. After some checking – and even rewriting our searcher class to purely rely on Ferret::Searcher, not on Ferret::Index::Index anymore – we couldn’t find any abandoned index.

So we took another look at the bleak house results. The number of objects in the search controller are consistent, there is no growth in the number of opened objects. The memory is jumping every 10 requests, but we fired our curl requests to just one action, so there is no reason, why the memory is growing every 10th request. Looking at the special memory report of bleak house, we found, that the memory usage is growing linearly.


We decided to remote all custom omdb ferret code and try to build the search using just the Ferret API. We added feature by feature, but long time, no leak. Just after we’ve added our custom analyzers, the leak appeared again. omdb uses a lot of different analyzers, not only per language, but per field. To use the right analyzer for each field/language, we’re using the PerFieldAnalyzer, which allows us to specify how we want each field to be analyzed. So the leak was not inside the IndexReader or Writer, but part of the analysis process. We managed to extract the problem to this simple script, that will consume lots and lots of memory, if you run it.

First we thought the MappingFilter is the problem, but it’s actually the PerFieldAnalyzer, that is leaking memory. If you just use StandardAnalyzers, the leak is marginal, but adding big character-mapping tables – maybe even to a lot of different fields – will result in the big memory consumption we’re experiencing.

The fix is trivial, the problem with the PerFieldAnalyzer is located in the C code of Ferret, so we just need to implement our own PerFieldAnalyzer, that is written in ruby. We’ve created a small Analyzer that will do the same as the build-in PerFieldAnalyzer.

setup a local copy of omdb

written by benjamin on July 15th, 2007 @ 03:28 PM

We’ve added a page on how to setup a copy of omdb.org on your local machine. The source code of omdb was released under the terms of the MIT License at this years Rails Konferenz.

http://bugs.omdb.org/wiki/InstallFromSubversion

rails 1.1.6 and ruby 1.8.6

written by benjamin on July 6th, 2007 @ 04:43 PM

While trying to install omdb on a fresh MacBookPro, we experienced strange error messages from active record. We just tried to access the database and got errors like this:

ArgumentError: wrong number of arguments (1 for 0) from .../activerecord-1.14.4/lib/active_record/vendor/mysql.rb:551:in `initialize' from .../activerecord-1.14.4/lib/active_record/vendor/mysql.rb:551:in `new' from .../activerecord-1.14.4/lib/active_record/vendor/mysql.rb:551:in `scramble41' from .../activerecord-1.14.4/lib/active_record/vendor/mysql.rb:141:in `real_connect'

It seems that the constructor for SHA1 has changed in ruby 1.8.6 and therefore the rails 1.1.6 mysql adaptor is broken, giving this error message. So what's the conclusion? Simply make sure you're using ruby 1.8.5. If you're using Mac and Darwinports, see this post on how to do this. The subversion-revision you need is 21127.

ferret talk @ berlin ruby ug

written by benjamin on June 19th, 2007 @ 01:15 PM

A few weeks ago, I did a talk at the berlin ruby user group about ferret and the ferret implementation at omdb. We’re happy with the way ferret fits into omdb, and a lot of our experiences (especially the background indexing) is now part of Jens’ acts_as_ferret plugin for Rails. If you’re planning to work with ferret, grab a copy of Lucene in Action, over at Manning. It’s a great book and all the details about Analyzers and Tokenizers is very useful to understand the way Ferret works.

Here’re the sildes of the talk. If you’re going to Rails Konferenz or Rails Conf Europe, be sure to attend Jens talk about acts_as_ferret.

Talking at the german RailsConf

written by benjamin on June 19th, 2007 @ 01:06 PM

The unofficial german Rails Konferenz is this Friday. They asked me to talk about omdb and the conclusions we had come to after developing with Rails since the days of Rails 0.14. We’re looking forward seeing some of the german rails pioneers over in Frankfurt. I’ll add the slides here as soon as the conference is over.

embed omdb movie information

written by benjamin on April 27th, 2007 @ 05:44 PM

We've added a new feature to allow you to embed some movie information in you personal blog or website. here is a preview.

One year in the making

written by benjamin on March 22nd, 2007 @ 12:04 PM

It’s one year ago now, that the first subversion-checkin was transfered to the omdb-svn. You can see all the details on that histroic event right here

http://bugs.omdb.org/changeset/1

I actually started coding some three month earlier, building a prototype with less functionality and without a real layout. The interface was ugly, most of our current usability wasn’t even thought of. This is a screenshot how omdb looked like back in April of 2006, the first layout we wanted to implement.


Thanks to Thomas for doing most of the Interface-related work back then. After that several People worked on the Layout, including Sheila and Namics.

Encyclopedia Page

written by benjamin on March 14th, 2007 @ 03:44 PM

I’m not a designer. It’s been a long and hard way to get to the current omdb design and interface. And I’ve always avoided to implement the main pages for our four main sections Movies, People, Companies and Encyclopedia. Last Sunday, on my way to Princeton, I finally came up with a design for the Encyclopedia Page i’m quite satisfied with.

http://www.omdb.org/encyclopedia

Now we need to finalize the Movie Page, the People Page and the Company Page. I’ll go back to europe next Sunday, maybe there’s enough time to finalize these pages.

Safari, AJAX and UTF8

written by benjamin on March 8th, 2007 @ 10:01 PM

The Lightbox-Popups we’re using did not work with Safari, at least until now. All UTF8 characters were broken, even though Firefox and IE7 were showing them correctly. Searching the web offers some standard solutions, however, we implemented our own few lines of code, as some of the solutions did not work the way we want. We’ve implemented a after_filter like that.

def add_utf8_header
  content_type = @headers["Content-Type"] || ( request.xhr? ? 'text/javascript' : 'text/html' )
  if /^text\//.match(content_type)
    @headers["Content-Type"] = "#{content_type}; charset=utf-8" 
  end
end

Right now it’s working fine with all major browsers. However, our first attempt – adding a AddDefaultCharset option to apache – did not work.

2007-02-28 created a blog entry

written by benjamin on February 28th, 2007 @ 09:04 PM

Today we finished our logging system, partially based on Brandon Keepers acts_as_audited. During the closed beta phase our editors created 16501 People, 5028 Categories (Plot Keywords, Genres, etc.), 1352 Movies, 937 Companies and 510 Jobs. All of these objects without any log-information. To keep our history consistent with all soon-to-be created objects, each of these old objects needed at least one logentry. So today we created 20.000 logentries, one created something entry per object. I guess we can officially call the 2007-02-28 the day, omdb has been created.

So what’s left to do? We still have a bunch of errors in our testcases, nothing serious, but a few of them need special attention. And we’re waiting for the 1.0 release of ferret, due by the end of this week. But generally spoken, we’re ready to launch.

Another Update..

written by benjamin on February 14th, 2007 @ 06:14 PM

Time for the last few updates before going live. Most importantly we improved the search stability by using backgroundrb to serialize all indexing requests. While Dave announced that ferret 1.0 will be released soon, we encountered a number of problems with the ferret index if you have several threads writing to the index. Backgroundrb eliminated that problem and we’re now quite happy with our 0.10.14 ferret index.

We also included the first version of logging on ‘who changed what information for which movie’. By now only a few selected people are able to enter and edit movie data on omdb.org. This is part of our closed beta test that will soon be finished. We now need a way of keeping track of all the changes of a movie and we want to be able to provide RSS feeds.

We first checked out acts_as_versioned and acts_as_audited, but both solutions are not what we were looking for. Mostly, because both cannot keep track of changes in relations. Most of the movie data is based on relations (casts/crew-information, categories, plot keywords, production companies and so on). We decided to implement our own logging mechanism that will log all these changes. It is partially based on acts_as_audited, thanks to Brandon for the plugin. Here’s a preview of what it might look like:

http://www.omdb-beta.org/movie/70/history

Some things are left to do though, e.g. we need to include all of the wiki/abstract changes into the log. Wiki/abstracts are still using acts_as_versioned. Furthermore the log is not that nifty, we need to work on the design. I’m sure a few bugs will still be in the code but if anyone is interested in such a plugin, we’ll see if we can extract it.

livetabs plugin

written by benjamin on January 28th, 2007 @ 09:17 PM

A lot of dialogs on omdb have tabbed content that you can switch via Javascript. We’ve created a small plugin to extract that functionality.

The ‘livetabs’ plugin for Rails makes it easy to add any number of tabs and tabbed content, that the user can switch via javascript. To install the plugin, use rails plugin script

script/plugin install http://svn.omdb-beta.org/plugins/livetabs

Next you need to copy the javascript, stylesheets and images to your rails public folder. Right now the plugin has just one skin, that is the current omdb-tab-layout. To copy these assets, type

rake livetabs:install:default

To use the tabs, simply add a line like that to your view:

<%= livetabs "One", "Two", "Three" %>

Which will you give three tabs. You can use any number of tabs of course. The pluin will look for partials named like the name of the tab, so in this case for a _one.rhtml, _two.rhtml and _three.rhtml. If you use spaces in your tab names, they will be transformed to underscores. So a tab names “Recent News” will look for a partial called _recent_news.rhtml.

You need to include the default javascripts (javascript_include_tag :defaults) and the livetabs stylesheets (stylesheet_link_tag ‘livetabs’) in your layout.

Here is a small preview, how the tabs will look like.

Mr. Pitt and Mr. Brosnan

written by benjamin on January 16th, 2007 @ 11:53 AM

One of the great features of textmate, that i got used to so quickly, is the way you load files. If you want to edit the movie_controller, just press cmd-t, enter ‘mo co’ and the movie_controller.rb file will be right there for you to select. This is something i wanted to add to omdb, if you add for example actors to a movie.

Its already possible to add someone like Brad Pitt as an actor by simply searching for ‘br pi’ in the appropriate add-actor dialog. omdb will find Brad, however, you will get results for Pierce Brosnan or Pieter Jan Brugge as well. Entering the phrases ‘br’ and ‘pi’ should make sure, you don’t want to find Pierce Brosnan, otherwise you would have entered ‘pi br’ not ‘br pi’. :-)

With the wonderful magic of ferret, you can do exactly that (you need to upgrade to at least 0.10.14). Take a look at the following example:

require 'rubygems'
require 'ferret'
i = Ferret::I.new
i << {:name => 'Janice Joplin'}
i << {:name => 'James Joyce'}
i << {:name => 'John Jarrett'}

Now here’s the query to make sure you find only two of the people.

include Ferret::Search::Spans
puts i.search(SpanNearQuery.new(:clauses => [
  SpanPrefixQuery.new(:name, 'ja'),
  SpanPrefixQuery.new(:name, 'jo')
], :in_order => true)).to_s(:name)

You just get 2 hits (3 hits with ferret versions prior to 0.10.14). omdb will make use of it in the near future, improving the usabilty even more.

Options:

Size