Whiskey Media serves top-shelf Internet.

The Internet is slowly becoming a trap. We’re building a sanctuary for passionate audiences to come together and build comprehensive, fun websites with less bullshit.

Paginating Multiple Content Types (Using Redis) Part 1

A tutorial on paginating objects of different ContentTypes using redis.

A Brief Background

Paginating iterables and querysets in Django is made quite sane through the use of the Paginator class. However, if you need to paginate objects that span multiple content types, the problem becomes slightly more complicated.

We first dealt with this problem when the Tested guys wanted add a "How-To" section to their site which would incorporate Articles, Reviews, and Videos (all of which are distinctly different models). This tutorial will set up a few sample models and walk you through paginating them as a single group.

A Quick Example

We have sample repository showing how everything works together in more detail on Github if you're interested. For this example we'll define an Article and Review model, both of which we'll need to paginate:

>>> class Article(models.Model): >>> pass >>> >>> class Review(models.Model): >>> pass

Before we begin, let's run through how we would paginate Articles.

>>> from django.core.paginator import Paginator
>>> articles = Article.objects.all()
>>> paginator = Paginator(articles, 10) # Paginate articles w/ 10 per page
>>> page = paginator.page(1)
>>> page.object_list

This is fairly simple and Django's Paginator class handles Querysets very efficiently (by limiting and offsetting the query behind the scenes) and can handle any iterable as well. However, when paginating objects that span multiple content types, things get a little messy. While we could write code that effectively merges Articles and Reviews into a single organized list and pass that to the Paginator, that solution does NOT scale and we don't have a defined way to sort that list. You can't sort by their primary keys because you have no idea which pk from which model should come first.

In order to paginate instances of these two models, we'll set up and manage a list of the objects in redis. The data in the list will consist of "content_type_id:object_id" strings similar to what's found below. For those unfamiliar with Django, the ContentTypes Framework provides you with are a generic interface for working with your models. Every one of your models are assigned a unique ContentType id which allows us to create a content type id/ojbect id combination to uniquely identify each object we wish to paginate. Below is an abstraction of what our object list will look like in redis.

>>> ['article_ct:article.id','review_ct:review.id'....]

We can get object data into redis a number of ways: using signals, overriding the models' save() method, etc. In our production sites, we use a slightly more complex system that registers and manages a handful of post_save()/post_delete() hooks on the models we're interested in -- but for the sake of this example let's just override the save() method on the Article and Review models defined above.

>>> KEY = 'your_storage_key'
>>> class Article(models.Model):
>>> def save(self, *args):
>>> super(Article, self).save(*args)
>>> redis.lpush(KEY, "%d:%d" % (ContentType.objects.get_for_model(self), self.id))
>>>
>>> class Review(models.Model):
>>> def save(self, *args):
>>> super(Review, self).save(*args)
>>> redis.lpush(KEY, "%d:%d" % (ContentType.objects.get_for_model(self), self.id))

All we're doing here is pushing new objects onto the same redis list whenever an Article or Review is saved. This will work great for new objects, but it will also add duplicate objects to the list if we update an object that already exists.

So let's tweak the save() methods above so that they only add 'new' objects to the list.

>>> def save(self, *args):
>>> created = not self.id
>>> super(Article, self).save(*args)
>>> if created: >>> redis.lpush(KEY, "%d:%d" % (ContentType.objects.get_for_model(self), self.id))

Now we should be set up to insert data into redis like so:

>>># Let's create a handful of Articles and Reviews
>>> article_1 = Article.objects.create()
>>> review_1 = Review.objects.create()
>>> article_2 = Article.objects.create()
>>> review_2 = Review.objects.create()
>>> review_3 = Review.objects.create()
>>>
>>># Check their content_types
>>> ContentType.objects.get_for_model(Article)
>>> 1
>>> ContentType.objects.get_for_model(Review)
>>> 2
>>>
>>># Fetch the list of objects from redis
>>> redis.lrange(KEY, 0, -1)
>>> ['2:3','2:2','1:2','2:1','1:1'] # [review_3, review_2, article_2, review_1, article_1]

At this point we now have data being properly stored in redis. We'll now use this redis list to get, paginate and factory objects.

Pagination

In our production environment, we have a RedisPaginator that subclasses Django's Paginator class and overrides _get_count() and page(), but we can simplify things in this example:

>>># Let's get the first page containing three items
>>> page = 1
>>> results_per_page = 3
>>>
>>># Use the page to calc the lower limit of the slice >>> bottom = (page - 1) * results_per_page
>>>
>>># Use the results_per_page to calc the upper limit of the slice >>> top = bottom + results_per_page - 1 >>> count = redis.llen(KEY) >>> if top >= count: >>> top = count - 1 >>>
>>># Get a slice of the object list in redis using the bottom and top bounds >>> redis.lrange(KEY, bottom, top) >>> ['2:3','2:2','1:2']

Factory Objects Returned in the Page

So this logic will fetch a paginated 'page' the "content_type_id:object_id" strings out of redis given a page number and the number of results per page you want - now let's finish this off by factorying each object returned in the page:

>>> def factory_object(ct_obj_string):
>>> # Split the string to get the ct_id/obj_id
>>> ct_id, obj_id = ct_obj_string.split(':')
>>> return ContentType.objects.get_for_id(ct_id).models_class().objects.get(id=obj_id)
>>>
>>> [factory_object(ct_obj_id) for ct_obj_id in redis.lrange(KEY, bottom, top)]

Clean Up

So we now have post-save logic that pushes new objects into a redis list. We also have logic to get paginated data out of that list as well as a small function to factory each object in the object list. But we don't have any logic to handle the use case when an Review or Article is deleted, so let's add it.

Again, there are a number of ways we could implement this using post_delete() signals, etc., but let's just override the delete() method on the Article and Review models to allow them to flush items from the redis list after they are deleted.

>>> def delete(self): >>> redis.lrem(KEY, "%d:%d" % (ContentType.objects.get_for_model(self), self.id))
>>> super(Article, self).delete()

And there we are -- super simple pagination of objects spanning multiple content types using redis. If you enjoyed that, but need a little more power -- for instance, if you want to sort objects based on a common field like a 'publish_date' -- then stay tuned because we'll be adding a complimentary post that goes into more depth with a slightly more complex example using sorted sets.

Jeff staff on June 14, 2011 at 9:36 p.m.

Awesome! I'm totally going to go steal all this hot code and build my own site now!

drewon June 14, 2011 at 10:09 p.m.

Really cool article. I'd love to hear more about your specific uses of NoSQL databases.

TheBeaston June 15, 2011 at 12:58 a.m.

This is great, very much enjoy reading stuff like this!

Out of interest, the way I usually deal with this is have all my content types derive from a base model and use something like django_polymorphic to paginate across all content. I can see why this doesn't scale, but do you know if the performance issues are worth implementing an alternative solution like yours on smaller scale sites?

MattyFTMon June 15, 2011 at 3:25 a.m.

Never have I read an article written in English and understood so little.

psoplayeron June 15, 2011 at 7:48 a.m.

Yay, Redis is awesome! At my previous job we used it for this kind of stuff all the time.

coonce staff on June 16, 2011 at 9:47 a.m.
@thebeast said: 

Out of interest, the way I usually deal with this is have all my content types derive from a base model and use something like django_polymorphic to paginate across all content. I can see why this doesn't scale, but do you know if the performance issues are worth implementing an alternative solution like yours on smaller scale sites?

Subclassing a base model is definitely another approach. However, we ran into this problem after established models were already in place, so it wasn't something we could have easily implemented.  
 
I believe this suggestion would scale just fine.
coonce staff on June 16, 2011 at 9:53 a.m.
@drew said:

Really cool article. I'd love to hear more about your specific uses of NoSQL databases.

Thanks drew. We do use redis, and mongodb fairly heavily in production. We use redis as a backend for our queuing system as well as for a number of our other site functions such as user activity. We use mongo heavily as the data store for our stat collecting service that Honza wrote. 
 
We will be writing about some of these specific uses in more detail in the coming months.
PatVBon June 18, 2011 at 3:38 p.m.

My goal in life is to one day be able to understand this article.

drewon June 20, 2011 at 5:31 a.m.

@coonce: To be honest, I'd never really understood the point of Redis until I read that article. I've just seen a few ORMs that essentially turn it into a poorly tuned relation database. Can't wait to see how you use MongoDB so I can have a similar lightbulb moment!

As an aside, I'm sure I'm not the only one that geeks out when you guys go on the Whiskey podcasts and talk about new updates to the CMS/Deployment System/CDNs etc. I'm sure you probably have better things to do than write these blog posts but you guys behind the scenes are just as much the reason I'm a member as people running the individual sites. Thanks for the hard work!

Sparklykisson June 24, 2011 at 2:04 p.m.

Yes, coonce, talking nerdy to us.

I love when you guys start speaking in tongues.

ThatWasBrillianton June 30, 2011 at 11:26 a.m.

Cool, thanks! Been wanting to get more heavily into Django.

mockenoffon July 22, 2011 at 12:54 p.m.
When do we get part 2?
addabaddaon Jan. 14, 2012 at 12:38 a.m.

I'm not sure I see the benefit of using redis over a relational database in this scenario (other than keeping your redundant data out of your main db - which appeals to me, I admit!). In fact, if you were to use django's ORM to store your content type and object id references you could use select_related and avoid querying the content types table once for each item.

Posts by Department

On our Sites

On Twitter
  • We aren't podcasting live tomorrow morning, but we do have something special happening on the site a little after 10AM. Just sayin'

    Will Smith / @
  • DAVE: well DAVE: the thing about can town DAVE: and all there really is to say about can town is DAVE: its awesome DAVE: the end???

    Jeff Forcier / @
  • Just got near a computer. Apparently I'm making pancakes for everyone. Can do.

    Dave Snider / @
  • The new iPad just makes me want a iPhone. If the new iPad was smaller and called "iPhone 5", I'd buy it. I don't care if it's not a phone.

    / @
  • Hey, we've got some cool stuff to talk about tomorrow! We'll all be at @enemykite's place at 10AM PDT for pancakes and live video. Tune in!

    Jeff Gerstmann / @
  • Damn, Southwest. You turned what should have been an easy plane ride into 3 plane rides, a taxi, and a greyhound.

    Joey Fameli / @
  • Okay @Marvel, with this recommendation for the Uncanny X-Men HC, how can I not buy it? http://t.co/iTYSWMCv

    Tony Guerrero / @
  • "can you make this link open in a new tab?" "well... okay... but i won't always be there to hold the ctrl key for you."

    Mike Horn / @
  • Dig Deeper into Chili? Don't mind if I do. http://t.co/n3kQsFDD http://t.co/DjKMuFOo

    / @
  • Wings of Prey w/ @drewscanlon @VinnyCaravella @enemykite was fun today. Flight sticks, pedals & TrackIR are looking mighty tempting.

    / @
  • Bundaberg ginger beer. Start of a good evening. http://t.co/v7YXNzV1

    Ethan Lance / @
  • The frightening and inevitable future of @VinnyCaravella, @enemykite, and me: http://t.co/p1WqgDqx

    Drew Scanlon / @
  • YES! I have this @OBEYGIANT and 2000 Noise Pop Chicago posters at home. “@SG: @miketatum indeed, dude. for you.... http://t.co/pDstgMAo”

    / @
  • Deleted code is good code.

    / @
  • An asshole deer attacked my car at 4:30am in the absolute most podunk-nowhere-ville place ever. 8 hour delay to the road trip.

    Andy McCurdy / @