Google App Engine: Pagination by date

PAGESIZE = 10 
 
class Story(db.Model): 
  content = db.StringProperty() 
  when = db.DateTimeProperty(auto_now_add=True) 

I. Efficient Paging (with a next button)

Source: Paging through large datasets

A. Example Url

/Newest?bookmark=2009-11-06T17:17:34 

B. Code

def get(self): 
  stories = Story.all().order('-when').fetch(PAGESIZE+1) 
  next = None 
  bookmark = self.request.get('bookmark')
#bookmark = next in template


  if bookmark: 
    query = Story.all().order('-when') 
    query.filter('when <=', bookmark) 
    stories = query.fetch(PAGESIZE+1) 
  else: 
    #first page or last page 
    query = Story.all().order('-when') 
    stories = query.fetch(PAGESIZE+1) 
  if len(stories) == PAGESIZE + 1: 
    next = stories[-1].when 
    stories = stories[:PAGESIZE] 
#'next' and 'stories' are template variables 

Template:

{% if next %} 
Next Page 
{% endif %} 

C. Does it work? Not yet!

This is what you'll end up with if you read the first part of Google's tutorial on paging. And it's supposed to just work -- but it doesn't.

The 'when' attribute is stored as a datetime.datetime object, which look like this:

datetime.datetime(2009, 11, 6, 12, 20, 5) 

Passing it to a template converts it into a string. By default, it'll be converted into ISO format:

'2009-11-06 12:20:05' 

But when you retrieve this string from your page, it won't compare well with datetime.datetime objects. Converting it is relatively easy though. Convert it before you send it to the template like this:

import datetime 
 
next = stories[-1].when.isoformat() 

And then convert it back after retrieving it:

if bookmark: 
  dt = datetime.datetime.strptime(bookmark, "%Y-%m-%dT%H:%M:%S") 
  dt.replace(microsecond=0) 

I needed to set microsecond to 0 on my local machine in order to get this to work. YMMV

Tips:

1. Instead of passing a date as the 'next page' link, pass the ID of the next story. Then on the next page fetch this objects date to figure out where to start. 

In order to avoid a datastore lookup, use memcache.

2. On every page store the first element's date in memcache so you can look it up on the next page to use for a 'previous page' link.

Source: Discussion

D. Downsides

If two stories are submitted at the same second and one of your pages is set to start filtering at that datetime value, you'll be missing a story. This is really bad.

For a solution to this problem, see the original paging article from Google. And source code .

II. Simple Paging (less efficient, perhaps more useful)

A. Example Url

/Newest?page=2 

B. Code

PAGESIZE = 10 
 
def get(self): 
  max_results = 1000 max_pages = (max_results - PAGESIZE) / PAGESIZE page = self.request.get_range('page', min_value=0, max_value=max_pages, default=0) 
  start = page*PAGESIZE 
  stories = Story.all().order('-when').fetch(PAGESIZE+1, start) 
  more_stories = len(stories) > PAGESIZE 
 
  prev_page = None 
  if page: #if page == 0 == False, we're on the first page 
    prev_page = str(page - 1) 
 
  next_page = None 
  if more_stories: 
    next_page = str(page + 1) 
 
  stories = stories[:PAGESIZE] 
 # 'stories' and 'next_page' and 'prev_page' 
# are template variables 

Template:

{% if next_page %} 
Next Page 
{% endif %} 
 
{% if prev_page %} 
Previous Page 
{% endif %} 

C. Explanation

On the first page, the value of 'page' will be its default value: 0

So 'start' will also equal 0. We fetch our PAGESIZE + 1 (10 + 1), and specify an offset of start (0).

If there are 20 stories in the database, fetch will return the first 11. If there are 8 stories in our database, fetch will return 8.

On the current page we'll need the first 10 stories, but we need to figure out if there are more than 10 stories so we can display a "Next Page" link.

If we fetched 11 stories, len(stories) will be greater than PAGESIZE. And if that's the case, then there's a next page! We'll set 'next_page' equal to the current page's value (0) plus 1.

The reason this is inefficient is because once we get to the 5th page we'll be fetching 11 stories while using an offset of 50. The way fetch works, it will get 61 stories, discard the first 50, and then give us the remainder. All we want is 11 stories, but we'll end up incurring the time and processing power required to get 61 of them.

D. Some Tips

1. Convert the values of next_page and prev_page into strings because in the template {% if prev_page %} would evaluate to False if prev_page was equal to 0.

2. If you know you're going to be returning a lot of results, link to the first few pages directly.

[1] 
[2] 
[3] 

 

Comments [1]

Google App Engine: Using the List Property (db.ListProperty)

I. Many-to-One Relationship with Reference Property

A. Models

class UserInfo(db.Model):
  """key name = user id"""

class Story(db.Model):
  content = db.TextProperty()

class Favorite(db.Model):
  user_info = db.ReferenceProperty(UserInfo)
  story = db.ReferenceProperty(Story)

B. Request Handler

Note: The following code is a little inefficient.

user_info = UserInfo.get_by_key_name(users.get_current_user().user_id())
# get all favorites
favorites = user_info.favorite_set.fetch(1000) # equivalent to Favorite.all().filter('user_info =' user_info.key())
# get all favorite stories
story_keys = [f.story.key() for f in favorites]
stories = db.get(story_keys)

f.story.key() retrieves the story object from the datastore before returning its key -- and it's done one by one, due to the for loop.

Use this instead:

To get the key without retrieving the object use:

user_info = UserInfo.get_by_key_name(users.get_current_user().user_id())
# get all favorites
favorites = user_info.favorite_set # equivalent to Favorite.all().filter('user_info =' user_info.key())
# get all favorite stories
story_keys = [Favorite.story.get_value_for_datastore(f) for f in favorites.fetch(1000)] 
stories = db.get(story_keys)

Using get_value_for_datastore() here retrieves the story key without a datastore lookup of the story object. See this discussion.

C. Why Use This?

Calling user_info.favorite_set (a back-reference) returns a Query instance that you can order and filter.

If your Favorite model had a date property you could do this:

favorites = user_info.favorite_set.order('date')


II. Many-to-One Relationship with List Property

A. Models

class UserInfo(db.Model):
  """key name = user id"""
  favorites = db.ListProperty(db.Key) #list of keys of Story entities

class Story(db.Model):
  content = db.TextProperty()

B. Request Handler

user_info = UserInfo.get_by_key_name(users.get_current_user().user_id())
# get all favorites
favorites_keys = user_info.favorites
# get all favorite stories
stories = db.get(favorites_keys)

C. Why Use This?

The code is definitely cleaner, but you lose some flexibility.

1. You can't order or filter the favorites before retrieving them (although they'll probably be sorted in chronological order because that's how they were added.

2. The list of story keys is read when you fetch user_info. If a user has favorited a thousand stories, you have a list of a thousand story keys on your hands. You can fetch a limited number of these stories (e.g. db.get(favorites_keys[:5]), but with the first approach you never run into this problem, you just query only the favorites you want (e.g. user_info.favorite_set.fetch(5))

3. With the code as it is you can't access a list of users who have favorited a story. In the first approach you can (e.g. story.favorite_set)

You can filter all users that favorite a story like this:

favusers = UserInfo.all().filter('favorites =', somestory.key()).fetch(1000)


III. Many-to-One Relationship with List Property (avoid reading the list)

There were some problems with the second approach, even though it's a much simpler design.

Here's another way to approach the problem (from Google I/O: Building Scalable, Complex Apps on App Engine):

A. Models

class FavoriteIndex(db.Model):
  """parent is a Story entity"""
  favorited_by = db.StringListProperty() # list of user ids

class Story(db.Model):
  content = db.TextProperty()

B. Request Handler

index_keys = FavoriteIndex.all(keys_only=True).filter('favorited_by =', users.get_current_user().user_id())
# get keys of favorite stories
story_keys = [k.parent() for k in index_keys] # key.parent() documentation
# get all favorite stories
stories = db.get(story_keys)

C. How Does This Work?

Internally, the datastore represents a list property value as multiple values for the property. (source: List Property documentation)

So all of the items in the 'favorited_by' list property don't have to be traversed because it's as if each item in the list property was indexed separately. And since you're only returning the keys of the FavoriteIndex entities (i.e. FavoriteIndex.all(keys_only=True)) you never have to deserialize the list.

After you get the keys of the FavoriteIndex entities that you want it's easy to extract their parents' keys. And then we get the Story instances for the corresponding keys.

D. Why Use This?

1. Clean, simple code. We got rid of the UserInfo model because we're using the user id's of the built-in User objects.

2. The list property is never serialized/deserialized like with the second approach, making it much faster.

3. You can request a limited number of stories without serializing/deserializing the list (e.g. FavoriteIndex.all(keys_only=True).filter('favorited_by =', users.get_current_user().user_id()).fetch(5))

4. You can easily get access to all of the user id's of people who favorited a particular story. Here's how:

index = FavoriteIndex.all().ancestor(story).get()
user_ids = index.favorited_by

Note: One of the downsides to this approach is that you still can't order or filter favorite stories before you retrieve them.

E. Some Tips

1. a. If you want to be able to quickly figure out if someone favorited a story: for each favorite, create a 'UserFavorite' entity as a child entity of the relevant Story entry, with the key name set to the user's user id. This way, you can determine if a user has favorited a story with a simple get:

UserFavorite.get_by_key_name(user_id, parent=a_story)

(source: StackOverflow answer)

1. b. An alternative to the above (untested by me):

Give all the 'FavoriteIndex' objects the key name 'favs'

#retrieve a story
story = # ...
#get current user's id
uid  = users.get_current_user().user_id()
#build a new key object
fav_index_key = db.Key.from_path('Story', a_story.key.id_or_name(), 'FavoriteIndex', 'favs')
#perform a key only query (faster than normal)
#filter by the favorite index key that was just built and the user's id
is_fav_story = FavoriteIndex.all(keys_only=True).filter('__key__ =', fav_index_ey).filter('favorited_by =', uid).get()

This option is really nice because it doesn't require as many writes to the datastore (or as much datastore space), but it will probably be a little slower than the first option outlined in 1a.

2. If you want to be able to quickly access a user's information, make an entity to store information in (e.g. UserInfo) and set its key name equal to a user's id. This will make it easy to, for example, look up all of the username's of the users who favorited a story.

index = FavoriteIndex.all().ancestor(story).get()
users = UserInfo.get_by_key_name(index.favorited_by)
for user in users:
  print user.name

Comments [0]

Python: How is the built-in method str.strip() implemented?

str.strip()

Return a copy of the string with the leading and trailing whitespace removed.  # "   hello   " => "hello"


Most of Python's built-in types are written in C. They're in the Objects directory [1]

The stringobject.c file contains the implementations of string methods, and the string_strip function provides some details of the underlying functions used to provide the strip method's functionality. [2]

The string_strip function [3] pointed to the do_strip function, which did all the real work. Here's the do_strip function:

Py_LOCAL_INLINE(PyObject *)
do_strip(PyStringObject *self, int striptype)
 {
    /* returns a NUL-terminated representation of the contents of string: */
    char *s = PyString_AS_STRING(self); 
    /* returns the length of the string: */
    Py_ssize_t len = PyString_GET_SIZE(self), i, j;
 
    i = 0;
    if (striptype != RIGHTSTRIP) {
        while (i < len && isspace(Py_CHARMASK(s[i]))) {
            i++;
        }
    }

    j = len;
    if (striptype != LEFTSTRIP) {
         do {
            j--;
        } while (j >= i && isspace(Py_CHARMASK(s[j])));
        j++;
    }

    if (i == 0 && j == len && PyString_CheckExact(self)) {
        Py_INCREF(self);
         return (PyObject*)self;
    }
    else
        /* returns a new string object with the value arg1 and length arg2: */
        return PyString_FromStringAndSize(s+i, j-i);
}

The C api for the string object helped me figure out a lot of the special functions in this code. I translated it into a working Python example to understand it better.

The above code, in Python:

import string  # This is only used on the fourth line, to get whitespace chars.

def isspace(ch):
    return ch in string.whitespace

def do_strip(string, striptype="both"):
     length = len(string)

    i = 0
    if striptype != "right":
        while (i < length and isspace(string[i])):
            i += 1

    j = length
    if striptype != "left":
         j -= 1
        while (j >= i and isspace(string[j])):
            j -= 1
        j += 1

    if (i == 0 and j == length and isstring(string)):
        return string

    else:
        return string[i:j]
 
print do_strip("   hi   ")  # returns "hi"

And then there's the much more elegant string.strip() method from Python 1.5.2:

def strip(s):
    i, j = 0, len(s)
    while i < j and s[i] in whitespace: i = i+1
    while i < j and s[j-1] in whitespace: j = j-1
    return s[i:j]

[1] You can download Python's source files here: http://www.python.org/download/

[2] Paul Boddie told me this on the Asking For Help section of the Python Wiki

[3]

 static PyObject *
 string_strip(PyStringObject *self, PyObject *args)
 {
     if (PyTuple_GET_SIZE(args) == 0)
         return do_strip(self, BOTHSTRIP); /* Common case */
     else
         return do_argstrip(self, BOTHSTRIP, args);
 }
 

Comments [0]

About

web developer/designer; python, html, css/sass, jquery.

i like exploring:
digging deeper, deeper, deeper, then BAM pieces start to fit together.

currently: http://www.storylog.com/