Earlier this month, we announced that we would be changing our API's pagination method from offset to cursor-based. These are two terms that don’t necessarily come up in everyone’s day-to-day, so we made this post to describe both, and why using one would be preferable to the other. This post assumes no prior knowledge of the concepts to ensure that anyone interested in learning more, developers and non-developers alike, can get on the same—and excuse the pun—page.
What is Pagination?
With the large mass of information in a database, you need a way of sorting through to access information in digestible pieces. Pagination is the concept of splitting up that mass of data into smaller subsets, or “pages.” The benefit is that users can retrieve small subsets of their data entries at a time, instead of all at once.
There are a few ways you can go about paginating your data.
In offset pagination (also called page-based pagination), the database is determining which entries to show on any given page defined by the number of entries you’ve dictated that there should be per page. For instance, if you have 40 entries and specify that there should be 10 items per page, you should see results for the first 4 pages and no results for any page after that. When you ask the server for a certain page, the database runs through all of your entries, counting up by the per-page number you’ve specified, until it arrives at the requested batch of entries, or “page,” that it returns to you. This type of pagination is behind the familiar concept of sifting through pages of results when surfing the web. Think clicking the “next” button or jumping around numbered pages when looking at pages of search results.
Issues with Offset Pagination
A pitfall of offset pagination is that it can mistakenly return duplicate entries or skip entries if data is being simultaneously added or deleted from the database. Since the pages are getting defined in real time based on the current set of data, desired number of items per page, and total number of entries, the specific set of entries housed on each page is not static. This means the same entry could be returned to you on two separate pages or get skipped over.
For example, let’s take a database of alphabetically ordered letters where each page includes 3 items: page 1 includes A, B, and C; page 2 includes D, E, and F.
While you are looking at page 1, someone deletes C. Since our database must return pages of 3 letters each, next time it divvies up its pages, page 1 will include A, B, and D—every entry subsequent to C will have shifted in position. So when you click to page 2, where you would have expected to see D, E, and F, you will actually see E, F, and G—because D is now occupying the third spot on page 1.
Unless you go back and look at page 1 again, you will not be shown D. And reasonably (even if mistakenly), why would you think to go back to view page 1 again if you just looked at it?
To bring the example to the context of Modern Treasury, think about viewing historical payment orders while new ones are simultaneously being created. The real-time shifting of which payment orders are on each page means that you run the risk of missing relevant entries or seeing repetitious ones as you go through page by page.
Offset pagination also runs into performance issues as your database scales. To return your desired page, the server has to traverse all the entries until it reaches the page that you are seeking. This could mean hundreds of thousands or millions of records to go through. So while it is only returning one singular page (seems simple enough), it still has to do all the reading and counting through every entry to determine the entries to be included on that one page you asked for. So as your entries grow in number, offset pagination becomes less performant.
Cursor based pagination works a bit differently. A cursor is a value that references an entry, and can be thought of like a divider in a filing cabinet—it separates the content that comes before it from the content that comes after it. The cursor given by the server will reference the next entry that comes after the last entry you received from the server response. To use our earlier alphabet example, after you receive the first set of results for A, B, and C, the cursor is set at D—the next entry that comes after the last entry you received.
Once you’ve been returned a cursor, you can provide it in your requests to act as a farther-along starting point within your data entries. The server is then able to efficiently skip all entries that come before your specified cursor value—and doesn’t need to run through every unwanted entry the way it does in offset pagination.
Importantly, cursor-based pagination solves the two pitfalls of offset pagination previously discussed. Since the cursors are fixed points, the pages aren’t being split up relative to the overall positioning of entries, meaning the content on each page isn’t shifting around when new entries are added or deleted. And since the server doesn’t need to do the counting and running through of entries that is required in offset pagination, scalability is less of an issue.
Ultimately in learning more about these pagination methods, I gained a stronger understanding of how this switch will contribute to the best MT experience for our API users. If you have any questions about pagination or how to migrate, reach out—our team is here to help work with you to ensure this change is successful.