Pagination spanning across databases
Normally, enabling a paginated query on a RDBMS is as simple as adding “limit” and “offset” parameters to any “select” query. However, recently I came across a situation where a service was to get data from multiple databases (hosted on separate servers) and expose the same to the client in a paginated manner i.e. the client can make a request saying that it needs the first page containing 10 records, then the second page containing 10 records and so on.
I will not go into the details of what pagination is and why is it important. There are several articles out there that explain Pagination as a concept.
So. let’s go through the considerations first:
- There are 2 databases (Say DB1 and DB2) which are abstracted by a service layer to any client that wants to interact.
- If a record exists both on DB1 and DB2, DB2 takes precedence i.e. DB1 record shouldn’t be returned.
- There is no temporary/intermediate data store which can store aggregated data from both DB1 and DB2.
- There is no direct access to DB1 i.e. we have to build another service (say S1) that exposes data from DB1 so that the service running with DB2 (say S2) can access it.
- Client can only make a request to S2 and S2 has to internally connect with DB2 and DB1 (via S1) to prepare the data and respond to the client’s request.
Let’s consider for example that client makes a request to S2 asking for the first 10 records i.e.
P = 0, S = 10
where P stands for Page and S stands for Size of the page, which in this case is 10 records. Let’s assume that the table in DB2 has 6 records that match the conditions specified by the client, which means that, to be able to return 10 records, I need to query just the first 4 records from the table in DB1 via S1.
Hence the (pseudo) query on the table in DB2 would be:
SELECT * FROM <table _name> where <client_conditions> limit S offset (P*S)
In addition to the above, we also need the total count of eligible records which means, we need to make the following query:
SELECT COUNT(*) from <table_name> where <client_conditions>
Once we get the result out of the above queries, we can check the size of the result set (RS2) and the result of count query (C2)which in this case would be 6 and 6. Hence, to get the remaining 4 records from the table in DB1, the (pseudo) query would look like:
SELECT * FROM <table_name> where <client_conditions> limit (S-length(RS2)) offset X
where
X = (P * S)-C2, if P > 0 and floor((P*S)/C2) > 0
X =0, for any other condition
In the above example,
length(RS2) = 6 and P = 0,
=>limit = 0 and offset = 4
To check the other case when the table in DB2 has more than S records, let’s consider that the table in DB2 has 12 records, which means to get the first page of 10 records, we wouldn’t make a query to S1/DB1 since DB2 already has 10 or more records.
To return the Second page, we get 2 records from DB2 and the pagination parameters for the query to be made on DB1 are calculated as follows:
length(RS2) = 2
C2 = 12
floor((P*S)/C2) = floor(10/12) = 0
hence, limit = 8 and offset = 0
Let’s pull in another page of records i.e.
P = 2, S = 10
length(RS2) = 0
C2 = 12
floor(20/12) = 1
Hence, limit and offset for query on DB1 will be calculated as follows:
limit = 10, offset = 20–12 = 8
For the next page,
limit = 10, offset = 30–12 = 18
and so on.
The above logic was implemented and tested to check for the coverage of the known scenarios, and it has worked well so far.
Thanks for the read. I will look forward to any feedback that you might have. Please glance over my other articles in case you are interested in software engineering concepts and case studies.