Performance of key-value databases with php

by Martin Monperrus
I discovered recently the old key-value database paradigm and I am quite convinced. But what about the performance of different implementations? This document presents a comparison in the context of PHP.

First, we explore the performance of two backends of the native PHP database abstraction layer (DBA). Then we explore how this layer compete with two naive implementations of the key-value paradigm.

Which one is the fastest: GDBM or DB3?


We propose 4 scenarios:
* massive write: open an empty database, write N key-value pairs, close the database
* massive read: open the database, read N key-value pairs from their keys, close the database
* traverse: open the database, retrieve all pairs, close the database
* update: do 50 times { open the database, update one key, close the database }

We tested these scenarios with three values of N: 100, 1000, 10000.

For each of them we report the performance in time (the lower, the better) of gdbm and db3. We represent it graphically (the smaller the bar, the better) and relatively (the bar length has no meaning, only the ratio matters).

exp type / N key-value pairs100100010000
massive write
############################## gdbm
################               db3
############################## gdbm
######################         db3
############################## gdbm
#############                  db3
massive read
############################## gdbm
####################           db3
############################   gdbm
############################## db3
##########################     gdbm
############################## db3
traverse
############################## gdbm
#############                  db3
############################## gdbm
###############                db3
############################## gdbm
################               db3

update
############################## gdbm
################               db3
############################## gdbm
###########                    db3
############################## gdbm
#############                  db3
Conclusion A1: db3 is faster than gdbm in most of cases.


What is the performance of naive implementations of the key-value model?


We now test the two following implementations of the key-value model:
* using/serializing/unserializing a PHP dictionary (i.e. an array key=>value)
* using the EXT3 file system (the key is the file name, the value is the file content). Note that the EXT3 filesystem indexes the file names.


exp type / N key-value pairs1000500010000
massive write
#                              phpdict
#############                  db3
############################## fs
#                              phpdict
##                             db3
############################## fs
#                              phpdict
#                              db3
############################## fs
massive read
#                              phpdict
###                            db3
############################## fs
#                              phpdict
###                            db3
############################## fs
#                              phpdict
####                           db3
############################## fs
traverse
#################              phpdict
############################   db3
############################## fs
##################             phpdict
###########################    db3
############################## fs
#######################        phpdict
#########################      db3
############################## fs

update
############################## phpdict
#############################  db3
#                              fs
############################## phpdict
############                   db3
#                              fs
############################## phpdict
######                         db3
#                              fs
Conclusion B1: if the whole data can fit into memory (RAM), for massive write and massive read, serialized dictionaries are much faster.
Conclusion B2: if only small updates are required, the filesystem based implementation of a key value store is much faster, probably because it minimizes the number of syscalls.
Conclusion B3: db3 is a very good trade-off, esp for large databases (see last column).

Conclusion


According to these results, PHP database abstraction layer (DBA) and its db3 backend is the best solution for basic key-value storage.
Open questions:
* does this experiment contain a bug (source code of the dbaperf experiment)?
* what about the performance of db4 backend?
* is there a mature PHP key-value datastore with queries on values (à la Google Bigtable)?
thanks to Lucas Satabin for the premise of conclusion B1.
Tagged as: