First, we explore the performance of two backends of the native PHP database abstraction layer (DBA). Then we explore how this layer compete with two naive implementations of the key-value paradigm.
Which one is the fastest: GDBM or DB3?
We propose 4 scenarios:
* massive write: open an empty database, write N key-value pairs, close the database
* massive read: open the database, read N key-value pairs from their keys, close the database
* traverse: open the database, retrieve all pairs, close the database
* update: do 50 times { open the database, update one key, close the database }
We tested these scenarios with three values of N: 100, 1000, 10000.
For each of them we report the performance in time (the lower, the better) of gdbm and db3. We represent it graphically (the smaller the bar, the better) and relatively (the bar length has no meaning, only the ratio matters).
exp type / N key-value pairs | 100 | 1000 | 10000 |
massive write |
|
|
|
massive read |
|
|
|
traverse |
|
|
|
update |
|
|
|
What is the performance of naive implementations of the key-value model?
We now test the two following implementations of the key-value model:
* using/serializing/unserializing a PHP dictionary (i.e. an array key=>value)
* using the EXT3 file system (the key is the file name, the value is the file content). Note that the EXT3 filesystem indexes the file names.
exp type / N key-value pairs | 1000 | 5000 | 10000 |
massive write |
|
|
|
massive read |
|
|
|
traverse |
|
|
|
update |
|
|
|
Conclusion B2: if only small updates are required, the filesystem based implementation of a key value store is much faster, probably because it minimizes the number of syscalls.
Conclusion B3: db3 is a very good trade-off, esp for large databases (see last column).
Conclusion
According to these results, PHP database abstraction layer (DBA) and its db3 backend is the best solution for basic key-value storage.
Open questions:
* does this experiment contain a bug (source code of the dbaperf experiment)?
* what about the performance of db4 backend?
* is there a mature PHP key-value datastore with queries on values (à la Google Bigtable)?
thanks to Lucas Satabin for the premise of conclusion B1.