Motivation

Extraction of the Linked Open Data is an expensive process. The knowledge bases are usually accessed remotely, and even if they are mirrored locally or most of the requests hit the cache, the processing can take from several seconds up to several hours. Therefore it is crucial that once computed results are not involuntarily lost, and there is no need to run it again. Apart from that the requirements on the permanent storage by the Odalic Semantic Table Interpretation are relatively modest. There is only a handful of classes which instances has to be serialized:

  • Users attempting registration.
    • Otherwise the sign up requests would get lost and confirmation links sent in e-mails would not work.
  • Registered users.
    • This requirement is natural.
  • Logged in users.
    • This could be eventually omitted, but since the tokens distributed by the application might be relatively long-lived (as is the case of the UnifiedViews plugin), it is better to keep it safe.
  • Files data and meta-data.
    • Even in the case of remote files, there is strong need to keep the format used by the parser intact.
  • Files utilization by the tasks.
    • Files can be shared between several task, if we loose this information, a file might accidentally get deleted, despite being a dependency of some configured task.
  • Task configurations.
    • This requirement could be theoretically partially solved by exporting the definitions and re-importing them in time of need. But this is impractical in large numbers. Also the configurations do not keep references to the owning users (to allow easy exchange and archiving) and the files are linked only by their ID.
  • Results.
    • As mentioned in the introducing statement, all the tasks could be potentially re-run, but there are good reasons not to.

MapDB

Embedded database MapDB was selected as the solution for keeping the state from several reasons:

  • There is no need to somehow convert or map the stored objects. They only need to be serializable and immutable, which is a good idea anyway.
  • Its interfaces are common Java collections and maps, therefore its usage is almost seamless in simple cases.
    • This came handy because all of the aforementioned services were written as memory-only first (and still can be set as, through the Spring configuration), where the objects were kept in the Java collections and maps.
  • It comes with write-ahead-log and transactions.
  • It is performing well enough.
  • It has a decent support for nested maps, called prefix tables, similar to Guava Tables.
    • This served well when the time to extend the existing code to provide separate user spaces came. It practically meant only to introduce a top-level map from user IDs to the previously used maps.

Despite being in version 3, it still has some minor flaws, which were not an obstacle to its deployment in Odalic:

  • The transactions are limited only to collection instances spawned from the same DB object. This makes them hard to execute across method boundaries without sharing the DB project. This is a major design nuisance, which one does not usually encounter when relying on mainstream Java Transaction API.
  • Also all commits and roll-backs has to be explicit.
  • In comparison to previous version, the 3rd one is forcing its users to cast map keys to Objects in order to use the prefix tables. This makes the code inherently less safe (and indeed we encountered a bug that was hidden behind this, when testing the Odalic server).

Usage within the project

All stored objects are kept in collections and maps initialized from a single MapDB object, which is obtainable by calling a method getDb() at cz.cuni.mff.xrg.odalic.util.storage.FileDbService, which implements interface cz.cuni.mff.xrg.odalic.util.storage.DbService. User can specify the location of the used writeable file at key cz.cuni.mff.xrg.odalic.db.file in the config/sti.properties.

For example initialization of a map from user IDs to the user instances looks like this:

this.db = dbService.getDb(); // Kept for further use, mainly commiting of transactions.
this.userIdsToUsers = this.db.hashMap("userIdsToUsers", Serializer.STRING, Serializer.JAVA).createOrOpen(); // Uses a more performant MapDB serializer for the keys.

Typical write to the DB can be found in a method cz.cuni.mff.xrg.odalic.users.DbUserService.confirmPasswordChange(Token), which demonstrates how the transactions work:

@Override
public void confirmPasswordChange(final Token token) {
  final DecodedToken decodedToken = validateAndDecode(token);

  final User newUser = matchPasswordChangingUser(decodedToken);

  final User replaced;
  try {
    replaced = replace(newUser);
  } catch (final Exception e) {
    this.db.rollback(); // Rollback in case the user version replacement fails.
    throw e;
  }

  invalidateTokens(replaced);

  this.db.commit(); // Commit in case of success.
}

private User replace(final User user) {
  final User replaced = this.userIdsToUsers.replace(user.getEmail(), user);
  Preconditions.checkState(replaced != null, "Nonexisting user!");
  
  return replaced;
}