User can provide several kinds of feedback to the core annotation algorithm. All parts are grouped in the cz.cuni.mff.xrg.odalic.feedbacks.Feedback class. Feedback is a part of Task configuration. Initially (after the Task creation) the feedback is empty. When the algorithm run is finished, user can observe the results and adjust the feedback, which will be considered by algorithm in the next run.

Passing the feedback to the algorithm

Before the execution of the core algorithm, Feedback is adapted by the cz.cuni.mff.xrg.odalic.feedbacks.DefaultFeedbackToConstraintsAdapter (which implements interface cz.cuni.mff.xrg.odalic.feedbacks.FeedbackToConstraintsAdapter) to object of the uk.ac.shef.dcs.sti.core.extension.constraints.Constraints class, which is the extension of sti-main module and encapsulates user's suggestions for concrete Semantic table interpreter run. The difference is, that Feedback comprises suggestions for all used knowledge bases, but Constraints contain suggestions only for one knowledge base, which is actually used in particular interpreter run.

Each run of the core algorithm is processed by the TAnnotation uk.ac.shef.dcs.sti.core.algorithm.tmp.TMPOdalicInterpreter.start(Table table, boolean statistical, Constraints constraints) method, which originally did not accept feedbacks, so we added (and implemented) a new argument constraints to this method. We also added another argument statistical, which does not come from Feedback, but from Task configuration. The statistical data annotations part of the Feedback is considered only when the statistical boolean argument is set to true.

Subject columns positions

User can suggest the positions of the subject columns, which then serve as the subjects for the relation discovery process. The original TableMiner+ algorithm detected the main subject column in the List<Pair<Integer, Pair<Double, Boolean>>> uk.ac.shef.dcs.sti.core.subjectcol.SubjectColumnDetector.compute(Table table, int... skipColumns) method according to the column data-types and other features and did not support manual (user's) setting of the subject columns. Relations were originally discovered only for the subject column detected by that method. Now when the subject columns are suggested by the user, relations are discovered for all of them.

Ignoring the column

User may want to completely ignore the column, so that the column is not annotated (disambiguated/classified) at all and also not considered in relations. This functionality was already supported in the original algorithm and used in all its phases, but we changed the way of passing the information about the indices of columns, which the user wants to ignore, to the algorithm. Originally the indices were set in the configuration file and passed as argument to the constructor of the uk.ac.shef.dcs.sti.core.algorithm.SemanticTableInterpreter object (or inherited class objects). In our implementation we store the column positions in the Feedback (resp. Constraints) class and pass as argument to the start method of the TMPOdalicInterpreter (introduced above). This approach was chosen, because the information about ignored column is part of the same object as other kinds of feedback and can be set individually for each algorithm run.

Compulsory columns

The algorithm performs classification and disambiguation phase only for columns where the cells' content data type was recognized as named entity. User may want to compulsorily perform them also when the content does not seem as named entity (e.g. when there are numerical identificators). This functionality was already supported in the original algorithm, but we changed the way of passing the information about the indices of compulsory columns. Originally the indices was set in the configuration file and passed to the constructor of the uk.ac.shef.dcs.sti.core.algorithm.SemanticTableInterpreter object (or inherited classes). We store the positions in the Feedback (resp. Constraints) and pass as argument to the start method of the TMPOdalicInterpreter (the same way as for ignored columns).

Ambiguity

User can leave certain cell of the input file ambiguous, meaning that the cell is not disambiguated. The functionality of skipping the cells was implemented in the original algorithm and used in the learning phase in the void uk.ac.shef.dcs.sti.core.algorithm.tmp.LEARNING.learn(Table table, TAnnotation tableAnnotation, int column, Constraints constraints) method, but not utilized in the experimental batches. So we added new argument constraints to this method and the positions of cells, which the user wants to skip, are read from this argument. The reading is provided by the Set<Integer> uk.ac.shef.dcs.sti.core.extension.constraints.Constraints.getSkipRowsForColumn(int columnIndex, int rowsCount) method.

ColumnAmbiguity

This is just a short-cut for leaving all cells in the column ambiguous. It is manifested in the Set<Integer> uk.ac.shef.dcs.sti.core.extension.constraints.Constraints.getSkipRowsForColumn(int columnIndex, int rowsCount) method, which returns all the rows to skip when the ColumnAmbigutity is set for given column.

Classification

User can set custom classification resource for the column. In the original algorithm the classification resource is voted according to classes of disambiguated cell values in the Pair<Integer, List<List<Integer>>> uk.ac.shef.dcs.sti.core.algorithm.tmp.LEARNINGPreliminaryColumnClassifier.runPreliminaryColumnClassifier(Table table, TAnnotation tableAnnotation, int column, Constraints constraints, Integer... skipRows) method. So we added new argument constraints to this method, and that extension allowed us to skip the process of voting and just set the annotation chosen by the user. When the user explicitly sets empty classification, then the column is left unclassified. Suggested classifications also affect the process of disambiguation, because in that case the candidates for disambiguation are searched in the knowledge base with the type restriction of respective classification. But when the list of candidates searched with this type restriction is empty, then the candidates are searched again without the type restriction.

Disambiguation

User can set custom disambiguation resource for the cell. In the original algorithm the candidate entities are searched in the knowledge base according to the cell value in the void uk.ac.shef.dcs.sti.core.algorithm.tmp.LEARNINGPreliminaryDisamb.runPreliminaryDisamb(int stopPointByPreColumnClassifier, List<List<Integer>> ranking, Table table, TAnnotation tableAnnotation, int column, Constraints constraints, Integer... skipRows) method. So we added new argument constraints to this method. In that extension we also search in the knowledge base, because we need to fetch other attributes of the entity, but we search for just one entity defined by its URI, so the searching is faster and simpler. This also means, that when the user sets entity which does not exist in the knowledge base, we can warn him (in the Warnings part of the Result). Then the entity suggested by user is set as a candidate to the disambiguation algorithm in the same way as it was originally (which was searched according to the cell value) to compute scores. This means that when the suggested entity is not suitable for the cell value disambiguation, it can have the score value of zero, When the user explicitly sets empty disambiguation, the cell is left ambiguous.

Relations

User can set custom predicates for relations between columns. In the original algorithm the relation enumeration phase is provided by the void uk.ac.shef.dcs.sti.core.algorithm.tmp.RELATIONENUMERATION.enumerate(List<Pair<Integer, Pair<Double, Boolean>>> subjectColCandidadteScores, Set<Integer> ignoreCols, TColumnColumnRelationEnumerator relationEnumerator, TAnnotation tableAnnotations, Table table, List<Integer> annotatedColumns, UPDATE update, Constraints constraints) method. So we added new argument constraints to this method. In that extension we just add the column relation annotations suggested by user. This does not restrict the rest of relation discovery process, which can discover relations also between other columns (with consideration of subject column) than the user suggested.

Statistical data annotations

This is our completely new extension. In our implementation the user can denote (in the Task configuration) that the statistical data are processed. In this case the relation enumeration phase is skipped (because columns describe dimensions and measures of the statistical data cube and there are not relations in the strict sense between them). Instead of relations, the statistical annotations are set in the void uk.ac.shef.dcs.sti.core.algorithm.tmp.TMPOdalicInterpreter.setStatisticalAnnotations(List<Integer> annotatedColumns, Table table, TAnnotation tableAnnotations, Constraints constraints) method. Initially the algorithm considers named entity columns as dimensions and non-named entity columns as measures. The user can change this later. Then the user also (manually) sets the predicates which correspond to the columns (dimensions and measures). These annotations are used for RDF export, which is generated according to the RDF data cube standard, where each row of the input file represents one observation and predicates describe dimensions and measures of the observation.

ADEQUATE : User feedback