Banner logo

hosted by SourceForge.net

Currently, this section is used to gather ideas about the implementation. They will be completed and structured before release 1.0

API Documentation

You can browse the API documentation online. It can also be generated from the distribution.

Matching algorithm

The current matching algorithm uses the concept of a query and a list of objects that the query operates on. A typical query is something like

If method <m> in object <o> returns a value less than <v> , there is a match".

When a query is executed, it returns a value from 0 to 1, both inclusive. A 1 means a 100% match, while a 0 return value means a total mismatch.

QuerySets

Queries may be combined by using QuerySet-queries.
A QuerySet maintains two different lists: A list of required subqueries and a list of preferred subqueries. During the matching, all required subqueries are first evaluated. When the required subqueries don't match completely, the match is considered a mismatch. When the required queries do match completely, the preferred subqueries are also evaluated. The match value of the QuerySet is calculated by taking the (weighted) average of all subqueries. This algorithm returns the highest match value (the highest relative ranking) for those objects that have the most / best matching subqueries.

A QuerySet is also a MatchQuery. This way, QuerySets may be nested. Escpecially when nesting QuerySets, you might want to provide different weights for the nested subqueries:

query.addPreferred(nestedQuery, 4f);

Match query hierarchy

RegexMatches BooleanEquals NotQuery LessThan GreaterThan NumberEquals Maximum Range Minimum StringEquals Contains StringQuery MatchQuery QuerySet NumericQuery Match hierarchy

Relative matching

Some queries, such as Minimum and Maximum, compare an object against all other objects in the object list. An example of a two-pass query:

Let's assume a user wants to look for a small, cheap camp site with a swimming pool. The following code defines and executes this match query:

QuerySet query = new QuerySet();
query.addPreferred(new LessThan("size", 100));
query.addPreferred(new NumberEquals("distanceToSwimmingPool", 0));
query.addPreferred(new Minimum("priceIndication"));

List campSites = guide.getCampSites();
MatchResult matchResult = matchEngine.executeQuery(query, campSites);

The full source of this example can be found in the directory examples/camping in the distribution

This kind of queries is called a two-pass query. In the first pass, the relevant data (global minimum and maximum for the price) is calculated. In the second pass, this data is used to calculate the match value. As the name implies, the data is passed twice. This means more communication overhead when the data is retrieved from a large database.

Extensions

You may want to find a match with more complicated, domain-specific criteria, for example if the car has had regular service intervals. In this case, you can create your own queries by extending class MatchQuery, or one of its subclasses. All you have to do is provide an implementation of getMatchValue, and your query can be part of the matching process.

/**
 * Class RegularServiceIntervals a query to be used by JavaMatch, 
 * that checks if a Car has has been services frequently
 */
public class RegularServiceIntervals extends MatchQuery {
    /**
     * Calculates the regularity of the service intervals
     * @param matchedObject the car from which the service intervals 
     *                      are checked
     * @return an indicator for how regular the car was serviced
     */
    public float getMatchValue(Object matchedObject) 
        throws MatchException {
        Car carToMatch = (Car)matchedObject;
        List services = carToMatch.getServices();
        ... 
        // calculate and return a value (from 0 to 1, both inclusive)
        // for how regular the services took place
    }
}

Output customization

There are several ways to customize JavaMatch's output. These customizations are performed on the match engine (class net.sourceforge.javamatch.engine.MatchEngine ).

Number of returned results

To change the number of results that are returned by JavaMatch, you can call
matchEngine.setMaxNumResultItems(<number of items>);

Match threshold

You might want to specify a threshold value: a minimum value before an object is added to the result. This can be done by calling
matchEngine.setThreshold(<threshold value>);

Related technology

Matching engines

  • WCC has a commercial matching engine, called Elise. This engine operates on top of a proprietary database. Matching seems to be done inside the database, which leads to a lot less data retrieval.

Persistent storage

Object-oriented data access

Object data retrieval

Search engine technology

Other technologies in depth

SodaQuery

SODAQuery project summary:
S.O.D.A. is a an object API to communicate with databases. The current specification is focussed on queries. Goals are: type safety, object reuse, minimum string use, programming language independance.

The main differences and similarities between JavaMatch and SODAQuery are:

Returned results
- SODA returns all objects that match the specified constraints.
- JavaMatch provides a QuerySet, that includes both a required and a preferred part. The required part is similar to SODAQuery. Within the objects that match all required subqueries, the engine looks for the objects that best match the preferred subqueries. It returns the top 10 of the best matching objects, with the best matching object first

Comparison operators
- SODA"s Constraint interface has a fixed set of evaluation modes. These can be mapped to / executed by the underlying database system.
- JavaMatch has a hierarchy of queries, that is executed in the Java Virtual Machine

Customizable filtering
- SODA provides the Evaluation interface. When a query is executed, the Evaluation acts as a final filter on the objects that are returned
- JavaMatch"s queries can be customized by subclassing class MatchQuery

Programming language
- SODA is programming langage independant. Java (JDK version 1.0 and later) and C# are supported
- JavaMatch is Java-only (JDK version 1.4 and later)

Limit usage of strings
- Applies to both systems, while providing maximum flexibility with respect to runtime dynamics

Runtime member retrieval
- SODA has a Query interface, that describes how to descend from an object to its members, and to put restrictions on those members
- JavaMatch"s Queries that retrieve object members have an argument "memberName" in their contructor. This is the path that is followed to retrieve a member value

Performance
- SODA"s constraints can be mapped to operators in the underlying database. Therefore, they can be executed by the database itself. The Evaluation leads to data transfer from the server to the client
- JavaMatch operates on the full set of objects. All data is sent from the underlying data storage mechanism to the Java Virtual Machine

Collections
- SODA uses an ObjectContainer as a bridge to the data structure
- JavaMatch operates on standard Java collection classes (java.util.List, java.util.Iterator)

Data types
- SODA uses an ObjectSet to return the results
- JavaMatch returns its results as a java.util.List

Memory
- SODA returns the objects one-at-a-time, so the entire database won"t be kept in memory
- JavaMatch returns a limited amount of results, so only the best matching objects are kept in memory