Semantic Search Help

Following on in this series of examples demonstrating the use of the semantic search engine...

These examples are in order of sophistication, so please follow them one after another.

Third, final and most sophisticated example: “What are the most recent Committee of Ministers’ replies in the field of education or social policy?”

1. It is assumed here that you have followed an earlier example, and thus understand the basic use of the semantic search page. Here is the prompt for the search:

Empty semantic search prompt

2. The subtleties of the question are found in the phrases “most recent” and “or” (as in education OR social policy).

Starting with the basics, the specifics of the question are the Committee of Ministers, the type of document “reply” and the fields of education and social policy. A field, key word or subject domain is generally referred to as a “subject” or “topic” in the context of the semantic search engine.

In depth..

Behind the search engine is a hierarchy of key words (“topics” or “subjects”), known as a thesaurus or vocabulary. This hierarchy represents concepts, sub-concepts etc. to an arbitrary level. For example, taking “education” as an example, “secondary education” and “primary education” may be sub-concepts. Education itself may be a sub-concept of “social responsibilities of governments”.

Within the Assembly, we use the European-level Eurovoc thesaurus (eurovoc.europa.eu), plus our own internal thesaurus. Eurovoc gives extensive and common coverage of terms at European government level in multiple languages.

The semantic search engine uses this hierarchy to widen its search and indeed to help you narrow it down until you find the response required.
Entering the specifics of the question, we can use the “Inst.Author” criterion (institutional author) to specify the Committee of Ministers, followed by two “Topic” criteria to specify education and social policy. Note, when typing the topics, the search engine will help to complete the words using its thesaurus.

Completing the criteria

 

3. Finally, typing in “reply” as the “Type” of document gives the following result:

 

Search results ordered by relevance then by date

4. AND and OR

Now, we start to see some of the subtleties of the search. Note that no document has been found that responds to all the query criteria (see (a) in the above screenshot). The semantic search engine has scored its responses according to how closely the response matches the query. The highest scoring document is at the top of the list (75% - (b)). Search results are ordered first by rating (b), then by date (c), so the most relevant document is presented first (not necessarily the most recent).

The first document found is a reply from the Committee of Ministers addressing the subject of education, but not social policy (not directly anyway). The search engine considers that the most relevant document is 75% (b) relevant on this basis, and presents this at the top of the list. Note that “education” has been found in the thesaurus list of the first document found (d) – a direct match for one of the subject criteria.

This example also illustrates how the search engine deals with questions of “and” and “or”. Note that although no document has been found that responds to all the criteria (“education” AND “social policy”), results have still been returned that respond to some of the criteria (“education” OR “social policy”) with the score (b) evaluating how much the document is considered by the search engine to have matched the query.

5. Most recent

Note in the example above that the ‘most recent’ document is from 2010 (c). That is quite old. Are there no more recent documents addressing the subjects than that? Well perhaps, but the documents would not necessarily be more ‘relevant’. Scrolling through the search results may reveal more recent documents, but the relevance of the document would be reduced (the “score” would decrease).

To illustrate this, an extra criterion can be added to return results just from this year (currently 2013):

Search result introducing a year element to narrow the search

6. Again, no result is returned that matches all the criteria. Look first at the score for the 2 example documents returned however (e). 60%. This is less than the score of the document returned when the year was not specified (75% - (b)). This is why these documents did not appear at the top of the list before.

Why is the score less? Look at the thesaurus entries for the 2 documents (f). The subjects given do not appear directly therein. They will, however, be semantically linked to the subjects of “education” and “social policy” in the semantic hierarchies used by the Assembly. It is perhaps most obvious in the first document where one of the thesaurus entries is “higher education”. This is obviously a sub-concept of education as a whole.

Clicking on key words in the thesaurus cloud (g) now allows the search to be narrowed. Each key word clicked on will be added to the search criteria. In this way it can be seen that the hierarchy of concepts found in the vocabularies is first used to broaden the search, and can then be used to narrow the search until the result required is found.

This example also shows the importance of sort order, and the linking of concepts when results are produced for searches.

The original question is therefore more difficult to answer directly, and will depend on the interpretation of the person asking the question. Is the researcher looking for more relevant or more recent documents?

END OF EXAMPLE.