Experience Store

Last updated: 07 Aug 18 15:28 CEST

The adlantics Experience Store (xs) is an xAPI-aligned learning records store (LRS). It provides a reliable, high-performance storage engine with strong data quality guarantees and fast and flexible query facilities. Data stored in xs can be referenced from other adlantics services. For example, by using the adlantics Analytics Engine (ae) on data stored in xs, users can easily compute complex learning analytics and let the system take automated action under certain conditions. However, xs is also a powerful service in its own right and many interesting analytical applications can be built using only xs' retrieval and query facilities.

We refer the reader to the platform overview for a high-level introduction to the adlantics platform and we recommend completing the quick start tutorial, which covers basic considerations such as connecting and authenticating. For the purpose of this article, we will assume basic familiarity with web services and with accessing the same from the command line. We will also assume that you have successfully authenticated to the platform as described in the quick start tutorial.

Experience Storage

At the most basic level, xs allows you to securely and permanently store statements about learning experiences. These statements can later be retrieved from xs using a unique identifier supplied by you, or assigned by xs upon storing the statement. In this section we will discuss how storing and retrieving individual statements or batches of statements can be accomplished using the xs service interface.

In many practical application, aggregates computed over such statements will be of greater interest than retrieving individual statements. The following sections will discuss the higher-level query facilities that xs provides to this end. Moreover, we recommended considering the adlantics Data Xcelerator (dx) tool when building interfaces between your learning systems and the adlantics platform. adlantics dx provides connectors to many popular e-learning systems, and it frees you from the low-level details of moving individual learning statements.

Suppose that one of your learners has successfully reviewed the adlantics website. In the xAPI terminology this is an experience, and we can assert that this experience occurred in a statement of the following form:

{
  "actor":{ "mbox":"hello@adlantics.com" },
  "verb":{ "id":"http://adxx.io/viewed" },
  "object":{ "id":"https://www.adlantics.com" }
}

The statement expresses that an actor (an agent in this case), identified by her mailbox address hello@adlantics.com engaged in an activity identified as http://adxx.io/viewed on the object identified as https://www.adlantics.com. To add this statement to our ground truth about learning experiences we POST it to the /experience endpoint of the xs service:

curl -XPOST -H"Realm: [your realm]" \
            -H"Authorization: Bearer [your access token]" \ 
            -d'{
                 "actor":{ "mbox":"hello@adlantics.com" },
                 "verb":{ "id":"http://adxx.io/viewed" },
                 "object":{ "id":"https://www.adlantics.com" }
               }' \
            https://xs.adlantics.com/experience

where the realm and authorization headers are chosen as described in the quick start tutorial and the section on authentication. If we provided a valid token that encompasses the write permission on the target realm, xs will respond with a status code of 201 (created) and the unique identifier assigned to the statement we just committed to the permanent record:

{ "statementIds": [ "510371e2-5850-4a52-a4f8-7e90496675e6" ] }

That the statement identifier is wrapped in a JSON array hints at the possibility of posting multiple statements at a time. In fact, xs accepts multiple newline-delimited JSON objects in a single POST body and it will treat each object as a separate statement:

curl -XPOST ... \
            -d'{ <statement_1> }

               { <statement_2> }
               ...
               { <statement_n> }' \
            https://xs.adlantics.com/experience
{
  "statementIds": [
    "2befd086-34b4-4b4a-8c58-6e386c87f620",
    "e8de181c-f4cd-4279-a7a8-b2a14a645b45",
    ...
  ]
}

Experience Retrieval

Once committed to storage, experience statements can be retrieved at any time using their unique statement identifier. This is accomplished through a GET operation on the /experience/<id> endpoint:

curl -XGET -H"Realm: [your realm]" \
           -H"Authorization: Bearer [your access token]" \ 
           https://xs.adlantics.com/experience?id=510371e2-5850-4a52-a4f8-7e90496675e6
{
  "experiences": [
    {
      "actor": { "mbox": "hello@adlantics.com" },
      "object": { "objectType": "Activity", "id": "https://www.adlantics.com" },
      "id": "510371e2-5850-4a52-a4f8-7e90496675e6",
      "stored": "2018-05-14T09:24:18.209Z",
      "verb": { "id": "http://adxx.io/viewed" }
    }
  ]
}

Fields and Schemas

Up to this point we have looked at learning records as monolithic documents, without considering the data they are made up of. This is a natural viewpoint for a basic LRS which cares mostly about a statement's compliance with the xAPI specification, as well as performing a limited number of operations on it. For example, a LRS should know how to add a unique id when a statement is submitted without one. Limiting what the LRS needs to know about the data it stores is a key benefit of the xAPI philosophy, as it lets experience vocabularies evolve with user needs.

When moving from individual statements to aggregates and other high-level constructs involving computation, more information about the data's structure is needed. For example, if we are interested in the number of experiences recorded after January 1st 2018, the LRS needs to know the field that holds timing information, as well as the format that timestamps are stored in (e.g., ISO 8601 standard). This need for a priori knowledge limits flexibility. Aggregates are not part of the LRS definition per se, but they are of great importance to analytical applications. Fortunately, the flexibility of the xAPI document-based approach and the additional a priori knowledge required by analytical applications do not have to be mutually exclusive. xs fully supports dynamically growing vocabularies, and statements can be stored in and retrieved from xs without informing the system about their interpretation beyond the built-in xAPI language standards. Only when a field is used in computations that hinge on the data's interpretation do we have to tell xs how to interpret the field. Importantly, this can happen at any time and for any subset of fields, while still permitting new fields to be added.

The collective set of assertions we make about the fields of statements in a given realm is called a schema, and each realm can have its own schema. A realm's schema can be accessed from the /schema resource as follows:

curl -XGET -H"Realm: [your realm]" \
           -H"Authorization: Bearer [your access token]"
           https://xs.adlantics.com/schema?locale=en
{
    "fields": [
        ... excerpt only, some descriptive titles omitted for brevity ...
        { "key": "completion", "title": "Indicates completion of an activity", "type": "boolean" },        
        { "key": "e_duration_msec", "title": "Duration of the activity in milliseconds", "type": "integer" },       
        { "key": "e_lo_hry", "type": "hierarchy" },                               
        { "key": "e_org_hry", "type": "hierarchy" },
        { "key": "language", "type": "keyword" },
        { "key": "mbox", "title": "Actor mailbox", "type": "text" },
        { "key": "object_type", "title": "Object type", "type": "keyword" },
        { "key": "success", "title": "Indicates successful completion of an activity", "type": "boolean" },
        { "key": "timestamp", "title": "Timestamp of the activity", "type": "time" },
        { "key": "verb_id", "title": "Activity verb", "type": "keyword" }
    ]
}            

A schema consists of field declarations telling the system what to expect from a certain field. Each field may only appear once, but not every field must be part of the schema, thus allowing new fields to be used in statements ad hoc. Each field has a key (colloquially known as field name) that uniquely identifies the field within the realm. A field may carry a descriptive title for informative purposes. By passing the locale query parameter, you can switch between title languages if such information is stored with the schema. The system falls back to English titles if no locale is specified in the request, or if no title in the desired target locale has been defined with the schema.

The most important part of a field declaration is its type, which influences how a field can be used in computations:

  • Numeric Types
    • integer: integer number
    • double: real number with double precision
  • Categorical Types (i.e., values come from a universe of keywords, but breaking down the keyword is not meaningful)
    • boolean: true or false values
    • keyword: e.g., statuses (open, in progress, closed, etc.), colors (red, green, blue, etc.), and languages ("en", "de", etc.)
    • hierarchy: Special type of keyword field where the keywords are key in an externally given hierarchy. For example, consider the hierarchy of all professional aviation learning objectives where "airspaces" is the unique key of one of the learning objectives under "airlaw". After attaching a definition of the hierarchy to this field, the system can infer that the "airspaces" learning objective is part of the overall "airlaw" learning objective and can use this information in hierarchical computations and breakdowns.
  • Other Types
    • text: strings with full-text character, i.e., where breaking the field into substrings is meaningful. Examples of text fields are note fields, essay responses, but also email addresses (where the ability to search for all actors from the same domain is important)
    • time: timestamps with time and date

Knowing the fields of a realm and their types is an important prerequisite for writing queries, which we will cover next.

Queries

Queries let you calculate various types of aggregates on (parts of) a realm. A typical example of a query is asking the system for the cumulative time (in hours) the learners spent reading the course material that started in 2018, broken down by class and learning objective. Note, that the result of a query is computed from the statements stored in a realm, but the statements themselves are never returned. As such, queries are different from retrieval operations, even if some of the concepts (e.g., filters) are shared between the two.

The simplest possible query is the empty query, without filters, facts, axes, or options. Posting the following query definition to the /query resource will return a count of all statements stored in the realm:

curl -XPOST -H"Realm: [your realm]" \
            -H"Authorization: Bearer [your access token]" https://xs.adlantics.com/query \
            -d '{ "filters": [], "facts": [], "axes": [], "options": {} }'
{
  "hits": 24684278,
  "result": [
    { "key": "unit", "value": [] }
  ]
}

In the example, the system responds with a total of 24.6 million statements it considered but it does not return any additional information derived from those statements. We will learn how this result came about and how to extract more useful aggregates from the systems in the following sections. We will start with an introduction of the overall structure of a query, followed by the supported filters, facts, axes, and options. Finally, we will describe how to influence the format of the response.

Structure of a Query

A query definition consists of four parts, some of which optional, which were shown in the empty query above and which are repeated here for convenience:

{ "filters": [], "facts": [], "axes": [], "options": {} }

If you have used some form of database system before, these concepts will feel immediately familiar:

  • A filter restricts the overall set of statements considered as basis for the query, e.g., by only selecting statements having to do with interaction attempts. xs queries always have zero or one filter, where the one filter can be complex and composed of multiple sub-filters.
  • A fact describes a quantities of interest to be computed from the underlying statements, such as the sum of all interaction durations. xs queries can have an arbitrary number of facts (including zero, as shown above).
  • An axes capture the dimensions along which we want to break down our response. xs supports queries of arbitrary dimensionality (resulting in so-called tensors, i.e. response matrices of arbitrary dimensionality), but there are limits on the number of dimensions that some of the response formats can carry when returning the result over the network.
  • Options, finally, are global influencers of how a query is executed.

We will now consider each of these parts of a query definition in turn, followed by a discussion of how to influence the format of the returned data.

Filters

Filters restrict the overall set of statements considered as basis for the query. In the empty query above, we considered all statements present in the realm, but we could have easily restricted the query to only statements containing the verb http://adxx.io/attempted by using the filter { "field": "verb_id", "value": "http://adxx.io/attempted" }. This match filter restricts the query to only statements where the field verb_id, as defined in the schema, matches the fixed value http://adxx.io/attempted.

The xs query language presently supports the following basic filters:

Filter Field types Details

match

categorical, text

{ "field": "KEY", "value": "VALUE" }

range

categorical, numeric, time

{ "type": "range", "field": "KEY", "lower": "LOWER", "upper": "UPPER" } where at least one of the two bounds must be specified. For example: { ..., "lower": "2017-09-28T20:12:27.48Z" }

prefix

categorical

{ "type": "prefix", "field": "KEY", "value": "VALUE" }

Basic filters can be composed into more complex filters by using one of the following Boolean filters:

Filter Details

Conjunction (AND)

matches statements for which all sub-filters match: { "type": "and", "filters": [ ... ] }

Disjunction (OR)

matches statements for which at least one sub-filters matches: { "type": "or", "filters": [ ... ] }

Within a query definition, a complex filter might looks as follows:

"filter": {
  "type": "and",
  "filters": [
    { "field": "verb_id", "value": "http://adxx.io/attempted" },
    { "field": "object_id", "value": "http://adxx.io/interaction" }
  ]
}

Filter composition is supported to arbitrary depths.

Facts

Facts are the quantities of interest to be computed from the underlying statements. In the empty query above, the facts were empty (the "hits" number is a part of every query result, not a fact we requested) but we could have easily asked for the number of completed attempts among our filtered statements by adding the fact definition:

{ "type": "sum", "name": "c_completion", "field": "completion" }

By doing so, we find that about 5.2 million interaction attempts were completed. The remaining approximately 17 million attempts were started (i.e., the question was loaded) but never attempted by the user.

{
  "hits": 22111995,
  "result": [{
    "key": "unit",
    "value": [ 5243731 ]
  }]
}

If we were to check the associated schema, we would find that completion is a boolean field. For this type of field, xs knows to sum a "one" each time it encounters a true value, and a "zero" for a false value. The fact name (c_completion in the example) is a user-defined identifier that xs uses to identify the fact in some output formats. For the JSON output above, xs always presents the facts in the order they were requested, especially in the case of more than one fact.

The xs query language presently supports the following fact types:

Fact Field types Details

sum

numeric, boolean

{ "type": "sum", "name": "NAME", "field": "KEY" }

hits

not field-related

computes the number of statements each cell in a response tensor is based on { "type": "hits", "name": "NAME" }

topvalue

any (value field), any (sort field)

computes the statement with the top value in the sort field and yields the value it contains in the value field { "type": "topvalue", "name": "NAME", "valueField": "KEY1", "sortField": "KEY1"}. The default sort order on the sort field is descending, i.e., returning the statement with the highest value in the sort field. To reverse the sort order add "sortDescending": false.

valuecount

categorical

approximates the number of distinct values the target field assumes for each cell in a response tensor.

Fact names must be non-empty, unique among all fact names of a query, and they must not use the reserved words key or doc_count.

Axes

An axis defines one dimensions along which we want to break down our response. xs supports queries of arbitrary dimensionality (resulting in so-called tensors, response matrices of arbitrary dimensionality), but there are limits on the number of dimensions that some of the response formats can carry when returning the result over the network. For example, CSV responses are limited to up to two dimensions due to their tabular nature. In our empty query above, the result was zero-dimensional (i.e., reduced to a point) which is expressed by the special key unit in the response. If we wanted to break down the resulting 5.2 million attempts into those that were successful and those that were not we would add the axis:

{ "name": "success", "field": "success" }

which results in a one-dimensional result with one fact axis value:

{
  "hits": 22111995, 
  "result": [ 
    { "key": "1", "value": [ 4254664 ] }, 
    { "key": "0", "value": [ 989067 ] } 
  ]
}

From the result, we can see that most attempts, once started, are successful. This might lead us to further inquiries about the difficulty level of our question database.

Axis definitions can presently only be based on categorical fields. Axis names must be unique among all axis names of a query, and they must not use the reserved words key or doc_count.

Options

Options are global influencers of how the query is executed. For example, we can ask xs to only return the first ten cells of the result tensor if we are developing a client application (and only interested in the general shape of the response), or querying for a time-sensitive part of the user interface.

The following options are supported:

Option Effect

size

Determines the maximum number of cells returned from a query. For one-dimensional queries, by default, xs will aim to select the most influential cells as determined by the number of statements they contain. For multi-dimensional queries, cells are fully enumerated returned according to their position in the response tensor, where the first dimension changes the slowest.

An example of query options might look like this:

"options": {
  "size": 1000
}

Response Types

Up to this point, we have received our query results as flat JSON documents. xs supports various alternative response formats that are either specified as query parameters or using request headers on the POST request.

The following query parameters are supported:

Parameter Details

hierarchy

only supported for one-dimensional query with a hierarchy-typed axis. When enabling this parameter, hierarchy=true, xs will return a deeply nested JSON response that recreates the hierarchy, including descriptive titles, if defined in the hierarchy.

locale

only supported for hierarchical responses. When setting this to a locale (e.g., locale=en or locale=de), xs will aim to return titles in the specified language of they are available from the hierarchy. The system falls back to English if the desired locale is unavailable or if the locale parameter is not set.

fullHierarchy

only supported for hierarchical responses. By default, xs will only return the part of the hierarchy for which values were returned during the query. By setting fullHierarchy to true, the full hierarchy can be retrieved where some of the nodes may have empty values. In database terminology, will perform an outer join, instead of the default inner join.

The following headers are supported:

Header Details

Accept=text/csv

return a tabular CSV representation instead of JSON

Notifications

xs provides a Websockets-based interface that allows client applications to subscribe to notifications about changes in their realm. In order to receive notifications, open a Websocket connection as shown in the following example:

    var token = '...';
    var realm = '...';
    var wsUri = 'wss://xs.adlantics.com/notification?idToken=' + token + '&realm=' + realm + "'";
    websocket = new WebSocket(wsUri);

    websocket.onmessage = function (evt) {
      // notification in evt.data
    };

In the example token and realm contain the same authentication details you would submit through request headers in ordinary xs request. However, because present browser Websocket implementations cannot accommodate authentication headers (beyond basic authentication), these data must be passed as query parameters.

Validation

Coming soon

The adlantics library is currently under development - more coming soon!

Conclusion and Further Reading

Coming soon

The adlantics library is currently under development - more coming soon!