10. Indicators

An Indicator, in a nutshell, is a mathematically simple operation on a specific field of each document, that in the end produces a numeric result. The operation could be a count, summation, subtraction, division and so on. It could also be a check if a value is available in a field, if it is above a threshold, or if it is unique, or if its division with another number gives a desired result.

Formally, an Indicator is a function that takes in a dataset as an argument and returns a single number.

\[f(\mathrm{Dataset}) = \mathrm{Number}\]

This essentially means that the Indicator is also applied to the dataset. The dataset here can be practically anything, from the whole dataset (the set of documents that the search query returned) to any subdataset.

What can a subdataset be?

Indicators are used everywhere throughout TIM, in one form or another.

The simplest example is in the standard Organisations page that is available by default: The size of each subdataset/node (=organisation) signifies the number of documents affiliated with the organisation. This number is calculated by simply applying a count indicator to each node. By applying a different indicator on the node, the size of the node could signify a different quantity, for example how active an organisation is, or what is the total cost of EU projects spent by a country per year.

Important

Each Indicator needs a dedicated Custom Page utilizing it in order to display it. Inside the Custom Page settings, there is a choice to apply the indicator on the node (nIndicator) or on the edge (eIndicator), or on both.

The Indicators available are split into the following categories.

10.1. Common

10.1.1. Count

The Count indicator returns the number of documents with any value in the specified field. If no field is specified, it just counts the number of documents available. With a field specified, a max/min cardinality (size) limit can be imposed on the field, i.e. if a field in the document has more than MAX or less than MIN values, the document will be ignored in the counting.

So, creating a Count Indicator with emm_affiliation__ename as the field and specifying min_cardinality=3 will count the number of documents for each subdataset that contain at least three affiliations, e.g. a publication with at least three authors.

Example

For the above case, let’s say we apply the Indicator to nodes that represent countries, e.g. in a page based on the Location>Countries page. Each country/node here is a subdataset. Let’s say one of the nodes, Germany, has 80 publications in total, and only five of them have more than 3 authors - the rest of them all have a single author. When the Indicator is applied to the Germany node, it will go through each publication and will consider only these five nodes. In the end, the size of Germany will be 5.

Let’s say now that we wanted to see how much of a “team-player” a country is when it comes to international collaborations. We could apply the same Indicator to a location>Countries-type page again, but this time specifying emm_affiliation__country as the specified field in the indicator, and setting min_cardinality=2. This would then show Countries with collaborations on an international level, not just different authors. We could go an extra step and create one more duplicate Countries page, with another indicator with max_cardinality=1 this time. This would show solo players, and we would be able to compare the two pages to each other.

10.1.2. Distinct

The Distinct indicator, when applied to a subdataset, returns the number of distinct (non-identical) values of a specified field inside the documents of the (sub)dataset.

Example

Let’s say we are viewing a location>Countries page, so each node is a country. Each country, thus, is a subdataset. We create a distinct Indicator with emm_author as the specified field.

Node1: Germany

doc1 authors: A, B, C

doc2 authors: A, C

doc3 authors: D

The distinct indicator will return 4.

Node2: Italy

doc1 authors: E, F, G

doc2 authors: E, G

doc3 authors: E, F

doc4 authors: G

The distinct indicator will return 3.

10.2. Time

10.2.1. Activeness

The Activeness indicator returns the number of documents of a subdataset in a specific time range, divided by the total number of documents of the subdataset. Thus, highly active nodes (organisations, countries etc) in that time range will get a score closer to 1 and highly inactive nodes will get a score closer to 0.

activenessIndicatorSettings

10.2.1.1. startyear

The beginning of the time range in which documents are going to be counted.

10.2.1.2. stopyear

The ending of the time range in which documents are going to be counted.

10.2.1.3. triggerMore

The subdataset total document number threshold below which the activeness Indicator gives a zero result.

Example

We compare three organisations’ Activeness indicator for the years 2015-2017, with a triggerMore threshold of 20 documents.

Nr. of documents

2013

2014

2015

2016

2017

Last 3 years

Total

Organisation A

2

1

1

5

4

10

13

Organisation B

2

1

4

7

12

23

26

Organisation C

6

6

6

3

2

11

23

Organisation A node:

The total number of documents is: 13, which is below the selected triggerMore value. The activeness Indicator will then return 0 for this node.

organisation B node:

The total number of documents is 26, which is above the given triggerMore value. The activeness Indicator will then return (4 + 7 + 12)/26 = 0.88: the node is more active lately.

organisation C node:

The total number of documents is 23, which is also above the given triggerMore value. The activeness Indicator will then return (6 + 3 + 2)/26 = 0.42: the node is not as active as it used to be.

10.2.2. peak

The peak indicator returns the number of documents delta (maximum peak) of a subdataset for the selected time range.

peakIndicator

10.2.2.1. startyear

The beginning of the time range in which documents are going to be counted.

10.2.2.2. stopyear

The ending of the time range in which documents are going to be counted.

10.2.2.3. triggerMore

The subdataset total document number threshold below which the activeness Indicator gives a zero result.

Example

if an entity has a large peak, say 50 documents from one year to the next, but the total number of its documents is 55, and the triggerMore option is set to 60, the indicator will return a zero value.

10.2.2.4. threshold

The lowest number of documents for a delta to be considered.

Example

For the previous example, if threshold is additionally set to 20, the indicator for all entities with biggest peak lower than 20 documents will be a zero value.

10.2.3. peakyear

The peakyear indicator is the same indicator as the peak indicator, but instead returns the year where the maximum peak of the subdataset occured.

10.2.4. newness

10.3. Number

10.3.1. sum

The sum indicator takes a field as its argument, and sums its numeric values for the whole (sub)dataset.

sumInd

Example

We want to sum the total project expenditure of the EU on countries. We select emm_affiliation__eugrant as field in the indicator settings above. We then go on to create a Custom Page, select emm_affiliation__country in the field to group option and then select this new Indicator in the nIndicator field.

Important

The quality of this kind of Indicator depends on the availability of data provided by the respective sources. Please be aware that some values like the cost in the above example might be available for a fraction of the projects available.

10.3.2. min

The min indicator returns the minimum of all the values in the selected field in the (sub)dataset.

10.3.3. max

The max indicator returns the maximum of all the values in the selected field in the (sub)dataset.

10.3.4. mean

The mean indicator returns the average of all the values in the selected field in the (sub)dataset.

meanInd

Example

We want to see the average number of normalized citations for each author. We select emm_citedbyn as field in the indicator settings above. We then go on to create a Custom Page, select emm_author__name in the field to group option and then select this new Indicator in the nIndicator field. The same indicator could also be used e.g. for a emm_affiliation__ename or an emm_affiliation__ecountry Custom Page.

10.3.5. hindex

This is an author-level metric that attempts to measure both the productivity and citation impact of the publications of a scientist or scholar. For more information on what the h-index is, see e.g. the relevant Wikipedia article.

This Indicator should be displayed in a Custom Page that uses the field emm_author__name.

hindexInd

10.3.5.1. field

Here, the field to calculate the h-index should be selected. This field can only be emm_citedby.

10.3.5.2. Number

This is the number of citations in an article to consider.

10.3.5.3. autoref

This exclude references of the author to their works.

10.4. Field

10.4.1. Coverage

The coverage indicator takes a field and a value as its arguments, counts all the documents that contain the value of that field at least once, and returns the ratio of the number of these documents over the total number of documents.

Example

Let’s say we have 4 documents and we want to find what is the percentage of documents that contain at least one EU organisation. We apply a coverage indicator with field: emm_affiliation__eeucountry and value: “EU”.

doc1 emm_affiliation__eeucountry: EU, US, Australia

doc2 emm_affiliation__eeucountry: EU, EU, Japan

doc3 emm_affiliation__eeucountry: China

doc4 emm_affiliation__eeucountry: S. Korea, S. Korea

The coverage indicator is going to return (1+1+0+0) /4 = 0.5

This essentially tells us what percentage of research is (co-)developed by the EU member states.