14. Fields in TIM

In TIM, documents are stored as a combination of fields and values. A field can be, for example, the title of the document or the year. Some of the fields come directly from the original data (also called raw) and some are the result of further data processing. The field-value pairs are stored in a search engine, which is what we interrogate when creating a query.

In addition, there is a special group of fields that are generated after storing the data in the search engine.
These fields are created by transforming the data, i.e. they are derived from another field. The particularity of these fields is that they cannot be searched for when creating a Dataset. However, they are present in some default pages and can be used to create custom pages and indicators.

14.1. Field Index

In the tables below, the fields are clearly marked with a checkmark (checkmark), when it is a normal field, or with an x-mark (xmark), indicating that they cannot be used in the search to create a dataset.
Also, the field type denoted in the tables is explained in section Field Types.

14.1.1. Full list of common fields

The fields in this section are common to all types of documents and not specific to any source. For the meaning of column Type, please see section Terms.

14.1.1.1. Fields relating to the document

Field

Description

Searchable

Type

guid

Doc ID
Unique ID of the document

checkmark

string

class

Class
Type of document

checkmark

string

source

Source
Source of the document (datab ase)

checkmark

string

title

Title
title of the document

checkmark

text

link

Link
External link to the document

checkmark

string

description

Abstract
Abstract of the document

checkmark

text

emm_year

Year
Publication year of the Document

checkmark

string

Examples

guid:S_2-s2.0-0000198629

class:article

emm_year:2012. For year range searches use: emm_year:[2010 TO 2012]

source:scopus

14.1.1.3. Fields relating to affiliations

For an explanation on the processing done to affiliations, as well as the meaning of some field names, see Affiliation processing.

Also, please keep in mind that these fields are available as long as the respective data is available in the specific group of documents you are searching in. Affiliation information, for example, doesn’t exist for Semantic Scholar data, whereas it does exist for Patstat and Cordis (which are in the same group of documents).

Field

Description

Searchable

Type

emm_affiliation__address

Organisation address
Address of the organisation from raw data

checkmark

string

emm_affiliation__afref

Organisation reference (raw)
Organisation reference used to create the correspondence

author/affiliation in the document.

checkmark

string

emm_affiliation__city

City (raw)
City of the Organisation from raw data

checkmark

string

emm_affiliation__country

Country (raw)
Country of the Organisation from raw data

checkmark

string

emm_affiliation__countryCode

Country Code
Country code of the Organisation
For a list of codes, see here.

checkmark

string

emm_affiliation__eafid

Organisation ID from the Entity Matcher
ID of the organisation in TIM’s Entity Matcher

checkmark

string

emm_affiliation__ecity

City (processed)
City of the Organisation (processed)

checkmark

string

emm_affiliation__ecountry

Country (processed)
Country of the Organisation

checkmark

string

emm_affiliation__ecountryCode

Country Code (processed)
Country code of the Organisation
For a list of codes, see here.

xmark

string

emm_affiliation__eeocountry

EU countries only (processed)
Values are available only for EU countries, otherwise blank.

xmark

string

emm_affiliation__eeucountry

EU vs World countries (processed)
Indicates if country belongs to EU or name of the non-EU country.

checkmark

string

emm_affiliation__eeucountryCode

EU vs World countries Code (processed)
Indicates if country belongs to EU or code of the non-EU country.

xmark

string

emm_affiliation__ename

Organisation (processed)
Organisation name

checkmark

text

emm_affiliation__enuts

NUTS3 region
Administrative divisions of countries for statistical purposes (NUTS3)

checkmark

string

emm_affiliation__enuts2

NUTS2 region
Level 2 of administrative divisions of countries for statistical purposes (NUTS2)

checkmark

string

emm_affiliation__esnuts2

NUTS2 region (code with description)
Level 2 of administrative divisions of countries for statistical purposes (NUTS2)

xmark

string

emm_affiliation__esnuts3

NUTS3 region (code with description)
Level 3 of administrative divisions of countries for statistical purposes (NUTS3)

xmark

string

emm_affiliation__eucountry

EU vs World countries (raw)
Indicates if country belongs to EU or name of the non-EU country.

checkmark

string

emm_affiliation__eucountryCode

EU vs World countries code (raw)
Indicates if country belongs to EU or code of the non-EU country.

xmark

string

emm_affiliation__mrgecity

City (merged)
Organisation city (merged)

xmark

string

emm_affiliation__mrgecountry

Country (merged)
Organisation country (merged)

xmark

string

emm_affiliation__mrgecountryCode

Country Code (merged)
Organisation country code (merged)

xmark

string

emm_affiliation__mrgeeocountry

EU countries only (merged)
Values are available only for EU countries, otherwise blank.

xmark

string

emm_affiliation__mrgeeucountry

EU vs World countries (merged)
Indicates if country belongs to EU or name of the non-EU country.

xmark

string

emm_affiliation__mrgeeucountryCode`

EU vs World countries code (merged)
Indicates if country belongs to EU or code of the non-EU country.

xmark

string

emm_affiliation__mrgename

Organisation (merged)
Organisation name (merged)

xmark

text

emm_affiliation__mrgenuts

NUTS3 code (merged)

xmark

string

emm_affiliation__mrgesnuts2

NUTS2 with description (merged)
NUTS (Nomenclature of Territorial Units for Statistics) level 2

xmark

string

emm_affiliation__mrgesnuts3

NUTS3 with description (merged)
NUTS (Nomenclature of Territorial Units for Statistics) level 3

xmark

string

emm_affiliation__name

Organisation (raw)
Organisation name from raw data

checkmark

text

emm_affiliation__nameVariant

Organisation name variant
One or more variants of the organisation name

checkmark

text

emm_affiliation__orgtype

Organisation type (processed)
Organisation type based on information extracted from the affiliation full name. Possible Values:
Company,
Foundation,
Hospital,
Research Centre,
University.

xmark

string

Examples

Some of the affiliation-related fields for two universities, one in Germany and one in the US.

Hamburg University (Germany)

Portland State University (US)

emm_affiliation__name: "University of Hamburg"

emm_affiliation__name:"Portland State University"

emm_affiliation__country:"Germany"

emm_affiliation__country:"United States"

emm_affiliation__enuts2:"DE60"

emm_affiliation__enuts2:"_"

emm_affiliation__city:"Hamburg"

emm_affiliation__city:"Portland"

emm_affiliation__orgtype:"University"

emm_affiliation__orgtype:"University"

emm_affiliation__eucountry: "EU"

emm_affiliation__eucountry:"United States"

emm_affiliation__eeocountry: "Germany"

emm_affiliation__eeocountry: "_"

14.1.1.4. Fields relating to authors

Field

Description

Searchable

Type

emm_author__name

Author/Inventor name
Name of the author or inventor

checkmark

text

emm_author__afref

Author/Inventor afid of the affiliation
Id of the affiliation of the author or inventor

checkmark

string

emm_author__auref

Author/Inventor reference (raw)
Id of the author or inventor

checkmark

string

Examples

emm_author__name:"Steinfeld A.", "Kuhn P."

14.1.3. Fields Specific to Semantic Scholar

Semantic Scholar is a free repository of scientific literature by the Allen Institute for AI.

When working with Semantic Scholar data, some specific fields can be used that are otherwise not available. These fields are mainly the ones relating to the journals, citations and some specific categories only applicable to some specific publications.

As a reminder, the documents being referred to here can be identified in TIM by including source:SemanticScholar in your search.

Field Name

Description

Searchable

Type

emm_author_ids

Semantic Scholar author ID
IDs of the authors

checkmark

string

emm_doi

DOI
Digital Object Identifier

checkmark

string

emm_doiUrl

DOI URL
URL of the Digital Object Identifier

checkmark

string

emm_inCitations

Document ID(s) of citing documents
ID (guid) of the documents that are citing this document

checkmark

string

emm_journalName

Journal Title
Title of the journal where the article is published

checkmark

emm_journalPages

Journal Pages
Interval of pages of Volume of the Journal where the article is published

checkmark

emm_journalVolume

Journal Volume
Number of the Volume of the Journal where the article is published

checkmark

emm_outCitations

Document ID(s) of cited documents
ID (guid) of the documents that are cited by this document

checkmark

string

emm_pdfRelatedLink

PDF Link
Link of the article in PDF format

checkmark

string

emm_pmid

Pubmed ID
Unique ID in PubMed repository

checkmark

string

emm_source

Source
Original source of the article

checkmark

string

emm_sourcefile

Source path
Path to the original source

checkmark

string

emm_venue

Conference Venue
Conference venue in the case of a conference proceeding, otherwise, copy of
emm_journalName

checkmark

string

Examples

emm_journalName:"The Journal of infectious diseases"

emm_doiUrl:"https://doi.org/10.1093/infdis%2Fjiv078"

emm_journalPages:"694-701" emm_journalVolume:"212 5"

emm_source:Medline

14.1.4. Fields Specific to CORD-19

When working with CORD-19 data, some specific fields can be used that are otherwise not available. These fields are mainly the ones relating to some IDs only applicable to some specific publications and to the original source of the documents.

Field Name

Description

Searchable

Type

emm_affiliation__laboratory

More detailed information on the affiliation, if it is referring to a laboratory.

checkmark

string

emm_affiliation__region

Generic regional information for the affiliation, could be province, city, state etc.

checkmark

string

emm_affiliation__zip

ZIP code of the affiliation.

checkmark

string

emm_cord_uid

Unique ID for CORD-19 dataset

checkmark

string

emm_license

Fulltext licensing

checkmark

string

emm_MicrosoftAcademicPaperID

Microsoft Academic (MAG) entity ID supplied by CORD-19

checkmark

string

emm_pmcid

PubMed Central reference number

checkmark

string

emm_pubmed_id

PubMed reference number

checkmark

string

emm_source_x

Specific source of the document, i.e. whether it’s coming from PMC, bioRxiv, medRxiv, WHO, CZI, Elsevier

checkmark

string

emm_WHOCovidence

Unique document ID, associated with the dataset provided by WHO

checkmark

string

Examples

emm_source_x:who (papers provided/curated by WHO)

emm_affiliation__laboratory:UMR AND emm_year:2020 (this year’s papers where CNRS is involved)

14.1.5. Fields Specific to Cordis

Included here are fields for searching in Cordis (see below). In order to make a search specifically for Cordis projects only, class:euproject should be included in the query.

Field Name

Description

Searchable

Type

emm_acronym

Project acronym
This is the short name of the project.

checkmark

string

emm_totalCost

Total cost of the project (in EUR).
For some of the actions, the full cost of the project is not funded by the EU.
This field represents the sum of the EU contribution plus the one by third parties (if any).

checkmark

string

emm_affiliation__role

Role of the organisations involved in the project.
Each affiliation can only have one role.

checkmark

string

emm_programme

Funding programme name
The acronym of the Programme under which the Project has been supported.

checkmark

string

emm_projectid

Project ID
ID of the project

checkmark

string

emm_euroSciVoc

European Science Vocabulary
For more info see here: https://op.europa.eu/en/web/eu-vocabularies/euroscivoc

checkmark

string

14.1.5.1. Fields Specific to Cordis

Cordis is the European Commission’s primary public repository and portal to disseminate information on all EU-funded research projects.

When working with Cordis data, some specific fields can be used that are otherwise not available. These fields are mainly the ones relating to the identification of the specific EU research project.

As a reminder, the documents being referred to here can be identified in TIM by including source:cordis in the search.

Field Name

Description

Searchable

Type

emm_programme

Funding programme name
The acronym of the Programme under which the Project has been supported.

checkmark

string

emm_acronym

Project acronym
This is the short name of the project.

checkmark

string

emm_projectid

Project ID
ID of the project in Cordis

checkmark

string

emm_call

Call for proposal of the EU programme.
These are the different “Funding opportunities” of the programme.
They are targeted towards specific scientific fields.

checkmark

string

emm_eutopic

Topic of the call.
Some of the Calls of the EU programmes are subdivided in diffent topics.

They are targeted towards specific topics in a broader scientific field.

checkmark

string

emm_euscheme

Funding Scheme of the Call.
These give information on the type of action that is funded.

The scheme will determine: the scope of what is funded,
the reimbursement rate and specific evaluation criteria to qualify for funding.

checkmark

string

emm_eusubject

Subject Index Classification code of the Project (seems to have been
discontinued in H2020?).

See the full list of subjects in FP7.

checkmark

string

emm_year

Starting year
Year of the starting date of the project.

checkmark

string

emm_countrycoordinator

Country of the coordinator of the project.
Projects of EU research programmes have one coordinator organisation.
This field is the country of the coordinator organisation.

checkmark

string

emm_eugrant

EU contribution to the project (in EUR).
Amount of money that was granted to the project by the EU.
This field is searchable, but only exact amounts can be searched for.
See emm_eugrantn.

checkmark

string

emm_eugrantn

EU contribution to the project (in EUR). In numerical format
This field can be searched using ranges.

checkmark

string

emm_totalCost

Total cost of the project (in EUR).
For some of the actions, the full cost of the project is not funded by the EU.
This field represents the sum of the EU contribution plus the one by third parties (if any).

checkmark

string

emm_report__summary
NOT in use YET!

Summary of the Project report
Summary of the context and overall objectives of the project

checkmark

string

emm_report__workPerformed
NOT in use YET!

Work performed during the project
Work performed from the beginning of the project to the end of the period covered by the
report and main results achieved so far

checkmark

string

emm_report__finalResults
NOT in use YET!

Final Results of the Project
Progress beyond the state of the art and expected potential impact (including the
socio-economic impact and the wider societal implications of the project so far)

checkmark

string

emm_refs__doi
NOT in use YET!

DOI of related publication
Publications resulting from funded projects are reported by providing their DOI

checkmark

string

Examples

emm_programme:h2020 Retrieves all EU research projects under Horizon 2020 (H2020).

emm_acronym:NANOPAD

emm_projectid:33017

emm_call:H2020-MSCA-IF-2014

emm_eutopic:MSCA-IF-2014-EF

emm_euscheme:MSCA-IF-EF-ST

emm_eusubject:(LIF OR MED OR SCI)

This searches for all the projects that have been classified as Life Science (LIF), Medicine,Health (MED) or Scientific Research (SCI).

emm_countrycoordinator:UK

emm_eugrant:344050

emm_eugrantn:[300000 TO 400000]

emm_totalCost:344050

Note

Concerning the field emm_call, H2020 includes the following main types of action:

Research & Innovation actions
Innovation actions
Coordination & support actions
Grants of the European Research Council (ERC) to support frontier research
Marie Skłodowska-Curie actions (MSCA)
COFUND actions
Procurement actions
SME instrument.
For more information make sure to check the

14.1.5.2. Fields relating to Cordis affiliations

The affiliation information related to Cordis is very similar to the information available for other types of documents in TIM (publications, patents etc). For the organisation name, country, and so on, please refer to the same fields already detailed in the section Fields relating to affiliations.

However, some extra fields are used to give more information on the organisations participating to the EU research programmes.

Field Name

Description

Searchable field

Possible Values

emm_affiliation__role

Role of the organisations involved in the project.
Each affiliation can only have one role.

checkmark

coordinator
beneficiary
participant
hostInstitution

emm_affiliation__pic

Participant Identification Code (PIC)
All participants to the EU research programme need to have a unique identifier or PIC that remains the same throughout the different programmes and calls. You can search for an organisation using its PIC here.

checkmark

emm_affiliation__entityType

Participant organisation type.
Participants to the EU research programme need to declare what type of organisation they are.

checkmark

HES (Higher or Secondary Education)
PRC [Private for profit (excluding education)]
REC (Research Organisation)
PUB [Public Body (excluding research and education)]
OTH (Other).

emm_affiliation__eugrant

EU contribution to the specific participant.
The amount of money that was granted to the participant by the EU for the completion of the project (the sum of all participants’ eugrant should match the total eugrant).

checkmark

Examples

emm_affiliation__role:coordinator

emm_affiliation__pic:997153502

emm_affiliation__entityType:PRC

emm_affiliation__eugrant:170121,6

14.1.6. Fields Specific to Patstat

Patstat contains bibliographical data relating to more than 100 million patent documents from leading industrialised and developing countries.

Each document in TIM’s search engine is in reality a patent family. When working with Pastat data, some specific fields can be used that are otherwise not available. These fields are mainly the ones relating to the identification of patents.

Both the application and publication numbers are available, for all members of the patent family.

As a reminder, the documents being referred to here can be identified in TIM by including source:pastat in the search (and they all belong to class:patent, so you can use this in your search instead).

Field Name

Description

Searchable

Type

emm_patstatPatentId

ID of the patent record in the Patstat database.

checkmark

string

emm_applicationNumber

Patent application numbers of all patent family members.

checkmark

string

emm_applicationLatest

Application Number of the most recent patent
This is the application that is the “representative” of the family. The information in this patent application is used for the other fields.

checkmark

string

emm_applicationDate

Date in YYYY-MM-DD format of the latest patent application

checkmark

string

emm_publicationNumber

Patent publications numbers of all patent family members

checkmark

string

emm_publicationLatest

Publication Number of the most recent patent

checkmark

string

emm_publicationLatestDate

Date in YYYY-MM-DD format of the latest patent publication

checkmark

string

emm_priorityDate

Priority date of the patent (YYYY-MM-DD format)
This date is the date of the first patent application in the family.

checkmark

string

emm_year

Year of the priority date of the patent
This field is the same for all documents types in TIM.

checkmark

string

emm_patentKind

Patent Kind code
Patent documents are classified into three main categories. See the EPO’s list of codes.

checkmark

string

emm_earliestPatentOffice

Earliest patent office
County code of the earliest patent office

checkmark

string

emm_familySize

Number of patent family members.
This is the number of patent applications that are considered as being part of the same family.

checkmark

float

emm_docdbFamily NOT in use YET!

DOCDB family ID
DOCDB simple patent family. This is the family that is used by TIM to determine the patent family. Read more.

checkmark

N/A

emm_inpadocFamily

INPADOC family ID
INPADOC extended patent family Read more.

checkmark

string

emm_granted

Indicates if the patent has been granted.
Possible values are true or false.

checkmark

string

emm_nace2__code

NACE Rev.2 code assigned to the patent application
NACE2 is a code for statistical classification of economic activities. It is computed based on a reference table between IPC and NACE.

checkmark

string

emm_nace2__weight

NACE Rev.2 weight
Weight (0 or 1) indicating whether there is a mapping between a particular IPC and a NACE2 code.

checkmark

string

emm_classificationCPC

CPC patent classification
Cooperative Patent Classification of the patent. Each patent belongs to one or more CPC classes. For the nomenclature of patent classes, see Espacenet.

checkmark

string

emm_classificationIPC

IPC patent classification
International Patent Classification of the patent. Each patent belongs to one or more IPC classes. For the nomenclature of patent classes see Espacenet.

checkmark

string

emm_author__address

Address of the author (inventor)

checkmark

string

emm_author__authref

Author ID (from Patstat)

checkmark

string

emm_author__country

Country of the author

checkmark

string

emm_author__countryCode

Country Code of the author

checkmark

string

emm_affiliation__name

Harmonized Applicant Name (HAN) from OECD
This field contains for many applicants the names as harmonized by the OECD HAN project of the OECD.

checkmark

text

emm_affiliation__nameVariant

A few variants available in the patstat database, joined
DOC_STD_NAME, PSN_NAME, _PERSON_NAME Visit patstat Data Catalog.

checkmark

text

emm_affiliation__hanHarmonised

Indicates the degree of harmonization and standardization which could be achieved
0: the HAN_NAME has been replenished with the original name, because the name could not be harmonized 1: the HAN_NAME has been harmonized but could not be matched with the ORBIS© database. 2: the HAN_NAME has been harmonized and could be matched with the ORBIS© database. For more info, visit the patstat Data Catalog.

checkmark

string

emm_affiliation__sector

Type of organisation
ECOOM assignee sector allocation, read more here.

checkmark

string

emm_numAuthors

Number of Applicants
Number of Applicants of the patent.

checkmark

float

14.1.6.1. Fields relating to Patstat authors

todo

There are some specific fields relating to inventors in Patstat. Those concern the inventor address, country and so on. For the fields that have a limited set of values, we should make sure we are informing the user of the possible values: class, source, …. Where should that be done? In the examples or in the table? There is a mix of both right now.

Outstanding Issues

emm_author__authref: from patstat ??
add lists of the other classification systems used (CABCLASS, GEOCLASS, etc), not available online.

14.2. Field Types

Every field has a specific type. Depending on this type, the value is both stored in the TIM database and queried in a different way.

The types that are of interest are:

14.2.1. Text

If the field is of type text, the value of the field is considered as normal text, and is thus split into words, each word is stemmed and then stored. When a field of type text is queried, terms need to be combined, e.g.

title:(rapid AND prototyping)

or else they have to be queried as an exact phrase:

title:"rapid prototyping"

14.2.2. String

If the field is of type string, the value of the field is stored verbatim, no splitting into words is performed, and no stemming. This type is better for fields that hold exact values, such as the global identifier of the document, the doi (document object identifier), a date, a CPC classification.

Because there is no word-splitting, a value like G09G  3/2092 is stored with its spaces, as a continuous string.

This has some implications on how these fields should be queried.

emm_classificationCPC:G09G  3/2092 will not work.

emm_classificationCPC:"G09G  3/2092" will work, but what if you need to search for all classification codes that end in /20XX ? The asterisk modifier does not work with exact phrases (i.e. terms in quotes).

In this case, the spaces must be escaped, that is, they need to be preceded by a backslash character \, like so:

emm_classificationCPC:G09G\ \ 3/20*

Note

Do not use escaped spaces for fields of type text, this might have unintended consequences. Besides, spaces are not stored anywhere; separate words are.

14.2.3. Float (decimal value)

If the field is of type float, the value of the field is a decimal number. This type is used for storing numbers that may need to be queried in a range. For example, you might need to find documents that have between 1 and 10 authors. The field emm_numAuthors is of type float, hence the query should be:

emm_numAuthors:[1 TO 10]