About

The WDC-API provides a REST-Interface for the WebDataCollector-Framework and provided the means for external applications to work with the collected and prepared data of the WebDataCollector.

General definitions

This API shares some common conventions and definitions which are not stated explicity at each endpoint.

Base-URL, Authentication and Authorization

The API can be accessed at the URL https://dss-wdc.wiso.uni-hamburg.de/api.

The API can only be access with Access-Tokens. Access Tokens can be included included in the Http-Head as parameter "Token".

curl 'https://dss-wdc.wiso.uni-hamburg.de/api/snapshot/list?page=0&size=5' -i -X GET -H "Token:MyToken"

Snapshots and Panels are secured objects. A user gets only the snapshots which have been accordingly configured. If you think you miss a snapshot you can have a look into your permissions via an API-Call.

You need an access-token? Please get in touch with us by email.
To increase the bevity of the examples, the documentation ignores the authentication-token. In your own code you have to be authenticated.
Please note that we log your access to the API. We use that information to identify bottlenecks and problems within the API.

Responses, Status-Codes and Paging

The WDC-API aims to produce structural stable Response-Objects in JSON. The basic form of such a Response-Object is as follows:

{
  "responseHeader" : { (1)
    "query" : "...",
    "state" : "OK",
    "msg" : "",
    "httpStatus" : "OK"
  },
  "content" : [ { (2)
    "domainName" : "www.dfg.de"
  }, {
    "domainName" : "www.oaq.ch"
  }, {
    "domainName" : "www.europace.org"
  } ],
  "page" : { (3)
    "size" : 3,
    "number" : 0,
    "totalElements" : 110,
    "totalPages" : 37
  },
  "links" : { (4)
    "next" : "https://dss-wdc.wiso.uni-hamburg.de/api/snapshot/20121227_intermediaries/domains?page=1&size=3"
  }
}
1 The responseHeader represents information about the query, the state of the response and potential messages and warnings.
2 The content consists of an array of objects. The type of these objects depends on the query.
3 The page-object gives information about the overall size of the data and gives detailed information which is important for paging throug large data-sets. To actually consume paged resources you should use the links-objects.
4 The links give the link for the next or the previous page. This information should be used to implement paging. If there is no next- or prev-page the property does not exist.
The maximum number of elements in one page is set to 2000. Thus, if you specify an paging-size of 1000 it will be overriden. Please be aware, that your client might restrict the size of the body.

Rate-Limiting

The API applies a rate limiting for each token. If you send requests to quickly the server sends responses with a status code of TOO_MANY_REQUESTS (429) and header "Retry-After" which specifies how many seconds you should wait to send the next request.

If you use the dsslab-wdc-client you don’t have to do anything as the client already waits according to the rate limits.

Complex Datatypes for the API-Requests

The API defines arguments on different endpoints. Some arguments, such as the arguments for paging or SnapshotSelections, have or can be used jointly and refer to a special datatype.

Datatype Description Arguments

Paging

Used to provide a means to "page" through larger results. See above. You should not use the paging directly. Instead use the prev and next-links.

  • page: the number of the page

  • size: the number of items on each page

SnapshotSelection

SnapshotSelection are used to express a subset of domains in a snapshot. Various endpoints offer the possibility to work on such subsets.

  • snapshot: the machine-name of the snapshot

  • selection: Optional. The machine-name of the selection. If not provided all domains in the snapshot are used.

Integration: Access from Python

For a more tight integration we publish the Python package dsslab-wdc-client. This package supports automatic handling of paging of large results and transforming these in JSON-Arrays or directly to DataFrames.

We highly recommend to use this approach as we develop and test this package in synch with the rest of the WDC-API.

  • The package is published on PyPi as dsslab-wdc_client and can be used in the usual ways using pip, poetry or any other package manager you prefer.

  • The documentation with examples and the API-Reference is located at dsslab-wdc-client-API

Snapshots

A set of Endpoints to discover information about available snapshots.

/api/snapshot/list

Returns a list of Snapshots, which are accessible for the given user.

Parameter Description

filter

A simple filter which checks if the name contains the given String

Example
$ curl 'http://localhost:8080/api/snapshot/list?filter=intermediaries' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 25
Content-Type: application/json
Content-Length: 552

{
  "responseHeader" : {
    "query" : "http://localhost:8080/api/snapshot/list?filter=intermediaries",
    "state" : "OK",
    "msg" : "",
    "httpStatus" : null
  },
  "content" : [ {
    "name" : "20121227_intermediaries",
    "description" : null,
    "indexed" : true,
    "textExtracted" : false
  }, {
    "name" : "20240701_intermediaries",
    "description" : null,
    "indexed" : false,
    "textExtracted" : false
  } ],
  "page" : {
    "size" : 2000,
    "number" : 0,
    "totalElements" : 2,
    "totalPages" : 1
  },
  "links" : { }
}

/api/snapshot/{snapshot}/domains

Returns the set of Domains included in the specified Snapshot. The information about Domains reflects the imported status of a crawl.

Information can be only obtained about crawled domains (Seeds).

Parameter Description

page

The number of the requested page.

size

The number of objects of the requested page.

Example
$ curl 'http://localhost:8080/api/snapshot/20121227_intermediaries/domains?page=0&size=2' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 25
Content-Type: application/json
Content-Length: 587

{
  "responseHeader" : {
    "query" : "http://localhost:8080/api/snapshot/20121227_intermediaries/domains?page=0&size=2",
    "state" : "OK",
    "msg" : "",
    "httpStatus" : null
  },
  "content" : [ {
    "domainName" : "www.cpu.fr",
    "type" : "SEED",
    "pages" : 10008
  }, {
    "domainName" : "www.srhe.ac.uk",
    "type" : "SEED",
    "pages" : 9995
  } ],
  "page" : {
    "size" : 2,
    "number" : 0,
    "totalElements" : 114,
    "totalPages" : 57
  },
  "links" : {
    "next" : "http://localhost:8080/api/snapshot/20121227_intermediaries/domains?page=1&size=2"
  }
}

/api/snapshot/{snapshot}/seeds

Return concise information about seeds, including their crawled status and possible redirects.

The information of seeds is generated from the Heritrix seed-reports. Status codes can be found here: https://heritrix.readthedocs.io/en/latest/glossary.html#status-codes

Fields of one seed-item:

Field Description

httpStatusCode

Extended httpStatusCode for the current uri

status

A more humand readable status code

uri

The actual URI.

redirectsTo

A possible redirect. Return "null", if there was no redirect. Please note, that such a redirect creates following seed-item which in turn could again create a redirect.

Example
$ curl 'http://localhost:8080/api/snapshot/20121227_intermediaries/seeds?size=3' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 25
Content-Type: application/json
Content-Length: 791

{
  "responseHeader" : {
    "query" : "http://localhost:8080/api/snapshot/20121227_intermediaries/seeds?size=3",
    "state" : "OK",
    "msg" : "",
    "httpStatus" : null
  },
  "content" : [ {
    "httpStatusCode" : -6,
    "status" : "NOTCRAWLED",
    "uri" : "http://www.esib.org/",
    "redirectsTo" : null
  }, {
    "httpStatusCode" : -6,
    "status" : "NOTCRAWLED",
    "uri" : "http://www.forum.eua.be/",
    "redirectsTo" : null
  }, {
    "httpStatusCode" : -6,
    "status" : "NOTCRAWLED",
    "uri" : "http://www.www2.esf.org/",
    "redirectsTo" : null
  } ],
  "page" : {
    "size" : 3,
    "number" : 0,
    "totalElements" : 157,
    "totalPages" : 53
  },
  "links" : {
    "next" : "http://localhost:8080/api/snapshot/20121227_intermediaries/seeds?page=1&size=3"
  }
}

/api/snapshot/{snapshot}/searchDomains

Queries the SearchIndex of the crawled documents with a given Query and returns a list of hits in each domain. Only domains which actually have at least one hit are returned.

The number of hits of a domain is calculated as the sum of hits in each document. Internally a facetted SolrQuery of the index is created which uses the facet.method=fc (see https://solr.apache.org/guide/solr/latest/query-guide/faceting.html).
Parameter Description

query

A query to search for. Can be an arbitrary Solr-Query.

selection

Optional. A machineName of a Selection. If specified only results of Domains in the Selection will be returned.

Example
$ curl 'http://localhost:8080/api/snapshot/20121227_intermediaries/searchDomains?query=uni&size=2' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 25
Content-Type: application/json
Content-Length: 567

{
  "responseHeader" : {
    "query" : "http://localhost:8080/api/snapshot/20121227_intermediaries/searchDomains?query=uni&size=2",
    "state" : "OK",
    "msg" : "",
    "httpStatus" : null
  },
  "content" : [ {
    "domainName" : "www.acquin.org",
    "hits" : 2386
  }, {
    "domainName" : "www.che.de",
    "hits" : 1476
  } ],
  "page" : {
    "size" : 2,
    "number" : 0,
    "totalElements" : 110,
    "totalPages" : 55
  },
  "links" : {
    "next" : "http://localhost:8080/api/snapshot/20121227_intermediaries/searchDomains?query=uni&page=1&size=2"
  }
}

Selections

A Selection represents a subset of Domains of a Snapshot. They can be used as a filter for various endpoints.

Filtering on Selections are made on a best effort basis. Assume for example a search request. The Filtering includes all search results which end with a domain in the selection. This is necessary to include search results of redirected crawled data. Yet, this simpler approach might lead to undesired results:
Domain in Selection Domain in Seed-List Crawled, indexed Domain Matches

bimid.de

bimid.de

www.bimid.de

true

www.tageszeitung.de

www.tageszeitung.de

www.taz.de

false

Filter of Selections will be reworked to use a more sophisticated strategy using the redirect-data which will also match the second case.

/api/selection/list

Returns a list of available selections.

Parameter Description

page

The number of the requested page.

size

The number of objects of the requested page.

Example
$ curl 'http://localhost:8080/api/selection/list' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 24
Content-Type: application/json
Content-Length: 593

{
  "responseHeader" : {
    "query" : "http://localhost:8080/api/selection/list",
    "state" : "OK",
    "msg" : "",
    "httpStatus" : null
  },
  "content" : [ {
    "machineName" : "createWithSelection",
    "title" : "the title"
  }, {
    "machineName" : "SelectionControllerTest.SET",
    "title" : ""
  }, {
    "machineName" : "python-test-selection",
    "title" : ""
  }, {
    "machineName" : "wdc.crawler.test.SelectionServiceTest#set",
    "title" : ""
  } ],
  "page" : {
    "size" : 2000,
    "number" : 0,
    "totalElements" : 4,
    "totalPages" : 1
  },
  "links" : { }
}

/api/selection/{selection}/domains

Returns the set of Domains included in the specified Selection.

Parameter Description

page

The number of the requested page.

size

The number of objects of the requested page.

Example
$ curl 'http://localhost:8080/api/selection/createWithSelection/domains' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 25
Content-Type: application/json
Content-Length: 963

{
  "responseHeader" : {
    "query" : "http://localhost:8080/api/selection/createWithSelection/domains",
    "state" : "OK",
    "msg" : "",
    "httpStatus" : null
  },
  "content" : [ {
    "name" : "www.esf.org"
  }, {
    "name" : "www.oecd.org"
  }, {
    "name" : "www.nuffic.nl"
  }, {
    "name" : "www.eua.be"
  }, {
    "name" : "www.enqa.eu"
  }, {
    "name" : "www.eqar.eu"
  }, {
    "name" : "www.inqaahe.org"
  }, {
    "name" : "www.esmu.be"
  }, {
    "name" : "www.eaie.org"
  }, {
    "name" : "www.britishcouncil.org"
  }, {
    "name" : "eacea.ec.europa.eu"
  }, {
    "name" : "www.chea.org"
  }, {
    "name" : "www.aca-secretariat.be"
  }, {
    "name" : "www.iau-aiu.net"
  }, {
    "name" : "www.iie.org"
  }, {
    "name" : "www.aucc.ca"
  }, {
    "name" : "www.aau.org"
  }, {
    "name" : "www.nafsa.org"
  } ],
  "page" : {
    "size" : 2000,
    "number" : 0,
    "totalElements" : 18,
    "totalPages" : 1
  },
  "links" : { }
}

/api/selection/{selection}/set (beta)

Sets the Domains of the Selection. If the Selection does not exist, it will be created.

This Endpoint is still in evaluation and will be not usefull for "normal" users. As a normal user you won’t be able to edit a newly created Selection.
Selections are SecuredObjects and making changes of the object are secured. To edit an existing selection you have to make sure you have the corresponding access rights.
Example
$ curl 'http://localhost:8080/api/selection/SelectionControllerTest.SET/set' -i -X PUT \
    -H 'Content-Type: text/plain' \
    -d '	www.eua.be
	www.oecd.org
	www.enqa.eu
'
HTTP/1.1 201 Created
X-Rate-Limit-Remaining: 22

Panels

A set of Endpoints to discover information about available Panels.

/api/panel/list

Returns a list of Panels, which are accessible for the given user.

Example
$ curl 'http://localhost:8080/api/panel/list' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 20
Content-Type: application/json
Content-Length: 368

{
  "responseHeader" : {
    "query" : "http://localhost:8080/api/panel/list",
    "state" : "OK",
    "msg" : "",
    "httpStatus" : null
  },
  "content" : [ {
    "name" : "intermediaries",
    "description" : null,
    "snapshotCount" : 1
  } ],
  "page" : {
    "size" : 2000,
    "number" : 0,
    "totalElements" : 1,
    "totalPages" : 1
  },
  "links" : { }
}

/api/panel/{name}/list

Returns the set of Snapshots included in the specified Panel.

The list of returned Snapshots is secured and filtered with your access rules.
Parameter Description

page

Optional. The number of the requested page.

size

Optional. The number of objects of the requested page.

Example
$ curl 'http://localhost:8080/api/panel/intermediaries/list?page=0&size=1' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 19
Content-Type: application/json
Content-Length: 429

{
  "responseHeader" : {
    "query" : "http://localhost:8080/api/panel/intermediaries/list?page=0&size=1",
    "state" : "OK",
    "msg" : "",
    "httpStatus" : null
  },
  "content" : [ {
    "name" : "20121227_intermediaries",
    "description" : null,
    "indexed" : true,
    "textExtracted" : false
  } ],
  "page" : {
    "size" : 1,
    "number" : 0,
    "totalElements" : 1,
    "totalPages" : 1
  },
  "links" : { }
}

Index

Internally the documents (pages) of a Snapshot are indexed using a full-text search-engine.

For search requests based on documents you can use the fiels in the table below:

Fieldname Description

domain_s

The domain of the web site hosting this page

path_s

The complete path of the page, including query parameters

title_t

The title of the web page.

description_t

A description extracted from the page.

language_s

The language of the identified text

_text_

The hidden field for the full-text. It can be queried but the values are not stored in the index. If you need full-texts please use the appropriate API-Call. Normally, you do not have to state this field in queries.

Examples for querying the full-text index
	CSR (1)
	"Corporate Social Responsibility" (2)
	language_s:de && "Corporate Social Responsibility" (3)
	language_s:en && path_s:"/about" (4)
1 Searches for a phrase in the field \_text.
2 Searches for the phrase "Corporate Social Responsibility". Use " to combine single words to a longer phrase.
3 Same as above, but searches only in german documents.
4 Searches for pages with the specified path in english documents.
Notes and complete Solr-Query-Syntax

As you probably noted the examples do not entail information about a snapshot or panel. This information is added to your query automatically to make sure that only reasonable queries can be submitted to the API.

Currently a version of Solr is used. Thus you can use the syntax of Solr to query the full-text index. For further reference, please use for reference the orginal documentation of Solr:

/api/index/status

Provides an overview of indexed documents per Snapshot and a possible defined Selection.

Parameter Description

snapshot

Optional. The name of a snapshot. Can be specified multiple times.

panel

Optional. The name of a panel. One of 'snapshot' or 'panel' has to be specified.

selection

Optional. The name of a selection to filter the results.

Example
$ curl 'http://localhost:8080/api/index/status?snapshot=20121227_intermediaries' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 49
Content-Type: application/json
Content-Length: 412

{
  "responseHeader" : {
    "query" : "http://localhost:8080/api/index/status?snapshot=20121227_intermediaries",
    "state" : "OK",
    "msg" : "",
    "httpStatus" : null
  },
  "content" : [ {
    "snapshot" : "20121227_intermediaries",
    "selection" : "",
    "indexedDocs" : 223686
  } ],
  "page" : {
    "size" : 1,
    "number" : 0,
    "totalElements" : 1,
    "totalPages" : 1
  },
  "links" : { }
}

/api/index/searchAggrBySnapshot

Queries a set of snapshots with a set of concepts and the overall number of occurrences of concepts aggregated on snapshots.

Parameter Description

size

Optional. Page size.

This search method makes use of more complex type "AggregatedSearchQuery". This type allows to specify queries for a set of snapshot and a set of concepts. It is defined as follows:

{
	"panel": "intermediaries", (1)
	"snapshots": [ ... ], (1)
	"searchQueries": [ (2)
		{
			"name": "University",
			"q": "\\"University\\" OR Uni OR College"
		}
	]
}
1 The fields "panel" or "snapshots" define which snapshots will be searched. Only one of these fields must be specified.
2 The field "searchQueries" is an array of SearchQuery-objects. Each SearchQuery has a name as a label and a query 'q' which is used to search for the concept.
When using such an AggregatedSearchQuery in a request, you have to specify the AggregatedSearchQuery in the BODY of the request.

Result: The endpoint returns a Task which then can be queried to obtain the actual results.

Using this endpoint is simple, when using our Python-Client.

Example (Python)
query = {
    "panel": "intermediaries",
    "searchQueries": [
        {
            "name": "University",
            "q": "\\"University\\" OR Uni OR College"
        }
    ]
}

df2 = client.loadAsDF(
    "/api/index/searchAggrBySnapshot", body = query)
Example (direct use, use GET or POST)
$ curl 'http://localhost:8080/api/index/searchAggrBySnapshot' -i -X POST \
    -H 'Content-Type: application/json' \
    -d '{
	"panel": "intermediaries",
	"searchQueries": [
		{
			"name": "University",
			"q": "\"University\" OR Uni OR College"
		},
		{
			"name": "Klima",
			"q": "Klima"
		}
	]
}
'
HTTP/1.1 303 See Other
X-Rate-Limit-Remaining: 49
Location: /api/index/doSearchAggrBySnapshot?taskId=24fa2c23452d0801158b6b2eb149ad7f7a899154&size=1000
Content-Type: application/json
Content-Length: 184

{
  "id" : "24fa2c23452d0801158b6b2eb149ad7f7a899154",
  "uri" : "/api/index/doSearchAggrBySnapshot?taskId=24fa2c23452d0801158b6b2eb149ad7f7a899154&size=1000",
  "description" : null
}

Texts

Texts of web pages are prepared in various ways. This chapter describes what actually happens to those texts an how you can access this information.

/api/texts/search

Returns a subset of pages with extracted text.

Parameter Description

snapshot

The name of the snapshot

selection

Optional. The machine-name of the selection.

query

The Solr-Query which is used to search for the pages

textsInclude

Optional. If 'true' includes the extracted text. Please note, that texts are not extracted on all Snapshots.

textsAbbreviate

Optional. Used for debugging. Abbreviates exported texts.

page

Optional. The number of the requested page.

size

Optional. The number of objects of the requested page.

Example
$ curl 'http://localhost:8080/api/texts/search?snapshot=20121227_intermediaries&query=news&textsInclude=true&textsAbbreviate=true&size=2' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 49
Content-Type: application/json
Content-Length: 1650

{
  "responseHeader" : {
    "query" : "http://localhost:8080/api/texts/search?snapshot=20121227_intermediaries&query=news&textsInclude=true&textsAbbreviate=true&size=2",
    "state" : "OK",
    "msg" : "solrQuery: q=news&q.op=OR&fq=snapshot_id_i:+1&sort=domain_id_i+asc,reference_id_i+asc&start=0&rows=2",
    "httpStatus" : null
  },
  "content" : [ {
    "snapshot" : "20121227_intermediaries",
    "language" : "en",
    "textId" : 46,
    "domain" : "www.dfg.de",
    "path" : "/en/index.jsp",
    "textInfo" : {
      "title" : "DFG, German Research Foundation",
      "description" : null,
      "text" : "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nDFG, German Research Foundation\n\n\r\n\r\n\r\n    \r\n\r\n      ...",
      "contentType" : "text/plain",
      "language" : "en"
    }
  }, {
    "snapshot" : "20121227_intermediaries",
    "language" : "de",
    "textId" : 98,
    "domain" : "www.dfg.de",
    "path" : "/dfg_profil/geschaeftsstelle/dfg_praesenz_ausland/beijing/index.jsp",
    "textInfo" : {
      "title" : "DFG - Deutsche Forschungsgemeinschaft - Chinesisch-Deutsches Zentrum für Wissenschaftsförderung Beijing",
      "description" : null,
      "text" : "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nDFG - Deutsche Forschungsgemeinschaft - Chinesisch-...",
      "contentType" : "text/plain",
      "language" : "de"
    }
  } ],
  "page" : {
    "size" : 2,
    "number" : 0,
    "totalElements" : 75781,
    "totalPages" : 37891
  },
  "links" : {
    "next" : "http://localhost:8080/api/texts/search?snapshot=20121227_intermediaries&query=news&textsInclude=true&textsAbbreviate=true&page=1&size=2"
  }
}

/api/texts/get

Returns extracted text from a set of given text-ids.

Parameter Description

snapshot

The name of the snapshot

id

Id of the text. Can be specified multiple times.

Example
$ curl 'http://localhost:8080/api/texts/get?snapshot=20121227_intermediaries&id=9094&id=9095' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 49
Content-Type: application/json
Content-Length: 7624

{
  "responseHeader" : {
    "query" : "http://localhost:8080/api/texts/get?snapshot=20121227_intermediaries&id=9094&id=9095",
    "state" : "OK",
    "msg" : "",
    "httpStatus" : null
  },
  "content" : [ {
    "snapshot" : "20121227_intermediaries",
    "language" : "en",
    "textId" : 9094,
    "textInfo" : {
      "title" : "Partnerships : European Science Foundation",
      "description" : null,
      "text" : "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nPartnerships : European Science Foundation\n\n\n\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\tBookmark this pageFAQMember pagesRSSSitemapSubscribe\n\n\t\t\t\t\t\t\t\t\t\n\t\t\t     \n                            \n                            \n                            \n\t\t\n\n\t\t\n\t\t\t\n\t\t\n\n\n\n\t\t\t\t\t\t\n\n\t\t\t\t\t\n\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n\t\t\t\t\t\t\n\n\t\t\t\t\t\n\n\t\t\t\t\n\n\t\t\t\t\tHome\n\tAbout ESF\n\tActivities\n\tResearch Areas\n\tPublications\n\tMedia Centre\n\tJobs\n\tContact\n\n\n\n\n\t\t\t\tHome  > Activities  > ESF Research Conferences  > Partnerships\n\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\tESF Research Conferences\n\n\n\t\t\t\t\t\tPartnerships\n\t\n\n\nThe ESF Research Conferences Scheme brings together researchers and researchers from different nationalities, backgrounds and disciplines, and at different career stages to jointly discuss the latest developments in new and emerging fields of research.\n\nESF Research Conferences promote free discussion and exchange of information, and aim to create long-term networks between participants. Participation is open to researchers from academia, industry, society and politics worldwide.\n\nResearch Conferences are grouped in series of 2-5 annual conferences, with each series focussing on a specific scientific area. ESF Research Conferences Series are conceived, branded and financed in partnership with high-level European research organisations and universities. The scientific area and location of each series is suggested by the partner, aiming to showcase their excellence in their particular field of expertise.\n\n\n\n\n\n\nHow to join the Scheme?\n\t\n\n\nThe Call for Partnerships is now closed. For more detailed information, please refer to the 'Call for Partnerships' webpage. \n\r\n\nESF welcomes expressions of interest from its own Member Organisations and from national and international institutions, universities or organisations that would be willing to co-sponsor, on an equal-share basis, a number of conferences each year over a longer-term duration, for example 5 conferences a year for 5 years.\r\n\nPartners in the ESF Research Conferences Scheme PDF (1.1 MB) Last Updated 4-March-2010.\n\n\n\n\n\nFor further information, please contact:\n\n\n\n\tMs.BenitaLippsE-Mail\n\tHead of Unit\n\n\nPhone +32 (0) 25332020 \nFax +32 (0) 25388486 \n\nWorld & Research Conferences\n\n\tESF-LFUI Conferences\n\n\n\tESF-EMBO Symposia\n\n\n\n\n\tESF-JSPS Frontier Science Conference Series for Young Researchers\n\n\n\n\tESF-LiU Conferences with support from Riksbankens Jubileumsfond & Vetenskapsrådet\n\n\n\tESF-UB Conferences in Biomedicine with support from Generalitat de Catalunya\n\n\n\tESF-VR-FORMAS Conferences on Global Change Research\n \n\n\n\tESF-COST High-Level Research Conferences\n\n\n\tEurope-Africa Frontier Research Conference Series\n\n\n\tESF-FMSH Entre-Sciences Conferences in Interdisciplinary Environmental Sciences\n\n\n\tESF-ZiF-Bielefeld Conferences\n\n\n\tESF-EMS-ERCOM Mathematics Conferences\n\n\nSummer/Winter Schools\n\tESF-IAS Winter Schools in Physics\n\n\n\tESF-EPSRC-STFC Summer Schools in Physics & Astronomy (SUSSP)\n\n\n\tESF-CERN Cargese Summer Schools in High Energy Physics & Astrophysics\n\n\nBack - ESF Research Conferences - Home Page\n\n\n\n\n\n\t\t\t\t\t\t\n\n\n\n\t\t\t\t\t\t\tEuroBioFund\n\tEUROCORES\n\tExploratory Workshops\n\tForward Looks\n\tCalls and Funding\n\tMO Fora\n\tResearch Networking Programmes\n\tESF Research Conferences\tUpcoming Events\n\tNews\n\tCall for Proposals\n\tPartnerships\tCall for Partnerships\n\n\n\tMaking Conferences Greener\n\tSponsor Resource Center\n\tVenues\n\tPublications\n\tRestricted Pages\n\tContacts\n\tFAQ\n\tPast Events\n\tSearch\n\tOther Meetings \n\tConferences Email Alerts\n\n\n\tScience Policy\n\tESF Meetings\n\tEuropean Latsis Prize 2012\n\tPeer Review\n\tESF Symposia\n\tESF at ESOF 2012 Dublin\n\n\n\n\n\n\t\t\t\t\t\n\n\t\t\t\t\n\n\t\t\t\tData protection | Disclaimer\n\n© 2012 European Science Foundation - page last updated: 27.12.2012\n\nESF provides the scientific, administrative and technical secretariat for COST (European Cooperation in Science and Technology).\n\n\n\n\n\t\t\t\n\n\t\t\n\n\t\t\n\t\t\n\t\n\n\n\n\n\n\n",
      "contentType" : "text/plain",
      "language" : "en"
    }
  }, {
    "snapshot" : "20121227_intermediaries",
    "language" : "en",
    "textId" : 9095,
    "textInfo" : {
      "title" : "ESF at ESOF : European Science Foundation",
      "description" : null,
      "text" : "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nESF at ESOF : European Science Foundation\n\n\n\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\tBookmark this pageFAQMember pagesRSSSitemapSubscribe\n\n\t\t\t\t\t\t\t\t\t\n\t\t\t     \n                            \n                            \n                            \n\t\t\n\n\t\t\n\t\t\t\n\t\t\n\n\n\n\t\t\t\t\t\t\n\n\t\t\t\t\t\n\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n\t\t\t\t\t\t\n\n\t\t\t\t\t\n\n\t\t\t\t\n\n\t\t\t\t\tHome\n\tAbout ESF\n\tActivities\n\tResearch Areas\n\tPublications\n\tMedia Centre\n\tJobs\n\tContact\n\n\n\n\n\t\t\t\tHome  > Activities  > ESF at ESOF 2012 Dublin\n\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\tESF at ESOF 2012 Dublin\n\n\n\t\t\t\t\t\t\t\n\n\nCome and visit us on our stand in Dublin  from 11-15 July  where  the most influential people  from the world of science, society and policy will assemble for the largest open forum of its kind.\r\n\nThis important gathering will provide a platform for debate, for influencing policy and strengthening the links between science and society.\r\n\nVisit us on Stand 17 where we will train you to give a successful elevator pitch and share information about ESF and how we can support you.\n\n\n\n\n\n\nEuropean Science TV & New Media Festival 2012\nWe are sponsoring this year's European Science TV & New Media Festival.\n\r\n\nThe Festival will take place at Trinity College Dublin between July 13-15 2012.\r\n\nMore info\n\n\n\n\n\n\n\n\t\t\t\t\t\t\t\nESF @ ESOF 2012 Programme\n\n\n\n\n\n\t\nElevator Pitch for Researchers\n\n\n\n\n\n\t\n\n\n\n\n\n\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\n\n\n\n\n\t\t\t\t\t\t\tEuroBioFund\n\tEUROCORES\n\tExploratory Workshops\n\tForward Looks\n\tCalls and Funding\n\tMO Fora\n\tResearch Networking Programmes\n\tESF Research Conferences\n\tScience Policy\n\tESF Meetings\n\tEuropean Latsis Prize 2012\n\tPeer Review\n\tESF Symposia\n\tESF at ESOF 2012 Dublin\tA look back on ESOF 2012 in Dublin\n\tESOF 2012 DUBLIN - Photos\n\n\n\n\n\n\n\n\t\t\t\t\t\n\n\t\t\t\t\n\n\t\t\t\tData protection | Disclaimer\n\n© 2012 European Science Foundation\n\nESF provides the scientific, administrative and technical secretariat for COST (European Cooperation in Science and Technology).\n\n\n\n\n\t\t\t\n\n\t\t\n\n\t\t\n\t\t\n\t\n\n\n\n\n\n\n",
      "contentType" : "text/plain",
      "language" : "en"
    }
  } ],
  "page" : {
    "size" : 2,
    "number" : 0,
    "totalElements" : 2,
    "totalPages" : 1
  },
  "links" : { }
}

DomainGraph

A set of Endpoints to export already prepared DomainGraphs. DomainGraphs represent the graph of Domains (Nodes) and their linkages (Edges). Edges additionally have a weight to count how often one Domain links to another.

The API for DomainGraphs uses a Node-Edge representation. Thus you have to use 2 calls to the API to get the data for a DomainGraph.

There may be multiple different DomainGraphs for one Snapshot. DomainGraphs may be created from Variants and Selections.

As for the Variants, currently only one Variant ONLY_SEEDS exists.

  • ONLY_SEEDS: Contains all nodes and edges from from the crawled Snapshot.

Currently new DomainGraphs can only be created from the backend.

/api/domaingraph/list

A list with all existing DomainGraphs.

Parameter Description

snapshot

Optional. The name of a snapshot. Can be specified multiple times.

panel

Optional. The name of a panel.

selection

Optional. The machine-name of the selection. If not specified does not filter the result.

variant

Optional. The variant of the DomainGraph. Currently only 'ONLY_SEEDS' is supported. If not specified, does not filter the result.

page

The number of the requested page.

size

The number of objects of the requested page.

Example
$ curl 'http://localhost:8080/api/domaingraph/list?page=0&size=3' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 49
Content-Type: application/json
Content-Length: 432

{
  "responseHeader" : {
    "query" : "http://localhost:8080/api/domaingraph/list?page=0&size=3",
    "state" : "OK",
    "msg" : "",
    "httpStatus" : null
  },
  "content" : [ {
    "id" : 162,
    "snapshotName" : "20121227_intermediaries",
    "variant" : "ONLY_SEEDS",
    "selectionMachineName" : null
  } ],
  "page" : {
    "size" : 3,
    "number" : 0,
    "totalElements" : 1,
    "totalPages" : 1
  },
  "links" : { }
}

/api/domaingraph/{id}/nodes

A list with all nodes in the referenced DomainGraph.

Parameter Description

page

The number of the requested page.

size

The number of objects of the requested page.

Example
$ curl 'http://localhost:8080/api/domaingraph/162/nodes?page=0&size=3' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 49
Content-Type: application/json
Content-Length: 894

{
  "responseHeader" : {
    "query" : "http://localhost:8080/api/domaingraph/162/nodes?page=0&size=3",
    "state" : "OK",
    "msg" : "",
    "httpStatus" : null
  },
  "content" : [ {
    "id" : "www.dfg.de",
    "url" : "www.dfg.de",
    "type" : "SEED",
    "indegree" : 8,
    "outdegree" : 4,
    "degree" : 12,
    "outdegree_seeds" : 4
  }, {
    "id" : "www.oaq.ch",
    "url" : "www.oaq.ch",
    "type" : "SEED",
    "indegree" : 10,
    "outdegree" : 20,
    "degree" : 30,
    "outdegree_seeds" : 20
  }, {
    "id" : "www.europace.org",
    "url" : "www.europace.org",
    "type" : "SEED",
    "indegree" : 5,
    "outdegree" : 6,
    "degree" : 11,
    "outdegree_seeds" : 6
  } ],
  "page" : {
    "size" : 3,
    "number" : 0,
    "totalElements" : 113,
    "totalPages" : 38
  },
  "links" : {
    "next" : "http://localhost:8080/api/domaingraph/162/nodes?page=1&size=3"
  }
}

/api/domaingraph/{id}/edges

A list with all edges in the referenced DomainGraph.

Parameter Description

page

The number of the requested page.

size

The number of objects of the requested page.

Example
$ curl 'http://localhost:8080/api/domaingraph/162/edges?page=0&size=3' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 49
Content-Type: application/json
Content-Length: 639

{
  "responseHeader" : {
    "query" : "http://localhost:8080/api/domaingraph/162/edges?page=0&size=3",
    "state" : "OK",
    "msg" : "",
    "httpStatus" : null
  },
  "content" : [ {
    "source" : "www.dfg.de",
    "target" : "www.esf.org",
    "weight" : 20
  }, {
    "source" : "www.dfg.de",
    "target" : "erc.europa.eu",
    "weight" : 6
  }, {
    "source" : "www.dfg.de",
    "target" : "www.ciee.org",
    "weight" : 2
  } ],
  "page" : {
    "size" : 3,
    "number" : 0,
    "totalElements" : 1126,
    "totalPages" : 376
  },
  "links" : {
    "next" : "http://localhost:8080/api/domaingraph/162/edges?page=1&size=3"
  }
}

Statistics

These endpoints provide access to some statistical information basic information about snapshots (and panels).

/api/stats

Returns a list of Stats-Objects describing various descriptive indicators of snapshots.

Parameter Description

snapshot

Optional. The name of a snapshot. Can be specified multiple times.

panel

Optional. The name of a panel. One of 'snapshot' or 'panel' has to be specified.

Example
$ curl 'http://localhost:8080/api/stats?snapshot=20121227_intermediaries' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 49
Content-Type: application/json
Content-Length: 632

{
  "responseHeader" : {
    "query" : "http://localhost:8080/api/stats?snapshot=20121227_intermediaries",
    "state" : "OK",
    "msg" : "",
    "httpStatus" : null
  },
  "content" : [ {
    "snapshot" : "20121227_intermediaries",
    "selection" : null,
    "seedsInitial" : 122,
    "seedsActual" : 157,
    "seedsCrawled" : 141,
    "seedsNotCrawled" : 16,
    "importedDomains" : 114,
    "importedHtmlDocs" : 274126,
    "importedKBytes" : -1,
    "indexedDomains" : 110,
    "indexedDocs" : 223686
  } ],
  "page" : {
    "size" : 2000,
    "number" : 0,
    "totalElements" : 1,
    "totalPages" : 1
  },
  "links" : { }
}
Table 1. Stats-Object
Name Description

snapshot

The snapshot.

selection

The selection. Might be null, when no Seleciton is present.

seedsInitial

The number for seeds which have been used as input for the crawl.

seedsActual

The number of seeds which have been used for the crawl. Includes possible redirects and seeds which could not be crawled.

seedsCrawled and seedsNotCrawled

Should be self explanotory.

importedDomains

The number of seeds which have been actually imported.

importedHtmlDocs

The number of imported documents with the mime-type "text/html"

importedKBytes (currently not computed)

The complete size of the imported documents. Includes possible duplicated documents.

indexedSites

The number of sites which have been indexed.

indexedDocs

The number of indexed documents.

/api/stats/domains

Returns a list of DomainStats-Objects describing descriptive indicators for all seed-domains of a given snapshot.

Parameter Description

snapshot

Optional. The name of a snapshot. Can be specified multiple times.

panel

Optional. The name of a panel. One of 'snapshot' or 'panel' has to be specified.

Example
$ curl 'http://localhost:8080/api/stats/domains?snapshot=20121227_intermediaries&page=0&size=2' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 49
Content-Type: application/json
Content-Length: 806

{
  "responseHeader" : {
    "query" : "http://localhost:8080/api/stats/domains?snapshot=20121227_intermediaries&page=0&size=2",
    "state" : "OK",
    "msg" : "",
    "httpStatus" : null
  },
  "content" : [ {
    "snapshot" : "20121227_intermediaries",
    "selection" : null,
    "domain" : "www.dfg.de",
    "importedHtmlDocs" : 9877,
    "importedKBytes" : 0,
    "indexedDocs" : 7623
  }, {
    "snapshot" : "20121227_intermediaries",
    "selection" : null,
    "domain" : "www.oaq.ch",
    "importedHtmlDocs" : 2613,
    "importedKBytes" : 0,
    "indexedDocs" : 1221
  } ],
  "page" : {
    "size" : 2,
    "number" : 0,
    "totalElements" : 114,
    "totalPages" : 57
  },
  "links" : {
    "next" : "http://localhost:8080/api/stats/domains?snapshot=20121227_intermediaries&page=1&size=2"
  }
}

Embeddings

TODO:

/api/embeddings/definitions

Returns a list of EmbeddingDefintions. An EmbeddingDefinition combines a specific embedding model and a reference to a Chunker.

Parameter Description

page

Optional. The number of the requested page.

size

Optional. The number of objects of the requested page.

Example
$ curl 'http://localhost:8080/api/embeddings/definitions?size=1' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 49
Content-Type: application/json
Content-Length: 628

{
  "responseHeader" : {
    "query" : "http://localhost:8080/api/embeddings/definitions?size=1",
    "state" : "OK",
    "msg" : "",
    "httpStatus" : null
  },
  "content" : [ {
    "machineName" : "default",
    "modelName" : "sentence-transformers/paraphrase-MiniLM-L12-v2",
    "dimensions" : 384,
    "chunkerMachineName" : "default"
  }, {
    "machineName" : "tests",
    "modelName" : "sentence-transformers/paraphrase-MiniLM-L12-v2",
    "dimensions" : 384,
    "chunkerMachineName" : "default"
  } ],
  "page" : {
    "size" : 2,
    "number" : 0,
    "totalElements" : 2,
    "totalPages" : 1
  },
  "links" : { }
}

/api/embeddings/status

TODO:

/api/embeddings/search

Creates a list of "nearby" chunks of documents based on the used embedding sorted ascending on the distance. As distance the cosinus similarity is used.

Parameter Description

snapshot

The snapshot

selection

The selection. (currently not used)

domain

A Domain on which the search should be restricted. Can be specified multiple times.

embeddingsDef

The machineName of the EmbeddingsDef

query

The text for comparing with embeddings

maxDistance

The maximum distance of text chunks. Defaults to 0.5

limit

The limit of matching Chunks to return. If active disables paging. Defaults to 1000

minTokenCount

CURRENTY IGNORED and fixed to 15. Only consider Chunks with more than minTokensCount. Defaults to 15

expectedElements

MITIGATES a bug in MariaDB with inconsistent results

Example
$ curl 'http://localhost:8080/api/embeddings/search?snapshot=20121227_intermediaries&embeddingsDef=default&query=Nachhaltigkeit&limit=2' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 49
Content-Type: application/json
Content-Length: 1375

{
  "responseHeader" : {
    "query" : "http://localhost:8080/api/embeddings/search?snapshot=20121227_intermediaries&embeddingsDef=default&query=Nachhaltigkeit&limit=2",
    "state" : "OK",
    "msg" : "",
    "httpStatus" : null
  },
  "content" : [ {
    "domain" : "www.dfg.de",
    "reference" : "/dfg_magazin/wissenschaft_oeffentlichkeit/dfg_wissenschaftsjahre/2012_nachhaltigkeit/index.jsp",
    "textChunk" : "Das \"Wissenschaftsjahr 2012 – Zukunftsprojekt Erde\" beschäftigt sich mit Forschung für nachhaltige Entwicklung. Alle Aspekte der Nachhaltigkeit werden angesprochen: Im Fokus stehen Möglichkeiten und Realisierbarkeit wirtschaftlichen, ökologischen und sozial nachhaltigen Handelns.",
    "embedding" : [ ],
    "dist" : 0.34900558
  }, {
    "domain" : "www.dfg.de",
    "reference" : "/service/presse/das_neueste/index.html",
    "textChunk" : "(30.05.12) Am 30. Mai startet die MS Wissenschaft ihre Tour 2012. An Bord präsentieren auch von der DFG unterstützte Projekte ihre Forschung für nachhaltige Entwicklungen. Zum Start des Schiffes erscheint auch „Das blaue ABC. Forschung – Wissen – Nachhaltigkeit“, das DFG-geförderte Forschung zur Nachhaltigkeit vorstellt.",
    "embedding" : [ ],
    "dist" : 0.4760739
  } ],
  "page" : {
    "size" : 2,
    "number" : 0,
    "totalElements" : 2,
    "totalPages" : 1
  },
  "links" : { }
}

include::users.adoc[]s