About
The WDC-API provides a REST-Interface for the WebDataCollector-Framework and provided the means for external applications to work with the collected and prepared data of the WebDataCollector.
General definitions
This API shares some common conventions and definitions which are not stated explicity at each endpoint.
Base-URL, Authentication and Authorization
The API can be accessed at the URL https://dss-wdc.wiso.uni-hamburg.de/api.
The API can only be access with Access-Tokens. Access Tokens can be included included in the Http-Head as parameter "Token".
curl 'https://dss-wdc.wiso.uni-hamburg.de/api/snapshot/list?page=0&size=5' -i -X GET -H "Token:MyToken"
Snapshots and Panels are secured objects. A user gets only the snapshots which have been accordingly configured. If you think you miss a snapshot you can have a look into your permissions via an API-Call.
| You need an access-token? Please get in touch with us by email. |
| To increase the bevity of the examples, the documentation ignores the authentication-token. In your own code you have to be authenticated. |
| Please note that we log your access to the API. We use that information to identify bottlenecks and problems within the API. |
Responses, Status-Codes and Paging
The WDC-API aims to produce structural stable Response-Objects in JSON. The basic form of such a Response-Object is as follows:
{
"responseHeader" : { (1)
"query" : "...",
"state" : "OK",
"msg" : "",
"httpStatus" : "OK"
},
"content" : [ { (2)
"domainName" : "www.dfg.de"
}, {
"domainName" : "www.oaq.ch"
}, {
"domainName" : "www.europace.org"
} ],
"page" : { (3)
"size" : 3,
"number" : 0,
"totalElements" : 110,
"totalPages" : 37
},
"links" : { (4)
"next" : "https://dss-wdc.wiso.uni-hamburg.de/api/snapshot/20121227_intermediaries/domains?page=1&size=3"
}
}
| 1 | The responseHeader represents information about the query, the state of the response and potential messages and warnings. |
| 2 | The content consists of an array of objects. The type of these objects depends on the query. |
| 3 | The page-object gives information about the overall size of the data and gives detailed information which is important for paging throug large data-sets. To actually consume paged resources you should use the links-objects. |
| 4 | The links give the link for the next or the previous page. This information should be used to implement paging. If there is no next- or prev-page the property does not exist. |
| The maximum number of elements in one page is set to 2000. Thus, if you specify an paging-size of 1000 it will be overriden. Please be aware, that your client might restrict the size of the body. |
Rate-Limiting
The API applies a rate limiting for each token. If you send requests to quickly the server sends responses with a status code of TOO_MANY_REQUESTS (429) and header "Retry-After" which specifies how many seconds you should wait to send the next request.
If you use the dsslab-wdc-client you don’t have to do anything as the client already waits according to the rate limits.
Complex Datatypes for the API-Requests
The API defines arguments on different endpoints. Some arguments, such as the arguments for paging or SnapshotSelections, have or can be used jointly and refer to a special datatype.
| Datatype | Description | Arguments |
|---|---|---|
Paging |
Used to provide a means to "page" through larger results. See above. You should not use the paging directly. Instead use the prev and next-links. |
|
SnapshotSelection |
SnapshotSelection are used to express a subset of domains in a snapshot. Various endpoints offer the possibility to work on such subsets. |
|
Integration: Access from Python
For a more tight integration we publish the Python package dsslab-wdc-client. This package supports automatic handling of paging of large results and transforming these in JSON-Arrays or directly to DataFrames.
We highly recommend to use this approach as we develop and test this package in synch with the rest of the WDC-API.
|
Snapshots
A set of Endpoints to discover information about available snapshots.
/api/snapshot/list
Returns a list of Snapshots, which are accessible for the given user.
| Parameter | Description |
|---|---|
|
A simple filter which checks if the name contains the given String |
$ curl 'http://localhost:8080/api/snapshot/list?filter=intermediaries' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 25
Content-Type: application/json
Content-Length: 552
{
"responseHeader" : {
"query" : "http://localhost:8080/api/snapshot/list?filter=intermediaries",
"state" : "OK",
"msg" : "",
"httpStatus" : null
},
"content" : [ {
"name" : "20121227_intermediaries",
"description" : null,
"indexed" : true,
"textExtracted" : false
}, {
"name" : "20240701_intermediaries",
"description" : null,
"indexed" : false,
"textExtracted" : false
} ],
"page" : {
"size" : 2000,
"number" : 0,
"totalElements" : 2,
"totalPages" : 1
},
"links" : { }
}
/api/snapshot/{snapshot}/domains
Returns the set of Domains included in the specified Snapshot. The information about Domains reflects the imported status of a crawl.
Information can be only obtained about crawled domains (Seeds).
| Parameter | Description |
|---|---|
|
The number of the requested page. |
|
The number of objects of the requested page. |
$ curl 'http://localhost:8080/api/snapshot/20121227_intermediaries/domains?page=0&size=2' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 25
Content-Type: application/json
Content-Length: 587
{
"responseHeader" : {
"query" : "http://localhost:8080/api/snapshot/20121227_intermediaries/domains?page=0&size=2",
"state" : "OK",
"msg" : "",
"httpStatus" : null
},
"content" : [ {
"domainName" : "www.cpu.fr",
"type" : "SEED",
"pages" : 10008
}, {
"domainName" : "www.srhe.ac.uk",
"type" : "SEED",
"pages" : 9995
} ],
"page" : {
"size" : 2,
"number" : 0,
"totalElements" : 114,
"totalPages" : 57
},
"links" : {
"next" : "http://localhost:8080/api/snapshot/20121227_intermediaries/domains?page=1&size=2"
}
}
/api/snapshot/{snapshot}/seeds
Return concise information about seeds, including their crawled status and possible redirects.
| The information of seeds is generated from the Heritrix seed-reports. Status codes can be found here: https://heritrix.readthedocs.io/en/latest/glossary.html#status-codes |
Fields of one seed-item:
| Field | Description |
|---|---|
httpStatusCode |
Extended httpStatusCode for the current uri |
status |
A more humand readable status code |
uri |
The actual URI. |
redirectsTo |
A possible redirect. Return "null", if there was no redirect. Please note, that such a redirect creates following seed-item which in turn could again create a redirect. |
$ curl 'http://localhost:8080/api/snapshot/20121227_intermediaries/seeds?size=3' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 25
Content-Type: application/json
Content-Length: 791
{
"responseHeader" : {
"query" : "http://localhost:8080/api/snapshot/20121227_intermediaries/seeds?size=3",
"state" : "OK",
"msg" : "",
"httpStatus" : null
},
"content" : [ {
"httpStatusCode" : -6,
"status" : "NOTCRAWLED",
"uri" : "http://www.esib.org/",
"redirectsTo" : null
}, {
"httpStatusCode" : -6,
"status" : "NOTCRAWLED",
"uri" : "http://www.forum.eua.be/",
"redirectsTo" : null
}, {
"httpStatusCode" : -6,
"status" : "NOTCRAWLED",
"uri" : "http://www.www2.esf.org/",
"redirectsTo" : null
} ],
"page" : {
"size" : 3,
"number" : 0,
"totalElements" : 157,
"totalPages" : 53
},
"links" : {
"next" : "http://localhost:8080/api/snapshot/20121227_intermediaries/seeds?page=1&size=3"
}
}
/api/snapshot/{snapshot}/searchDomains
Queries the SearchIndex of the crawled documents with a given Query and returns a list of hits in each domain. Only domains which actually have at least one hit are returned.
| The number of hits of a domain is calculated as the sum of hits in each document. Internally a facetted SolrQuery of the index is created which uses the facet.method=fc (see https://solr.apache.org/guide/solr/latest/query-guide/faceting.html). |
| Parameter | Description |
|---|---|
|
A query to search for. Can be an arbitrary Solr-Query. |
|
Optional. A machineName of a Selection. If specified only results of Domains in the Selection will be returned. |
$ curl 'http://localhost:8080/api/snapshot/20121227_intermediaries/searchDomains?query=uni&size=2' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 25
Content-Type: application/json
Content-Length: 567
{
"responseHeader" : {
"query" : "http://localhost:8080/api/snapshot/20121227_intermediaries/searchDomains?query=uni&size=2",
"state" : "OK",
"msg" : "",
"httpStatus" : null
},
"content" : [ {
"domainName" : "www.acquin.org",
"hits" : 2386
}, {
"domainName" : "www.che.de",
"hits" : 1476
} ],
"page" : {
"size" : 2,
"number" : 0,
"totalElements" : 110,
"totalPages" : 55
},
"links" : {
"next" : "http://localhost:8080/api/snapshot/20121227_intermediaries/searchDomains?query=uni&page=1&size=2"
}
}
Selections
A Selection represents a subset of Domains of a Snapshot. They can be used as a filter for various endpoints.
| Filtering on Selections are made on a best effort basis. Assume for example a search request. The Filtering includes all search results which end with a domain in the selection. This is necessary to include search results of redirected crawled data. Yet, this simpler approach might lead to undesired results: |
| Domain in Selection | Domain in Seed-List | Crawled, indexed Domain | Matches |
|---|---|---|---|
bimid.de |
bimid.de |
www.bimid.de |
true |
www.tageszeitung.de |
www.tageszeitung.de |
www.taz.de |
false |
Filter of Selections will be reworked to use a more sophisticated strategy using the redirect-data which will also match the second case.
/api/selection/list
Returns a list of available selections.
| Parameter | Description |
|---|---|
|
The number of the requested page. |
|
The number of objects of the requested page. |
$ curl 'http://localhost:8080/api/selection/list' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 24
Content-Type: application/json
Content-Length: 593
{
"responseHeader" : {
"query" : "http://localhost:8080/api/selection/list",
"state" : "OK",
"msg" : "",
"httpStatus" : null
},
"content" : [ {
"machineName" : "createWithSelection",
"title" : "the title"
}, {
"machineName" : "SelectionControllerTest.SET",
"title" : ""
}, {
"machineName" : "python-test-selection",
"title" : ""
}, {
"machineName" : "wdc.crawler.test.SelectionServiceTest#set",
"title" : ""
} ],
"page" : {
"size" : 2000,
"number" : 0,
"totalElements" : 4,
"totalPages" : 1
},
"links" : { }
}
/api/selection/{selection}/domains
Returns the set of Domains included in the specified Selection.
| Parameter | Description |
|---|---|
|
The number of the requested page. |
|
The number of objects of the requested page. |
$ curl 'http://localhost:8080/api/selection/createWithSelection/domains' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 25
Content-Type: application/json
Content-Length: 963
{
"responseHeader" : {
"query" : "http://localhost:8080/api/selection/createWithSelection/domains",
"state" : "OK",
"msg" : "",
"httpStatus" : null
},
"content" : [ {
"name" : "www.esf.org"
}, {
"name" : "www.oecd.org"
}, {
"name" : "www.nuffic.nl"
}, {
"name" : "www.eua.be"
}, {
"name" : "www.enqa.eu"
}, {
"name" : "www.eqar.eu"
}, {
"name" : "www.inqaahe.org"
}, {
"name" : "www.esmu.be"
}, {
"name" : "www.eaie.org"
}, {
"name" : "www.britishcouncil.org"
}, {
"name" : "eacea.ec.europa.eu"
}, {
"name" : "www.chea.org"
}, {
"name" : "www.aca-secretariat.be"
}, {
"name" : "www.iau-aiu.net"
}, {
"name" : "www.iie.org"
}, {
"name" : "www.aucc.ca"
}, {
"name" : "www.aau.org"
}, {
"name" : "www.nafsa.org"
} ],
"page" : {
"size" : 2000,
"number" : 0,
"totalElements" : 18,
"totalPages" : 1
},
"links" : { }
}
/api/selection/{selection}/set (beta)
Sets the Domains of the Selection. If the Selection does not exist, it will be created.
| This Endpoint is still in evaluation and will be not usefull for "normal" users. As a normal user you won’t be able to edit a newly created Selection. |
| Selections are SecuredObjects and making changes of the object are secured. To edit an existing selection you have to make sure you have the corresponding access rights. |
$ curl 'http://localhost:8080/api/selection/SelectionControllerTest.SET/set' -i -X PUT \
-H 'Content-Type: text/plain' \
-d ' www.eua.be
www.oecd.org
www.enqa.eu
'
HTTP/1.1 201 Created
X-Rate-Limit-Remaining: 22
Panels
A set of Endpoints to discover information about available Panels.
/api/panel/list
Returns a list of Panels, which are accessible for the given user.
$ curl 'http://localhost:8080/api/panel/list' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 20
Content-Type: application/json
Content-Length: 368
{
"responseHeader" : {
"query" : "http://localhost:8080/api/panel/list",
"state" : "OK",
"msg" : "",
"httpStatus" : null
},
"content" : [ {
"name" : "intermediaries",
"description" : null,
"snapshotCount" : 1
} ],
"page" : {
"size" : 2000,
"number" : 0,
"totalElements" : 1,
"totalPages" : 1
},
"links" : { }
}
/api/panel/{name}/list
Returns the set of Snapshots included in the specified Panel.
| The list of returned Snapshots is secured and filtered with your access rules. |
| Parameter | Description |
|---|---|
|
Optional. The number of the requested page. |
|
Optional. The number of objects of the requested page. |
$ curl 'http://localhost:8080/api/panel/intermediaries/list?page=0&size=1' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 19
Content-Type: application/json
Content-Length: 429
{
"responseHeader" : {
"query" : "http://localhost:8080/api/panel/intermediaries/list?page=0&size=1",
"state" : "OK",
"msg" : "",
"httpStatus" : null
},
"content" : [ {
"name" : "20121227_intermediaries",
"description" : null,
"indexed" : true,
"textExtracted" : false
} ],
"page" : {
"size" : 1,
"number" : 0,
"totalElements" : 1,
"totalPages" : 1
},
"links" : { }
}
Index
Internally the documents (pages) of a Snapshot are indexed using a full-text search-engine.
For search requests based on documents you can use the fiels in the table below:
| Fieldname | Description |
|---|---|
domain_s |
The domain of the web site hosting this page |
path_s |
The complete path of the page, including query parameters |
title_t |
The title of the web page. |
description_t |
A description extracted from the page. |
language_s |
The language of the identified text |
_text_ |
The hidden field for the full-text. It can be queried but the values are not stored in the index. If you need full-texts please use the appropriate API-Call. Normally, you do not have to state this field in queries. |
CSR (1)
"Corporate Social Responsibility" (2)
language_s:de && "Corporate Social Responsibility" (3)
language_s:en && path_s:"/about" (4)
| 1 | Searches for a phrase in the field \_text. |
| 2 | Searches for the phrase "Corporate Social Responsibility". Use " to combine single words to a longer phrase. |
| 3 | Same as above, but searches only in german documents. |
| 4 | Searches for pages with the specified path in english documents. |
|
Notes and complete Solr-Query-Syntax
As you probably noted the examples do not entail information about a snapshot or panel. This information is added to your query automatically to make sure that only reasonable queries can be submitted to the API. Currently a version of Solr is used. Thus you can use the syntax of Solr to query the full-text index. For further reference, please use for reference the orginal documentation of Solr: |
/api/index/status
Provides an overview of indexed documents per Snapshot and a possible defined Selection.
| Parameter | Description |
|---|---|
|
Optional. The name of a snapshot. Can be specified multiple times. |
|
Optional. The name of a panel. One of 'snapshot' or 'panel' has to be specified. |
|
Optional. The name of a selection to filter the results. |
$ curl 'http://localhost:8080/api/index/status?snapshot=20121227_intermediaries' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 49
Content-Type: application/json
Content-Length: 412
{
"responseHeader" : {
"query" : "http://localhost:8080/api/index/status?snapshot=20121227_intermediaries",
"state" : "OK",
"msg" : "",
"httpStatus" : null
},
"content" : [ {
"snapshot" : "20121227_intermediaries",
"selection" : "",
"indexedDocs" : 223686
} ],
"page" : {
"size" : 1,
"number" : 0,
"totalElements" : 1,
"totalPages" : 1
},
"links" : { }
}
/api/index/searchAggrBySnapshot
Queries a set of snapshots with a set of concepts and the overall number of occurrences of concepts aggregated on snapshots.
| Parameter | Description |
|---|---|
|
Optional. Page size. |
This search method makes use of more complex type "AggregatedSearchQuery". This type allows to specify queries for a set of snapshot and a set of concepts. It is defined as follows:
{
"panel": "intermediaries", (1)
"snapshots": [ ... ], (1)
"searchQueries": [ (2)
{
"name": "University",
"q": "\\"University\\" OR Uni OR College"
}
]
}
| 1 | The fields "panel" or "snapshots" define which snapshots will be searched. Only one of these fields must be specified. |
| 2 | The field "searchQueries" is an array of SearchQuery-objects. Each SearchQuery has a name as a label and a query 'q' which is used to search for the concept. |
| When using such an AggregatedSearchQuery in a request, you have to specify the AggregatedSearchQuery in the BODY of the request. |
Result: The endpoint returns a Task which then can be queried to obtain the actual results.
Using this endpoint is simple, when using our Python-Client.
query = {
"panel": "intermediaries",
"searchQueries": [
{
"name": "University",
"q": "\\"University\\" OR Uni OR College"
}
]
}
df2 = client.loadAsDF(
"/api/index/searchAggrBySnapshot", body = query)
$ curl 'http://localhost:8080/api/index/searchAggrBySnapshot' -i -X POST \
-H 'Content-Type: application/json' \
-d '{
"panel": "intermediaries",
"searchQueries": [
{
"name": "University",
"q": "\"University\" OR Uni OR College"
},
{
"name": "Klima",
"q": "Klima"
}
]
}
'
HTTP/1.1 303 See Other
X-Rate-Limit-Remaining: 49
Location: /api/index/doSearchAggrBySnapshot?taskId=24fa2c23452d0801158b6b2eb149ad7f7a899154&size=1000
Content-Type: application/json
Content-Length: 184
{
"id" : "24fa2c23452d0801158b6b2eb149ad7f7a899154",
"uri" : "/api/index/doSearchAggrBySnapshot?taskId=24fa2c23452d0801158b6b2eb149ad7f7a899154&size=1000",
"description" : null
}
Texts
Texts of web pages are prepared in various ways. This chapter describes what actually happens to those texts an how you can access this information.
/api/texts/search
Returns a subset of pages with extracted text.
| Parameter | Description |
|---|---|
|
The name of the snapshot |
|
Optional. The machine-name of the selection. |
|
The Solr-Query which is used to search for the pages |
|
Optional. If 'true' includes the extracted text. Please note, that texts are not extracted on all Snapshots. |
|
Optional. Used for debugging. Abbreviates exported texts. |
|
Optional. The number of the requested page. |
|
Optional. The number of objects of the requested page. |
$ curl 'http://localhost:8080/api/texts/search?snapshot=20121227_intermediaries&query=news&textsInclude=true&textsAbbreviate=true&size=2' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 49
Content-Type: application/json
Content-Length: 1650
{
"responseHeader" : {
"query" : "http://localhost:8080/api/texts/search?snapshot=20121227_intermediaries&query=news&textsInclude=true&textsAbbreviate=true&size=2",
"state" : "OK",
"msg" : "solrQuery: q=news&q.op=OR&fq=snapshot_id_i:+1&sort=domain_id_i+asc,reference_id_i+asc&start=0&rows=2",
"httpStatus" : null
},
"content" : [ {
"snapshot" : "20121227_intermediaries",
"language" : "en",
"textId" : 46,
"domain" : "www.dfg.de",
"path" : "/en/index.jsp",
"textInfo" : {
"title" : "DFG, German Research Foundation",
"description" : null,
"text" : "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nDFG, German Research Foundation\n\n\r\n\r\n\r\n \r\n\r\n ...",
"contentType" : "text/plain",
"language" : "en"
}
}, {
"snapshot" : "20121227_intermediaries",
"language" : "de",
"textId" : 98,
"domain" : "www.dfg.de",
"path" : "/dfg_profil/geschaeftsstelle/dfg_praesenz_ausland/beijing/index.jsp",
"textInfo" : {
"title" : "DFG - Deutsche Forschungsgemeinschaft - Chinesisch-Deutsches Zentrum für Wissenschaftsförderung Beijing",
"description" : null,
"text" : "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nDFG - Deutsche Forschungsgemeinschaft - Chinesisch-...",
"contentType" : "text/plain",
"language" : "de"
}
} ],
"page" : {
"size" : 2,
"number" : 0,
"totalElements" : 75781,
"totalPages" : 37891
},
"links" : {
"next" : "http://localhost:8080/api/texts/search?snapshot=20121227_intermediaries&query=news&textsInclude=true&textsAbbreviate=true&page=1&size=2"
}
}
/api/texts/get
Returns extracted text from a set of given text-ids.
| Parameter | Description |
|---|---|
|
The name of the snapshot |
|
Id of the text. Can be specified multiple times. |
$ curl 'http://localhost:8080/api/texts/get?snapshot=20121227_intermediaries&id=9094&id=9095' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 49
Content-Type: application/json
Content-Length: 7624
{
"responseHeader" : {
"query" : "http://localhost:8080/api/texts/get?snapshot=20121227_intermediaries&id=9094&id=9095",
"state" : "OK",
"msg" : "",
"httpStatus" : null
},
"content" : [ {
"snapshot" : "20121227_intermediaries",
"language" : "en",
"textId" : 9094,
"textInfo" : {
"title" : "Partnerships : European Science Foundation",
"description" : null,
"text" : "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nPartnerships : European Science Foundation\n\n\n\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\tBookmark this pageFAQMember pagesRSSSitemapSubscribe\n\n\t\t\t\t\t\t\t\t\t\n\t\t\t \n \n \n \n\t\t\n\n\t\t\n\t\t\t\n\t\t\n\n\n\n\t\t\t\t\t\t\n\n\t\t\t\t\t\n\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n\t\t\t\t\t\t\n\n\t\t\t\t\t\n\n\t\t\t\t\n\n\t\t\t\t\tHome\n\tAbout ESF\n\tActivities\n\tResearch Areas\n\tPublications\n\tMedia Centre\n\tJobs\n\tContact\n\n\n\n\n\t\t\t\tHome > Activities > ESF Research Conferences > Partnerships\n\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\tESF Research Conferences\n\n\n\t\t\t\t\t\tPartnerships\n\t\n\n\nThe ESF Research Conferences Scheme brings together researchers and researchers from different nationalities, backgrounds and disciplines, and at different career stages to jointly discuss the latest developments in new and emerging fields of research.\n\nESF Research Conferences promote free discussion and exchange of information, and aim to create long-term networks between participants. Participation is open to researchers from academia, industry, society and politics worldwide.\n\nResearch Conferences are grouped in series of 2-5 annual conferences, with each series focussing on a specific scientific area. ESF Research Conferences Series are conceived, branded and financed in partnership with high-level European research organisations and universities. The scientific area and location of each series is suggested by the partner, aiming to showcase their excellence in their particular field of expertise.\n\n\n\n\n\n\nHow to join the Scheme?\n\t\n\n\nThe Call for Partnerships is now closed. For more detailed information, please refer to the 'Call for Partnerships' webpage. \n\r\n\nESF welcomes expressions of interest from its own Member Organisations and from national and international institutions, universities or organisations that would be willing to co-sponsor, on an equal-share basis, a number of conferences each year over a longer-term duration, for example 5 conferences a year for 5 years.\r\n\nPartners in the ESF Research Conferences Scheme PDF (1.1 MB) Last Updated 4-March-2010.\n\n\n\n\n\nFor further information, please contact:\n\n\n\n\tMs.BenitaLippsE-Mail\n\tHead of Unit\n\n\nPhone +32 (0) 25332020 \nFax +32 (0) 25388486 \n\nWorld & Research Conferences\n\n\tESF-LFUI Conferences\n\n\n\tESF-EMBO Symposia\n\n\n\n\n\tESF-JSPS Frontier Science Conference Series for Young Researchers\n\n\n\n\tESF-LiU Conferences with support from Riksbankens Jubileumsfond & Vetenskapsrådet\n\n\n\tESF-UB Conferences in Biomedicine with support from Generalitat de Catalunya\n\n\n\tESF-VR-FORMAS Conferences on Global Change Research\n \n\n\n\tESF-COST High-Level Research Conferences\n\n\n\tEurope-Africa Frontier Research Conference Series\n\n\n\tESF-FMSH Entre-Sciences Conferences in Interdisciplinary Environmental Sciences\n\n\n\tESF-ZiF-Bielefeld Conferences\n\n\n\tESF-EMS-ERCOM Mathematics Conferences\n\n\nSummer/Winter Schools\n\tESF-IAS Winter Schools in Physics\n\n\n\tESF-EPSRC-STFC Summer Schools in Physics & Astronomy (SUSSP)\n\n\n\tESF-CERN Cargese Summer Schools in High Energy Physics & Astrophysics\n\n\nBack - ESF Research Conferences - Home Page\n\n\n\n\n\n\t\t\t\t\t\t\n\n\n\n\t\t\t\t\t\t\tEuroBioFund\n\tEUROCORES\n\tExploratory Workshops\n\tForward Looks\n\tCalls and Funding\n\tMO Fora\n\tResearch Networking Programmes\n\tESF Research Conferences\tUpcoming Events\n\tNews\n\tCall for Proposals\n\tPartnerships\tCall for Partnerships\n\n\n\tMaking Conferences Greener\n\tSponsor Resource Center\n\tVenues\n\tPublications\n\tRestricted Pages\n\tContacts\n\tFAQ\n\tPast Events\n\tSearch\n\tOther Meetings \n\tConferences Email Alerts\n\n\n\tScience Policy\n\tESF Meetings\n\tEuropean Latsis Prize 2012\n\tPeer Review\n\tESF Symposia\n\tESF at ESOF 2012 Dublin\n\n\n\n\n\n\t\t\t\t\t\n\n\t\t\t\t\n\n\t\t\t\tData protection | Disclaimer\n\n© 2012 European Science Foundation - page last updated: 27.12.2012\n\nESF provides the scientific, administrative and technical secretariat for COST (European Cooperation in Science and Technology).\n\n\n\n\n\t\t\t\n\n\t\t\n\n\t\t\n\t\t\n\t\n\n\n\n\n\n\n",
"contentType" : "text/plain",
"language" : "en"
}
}, {
"snapshot" : "20121227_intermediaries",
"language" : "en",
"textId" : 9095,
"textInfo" : {
"title" : "ESF at ESOF : European Science Foundation",
"description" : null,
"text" : "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nESF at ESOF : European Science Foundation\n\n\n\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\tBookmark this pageFAQMember pagesRSSSitemapSubscribe\n\n\t\t\t\t\t\t\t\t\t\n\t\t\t \n \n \n \n\t\t\n\n\t\t\n\t\t\t\n\t\t\n\n\n\n\t\t\t\t\t\t\n\n\t\t\t\t\t\n\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n\t\t\t\t\t\t\n\n\t\t\t\t\t\n\n\t\t\t\t\n\n\t\t\t\t\tHome\n\tAbout ESF\n\tActivities\n\tResearch Areas\n\tPublications\n\tMedia Centre\n\tJobs\n\tContact\n\n\n\n\n\t\t\t\tHome > Activities > ESF at ESOF 2012 Dublin\n\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\tESF at ESOF 2012 Dublin\n\n\n\t\t\t\t\t\t\t\n\n\nCome and visit us on our stand in Dublin from 11-15 July where the most influential people from the world of science, society and policy will assemble for the largest open forum of its kind.\r\n\nThis important gathering will provide a platform for debate, for influencing policy and strengthening the links between science and society.\r\n\nVisit us on Stand 17 where we will train you to give a successful elevator pitch and share information about ESF and how we can support you.\n\n\n\n\n\n\nEuropean Science TV & New Media Festival 2012\nWe are sponsoring this year's European Science TV & New Media Festival.\n\r\n\nThe Festival will take place at Trinity College Dublin between July 13-15 2012.\r\n\nMore info\n\n\n\n\n\n\n\n\t\t\t\t\t\t\t\nESF @ ESOF 2012 Programme\n\n\n\n\n\n\t\nElevator Pitch for Researchers\n\n\n\n\n\n\t\n\n\n\n\n\n\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\n\n\n\n\n\t\t\t\t\t\t\tEuroBioFund\n\tEUROCORES\n\tExploratory Workshops\n\tForward Looks\n\tCalls and Funding\n\tMO Fora\n\tResearch Networking Programmes\n\tESF Research Conferences\n\tScience Policy\n\tESF Meetings\n\tEuropean Latsis Prize 2012\n\tPeer Review\n\tESF Symposia\n\tESF at ESOF 2012 Dublin\tA look back on ESOF 2012 in Dublin\n\tESOF 2012 DUBLIN - Photos\n\n\n\n\n\n\n\n\t\t\t\t\t\n\n\t\t\t\t\n\n\t\t\t\tData protection | Disclaimer\n\n© 2012 European Science Foundation\n\nESF provides the scientific, administrative and technical secretariat for COST (European Cooperation in Science and Technology).\n\n\n\n\n\t\t\t\n\n\t\t\n\n\t\t\n\t\t\n\t\n\n\n\n\n\n\n",
"contentType" : "text/plain",
"language" : "en"
}
} ],
"page" : {
"size" : 2,
"number" : 0,
"totalElements" : 2,
"totalPages" : 1
},
"links" : { }
}
DomainGraph
A set of Endpoints to export already prepared DomainGraphs. DomainGraphs represent the graph of Domains (Nodes) and their linkages (Edges). Edges additionally have a weight to count how often one Domain links to another.
The API for DomainGraphs uses a Node-Edge representation. Thus you have to use 2 calls to the API to get the data for a DomainGraph.
There may be multiple different DomainGraphs for one Snapshot. DomainGraphs may be created from Variants and Selections.
As for the Variants, currently only one Variant ONLY_SEEDS exists.
-
ONLY_SEEDS: Contains all nodes and edges from from the crawled Snapshot.
| Currently new DomainGraphs can only be created from the backend. |
/api/domaingraph/list
A list with all existing DomainGraphs.
| Parameter | Description |
|---|---|
|
Optional. The name of a snapshot. Can be specified multiple times. |
|
Optional. The name of a panel. |
|
Optional. The machine-name of the selection. If not specified does not filter the result. |
|
Optional. The variant of the DomainGraph. Currently only 'ONLY_SEEDS' is supported. If not specified, does not filter the result. |
|
The number of the requested page. |
|
The number of objects of the requested page. |
$ curl 'http://localhost:8080/api/domaingraph/list?page=0&size=3' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 49
Content-Type: application/json
Content-Length: 432
{
"responseHeader" : {
"query" : "http://localhost:8080/api/domaingraph/list?page=0&size=3",
"state" : "OK",
"msg" : "",
"httpStatus" : null
},
"content" : [ {
"id" : 162,
"snapshotName" : "20121227_intermediaries",
"variant" : "ONLY_SEEDS",
"selectionMachineName" : null
} ],
"page" : {
"size" : 3,
"number" : 0,
"totalElements" : 1,
"totalPages" : 1
},
"links" : { }
}
/api/domaingraph/{id}/nodes
A list with all nodes in the referenced DomainGraph.
| Parameter | Description |
|---|---|
|
The number of the requested page. |
|
The number of objects of the requested page. |
$ curl 'http://localhost:8080/api/domaingraph/162/nodes?page=0&size=3' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 49
Content-Type: application/json
Content-Length: 894
{
"responseHeader" : {
"query" : "http://localhost:8080/api/domaingraph/162/nodes?page=0&size=3",
"state" : "OK",
"msg" : "",
"httpStatus" : null
},
"content" : [ {
"id" : "www.dfg.de",
"url" : "www.dfg.de",
"type" : "SEED",
"indegree" : 8,
"outdegree" : 4,
"degree" : 12,
"outdegree_seeds" : 4
}, {
"id" : "www.oaq.ch",
"url" : "www.oaq.ch",
"type" : "SEED",
"indegree" : 10,
"outdegree" : 20,
"degree" : 30,
"outdegree_seeds" : 20
}, {
"id" : "www.europace.org",
"url" : "www.europace.org",
"type" : "SEED",
"indegree" : 5,
"outdegree" : 6,
"degree" : 11,
"outdegree_seeds" : 6
} ],
"page" : {
"size" : 3,
"number" : 0,
"totalElements" : 113,
"totalPages" : 38
},
"links" : {
"next" : "http://localhost:8080/api/domaingraph/162/nodes?page=1&size=3"
}
}
/api/domaingraph/{id}/edges
A list with all edges in the referenced DomainGraph.
| Parameter | Description |
|---|---|
|
The number of the requested page. |
|
The number of objects of the requested page. |
$ curl 'http://localhost:8080/api/domaingraph/162/edges?page=0&size=3' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 49
Content-Type: application/json
Content-Length: 639
{
"responseHeader" : {
"query" : "http://localhost:8080/api/domaingraph/162/edges?page=0&size=3",
"state" : "OK",
"msg" : "",
"httpStatus" : null
},
"content" : [ {
"source" : "www.dfg.de",
"target" : "www.esf.org",
"weight" : 20
}, {
"source" : "www.dfg.de",
"target" : "erc.europa.eu",
"weight" : 6
}, {
"source" : "www.dfg.de",
"target" : "www.ciee.org",
"weight" : 2
} ],
"page" : {
"size" : 3,
"number" : 0,
"totalElements" : 1126,
"totalPages" : 376
},
"links" : {
"next" : "http://localhost:8080/api/domaingraph/162/edges?page=1&size=3"
}
}
Statistics
These endpoints provide access to some statistical information basic information about snapshots (and panels).
/api/stats
Returns a list of Stats-Objects describing various descriptive indicators of snapshots.
| Parameter | Description |
|---|---|
|
Optional. The name of a snapshot. Can be specified multiple times. |
|
Optional. The name of a panel. One of 'snapshot' or 'panel' has to be specified. |
$ curl 'http://localhost:8080/api/stats?snapshot=20121227_intermediaries' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 49
Content-Type: application/json
Content-Length: 632
{
"responseHeader" : {
"query" : "http://localhost:8080/api/stats?snapshot=20121227_intermediaries",
"state" : "OK",
"msg" : "",
"httpStatus" : null
},
"content" : [ {
"snapshot" : "20121227_intermediaries",
"selection" : null,
"seedsInitial" : 122,
"seedsActual" : 157,
"seedsCrawled" : 141,
"seedsNotCrawled" : 16,
"importedDomains" : 114,
"importedHtmlDocs" : 274126,
"importedKBytes" : -1,
"indexedDomains" : 110,
"indexedDocs" : 223686
} ],
"page" : {
"size" : 2000,
"number" : 0,
"totalElements" : 1,
"totalPages" : 1
},
"links" : { }
}
| Name | Description |
|---|---|
snapshot |
The snapshot. |
selection |
The selection. Might be null, when no Seleciton is present. |
seedsInitial |
The number for seeds which have been used as input for the crawl. |
seedsActual |
The number of seeds which have been used for the crawl. Includes possible redirects and seeds which could not be crawled. |
seedsCrawled and seedsNotCrawled |
Should be self explanotory. |
importedDomains |
The number of seeds which have been actually imported. |
importedHtmlDocs |
The number of imported documents with the mime-type "text/html" |
importedKBytes (currently not computed) |
The complete size of the imported documents. Includes possible duplicated documents. |
indexedSites |
The number of sites which have been indexed. |
indexedDocs |
The number of indexed documents. |
/api/stats/domains
Returns a list of DomainStats-Objects describing descriptive indicators for all seed-domains of a given snapshot.
| Parameter | Description |
|---|---|
|
Optional. The name of a snapshot. Can be specified multiple times. |
|
Optional. The name of a panel. One of 'snapshot' or 'panel' has to be specified. |
$ curl 'http://localhost:8080/api/stats/domains?snapshot=20121227_intermediaries&page=0&size=2' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 49
Content-Type: application/json
Content-Length: 806
{
"responseHeader" : {
"query" : "http://localhost:8080/api/stats/domains?snapshot=20121227_intermediaries&page=0&size=2",
"state" : "OK",
"msg" : "",
"httpStatus" : null
},
"content" : [ {
"snapshot" : "20121227_intermediaries",
"selection" : null,
"domain" : "www.dfg.de",
"importedHtmlDocs" : 9877,
"importedKBytes" : 0,
"indexedDocs" : 7623
}, {
"snapshot" : "20121227_intermediaries",
"selection" : null,
"domain" : "www.oaq.ch",
"importedHtmlDocs" : 2613,
"importedKBytes" : 0,
"indexedDocs" : 1221
} ],
"page" : {
"size" : 2,
"number" : 0,
"totalElements" : 114,
"totalPages" : 57
},
"links" : {
"next" : "http://localhost:8080/api/stats/domains?snapshot=20121227_intermediaries&page=1&size=2"
}
}
Embeddings
TODO:
/api/embeddings/definitions
Returns a list of EmbeddingDefintions. An EmbeddingDefinition combines a specific embedding model and a reference to a Chunker.
| Parameter | Description |
|---|---|
|
Optional. The number of the requested page. |
|
Optional. The number of objects of the requested page. |
$ curl 'http://localhost:8080/api/embeddings/definitions?size=1' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 49
Content-Type: application/json
Content-Length: 628
{
"responseHeader" : {
"query" : "http://localhost:8080/api/embeddings/definitions?size=1",
"state" : "OK",
"msg" : "",
"httpStatus" : null
},
"content" : [ {
"machineName" : "default",
"modelName" : "sentence-transformers/paraphrase-MiniLM-L12-v2",
"dimensions" : 384,
"chunkerMachineName" : "default"
}, {
"machineName" : "tests",
"modelName" : "sentence-transformers/paraphrase-MiniLM-L12-v2",
"dimensions" : 384,
"chunkerMachineName" : "default"
} ],
"page" : {
"size" : 2,
"number" : 0,
"totalElements" : 2,
"totalPages" : 1
},
"links" : { }
}
/api/embeddings/status
TODO:
/api/embeddings/search
Creates a list of "nearby" chunks of documents based on the used embedding sorted ascending on the distance. As distance the cosinus similarity is used.
| Parameter | Description |
|---|---|
|
The snapshot |
|
The selection. (currently not used) |
|
A Domain on which the search should be restricted. Can be specified multiple times. |
|
The machineName of the EmbeddingsDef |
|
The text for comparing with embeddings |
|
The maximum distance of text chunks. Defaults to 0.5 |
|
The limit of matching Chunks to return. If active disables paging. Defaults to 1000 |
|
CURRENTY IGNORED and fixed to 15. Only consider Chunks with more than minTokensCount. Defaults to 15 |
|
MITIGATES a bug in MariaDB with inconsistent results |
$ curl 'http://localhost:8080/api/embeddings/search?snapshot=20121227_intermediaries&embeddingsDef=default&query=Nachhaltigkeit&limit=2' -i -X GET
HTTP/1.1 200 OK
X-Rate-Limit-Remaining: 49
Content-Type: application/json
Content-Length: 1375
{
"responseHeader" : {
"query" : "http://localhost:8080/api/embeddings/search?snapshot=20121227_intermediaries&embeddingsDef=default&query=Nachhaltigkeit&limit=2",
"state" : "OK",
"msg" : "",
"httpStatus" : null
},
"content" : [ {
"domain" : "www.dfg.de",
"reference" : "/dfg_magazin/wissenschaft_oeffentlichkeit/dfg_wissenschaftsjahre/2012_nachhaltigkeit/index.jsp",
"textChunk" : "Das \"Wissenschaftsjahr 2012 – Zukunftsprojekt Erde\" beschäftigt sich mit Forschung für nachhaltige Entwicklung. Alle Aspekte der Nachhaltigkeit werden angesprochen: Im Fokus stehen Möglichkeiten und Realisierbarkeit wirtschaftlichen, ökologischen und sozial nachhaltigen Handelns.",
"embedding" : [ ],
"dist" : 0.34900558
}, {
"domain" : "www.dfg.de",
"reference" : "/service/presse/das_neueste/index.html",
"textChunk" : "(30.05.12) Am 30. Mai startet die MS Wissenschaft ihre Tour 2012. An Bord präsentieren auch von der DFG unterstützte Projekte ihre Forschung für nachhaltige Entwicklungen. Zum Start des Schiffes erscheint auch „Das blaue ABC. Forschung – Wissen – Nachhaltigkeit“, das DFG-geförderte Forschung zur Nachhaltigkeit vorstellt.",
"embedding" : [ ],
"dist" : 0.4760739
} ],
"page" : {
"size" : 2,
"number" : 0,
"totalElements" : 2,
"totalPages" : 1
},
"links" : { }
}
include::users.adoc[]s