-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathdata.html
83 lines (75 loc) · 4.09 KB
/
data.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
layout: default
nav_active: data
title: Webis Data
description: Overview of corpora that are used by the Webis
---
<nav class="uk-container">
<ul class="uk-breadcrumb">
<li><a href="index.html">Webis.de</a></li>
<li class="uk-disabled"><a href="#">Data</a></li>
</ul>
</nav>
<script type="application/ld+json">
{
"@context": "http://schema.org/",
"@type": "DataCatalog",
"name": "Webis Data",
"description": "Overview of corpora that are used by the Webis research group.",
"url": "https://webis.de/data.html",
"keywords": [
"webis",
"data",
"corpora",
"corpus"
],
"author": [
{
"@type": "Organization",
"url": "https://webis.de/",
"name": "The Web Technology & Information Systems Network",
"alternateName": "Webis"
}
]
}
</script>
<main class="uk-section uk-section-default">
<div class="uk-container">
<h1>Data</h1>
<ul class="uk-list">
<li><span data-uk-icon="chevron-down"></span> <a href="#released-webis-corpora">Released Webis Corpora</a>
</li>
<li><span data-uk-icon="chevron-down"></span> <a href="#pan-corpora">PAN Corpora</a></li>
<li><span data-uk-icon="chevron-down"></span> <a href="#touche-corpora">Touché Corpora</a></li>
<li><span data-uk-icon="chevron-down"></span> <a href="#internal-webis-corpora">Internal Webis Corpora</a>
</li>
<li><span data-uk-icon="chevron-down"></span> <a href="#other-corpora">Other Corpora</a></li>
</ul>
</div>
<div class="uk-container uk-margin-medium">
<p>
This page organizes all corpora which have resulted from or have been used in our research. Their availability for Webis externals is as follows:
(1) corpora that have been officially released by Webis as well as
(2) corpora of the PAN series can be downloaded here,
(3) internal Webis corpora (which will be officially released in the future) are supplied upon request,
(4) other corpora can be downloaded from their original publisher/creator.
Most of our released corpora are hosted at <a title="Download: Zenodo" href="https://zenodo.org/communities/webis">Zenodo <img src="data/img/zenodo-icon.png" alt="(Zenodo)"></a> and are indexed in the <a title="Indexed: Google" href="https://toolbox.google.com/datasetsearch/search?query=webis-data-catalog">Google Dataset Search <img src="data/img/google-icon.png" alt="(Google Dataset Search)"></a>; a few larger corpora are available in the <a title="Internet Archive" href="https://archive.org/details/webis">Internet Archive <img src="data/img/ia-icon.png" alt="(Internet Archive)"></a>; some corpora are accessibly via the <a title="Hugging Face" href="https://huggingface.co/webis">Hugging Face <img src="data/img/huggingface-icon.png" alt="(Huggingface)"></a> and <a title="IR datasets" href="https://ir-datasets.com/index.html">IR datasets <img src="data/img/ir-icon.png" alt="(ir_datasets)"></a> libraries; the <img src="data/img/browser-icon-magnifier.png" alt="Browser"></a> –symbol indicates a browsing facility for the respective corpus.
</p>
<div id="search-control">
<input type="text" class="uk-input" id="filter-field" placeholder="Type here to filter…"/>
</div>
</div>
<div class="uk-container uk-margin-medium webis-data-table">
{% include bib-data.html %}
<div id="filtered-all-message" class="uk-hidden uk-text-muted" aria-hidden="true">
None of our corpora match your filter.
</div>
</div>
</main>
<script src="https://assets.webis.de/js/thirdparty/jquery/jquery.slim.min.js"></script>
<script src="https://assets.webis.de/js/thirdparty/fontawesome/fontawesome.min.js"></script>
<script src="https://assets.webis.de/js/thirdparty/fontawesome/solid.min.js"></script>
<script src="https://assets.webis.de/js/filter.js"></script>
<script src="https://assets.webis.de/js/selection.js"></script>
<script src="https://assets.webis.de/js/tables.js"></script>
<script>initWebisDataFiltering();</script>