Skip to content

Commit

Permalink
V0.18.0 (#100)
Browse files Browse the repository at this point in the history
* link to GitHub profile rather than email

* version bump: v0.18.0

* docs

* docs (closes #94)

* improve readme

closes #91

* CONTRIBUTING.md

closes #90

* rename test.js > schemas.test.js

* Orthography: version bump (v1.2.0)

* Orthography.test.js

* add direction to Orthography

closes #89

* bump schema versions

* Address.type

* Bundle.type

* Language.type

* Lexeme.type

* LexemeReference.type

* Lexicon.type

* Location.type

* Media.type

* Morpheme.type

* Note.type

* Orthography.type

* Person.type

* Phoneme.type

* Add note for Reference.type

* Sentence.type

* Text.type

* Word.type

* add types

closes #88

* version bump: Language v3.0.0

* GeoJSON / Mendeley

- fix person reference in Reference.json
- add GeoJSON schema (for testing purposes)

* Language.names

closes #87

* Url.json > URL.json

reset schema numbering to v1.0.0

* Url > URL

closes #99

* Abbreviation.test.js

* Rename Url.json to URL.json
  • Loading branch information
dwhieb authored Jul 24, 2017
1 parent 67c78eb commit 4a00da7
Show file tree
Hide file tree
Showing 65 changed files with 7,005 additions and 1,051 deletions.
83 changes: 83 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# Contributing Guidelines

:star2: Thank you for contributing to DLx! :star2:

Below is some information on how you can contribute to the DLx data format, and how to get started.

* [General Guidelines for DLx Projects][1]

* [Code of Conduct][2]

## Questions?

Just have a question you need answered? Consider [joining the DLx Slack channel][3], where you can chat with other users and developers about various DLx projects.

You can also ask a question by [opening an issue in this repository][4].

## Suggesting Features

Have a feature you'd like to suggest? The project would especially benefit from the following types of suggestions (other suggestions are fine too):

- Is there some part of the specification that could be structured more simply, without losing any information?

- Is there a use case, property, or other piece of information that linguists commonly use that hasn't been included in the specification?

- Are there any parts of the specification that are unclear or could be made clearer?

- Is there a type of linguistic object or data that isn't included in the specification?

- Can the documentation be improved?

To suggest a feature, simply [open an issue in the GitHub repository for this project][4], and explain your suggestion. You should provide an example in JSON of how your suggestion would work, and if possible a valid [JSON Schema][5] describing the new format.

If you're suggesting an improvement to the documentation, please include the original wording and how you would change it, or at least a short description of what needs to be changed.

When suggesting a feature, think about whether this would require a major, minor, or patch update to the specification (following [semantic versioning principles][6]), and include that in the comments for your issue.

## Reporting Bugs & Other Issues

Found a problem in the specification? [Open an issue on GitHub][4] describing the problem and its severity. Does the issue affect just a single schema, or several schemas? Does the issue have the potential to cause errors in applications using data in this format? Include as much detail as possible.

## Making Pull Requests

If you'd like to make a pull request and contribute to the code for the DLx specification, first open an issue with information about the feature or fix you'd like to make, following the guidelines in the [suggesting features](#suggesting-features) section above. It is a good idea to have your feature request or bug fix request approved by a maintainer before writing any code.

Once an issue is made and has been assigned to you, you can follow the steps in the [maintainer's guidelines][7] to prepare your pull request (not all of the steps may be applicable).

For this project in particular, you will want to include the following steps:

1. Fork the repository and clone it to your computer.

1. Install project dependencies by running `npm install` in the project folder from the command line.

1. Increment the version number in the `version` field of `package.json`.

1. Increment the version number(s) of the particular schemas you'll be updating, if any.

1. Update any existing documentation (e.g. the readme, or the `description` property of a schema) so that it reflects the changes you'll be making.

1. In the `/test` folder, find (or create) the test file for the schema(s) you're making changes to, and update the tests with valid (and invalid) sample data to test against your schema. The tests use the [`ajv` library][7] to validate data against schemas, and are run in Node.

1. Make any necessary changes to the schemas. If adding a new schema, place it in the `/schemas` folder, and name the file with the name of the object, in SnakeCase. Make sure that object schemas have a `type` property.

1. Run `npm test` from the command line to check that the schemas are valid and that your new / updated tests pass.

1. Update any documentation again, if needed.

1. Regenerate the documentation by running `npm run docs` from the command line.

1. Add commit message that closes the related issue (e.g. `closes #167`).

1. Commit and push

1. Open a pull request into the `master` branch, and include the release notes in the comments.

1. Address any changes requested by the code reviewer in your pull request.

[1]: https://github.com/digitallinguistics/digitallinguistics.github.io/blob/master/CONTRIBUTING.md
[2]: https://github.com/digitallinguistics/digitallinguistics.github.io/blob/master/CODE_OF_CONDUCT.md
[3]: https://slack.digitallinguistics.io/
[4]: https://github.com/digitallinguistics/spec/issues/
[5]: http://json-schema.org/
[6]: http://semver.org/
[7]: https://www.npmjs.com/package/ajv
32 changes: 26 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
# The DLx Data Formats
A collection of JSON Schemas for representing scientific linguistic data.
# The DLx Data Format
This project aims to create a standardized, human-readable, web-compatible data format for representing linguistic data, and is aimed at anyone who manages a linguistic database. This repository contains a number of schemas which recommend ways of representing linguistic data in [JSON][23]. Tools which follow this recommended format will be interoperable, allowing users to migrate their data easily from one tool to another. In addition, this format is compatible with the modern web platform, making it easy to manage linguistic data online or in a browser. All Digital Linguistics projects utilize this data format.

* Read the [Introduction](#introduction) below to understand how the format works.

* Read the [Schemas](#schemas) section to get started with using the DLx format in your own projects.

* Want additional help or to talk with other members of the DLx community? [Join the DLx Slack channel][20].

* Need to report a bug or suggest a feature? [Open an issue on GitHub][21].

* Want to contribute to this project? :star2: Awesome! :star2: [Check out the contributing guidelines to get started][22].

[![npm version](https://badge.fury.io/js/%40digitallinguistics%2Fspec.svg)](https://badge.fury.io/js/%40digitallinguistics%2Fspec)
[![Build Status](https://travis-ci.org/digitallinguistics/spec.svg?branch=master)](https://travis-ci.org/digitallinguistics/spec)
Expand All @@ -19,7 +29,7 @@ While humans look at a representation like this and can see which glosses are as

There are many ways a linguist could choose to represent their data in digital form. Not only are many formats are available (a relational database, XML, a tabular spreadsheet, JSON, etc.), but there is significant flexibility in deciding what properties to include in your data and what to call them. For example, does the data about a text have a property specifying the language it was spoken in, and should that property be represented as `"lang"` or `"language"`?

The Digital Linguistics (DLx) project recommends a data format called [**JSON**](http://json.org/) (JavaScript Object Notation) for digitally representing your linguistic data. Moreover, the DLx project has drafted recommendations for how to structure linguistic data using JSON. This recommended format was designed to capture hierarchical linguistic data in a way that aligns with the descriptive categories that linguists actually use, relying on fundamental linguistic notions such as *text*, *morpheme*, *orthography*, etc. For instance, this format is capable of capturing the fact that a text contains sentences, sentences contain words, words contains morphemes, and morphemes contain phonemes. This functionality turns out to be a crucial factor in inputting, editing, searching, and analyzing linguistic data. At the same time, the DLx format is computer-readable, easily searchable, and is natively supported by all modern web-based tools.
The Digital Linguistics (DLx) project recommends a data format called [**JSON**][23] (JavaScript Object Notation) for digitally representing your linguistic data. Moreover, the DLx project has drafted recommendations for how to structure linguistic data using JSON. This recommended format was designed to capture hierarchical linguistic data in a way that aligns with the descriptive categories that linguists actually use, relying on fundamental linguistic notions such as *text*, *morpheme*, *orthography*, etc. For instance, this format is capable of capturing the fact that a text contains sentences, sentences contain words, words contains morphemes, and morphemes contain phonemes. This functionality turns out to be a crucial factor in inputting, editing, searching, and analyzing linguistic data. At the same time, the DLx format is computer-readable, easily searchable, and is natively supported by all modern web-based tools.

The DLx project recommends JSON because it has become the data interchange format for the modern web, and is natively supported by every major programming language. This makes it significantly easier for programmers to develop tools that use the DLx format, meaning that linguists will have a wider variety of options and helpful tools for managing their linguistic data. Moreover, JSON is extremely easy for humans to read. Below is a short phrase represented in JSON. Notice that, even if you don't understand how the format works, you can see the hierarchical relationship between the sentence, its words, and their morphemes, and you know which piece of data belongs to what kind of linguistic object.

Expand Down Expand Up @@ -174,14 +184,19 @@ Schema | Description
[`Person`][7] | Information about a person, e.g. speaker, linguist, editor, translator, etc.
[`Reference`][15] | A bibliographic reference.
[`Tags`][9] | A collection of tags on the given resource. Particularly useful for tagging instances of a phenomenon in your corpora.
[`Url`][16] | A URL.
[`URL`][16] | A URL.

### Using the Schemas
Following the recommended data format in your own project is as easy as making sure you include the required properties in your data, and format them in the recommended ways. For example, if you wish to create a JSON object representing a phrase, you should follow the Sentence schema by making sure you include the `transcription`, `translation`, and `words` properties on the JSON object. And if you want to include additional data, check to see whether there is already a recommended property you can use. For example, if you wish to indicate the time within the audio file that the phrase begins and ends, you would use the `startTime` and `endTime` properties, each of which is a number formatted in seconds and milliseconds (SS.MMM).

Note that most schemas have a strongly-recommended (but optional) `type` property indicating the schema that that object adheres to.

## Want to Contribute?
Check DLx's [general contributing guidelines][18].

## Maintainers
This repo is maintained by:
- Daniel W. Hieber ([dhieber@umail.ucsb.edu](mailto:dhieber@umail.ucsb.edu))
- [Daniel W. Hieber][19]

[1]: http://developer.digitallinguistics.io/spec/schemas/Abbreviation.html
[2]: http://developer.digitallinguistics.io/spec/schemas/Access.html
Expand All @@ -198,6 +213,11 @@ This repo is maintained by:
[13]: http://developer.digitallinguistics.io/spec/schemas/LexemeReference.html
[14]: http://developer.digitallinguistics.io/spec/schemas/MultiLangString.html
[15]: http://developer.digitallinguistics.io/spec/schemas/Reference.html
[16]: http://developer.digitallinguistics.io/spec/schemas/Url.html
[16]: http://developer.digitallinguistics.io/spec/schemas/URL.html
[17]: http://developer.digitallinguistics.io/spec/schemas/Location.html
[18]: https://github.com/digitallinguistics/digitallinguistics.github.io/blob/master/CONTRIBUTING.md
[19]: https://github.com/dwhieb/
[20]: https://slack.digitallinguistics.io/
[21]: https://github.com/digitallinguistics/spec
[22]: https://github.com/digitallinguistics/spec/blob/master/CONTRIBUTING.md
[23]: http://json.org/
35 changes: 31 additions & 4 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,25 @@

<body>

<main class=markdown-body><h1>The DLx Data Formats</h1>
<p>A collection of JSON Schemas for representing scientific linguistic data.</p>
<main class=markdown-body><h1>The DLx Data Format</h1>
<p>This project aims to create a standardized, human-readable, web-compatible data format for representing linguistic data, and is aimed at anyone who manages a linguistic database. This repository contains a number of schemas which recommend ways of representing linguistic data in <a href="http://json.org/">JSON</a>. Tools which follow this recommended format will be interoperable, allowing users to migrate their data easily from one tool to another. In addition, this format is compatible with the modern web platform, making it easy to manage linguistic data online or in a browser. All Digital Linguistics projects utilize this data format.</p>
<ul>
<li>
<p>Read the <a href="#introduction">Introduction</a> below to understand how the format works.</p>
</li>
<li>
<p>Read the <a href="#schemas">Schemas</a> section to get started with using the DLx format in your own projects.</p>
</li>
<li>
<p>Want additional help or to talk with other members of the DLx community? <a href="https://slack.digitallinguistics.io/">Join the DLx Slack channel</a>.</p>
</li>
<li>
<p>Need to report a bug or suggest a feature? <a href="https://github.com/digitallinguistics/spec">Open an issue on GitHub</a>.</p>
</li>
<li>
<p>Want to contribute to this project? :star2: Awesome! :star2: <a href="https://github.com/digitallinguistics/spec/blob/master/CONTRIBUTING.md">Check out the contributing guidelines to get started</a>.</p>
</li>
</ul>
<p><a href="https://badge.fury.io/js/%40digitallinguistics%2Fspec"><img src="https://badge.fury.io/js/%40digitallinguistics%2Fspec.svg" alt="npm version"></a>
<a href="https://travis-ci.org/digitallinguistics/spec"><img src="https://travis-ci.org/digitallinguistics/spec.svg?branch=master" alt="Build Status"></a>
<a href="https://zenodo.org/badge/latestdoi/50221632"><img src="https://zenodo.org/badge/50221632.svg" alt="DOI"></a></p>
Expand Down Expand Up @@ -231,17 +248,27 @@ <h3>Non-Linguistic Schemas</h3>
<td>A collection of tags on the given resource. Particularly useful for tagging instances of a phenomenon in your corpora.</td>
</tr>
<tr>
<td><a href="http://developer.digitallinguistics.io/spec/schemas/Url.html"><code>Url</code></a></td>
<td><a href="http://developer.digitallinguistics.io/spec/schemas/URL.html"><code>URL</code></a></td>
<td>A URL.</td>
</tr>
</tbody>
</table>
<h3>Using the Schemas</h3>
<p>Following the recommended data format in your own project is as easy as making sure you include the required properties in your data, and format them in the recommended ways. For example, if you wish to create a JSON object representing a phrase, you should follow the Sentence schema by making sure you include the <code>transcription</code>, <code>translation</code>, and <code>words</code> properties on the JSON object. And if you want to include additional data, check to see whether there is already a recommended property you can use. For example, if you wish to indicate the time within the audio file that the phrase begins and ends, you would use the <code>startTime</code> and <code>endTime</code> properties, each of which is a number formatted in seconds and milliseconds (SS.MMM).</p>
<p>Note that most schemas have a strongly-recommended (but optional) <code>type</code> property indicating the schema that that object adheres to.</p>
<h2>Want to Contribute?</h2>
<p>Check DLx's <a href="https://github.com/digitallinguistics/digitallinguistics.github.io/blob/master/CONTRIBUTING.md">general contributing guidelines</a>.</p>
<h2>Maintainers</h2>
<p>This repo is maintained by:</p>
<ul>
<li><a href="https://github.com/dwhieb/">Daniel W. Hieber</a></li>
</ul>
</main>

<nav>
<h1>Schemas</h1>
<ul>
<li><a href="schemas/Abbreviation.html">Abbreviation</a></li><li><a href="schemas/Address.html">Address</a></li><li><a href="schemas/Access.html">Access Rights</a></li><li><a href="schemas/DateCreated.html">Date Created</a></li><li><a href="schemas/DateModified.html">Date Modified</a></li><li><a href="schemas/Bundle.html">Bundle</a></li><li><a href="schemas/DateRecorded.html">Date Recorded</a></li><li><a href="schemas/Language.html">Language</a></li><li><a href="schemas/LexemeReference.html">Lexeme Reference</a></li><li><a href="schemas/Lexeme.html">Lexeme</a></li><li><a href="schemas/Lexicon.html">Lexicon</a></li><li><a href="schemas/Media.html">Media File</a></li><li><a href="schemas/Location.html">Location</a></li><li><a href="schemas/Morpheme.html">Morpheme</a></li><li><a href="schemas/MultiLangString.html">Multi-Language Text / String</a></li><li><a href="schemas/Note.html">Note</a></li><li><a href="schemas/Orthography.html">Orthography</a></li><li><a href="schemas/Person.html">Person</a></li><li><a href="schemas/Phoneme.html">Phoneme</a></li><li><a href="schemas/Reference.html">Bibliographic Reference</a></li><li><a href="schemas/Sentence.html">Sentence</a></li><li><a href="schemas/Tags.html">Tags</a></li><li><a href="schemas/Text.html">Text</a></li><li><a href="schemas/Transcription.html">Transcription</a></li><li><a href="schemas/Url.html">URL</a></li><li><a href="schemas/Word.html">Word</a></li>
<li><a href="schemas/Abbreviation.html">Abbreviation</a></li><li><a href="schemas/Address.html">Address</a></li><li><a href="schemas/DateCreated.html">Date Created</a></li><li><a href="schemas/Access.html">Access Rights</a></li><li><a href="schemas/Bundle.html">Bundle</a></li><li><a href="schemas/GeoJSON.html">GeoJSON Object</a></li><li><a href="schemas/DateRecorded.html">Date Recorded</a></li><li><a href="schemas/Language.html">Language</a></li><li><a href="schemas/Lexeme.html">Lexeme</a></li><li><a href="schemas/LexemeReference.html">Lexeme Reference</a></li><li><a href="schemas/DateModified.html">Date Modified</a></li><li><a href="schemas/Lexicon.html">Lexicon</a></li><li><a href="schemas/Location.html">Location</a></li><li><a href="schemas/Media.html">Media File</a></li><li><a href="schemas/Morpheme.html">Morpheme</a></li><li><a href="schemas/MultiLangString.html">Multi-Language Text / String</a></li><li><a href="schemas/Note.html">Note</a></li><li><a href="schemas/Orthography.html">Orthography</a></li><li><a href="schemas/Person.html">Person</a></li><li><a href="schemas/Reference.html">Bibliographic Reference</a></li><li><a href="schemas/Phoneme.html">Phoneme</a></li><li><a href="schemas/Tags.html">Tags</a></li><li><a href="schemas/Sentence.html">Sentence</a></li><li><a href="schemas/Text.html">Text</a></li><li><a href="schemas/Transcription.html">Transcription</a></li><li><a href="schemas/URL.html">URL</a></li><li><a href="schemas/Word.html">Word</a></li>
</ul>
</nav>

Expand Down
Loading

0 comments on commit 4a00da7

Please sign in to comment.