Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix broken headings in Markdown files #5

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,15 @@ David Bamman, Ted Underwood and Noah Smith, "A Bayesian Mixed Effects Model of L
How To Run
=======

####Preliminaries
#### Preliminaries

Download external jars (which are sadly too big for GitHub's 100MB file size limit)

* Download and unzip http://nlp.stanford.edu/software/stanford-corenlp-full-2014-01-04.zip
* copy stanford-corenlp-full-2014-01-04/stanford-corenlp-3.3.1-models.jar to the lib/ folder in the current working directory


####Example
#### Example

From the command line, run the following:

Expand All @@ -40,9 +40,9 @@ This runs the bookNLP pipeline on "Oliver Twist" in the data/originalTexts direc
* data/tokens/dickens.oliver.tokens -> the path to the file where you want the processed text to be stored.
* data/output/dickens -> the path to the output directory you want to write any other diagnostics to.

####Flags
#### Flags

######Required
###### Required

-doc <text> : original text to process

Expand All @@ -51,7 +51,7 @@ This runs the bookNLP pipeline on "Oliver Twist" in the data/originalTexts direc
-p : the directory to write all diagnostic files to. Creates the directory if it does not already exist.


######Optional
###### Optional

-id : a unique book ID for this book (output files include this in the filename)

Expand All @@ -60,7 +60,7 @@ This runs the bookNLP pipeline on "Oliver Twist" in the data/originalTexts direc
-f : force the (slower) syntactic processing of the original text file, even if the <file> in the -tok flag exists (if the -tok <file> exists, the process that would parse the original text to create it is skipped)


####Output
#### Output

The main output here is data/tokens/dickens.oliver.tokens, which contains the original book, one token per line, with part of speech, syntax, NER, coreference and other annotations. The (tab-separated) format is:

Expand Down Expand Up @@ -99,7 +99,7 @@ Training coreference

Coreference only needs to be trained when there's new training data (or new feature ideas: current features are based on syntactic tree distance, linear distance, POS identity, gender matching, quotation scope and salience).

####Data
#### Data

Coreference annotated data is located in the coref/ directory.

Expand All @@ -111,7 +111,7 @@ annotatedData.txt contains coreference annotations, in the (tab-separated) forma

bookIDs are mapped to their respective token files in docPaths.txt. All of these token files are located in finalTokenData/. These tokens files are all read-only -- since the annotations are keyed to specific token IDs in those files, we want to make sure they stay permanent.

####Training a model
#### Training a model

Given the coref/ folder above, train new coreference weights with:

Expand Down