Skip to content

Latest commit

 

History

History
108 lines (57 loc) · 4.59 KB

tutorial.md

File metadata and controls

108 lines (57 loc) · 4.59 KB

Tutorial

This tutorial explains the basic concepts in the NLP editor. The flow created in this tutorial can be imported from sample-flows/tutorial-flow.json and can be executed by uploading the text file [4Q2006.txt](./sample-data/revenue by division/financial statements/4Q2006.txt) into the Input Document .

Set up the input document

Under Extractors, drag and drop Input Documents on the canvas. Configure with document 4Q2006.txt. Click Upload, then Close.

Setting up an input document for testing during development

Create a dictionary of division names

Under Extractors, drag Dictionary on the canvas. Connect its input to the output of Input Documents. Rename the node to Division and enter the terms: Software, Global Business Services, and Global Technology Services. Click Save.

Creating a dictionary of division names

Run the dictionary and see results highlighted

Select the Division node, and click Run.

Running the dictionary and seeing results highlighted in the input text

Create a second dictionary of metric names

Similar to the prior step, create a dictionary called Metric with a single term revenue. Select Ignore case and Lemma Match. Don't forget to click Save.

Creating a dictionary of metrics

Create a second dictionary of prepositions

Create a dictionary Preposition with terms for, and from. Select Ignore case. Click Save.

Creating a dictionary of prepositions

Create a sequence for "division revenue"

Create a sequence that identifies text such as "Software revenues". Under Generation, drag and drop Sequence to the canvas. Connect its input with the outputs of nodes Division and Metric. Open the sequence, rename it to RevenueOfDivision1 and write (<Division.Division>)<Token>{0,2}(<Metric.Metric>) under Sequence Pattern. Click Save. Run the sequence to see results.

Creating a sequence

Running a sequence

Create a sequence for "revenue from a division"

  1. Create another sequence called RevenueOfDivision2 to identify text such as "revenues from Software". Connect its input to the output of nodes Metric, Preposition, and Division. Modify the Sequence Pattern as: (<Metric.Metric>)<Token>{0,1}(<Preposition.Preposition>)<Token>{0,2}(<Division.Division>). Note: the order in which you connect the inputs of the sequence dictates the initial sequence pattern filled in by default.

Click Save and Run.

Create a second sequence

Running the sequence

Create a union

Under Generation, drag Union to the canvas. Connect its inputs to the outputs of RevenueOfDivision1 and RevenueOfDivision2. Rename the union to RevenueOfDivision. Click Close and Run. You will see 6 results: one result from RevenueOfDivision1, and five results RevenueOfDivision2.

Create a union

Running a union

Create a regular expression to capture currency amounts

Under Extractors, drag ReGex to the canvas. Name it Amount and specify the regular expression as \d+(\.\d+)?\s+billion. Click Save, then Run. The regular expression captures mentions of currency amounts.

Creating a regular expression

Running a regular expression

Create a sequence to combine the division, metric and amount

Create a sequence called RevenueByDivision and specify the pattern as (<RevenueOfDivision.RevenueOfDivision>)<Token>{0,35}(<Amount.Amount>). Click Save.

Combining division, metric and amount