From 5cd28fb0eaa137314f11401da382b13e99273928 Mon Sep 17 00:00:00 2001 From: Gabe Fierro Date: Tue, 27 Aug 2024 14:41:48 -0600 Subject: [PATCH 1/9] add explanation of point label parsing --- docs/_toc.yml | 1 + docs/explanations/point-label-parsing.md | 143 +++++++++++++++++++++++ 2 files changed, 144 insertions(+) create mode 100644 docs/explanations/point-label-parsing.md diff --git a/docs/_toc.yml b/docs/_toc.yml index 7410c2569..6b81cfa82 100644 --- a/docs/_toc.yml +++ b/docs/_toc.yml @@ -27,6 +27,7 @@ parts: - file: explanations/templates.md - file: explanations/shapes-and-templates.md - file: explanations/shacl_to_sparql.md + - file: explanations/point-label-parsing.md - caption: Appendix chapters: - file: bibliography.md diff --git a/docs/explanations/point-label-parsing.md b/docs/explanations/point-label-parsing.md new file mode 100644 index 000000000..442f8e4de --- /dev/null +++ b/docs/explanations/point-label-parsing.md @@ -0,0 +1,143 @@ +# Point Label Parsing + +One common source of building metadata are the "point labels" used in building management systems to label or tag the input/output points with some human-readable description. +It is often useful to extract structured information from these labels to help with constructing a semantic model of the building. + +BuildingMOTIF provides a framework for defining point label naming conventions and parsing them into structured data. +The output of this process is a set of typed `Token`s which can be input into a "Semantic Graph Synthesis" process to generate a semantic model of the building. + +This article describes the framework for defining point label parsing rules and provides examples of how to use it. + +```{admonition} Semantic Graph Synthesis +This feature is coming soon! This label parsing framework is just part of the larger BuildingMOTIF toolkit for generating semantic models of buildings. +``` + +## Background on Parser Combinators + +The point label parsing framework in BuildingMOTIF is based on the concept of "parser combinators". +Parser combinators are a way of defining parsers by combining smaller parsers together. +In BuildingMOTIF, the "combinators" are defined as Python functions which take a string as input and return a list of `Token`s. +These combinators can be combined together to create more complex parsers. + +Here is a short example: + +```python +def parse_ahu_label(label: str) -> List[TokenResult]: + return sequence( + string("AHU", Constant(BRICK.Air_Handling_Unit)), + string("-", Delimiter), + regex(r"\d+", Identifier) + )(label) +``` + +This defines a parser which matches strings like "AHU-1" or "AHU-237" and returns a list of `Token`s. +The `sequence` combinator combines the three parsers together, and the `string` and `regex` combinators match specific strings or regular expressions. + +Using parser combinators in this way allows you to define complex parsing rules in a concise and readable way. + +The example output of the `parse_ahu_label` function might look like this: + +```python +parse_ahu_label("AHU-1") +# [TokenResult(value='AHU', token=Constant(value=rdflib.term.URIRef('https://brickschema.org/schema/Brick#Air_Handling_Unit')), length=3, error=None, id=None), +# TokenResult(value='-', token=Delimiter(value='-'), length=1, error=None, id=None), +# TokenResult(value='1', token=Identifier(value='1'), length=1, error=None, id=None)] + +parse_ahu_label("AH-1") +# [TokenResult(value=None, token=Null(value=None), length=0, error='Expected AHU, got AH-', id=None)] +``` + +## BuildingMOTIF Parser Combinators + +The `buildingmotif.label_parsing.combinators` module provides a set of parser combinators for defining point label parsing rules. +Here are some of the most commonly used combinators: + +- `string`: Matches a specific string and returns a `Token` with a constant value. +- `regex`: Matches a regular expression and returns a `Token` with the matched value. +- `choice`: Matches one of a list of parsers. Uses the first one that matches. +- `sequence`: Matches a sequence of parsers and returns a list of `Token`s. +- `constant`: Returns a `Token` with a constant value. Does not consume any input. +- `many`: Matches zero or more occurrences of a parser. +- `maybe`: Matches zero or one occurrence of a parser. +- `until`: Matches a parser until another parser is matched. + + +### Defining New Combinators + +These are all just Python functions, so you can define your own combinators as needed. + +```python +delimiters = regex(r"[._:/\- ]", Delimiter) +identifier = regex(r"[a-zA-Z0-9]+", Identifier) +named_equip = sequence(equip_abbreviations, maybe(delimiters), identifier) +named_point = sequence(point_abbreviations, maybe(delimiters), identifier) +``` + +More generally, a combinator is any function which takes a string as input and returns a list of `TokenResult`s. +The methods above (`regex`, `sequence`, `delimiters`) are functions which *return* a combinator as an argument. + +### Abbreviations + +Abbreviations are a common feature of point labels. +Strings like "AHU" for "Air Handling Unit" or "VAV" for "Variable Air Volume" are often used to save space on labels. +You can use the `abbreviations` combinator to define a set of abbreviations and automatically expand them in the input string. + +We can define a dictionary of abbreviations like this: + +```python +my_abbreviations = { + "AHU": BRICK.Air_Handling_Unit, + "FCU": BRICK.Fan_Coil_Unit, + "VAV": BRICK.Variable_Air_Volume_Box, + "CRAC": BRICK.Computer_Room_Air_Conditioner, + "HX": BRICK.Heat_Exchanger, + "PMP": BRICK.Pump, + "RVAV": BRICK.Variable_Air_Volume_Box_With_Reheat, + "HP": BRICK.Heat_Pump, + "RTU": BRICK.Rooftop_Unit, + "DMP": BRICK.Damper, + "STS": BRICK.Status, + "VLV": BRICK.Valve, + "CHVLV": BRICK.Chilled_Water_Valve, + "HWVLV": BRICK.Hot_Water_Valve, + "VFD": BRICK.Variable_Frequency_Drive, + "CT": BRICK.Cooling_Tower, + "MAU": BRICK.Makeup_Air_Unit, + "R": BRICK.Room, +} + +my_abbreviations_parser = abbreviations(my_abbreviations) +``` + +Then we can use `my_abbreviations_parser` in our label parsing rules to automatically expand abbreviations. +Note how the key of the `my_abbreviations` dictionary is the abbreviation and the value is the RDF Brick class that the abbreviation expands to. + +To expand our earlier example to work for other abbreviations, we can rewrite the parser like this: + +```python +def parse_label(label: str) -> List[TokenResult]: + return sequence( + my_abbreviations_parser, + string("-", Delimiter), + regex(r"\d+", Identifier) + )(label) + +parse_label("AHU-1") +# [TokenResult(value='AHU', token=Constant(value=rdflib.term.URIRef('https://brickschema.org/schema/Brick#Air_Handling_Unit')), length=3, error=None, id=None), +# TokenResult(value='-', token=Delimiter(value='-'), length=1, error=None, id=None), +# TokenResult(value='1', token=Identifier(value='1'), length=1, error=None, id=None)] + +parse_label("FCU-1") +# [TokenResult(value='FCU', token=Constant(value=rdflib.term.URIRef('https://brickschema.org/schema/Brick#Fan_Coil_Unit')), length=3, error=None, id=None), +# TokenResult(value='-', token=Delimiter(value='-'), length=1, error=None, id=None), +# TokenResult(value='123', token=Identifier(value='123'), length=3, error=None, id=None)] + +parse_label("AH-1") +# [TokenResult(value=None, token=Null(value=None), length=0, error='Expected +# AHU, got AH- | Expected FCU, got AH- | Expected VAV, got AH- | Expected CRAC, +# got AH-3 | Expected HX, got AH | Expected PMP, got AH- | Expected RVAV, got +# AH-3 | Expected HP, got AH | Expected RTU, got AH- | Expected DMP, got AH- | +# Expected STS, got AH- | Expected VLV, got AH- | Expected CHVLV, got AH-3 | +# Expected HWVLV, got AH-3 | Expected VFD, got AH- | Expected CT, got AH | +# Expected MAU, got AH- | Expected R, got A', id=None)] +``` From 8b122eedb844ea5ac7bf47e38feb0ab3d45ea79f Mon Sep 17 00:00:00 2001 From: Gabe Fierro Date: Tue, 27 Aug 2024 14:44:57 -0600 Subject: [PATCH 2/9] add example from notebook --- docs/explanations/point-label-parsing.md | 59 ++++++++++++++++++++++++ 1 file changed, 59 insertions(+) diff --git a/docs/explanations/point-label-parsing.md b/docs/explanations/point-label-parsing.md index 442f8e4de..20900fd64 100644 --- a/docs/explanations/point-label-parsing.md +++ b/docs/explanations/point-label-parsing.md @@ -141,3 +141,62 @@ parse_label("AH-1") # Expected HWVLV, got AH-3 | Expected VFD, got AH- | Expected CT, got AH | # Expected MAU, got AH- | Expected R, got A', id=None)] ``` + +## Example + +Consider these point labels: + +``` +:BuildingName_02:FCU503_ChwVlvPos +:BuildingName_02:FCU510_EffOcc +:BuildingName_02:FCU507_UnoccHtgSpt +:BuildingName_02:FCU415_UnoccHtgSpt +:BuildingName_01:FCU203_OccClgSpt +:BuildingName_02:FCU529_UnoccHtgSpt +:BuildingName_01:FCU243_EffOcc +:BuildingName_01:FCU362_ChwVlvPos +``` + +We can define a set of parsing rules to extract structured data from these labels. +This is essentially just an expression of the building point naming convention. + +```python +equip_abbreviations = abbreviations(COMMON_EQUIP_ABBREVIATIONS_BRICK) +# define our own for Points (specific to this building) +point_abbreviations = abbreviations({ + "ChwVlvPos": BRICK.Position_Sensor, + "HwVlvPos": BRICK.Position_Sensor, + "RoomTmp": BRICK.Air_Temperature_Sensor, + "Room_RH": BRICK.Relative_Humidity_Sensor, + "UnoccHtgSpt": BRICK.Unoccupied_Air_Temperature_Heating_Setpoint, + "OccHtgSpt": BRICK.Occupied_Air_Temperature_Heating_Setpoint, + "UnoccClgSpt": BRICK.Unoccupied_Air_Temperature_Cooling_Setpoint, + "OccClgSpt": BRICK.Occupied_Air_Temperature_Cooling_Setpoint, + "SaTmp": BRICK.Supply_Air_Temperature_Sensor, + "OccCmd": BRICK.Occupancy_Command, + "EffOcc": BRICK.Occupancy_Status, +}) + +def custom_parser(target): + return sequence( + string(":", Delimiter), + # regex until the underscore + constant(Constant(BRICK.Building)), + regex(r"[^_]+", Identifier), + string("_", Delimiter), + # number for AHU name + constant(Constant(BRICK.Air_Handling_Unit)), + regex(r"[0-9a-zA-Z]+", Identifier), + string(":", Delimiter), + # equipment types + equip_abbreviations, + # equipment ident + regex(r"[0-9a-zA-Z]+", Identifier), + string("_", Delimiter), + maybe( + sequence(regex(r"[A-Z]+[0-9]+", Identifier), string("_", Delimiter)), + ), + # point types + point_abbreviations, + )(target) +``` From bc9ca4d6c13bdd05da01aca767823fd3a693126f Mon Sep 17 00:00:00 2001 From: Gabe Fierro Date: Tue, 27 Aug 2024 14:46:19 -0600 Subject: [PATCH 3/9] add note on error handling --- docs/explanations/point-label-parsing.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/docs/explanations/point-label-parsing.md b/docs/explanations/point-label-parsing.md index 20900fd64..67e6ea0fe 100644 --- a/docs/explanations/point-label-parsing.md +++ b/docs/explanations/point-label-parsing.md @@ -142,6 +142,14 @@ parse_label("AH-1") # Expected MAU, got AH- | Expected R, got A', id=None)] ``` +### Error Handling + +The parser combinators in BuildingMOTIF provide detailed error messages when a parsing rule fails. +This can be useful for debugging and understanding why a particular label did not match the expected format. +The error messages include information about what was expected and what was found in the input string. + +If any `TokenResult` in the list has an `error` field, it means that the parsing rule failed at that point. + ## Example Consider these point labels: From 88367c8672a3230b1ce064e66aa2d0599c0a5597 Mon Sep 17 00:00:00 2001 From: Gabe Fierro Date: Tue, 27 Aug 2024 15:50:05 -0600 Subject: [PATCH 4/9] add links to a few classes --- docs/explanations/point-label-parsing.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/explanations/point-label-parsing.md b/docs/explanations/point-label-parsing.md index 67e6ea0fe..776095e14 100644 --- a/docs/explanations/point-label-parsing.md +++ b/docs/explanations/point-label-parsing.md @@ -4,7 +4,7 @@ One common source of building metadata are the "point labels" used in building m It is often useful to extract structured information from these labels to help with constructing a semantic model of the building. BuildingMOTIF provides a framework for defining point label naming conventions and parsing them into structured data. -The output of this process is a set of typed `Token`s which can be input into a "Semantic Graph Synthesis" process to generate a semantic model of the building. +The output of this process is a set of typed Token objects which can be input into a "Semantic Graph Synthesis" process to generate a semantic model of the building. This article describes the framework for defining point label parsing rules and provides examples of how to use it. @@ -16,9 +16,10 @@ This feature is coming soon! This label parsing framework is just part of the la The point label parsing framework in BuildingMOTIF is based on the concept of "parser combinators". Parser combinators are a way of defining parsers by combining smaller parsers together. -In BuildingMOTIF, the "combinators" are defined as Python functions which take a string as input and return a list of `Token`s. +In BuildingMOTIF, the "combinators" are defined as Python functions which take a string as input and return a list of TokenResults. These combinators can be combined together to create more complex parsers. + Here is a short example: ```python From 6facf6045c9a149517728566af902bae4f436bb6 Mon Sep 17 00:00:00 2001 From: Gabe Fierro Date: Wed, 11 Sep 2024 13:38:08 -0600 Subject: [PATCH 5/9] Update docs/explanations/point-label-parsing.md Co-authored-by: Matt Steen --- docs/explanations/point-label-parsing.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/explanations/point-label-parsing.md b/docs/explanations/point-label-parsing.md index 776095e14..3eebc2a06 100644 --- a/docs/explanations/point-label-parsing.md +++ b/docs/explanations/point-label-parsing.md @@ -1,12 +1,12 @@ # Point Label Parsing -One common source of building metadata are the "point labels" used in building management systems to label or tag the input/output points with some human-readable description. +The purpose of this explanation is to describe the framework for defining point label parsing rules and provide examples of how to use it. + +One common source of building metadata are the "point labels" used in building management systems to label or tag the input and output data points with some human-readable description. It is often useful to extract structured information from these labels to help with constructing a semantic model of the building. BuildingMOTIF provides a framework for defining point label naming conventions and parsing them into structured data. -The output of this process is a set of typed Token objects which can be input into a "Semantic Graph Synthesis" process to generate a semantic model of the building. - -This article describes the framework for defining point label parsing rules and provides examples of how to use it. +The output of this process is a set of typed Token objects that can be input into a "Semantic Graph Synthesis" process to generate a semantic model of the building. ```{admonition} Semantic Graph Synthesis This feature is coming soon! This label parsing framework is just part of the larger BuildingMOTIF toolkit for generating semantic models of buildings. From 12b0d7076ba544708901b88f1a13e11495ef544e Mon Sep 17 00:00:00 2001 From: Gabe Fierro Date: Wed, 11 Sep 2024 13:38:26 -0600 Subject: [PATCH 6/9] Update docs/explanations/point-label-parsing.md Co-authored-by: Matt Steen --- docs/explanations/point-label-parsing.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/explanations/point-label-parsing.md b/docs/explanations/point-label-parsing.md index 3eebc2a06..494cd55ad 100644 --- a/docs/explanations/point-label-parsing.md +++ b/docs/explanations/point-label-parsing.md @@ -12,11 +12,11 @@ The output of this process is a set of typed TokenResults. +In BuildingMOTIF, the "combinators" are defined as Python functions that take a string as input and return a list of TokenResults. These combinators can be combined together to create more complex parsers. From 5e04360e58e8791b9acf83c85c817f087a423291 Mon Sep 17 00:00:00 2001 From: Gabe Fierro Date: Wed, 11 Sep 2024 13:38:52 -0600 Subject: [PATCH 7/9] Update docs/explanations/point-label-parsing.md Co-authored-by: Matt Steen --- docs/explanations/point-label-parsing.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/explanations/point-label-parsing.md b/docs/explanations/point-label-parsing.md index 494cd55ad..ac62e2980 100644 --- a/docs/explanations/point-label-parsing.md +++ b/docs/explanations/point-label-parsing.md @@ -31,9 +31,8 @@ def parse_ahu_label(label: str) -> List[TokenResult]: )(label) ``` -This defines a parser which matches strings like "AHU-1" or "AHU-237" and returns a list of `Token`s. +This defines a parser that matches strings like "AHU-1" or "AHU-237" and returns a list of `Token`s. The `sequence` combinator combines the three parsers together, and the `string` and `regex` combinators match specific strings or regular expressions. - Using parser combinators in this way allows you to define complex parsing rules in a concise and readable way. The example output of the `parse_ahu_label` function might look like this: From 847f168751ecabf1ed6aa238c2a9149d0fa79807 Mon Sep 17 00:00:00 2001 From: Gabe Fierro Date: Wed, 11 Sep 2024 13:38:57 -0600 Subject: [PATCH 8/9] Update docs/explanations/point-label-parsing.md Co-authored-by: Matt Steen --- docs/explanations/point-label-parsing.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/explanations/point-label-parsing.md b/docs/explanations/point-label-parsing.md index ac62e2980..7ba991ca2 100644 --- a/docs/explanations/point-label-parsing.md +++ b/docs/explanations/point-label-parsing.md @@ -47,7 +47,7 @@ parse_ahu_label("AH-1") # [TokenResult(value=None, token=Null(value=None), length=0, error='Expected AHU, got AH-', id=None)] ``` -## BuildingMOTIF Parser Combinators +## Parser Combinators The `buildingmotif.label_parsing.combinators` module provides a set of parser combinators for defining point label parsing rules. Here are some of the most commonly used combinators: From 94aafb09d8447d859a4f3810fccec93d2f73e0ba Mon Sep 17 00:00:00 2001 From: Gabe Fierro Date: Wed, 11 Sep 2024 13:39:17 -0600 Subject: [PATCH 9/9] Update docs/explanations/point-label-parsing.md Co-authored-by: Matt Steen --- docs/explanations/point-label-parsing.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/explanations/point-label-parsing.md b/docs/explanations/point-label-parsing.md index 7ba991ca2..5e969f9b0 100644 --- a/docs/explanations/point-label-parsing.md +++ b/docs/explanations/point-label-parsing.md @@ -73,8 +73,8 @@ named_equip = sequence(equip_abbreviations, maybe(delimiters), identifier) named_point = sequence(point_abbreviations, maybe(delimiters), identifier) ``` -More generally, a combinator is any function which takes a string as input and returns a list of `TokenResult`s. -The methods above (`regex`, `sequence`, `delimiters`) are functions which *return* a combinator as an argument. +More generally, a combinator is any function that takes a string as input and returns a list of `TokenResult`s. +The methods above (`regex`, `sequence`, `delimiters`) are functions that *return* a combinator as an argument. ### Abbreviations