Skip to content

Commit

Permalink
string-value creating sub-pipelines (#1)
Browse files Browse the repository at this point in the history
⚠️ CONTAINS BREAKING CHANGE ⚠️

For all commands creating textual content (`SET-TEXT-CONTENT`,
`ADD-TEXT-CONTENT`, `ADD-COMMENT`, `SET-ATTR`) instead of only passing a
pre-defined string value now also sub-pipelines can be used to read that
string value from some place. E.g. read it from a (different) attribute
of a (different) element or read the text content.

- first step of such sub-pipeline always selects element to read the
value from

- `USE-ELEMENT`: take the target element and run value extraction on
that
- `USE-PARENT`: take the parent of the target element and run value
extraction on that
- `QUERY-ELEMENT`: run a CSS selector query on the target element and
run value extraction on all matches
- `QUERY-PARENT`: run a CSS selector query on the parent of the target
element and run value extraction on all matches
- `QUERY-ROOT`: run a CSS selector query on the root of the target
element and run value extraction on all matches

- second step of such sub-pipeline always defines which value to read
from selected element (attr, text-content)

  - `GET-ATTR`: read an attribute of the selected element
- `GET-TEXT-CONTENT`: read the
[textContent](https://developer.mozilla.org/en-US/docs/Web/API/Node/textContent)
of the selected element

- future additions:
- value-manipulating steps will be possible (up-casing, down-casing,
regex-based replace)
  - _maybe_: a "quote"-like command to read text content from a file

BREAKING CHANGES:
- Rename `ONLY` to `EXTRACT-ELEMENT` (kept old name as alias, old alias
is removed)
- Rename `WITHOUT` to `REMOVE-ELEMENT` (kept old name as alias, old
alias is removed)
- Renamed alias `FOR` to `WITH` for `FOR-EACH`
- Renamed `READ-FROM` to `LOAD-FILE`
  • Loading branch information
kelko authored Oct 12, 2022
1 parent d5c22ba commit 911cbe9
Show file tree
Hide file tree
Showing 31 changed files with 2,422 additions and 927 deletions.
6 changes: 3 additions & 3 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "html-streaming-editor"
version = "0.4.2"
version = "0.5.0"
edition = "2021"
authors = [":kelko: <kelko@me.com>"]
repository = "https://github.com/kelko/html-streaming-editor"
Expand All @@ -15,11 +15,11 @@ keywords = ["html"]

[dependencies]
peg = "0.8.0"
tl = "0.7.6"
tl = "0.7.7"
snafu = { version = "0.7", features = ["backtraces"] }
clap = { version = "4.0.9", features = ["derive"] }
exitcode = "1.1.2"
log = "0.4"
pretty_env_logger = "0.4.0"
rctree = "0.4.0"
rctree = "0.5.0"
html-escape = "0.2.11"
39 changes: 32 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,24 @@ Some `COMMAND` use sub-pipelines. There are two kind of `COMMANDS` with this:
The `SELECTOR` is a [CSS selector](https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors).

Pipeline Types
-----------------

There are three types of pipelines:

- element processing pipeline: The default. You have some input HTML which you run through the pipeline
- element creating sub-pipeline: special sub-pipeline wherever a commands adds one or more elements into the HTML tree (or into a different place of said tree)
- string value creating sub-pipeline: special sub-pipeline wherever a commands set a string value (text content, comment, attribute value)


Commands
-------------

Currently supported:

- `ONLY`: remove everything not matching the CSS selector (alias: `SELECT`)
- `WITHOUT`: remove everything matching the CSS selector (alias: `FILTER`)
- `FOR-EACH`: run a sub-pipeline on all sub-elements matching a CSS selector but return the previously selected elements (alias: `FOR`)
- `EXTRACT-ELEMENT`: remove everything not matching the CSS selector (alias: `ONLY`)
- `REMOVE-ELEMENT`: remove everything matching the CSS selector (alias: `WITHOUT`)
- `FOR-EACH`: run a sub-pipeline on all sub-elements matching a CSS selector but return the previously selected elements (alias: `WITH`)
- `CLEAR-ATTR`: removes a given attribute from the previously selected elements
- `CLEAR-CONTENT`: clears all children from the previously selected elements
- `SET-ATTR`: Sets a given attribute to a specified value
Expand All @@ -49,7 +59,22 @@ Currently supported:
- `ADD-ELEMENT`: appends a new tag/element child
- `REPLACE`: replace all elements matching a CSS selector with new elements (alias: `MAP`)
- `CREATE-ELEMENT`: creates a new, empty element, mainly in combination with `ADD-ELEMENT` or `REPLACE` (alias: `NEW`)
- `READ-FROM`: reads a DOM from a different file, mainly in combination with `ADD-ELEMENT` or `REPLACE` (alias: `SOURCE`)
- `LOAD-FILE`: reads a DOM from a different file, mainly in combination with `ADD-ELEMENT` or `REPLACE` (alias: `SOURCE`)
- `QUERY-REPLACED`: returns children matching the CSS selector of those elements meant to be replaced, only combination with or `REPLACE` (alias: `KEEP`)
- `USE-ELEMENT`: returns the currently selected element for a sub-pipeline, mainly in combination with "string value producing pipelines" (alias: `THIS`)
- `USE-PARENT`: returns the parent of the currently selected element for a sub-pipeline, mainly in combination with "string value producing pipelines" (alias: `PARENT`)
- `QUERY-ELEMENT`: runs a query on the currently selected element for a sub-pipeline, without detaching target element from HTML tree unlike `EXTRACT-ELEMENT`
- `QUERY-PARENT`: runs a query on the parent of the currently selected element for a sub-pipeline, without detaching target element from HTML tree unlike `EXTRACT-ELEMENT`
- `QUERY-ROOT`: runs a query on the root of the currently selected element for a sub-pipeline, without detaching target element from HTML tree unlike `EXTRACT-ELEMENT`
- `GET-ATTR`: returns the value of an attribute of the currently selected element for a string-value producing pipelines
- `GET-TEXT-CONTENT`: returns the text content of the currently selected element for a string-value producing pipelines

Not Yet implemented:

- `TO-LOWER`: all-lower the current string value of the pipeline
- `TO-UPPER`: all-caps the current string value of the pipeline
- `REGEX-REPLACE`: runs a RegEx-based value replacements on the current string value of the pipeline


Binary
-------
Expand Down Expand Up @@ -81,11 +106,11 @@ hse -i index.html 'ONLY{main .content}'
hse -i index.html 'ONLY{main, .main} | WITHOUT{script}'

# replaces all elements with `placeholder` class with the <div class="content"> from a second HTML file
hse -i index.html 'REPLACE{.placeholder ↤ READ-FROM{"other.html"} | ONLY{div.content} }'
hse -i index.html 'REPLACE{.placeholder ↤ SOURCE{"other.html"} | ONLY{div.content} }'

# add a new <meta name="version" value=""> element to <head> with git version info
hse -i index.html "FOR{head ↦ ADD-ELEMENT{ CREATE-ELEMENT{meta} | SET-ATTR{name ↤ 'version'} | SET-ATTR{content ↤ '`git describe --tags`'} } }"
hse -i index.html "WITH{head ↦ ADD-ELEMENT{ NEW{meta} | SET-ATTR{name ↤ 'version'} | SET-ATTR{content ↤ '`git describe --tags`'} } }"

# add a new comment to <body> with git version info
hse -i index.html "FOR{body ↦ ADD-COMMENT{'`git describe --tags`'}}"
hse -i index.html "WITH{body ↦ ADD-COMMENT{'`git describe --tags`'}}"
```
9 changes: 1 addition & 8 deletions src/css/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,7 @@ impl<'a> CssSelectorStep<'a> {
pub struct CssSelectorPath<'a>(Vec<CssSelectorStep<'a>>);

impl<'a> CssSelectorPath<'a> {
#[cfg(test)]
pub fn single(step: CssSelector<'a>) -> Self {
CssSelectorPath(vec![CssSelectorStep::start(step)])
}
Expand All @@ -268,10 +269,6 @@ impl<'a> CssSelectorPath<'a> {
CssSelectorPath(list)
}

pub fn as_vec(&self) -> Vec<CssSelectorStep<'a>> {
return self.0.clone();
}

pub(crate) fn query(
&self,
start: &Vec<rctree::Node<HtmlContent>>,
Expand Down Expand Up @@ -351,10 +348,6 @@ impl<'a> CssSelectorList<'a> {
CssSelectorList(content)
}

pub fn as_vec(&self) -> Vec<CssSelectorPath<'a>> {
return self.0.clone();
}

pub(crate) fn query(
&self,
start: &Vec<rctree::Node<HtmlContent>>,
Expand Down
215 changes: 215 additions & 0 deletions src/element_creating/command.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
use crate::html::HtmlTag;
use crate::{load_html_file, CommandError, CssSelectorList, HtmlContent};
use log::trace;

#[derive(Debug, PartialEq, Clone)]
pub enum ElementCreatingCommand<'a> {
/// creates an HTML element of given type
/// Returns the created element as result.
CreateElement(&'a str),
/// reads a different file into memory
/// Returns the content of that file as result.
FromFile(&'a str),
/// Starting at the element being replaced run a sub-query
/// Returns all sub-elements that match the given CSS selector.
FromReplaced(CssSelectorList<'a>),
}

impl<'a> ElementCreatingCommand<'a> {
/// perform the action defined by the command on the set of nodes
/// and return the calculated results.
/// For some command the output can be equal to the input,
/// others change the result-set
pub(crate) fn execute(
&self,
input: &Vec<rctree::Node<HtmlContent>>,
) -> Result<Vec<rctree::Node<HtmlContent>>, CommandError> {
match self {
ElementCreatingCommand::CreateElement(element_name) => {
Self::create_element(element_name)
}
ElementCreatingCommand::FromFile(file_path) => Self::load_file(file_path),
ElementCreatingCommand::FromReplaced(selector) => Self::query_replaced(input, selector),
}
}

fn create_element(name: &str) -> Result<Vec<rctree::Node<HtmlContent>>, CommandError> {
trace!("Running CREATE-ELEMENT command using name: {:#?}", name);

Ok(vec![rctree::Node::new(HtmlContent::Tag(HtmlTag::of_name(
name.clone(),
)))])
}

fn load_file(file_path: &str) -> Result<Vec<rctree::Node<HtmlContent>>, CommandError> {
trace!("Running LOAD-FILE command using file: {:#?}", file_path);

let root_element = load_html_file(file_path)?;
Ok(vec![root_element.make_deep_copy()])
}

fn query_replaced(
input: &Vec<rctree::Node<HtmlContent>>,
selector: &CssSelectorList<'a>,
) -> Result<Vec<rctree::Node<HtmlContent>>, CommandError> {
trace!("Running QUERY-REPLACED command");
Ok(selector
.query(input)
.iter()
.map(|e| rctree::Node::clone(e).make_deep_copy())
.collect::<Vec<_>>())
}
}

#[cfg(test)]
mod tests {
use crate::element_creating::ElementCreatingCommand;
use crate::html::HtmlTag;
use crate::{
load_inline_html, CssSelector, CssSelectorList, CssSelectorPath, HtmlContent,
HtmlRenderable,
};
use std::collections::BTreeMap;

#[test]
fn create_element_builds_new_element_on_empty_input() {
let command = ElementCreatingCommand::CreateElement("div");

let mut result = command.execute(&vec![]).unwrap();

assert_eq!(result.len(), 1);

let first_result = result.pop().unwrap();
let first_result = first_result.borrow();
assert_eq!(*first_result, HtmlContent::Tag(HtmlTag::of_name("div")));
}

#[test]
fn create_element_builds_new_element_ignoring_input() {
let command = ElementCreatingCommand::CreateElement("div");

let root = rctree::Node::new(HtmlContent::Tag(HtmlTag::of_name("html")));

let mut result = command.execute(&vec![root]).unwrap();

assert_eq!(result.len(), 1);

let first_result = result.pop().unwrap();
let first_result = first_result.borrow();
assert_eq!(*first_result, HtmlContent::Tag(HtmlTag::of_name("div")));
}

#[test]
fn load_file_read_file_content() {
let command = ElementCreatingCommand::FromFile("tests/source.html");
let mut result = command.execute(&vec![]).unwrap();

assert_eq!(result.len(), 1);

let first_result = result.pop().unwrap();
assert_eq!(
first_result.outer_html(),
r#"<html lang="en">
<head>
<meta charset="UTF-8">
<title>LOAD-FILE Source</title>
</head>
<body>
<div>Some other stuff</div>
<ul id="first">
<li>1</li>
<li>2</li>
<li>3</li>
</ul>
<ul id="second">
<li>a</li>
<li><!-- Some Comment -->b</li>
<li><em class="intense">c</em></li>
</ul>
<!-- not taken into account -->
</body>
</html>"#
);
}

#[test]
fn query_replaced_returns_matching_descendent_of_input() {
let command = ElementCreatingCommand::FromReplaced(CssSelectorList::new(vec![
CssSelectorPath::single(CssSelector::for_class("test-source")),
]));
let root = load_inline_html(
r#"<div id="replaced"><p class="first"></p><aside class="test-source"></aside></div>"#,
);

let mut result = command.execute(&vec![root]).unwrap();

assert_eq!(result.len(), 1);

let first_result = result.pop().unwrap();
let first_result = first_result.borrow();
assert_eq!(
*first_result,
HtmlContent::Tag(HtmlTag {
name: String::from("aside"),
attributes: BTreeMap::<String, String>::from([(
String::from("class"),
String::from("test-source")
)])
})
);
}

#[test]
fn query_replaced_returns_all_matching_descendents_of_input() {
let command = ElementCreatingCommand::FromReplaced(CssSelectorList::new(vec![
CssSelectorPath::single(CssSelector::for_class("test-source")),
]));
let root = load_inline_html(
r#"<div id="replaced">
<p class="first">
<em class="test-source">Content 1</em>
</p>
<aside class="test-source">Content 2</aside>
<div>
<div></div>
<div><img src="" class="test-source"></div>
<div></div>
</div>
</div>"#,
);

let result = command.execute(&vec![root]).unwrap();
let result = result.iter().map(|n| n.outer_html()).collect::<Vec<_>>();

assert_eq!(result.len(), 3);
assert!(result.contains(&String::from(r#"<em class="test-source">Content 1</em>"#)));
assert!(result.contains(&String::from(
r#"<aside class="test-source">Content 2</aside>"#
)));
assert!(result.contains(&String::from(r#"<img class="test-source" src="">"#)));
}

#[test]
fn query_replaced_returns_empty_on_no_match() {
let command = ElementCreatingCommand::FromReplaced(CssSelectorList::new(vec![
CssSelectorPath::single(CssSelector::for_class("test-source")),
]));
let root =
load_inline_html(r#"<div id="replaced"><p class="first"></p><aside></aside></div>"#);

let result = command.execute(&vec![root]).unwrap();

assert_eq!(result.len(), 0);
}

#[test]
fn query_replaced_returns_empty_on_empty_input() {
let command = ElementCreatingCommand::FromReplaced(CssSelectorList::new(vec![
CssSelectorPath::single(CssSelector::for_class("test-source")),
]));

let result = command.execute(&vec![]).unwrap();

assert_eq!(result.len(), 0);
}
}
5 changes: 5 additions & 0 deletions src/element_creating/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
mod command;
mod pipeline;

pub(crate) use command::ElementCreatingCommand;
pub(crate) use pipeline::ElementCreatingPipeline;
Loading

0 comments on commit 911cbe9

Please sign in to comment.