Is it possible to use this for indentation sensitive languages? #8
-
If yes, can you share an example of how that would look like? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Lady Deirdre currently does not support indentation-based grammars out of the box. One possible workaround is to preprocess the file manually, replacing each indentation with some "indent"/"dedent" symbols. This will impact performance, but probably not too critically for files of average size. I have considered implementing this feature, but haven't had enough time yet. It would help if you have concrete examples of grammars that you want to parse. Various questions arise, for instance, is it purely indentation-based, or do curly-brace blocks also exist? |
Beta Was this translation helpful? Give feedback.
-
The language in question is Imba so it's purely indentation based and has no curly braces |
Beta Was this translation helpful? Give feedback.
-
@haikyuu I see. The workaround I suggested above would look as follows: First, in the lexical grammar of your language, you can introduce special tokens: "LineStart", "Indent", and "Dedent". #[derive(Token)]
enum MyLangToken {
#[rule('\n' & ' '*)] // The beginning whitespaces of the line.
LineStart,
#[rule("SECRET_INDENT")]
Indent,
#[rule("SECRET_DEDENT")]
Dedent,
// The rest of the lexis.
} The "SECRET_INDENT"/"SECRET_DEDENT" words should be special words (maybe some unique Unicode characters) that typically never appear in the end-user's source code. Next, you introduce an intermediary function that preprocesses raw edits by replacing line-beginning whitespaces with the "SECRET_INDENT"/"SECRET_DEDENT" words, depending on the current indentation. fn process_and_write(
final_doc: &mut Document::<MyLangNode>, // MyLangNode depends on MyLangToken.
span: impl ToSpan, // The span of the source code edited by the end user.
edit: &str, // The text that the user edited.
) {
// To store the raw content of the documents before preprocessing.
static RAW_TEXT: Lazy<Table<Document<VoidSyntax<MyLangToken>>>> = Lazy::new(|| Table::new());
let mut raw_text = RAW_TEXT.entry(final_doc.id()).or_default();
let mut post_processed = String::new();
raw_text.write(span, edit);
// Start indents processing.
let mut current = 0;
for chunk in raw_text.chunks(..) {
if chunk.token != MyLangToken::LineStart {
// If the token is not a line indentation, copying content as is.
post_processed.push_str(chunk.string);
continue;
};
// Otherwise, replacing whitespaces of the beginning of the line
// with the Indent or Dedent tokens.
if chunk.length > current { post_processed.push_str("SECRET_INDENT"); }
if chunk.length < current { post_processed.push_str("SECRET_DEDENT"); }
current = chunk.length;
}
// Copying post-processed content to the final document.
final_doc.write(.., &post_processed);
}
// Instead of `my_doc.write(10..20, "foo")` you would call the preprocessor:
process_and_write(&mut my_doc, 10..20, "foo"); This workaround is not ideal in many ways, but as a temporary solution, it should work fine, at least for files of typical size. Let me know if you are interested in having built-in functionality for this. I will consider introducing a more proper API to process indentations. |
Beta Was this translation helpful? Give feedback.
@haikyuu I see. The workaround I suggested above would look as follows:
First, in the lexical grammar of your language, you can introduce special tokens: "LineStart", "Indent", and "Dedent".
The "SECRET_INDENT"/"SECRET_DEDENT" words should be special words (maybe some unique Unicode characters) that typically never appear in the end-user's source code.
Next, you introduce an intermediary function that preprocesses raw edits by replacing line-beginning whites…