Auto-highlighting specific words in paginated text #238
-
Attempting to build functionality off of the testApp where the content of the preloaded pages are analyzed against a dictionary of known words. Unknown words would be automatically highlighted. (Ideally in the context of language learning) First of all, how might I access that specific text and then parse through it? How could I then add highlights at locations where there are unknown words? If I can accomplish these things, I would also like to create a custom overlay that is displayed when a word is tapped. Any ideas for how to do that? Apologies, I'm a CS student so am pretty new to IOS dev, but I have spent a lot of time trying to make sense of the codebase and documentation. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 6 replies
-
Pretty interesting use case! You're in luck, we just merged something that will help with that. It's not yet released but available in This will extract full paragraphs, but you can split them by words using a tokenizer. The default one handles splitting by words: guard let content = publication.content() else {
return
}
let wordTokenizer = makeTextContentTokenizer(
defaultLanguage: publication.metadata.language,
textTokenizerFactory: { language in
makeDefaultTextTokenizer(unit: .word, language: language)
}
)
let words: [TextContentElement.Segment] = try content
.elements()
.flatMap { try wordTokenizer($0) }
.compactMap { $0 as? TextContentElement }
.flatMap { $0.segments }
for word in words {
print("Word '\(word.text)' found at \(word.locator)")
} Output
The I'm not sure how well this would perform if you need to highlight every word in each resource though, probably slowly. You can use the Decorator API to be notified when the user taps on a highlighted word. The associated event has a (navigator as? DecorableNavigator)?.apply(
decorations: words.enumerated().map { (index, word) in
Decoration(
id: "word-\(index)",
locator: word.locator,
style: .highlight(tint: .red, isActive: false)
)
},
in: "words"
) |
Beta Was this translation helpful? Give feedback.
-
Sorry to bring this up again. Took a break as I felt I lacked some contextual knowledge ab databases, os, etc which I now have. I was thinking recently... is it possible the reason auto highlighting is slow due to db transactions? IIRC highlights are added to the SQLlite db somewhere. I would imagine since there could b thousands of words on a page or group of page... it might b hard to commit all of those at once to disk? Just a guess. I was thinking maybe I could implement a different type of highlighting that is only in memory? And then saved to the db only when the word is marked by the user |
Beta Was this translation helpful? Give feedback.
Pretty interesting use case!
You're in luck, we just merged something that will help with that. It's not yet released but available in
develop
. Take a look at the user guide for the Content iteration.This will extract full paragraphs, but you can split them by words using a tokenizer. The default one handles splitting by words: