Skip to content

Search is bad #40

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
unknown321 opened this issue Apr 19, 2025 · 4 comments
Closed

Search is bad #40

unknown321 opened this issue Apr 19, 2025 · 4 comments

Comments

@unknown321
Copy link

unknown321 commented Apr 19, 2025

output.mp4

As you can see, there is a significant delay between key presses and actual input. This happens because search works by parsing https://mgsvmoddingwiki.github.io/assets/js/searchindex.js, which is a 3.8M js file, containing... html.

Search picks up html keywords like inline or href.

Any chance for a better search experience?


Just some to give some context about the implementation:

  • If using Firefox it's known to handle the search library used sluggishly for full text search, which is something noted in the wiki tips page. While Chromium based browsers handle it reasonably fast, as seen in the below capture (tested also in both memory-starved Linux VMs and a 10 year-old Windows system):
Chromium.capture.webm
  • The site uses Liquid templating as the pages are generated by Jekyll. To obtain the page text we're using page.content. This happens in the post-Markdown conversion stage, so it's already HTML. When I looked there's no way around this using Liquid templating without the use of custom plugins which Github Pages doesn't support (Github Pages only supports a very limited number of 'approved' plugins). As such it includes superfluous HTML tags. I agree it's non-ideal and also bloats the index size.

    • For comparison since we're not using Jekyll's parsing for the 'virtual pages' from the Entity Reference sub-sections I instead generated the virtual pages' full text search index based on the raw Markdown files per se, which is cleaner. This was done by checking for any modified files within the virtual pages directory upon every git commit and rebuilding the index as needed via a custom script.
  • The reason for using full text search is to match unique strings for user search queries that may appear anywhere in the article. There were tests done where instead the search was limited to to just the first n characters but that obviously missed strings that occur later in the article/documentation.

  • 1-2 other JS search libraries were tested at the time but the one chosen had the best balance of features (and widely used for static sites). The default/original search library used for the wiki (before being replaced) was unusably bad.


Btw if you'd like an issues/discussions tab it'd be best to ask Joey on Discord as they're in control of the repo.

Originally posted by @chocmake in #39 (comment)

@chocmake
Copy link
Collaborator

Added a search input debounce so new autosuggestion queries will only occur if keystrokes aren't within a time threshold.

In my tests has improved the responsiveness in Firefox.

@unknown321
Copy link
Author

unknown321 commented Apr 19, 2025

Feels much better now, thanks.

@chocmake
Copy link
Collaborator

chocmake commented Apr 26, 2025

With the new build process the main search index is reduced to 990KB raw / 300KB actual bandwidth gzipped via Github, as it now parses the files directly to store the Markdown, like the virtual pages index build was doing.

@unknown321
Copy link
Author

very nice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants