Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude specific-version docs from search engines #8016

Merged
merged 1 commit into from
Aug 5, 2024

Conversation

itaigilo
Copy link
Contributor

@itaigilo itaigilo commented Jul 31, 2024

Closes #8008 .

Change Description

In order to exclude our version-specific docs from the search engines, this PR does two things:

  1. Disallows /v0.* and /v1.* in robots.txt.
  2. Adds <meta name="robots" content="noindex"> to all version folders, in build time (meaning, as part of the release actions).

Another PR adds <meta name="robots" content="noindex"> to all relevant existing html files in our docs-lakeFS repo.

Testing

This change will affect only the next released lakeFS version, so I'll validate it after it's released that the <meta> tag is added to the /v1.31/* directory htmls (but not to the "latest" docs).
After this PR will be merged, I will be able to validate that for the latest docs, it won't add this <meta> tag.

Also, it'll be monitored with Yuval, our SEO expert.

@itaigilo itaigilo added exclude-changelog PR description should not be included in next release changelog minor-change Used for PRs that don't require issue attached docs-platform Issues to do with the docs platform itself area/ci labels Jul 31, 2024
@itaigilo itaigilo self-assigned this Jul 31, 2024
Copy link

github-actions bot commented Jul 31, 2024

♻️ PR Preview 228ae32 has been successfully destroyed since this PR has been closed.

🤖 By surge-preview

Copy link

E2E Test Results - Quickstart

10 passed

Copy link

E2E Test Results - DynamoDB Local - Local Block Adapter

13 passed

@itaigilo itaigilo removed their assignment Jul 31, 2024
@itaigilo itaigilo requested a review from a team July 31, 2024 09:47
@arielshaqed
Copy link
Contributor

Not sure this is the way to do it. IIUC it will remove all old versions from search engines. But what if I want to search docs for "lakectl fs 1.11"?

IIUC we want to create canonical URLs for each piece of content, not remove them entirely.

@itaigilo
Copy link
Contributor Author

itaigilo commented Aug 1, 2024

Not sure this is the way to do it. IIUC it will remove all old versions from search engines. But what if I want to search docs for "lakectl fs 1.11"?

IIUC we want to create canonical URLs for each piece of content, not remove them entirely.

This PR will indeed cause the removal of these previous versions from search results.

The current problem we're facing is that Google already considers the v0.52 pages as the "Canonical version" of our pages. According to the link you've posted -

You can indicate your preference to Google using these techniques, but Google may choose a different page as canonical than you do, for various reasons. That is, indicating a canonical preference is a hint, not a rule.
And anyway, if we try to implement such hints, we're not sure it'll indeed refer to the latest version as the "Canonical" one.

It'll indeed be trickier to google for specific lakeFS versions -
But the search results in such cases will lead to the relevant "latest" pages in our docs, and the users can change the page to the version they want. Plus, I think it's much more important to solve the false default results, then the search-for-a-specific-version one.

[Looping @keren-lakeFS here.]

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SEO expert says deleting all previous versions from search engines is the way to go, so that is what we'll do.

THANKS for your patience with me here!

@itaigilo itaigilo merged commit d40b90b into master Aug 5, 2024
47 checks passed
@itaigilo itaigilo deleted the feature/exclude-version-docs-from-search-enigines branch August 5, 2024 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ci docs-platform Issues to do with the docs platform itself exclude-changelog PR description should not be included in next release changelog minor-change Used for PRs that don't require issue attached
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Docs: remove non-latest versions from search-engines
2 participants