Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strict [skip-revcheck] #132

Closed
wants to merge 0 commits into from
Closed

Strict [skip-revcheck] #132

wants to merge 0 commits into from

Conversation

alfsb
Copy link
Member

@alfsb alfsb commented Jul 10, 2024

This is a tentative PR to fix the issues below. This PR changes doc-base's revcheck.php to:

  1. Strictly checks for [skip-revcheck] as the first text on the first line of a commit message;
  2. Strictly ignore a sequence of [skip-revcheck] commits.

Issue one

First observe the contents of commit php/doc-en@8f6fd5c and after his commit message, below:

    Add missing libxml related constants (#3388)
    * [skip-revcheck] Fix whitespace
    * Add missing libxml related constants

This commit should mark the translations as outdated or be skiped?

As now, the code searches for [skip-revcheck] in any part of the commit message. See:

if ( substr( $line , 0 , 4 ) == ' ' )
{
if ( stristr( $line, '[skip-revcheck]' ) !== false )
$skip = true;

Issue two

Observe these commit history:

That is, a sequence of commits marked with [skip-revcheck]. Should these files be marked as outdated each time there are two or more commits marked with [skip-revcheck] in sequence?

The current code always gets the commit -2 of history when the last commit is marked [skip-revcheck]. See:

$hashes = explode ( "\n" , `git log -2 --format=%H -- {$filename}` );

Possible another issue

In the case of https://github.com/php/doc-en/commits/765b2d6eec7dfbaeed900b32aa91a1360d73df42/reference/ds/ds.deque.xml , with the hashes and messages as:

4d17b7 [skip-revcheck] Convert class markup to be compatible with DocBook 5.2
6cecca [skip-revcheck] Normalize &Constants; and &Methods; usage (#2703)
b2a26b These should include the namesapce

What hash should appear in revcheck as the "target" of translations and diffs? The most recent (4d17b7) or the last non-skipped (b2a26b)?


The first issue caused about 40 files to not be marked outdated in all translations. The second issue is harder to calculate, as it involves analysing the commit history of each file, against each revtag.

If this PR gets approval, it will generate about a hundred of files to be marked as outdated in all translations, and will also desynchronize the revcheck script of doc-base and web-doc.

I plan to, after merging this, to change the lib version of this code in doc-base/tree/master/scripts/translation to reflect the new strict rules on lib code, and to open issues against code duplication on doc-base and web-doc. I would also resolve the issue against doc-base, but I probably will not tackle the web-doc issue as I do not know to test or deploy this infrastructure.

@alfsb alfsb mentioned this pull request Jul 10, 2024
@alfsb alfsb marked this pull request as ready for review July 24, 2024 22:10
@alfsb
Copy link
Member Author

alfsb commented Jul 24, 2024

PR ready for review and comments. As this changes a tool that is used by (all?) translations, this is a thing I would like to hear comments before merging.

I would especially like to hear from people with access to web-php revcheck.php to comment, and try to synchronize the change here and there.

@Girgias @nilgun

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for tackling this!

I got slightly frustrated with having multiple commits that skip the rev-checked actually marking the translation as outdated.

I wasn't aware of the bigger issue with it skipping squashed commits.

I haven't looked at the revcheck script in years, and I am busy so if my review is required it might take a bit longer, but I am happy to merge a PR which is made to the relevant doc repo to have them in sync.

@alfsb
Copy link
Member Author

alfsb commented Jul 27, 2024

I don't see a problem with waiting a little, and I'm glad that you can make the change to the web-doc/revcheck.php, and thus not have the problem of desynchronization.

As these changes will trigger revisions of dozens of files in each translation, I think it's important to ping the lists, right before or right after the merge.

With this PR merged, I will try to start the work of removing code duplication from revchecks, so that future changes do not depend on so much repo coordination/code duplication.

@alfsb
Copy link
Member Author

alfsb commented Aug 7, 2024

I'm now confident that this PR will only mark files incorrectly skipped (squashed commits that contain an [skip-revcheck]) and will flag files mismarked with a hash of a skipped commit.

This version also runs 10x faster on pt_BR translation, which is somewhat expected as this new code executes way less shell git commands. Before and after timing:

real 0m25,360s
user 0m21,151s
sys 0m4,974s

real 0m1,969s
user 0m1,918s
sys 0m0,329s

@nilgun
Copy link
Member

nilgun commented Aug 8, 2024

This is a tentative PR to fix the issues below. This PR changes doc-base's revcheck.php to:

1. Strictly checks for `[skip-revcheck]` as the first text on the first line of a commit message;

2. Strictly ignore a sequence of `[skip-revcheck]` commits.

Issue one

First observe the contents of commit php/doc-en@8f6fd5c and after his commit message, below:

    Add missing libxml related constants (#3388)
    * [skip-revcheck] Fix whitespace
    * Add missing libxml related constants

This commit should mark the translations as outdated or be skiped?

As now, the code searches for [skip-revcheck] in any part of the commit message. See:

if ( substr( $line , 0 , 4 ) == ' ' )
{
if ( stristr( $line, '[skip-revcheck]' ) !== false )
$skip = true;

This is a committer error. [skip-revcheck] is used incorrectly. This should be two separate commits.

Issue two

Observe these commit history:

* https://github.com/php/doc-en/commits/765b2d6eec7dfbaeed900b32aa91a1360d73df42/language/context/parameters.xml

* https://github.com/php/doc-en/commits/765b2d6eec7dfbaeed900b32aa91a1360d73df42/reference/ds/ds.deque.xml

That is, a sequence of commits marked with [skip-revcheck]. Should these files be marked as outdated each time there are two or more commits marked with [skip-revcheck] in sequence?

The current code always gets the commit -2 of history when the last commit is marked [skip-revcheck]. See:

$hashes = explode ( "\n" , `git log -2 --format=%H -- {$filename}` );

These have already been done in all languages' files. Otherwise configure.php would fail.

Possible another issue

In the case of https://github.com/php/doc-en/commits/765b2d6eec7dfbaeed900b32aa91a1360d73df42/reference/ds/ds.deque.xml , with the hashes and messages as:

4d17b7 [skip-revcheck] Convert class markup to be compatible with DocBook 5.2
6cecca [skip-revcheck] Normalize &Constants; and &Methods; usage (#2703)
b2a26b These should include the namesapce

What hash should appear in revcheck as the "target" of translations and diffs? The most recent (4d17b7) or the last non-skipped (b2a26b)?

The first issue caused about 40 files to not be marked outdated in all translations. The second issue is harder to calculate, as it involves analysing the commit history of each file, against each revtag.

If this PR gets approval, it will generate about a hundred of files to be marked as outdated in all translations, and will also desynchronize the revcheck script of doc-base and web-doc.

I plan to, after merging this, to change the lib version of this code in doc-base/tree/master/scripts/translation to reflect the new strict rules on lib code, and to open issues against code duplication on doc-base and web-doc. I would also resolve the issue against doc-base, but I probably will not tackle the web-doc issue as I do not know to test or deploy this infrastructure.

The first issue was already done in all languages' files, as far as I remember. That's why they are skip-revchecked. Otherwise configure.php would fail.

The second issue does not affect translations.

I don't understand why the third issue is objected.

--

I don't think any changes are needed. Committers should be trained better.

@alfsb
Copy link
Member Author

alfsb commented Aug 12, 2024

This is a committer error. [skip-revcheck] is used incorrectly. This should be two separate commits.

The original committer used separate commits. The issue above occurs when commits are squashed. This is a fix for squashed commits.

That is, a sequence of commits marked with [skip-revcheck]. Should these files be marked as outdated each time there are two or more commits marked with [skip-revcheck] in sequence?

These have already been done in all languages' files. Otherwise configure.php would fail.

configure.php does not fail when there are consecutive [skip-revcheck] commits. The issue is files being marked outdated when there are two or more [skip-revcheck] commits.

~/phpdoc$ ls -d */
doc-base/  en/  pt_BR/
~/phpdoc$ php doc-base/scripts/revcheck.php pt_BR > rev1.html
~/phpdoc$ cd en/
~/phpdoc/en$ echo " " >> language/context/parameters.xml 
~/phpdoc/en$ git commit -a -m "[skip-revcheck] One."
[master 39d8257a8f] [skip-revcheck] One.
 1 file changed, 1 insertion(+)
~/phpdoc/en$ echo "  " >> language/context/parameters.xml
~/phpdoc/en$ git commit -a -m "[skip-revcheck] Two."
[master ff12fe660d] [skip-revcheck] Two.
 1 file changed, 1 insertion(+)
~/phpdoc/en$ git log -3 --format=%H
ff12fe660d93eda71c146cf84508f6c2a3c613af
39d8257a8f69acdecdfd08d7c3fff4457564b895
be8baf89c0016c1c39ab0b85cafca28c8364e95b
~/phpdoc/en$ cd ..
~/phpdoc$ php doc-base/scripts/revcheck.php pt_BR > rev2.html
~/phpdoc$ cat rev1.html | grep "parameters.xml" | wc
      0       0       0
~/phpdoc$ cat rev2.html | grep "parameters.xml" | wc
      3      13     820

The first issue was already done in all languages' files, as far as I remember. That's why they are skip-revchecked. Otherwise configure.php would fail.

configure.php does not fail. Running php doc-base/configure.php --with-lang=pt_BR before or after the commands above, configure.php will complete sucefully.

@nilgun
Copy link
Member

nilgun commented Aug 19, 2024

These are DTD changes. It is not possible to apply them to doc-en and not to others.

When I look at it, I see that it is also applied to pt_BR:

php/doc-en@8bc832a

php/doc-pt_br@f59baa1

@alfsb
Copy link
Member Author

alfsb commented Aug 20, 2024

These are DTD changes. It is not possible to apply them to doc-en and not to others.
When I look at it, I see that it is also applied to pt_BR:

The link of first issue already shows a case not related with DTD changes: php/doc-en@8f6fd5c

Also, there are dozen of other cases in git history, showing that sequential commits occurs, outside coordinated efforts.

git log --format=%h  %s -- reference/radius/constants.xml
f27cfeeefc  Update examples using md5(...) to use hash('md5', ...) (#3619)
3dbacb703e  Add missing Radius constant IDs
6c946c61e9  [skip-revcheck] Fix whitespace
39729c23a8  [skip-revcheck] Update Radius constant ID (#3215)
86e6094e86  Use canonical type names

git log --format=%h  %s -- reference/curl/constants.xml
b8ea7cb9cf  Change wording from "Available since" to "Available as of"
5743eea6e4  Copy constants from curl_getinfo to dedicated constants page
72b9959ea6  [skip-revcheck] Fix whitespace
47d4ae82e6  [skip-revcheck] Update cURL constant IDs (#3224)
4a0b755422  [PHP 8.3] New cURL constants (#3089)

git log --format=%h  %s -- reference/sockets/constants.xml
cbf8c63630  Add socket constant descriptions (#3499)
10d68da45d  Add missing socket constants
c56da98841  [skip-revcheck] Fix whitespace
b9142963a5  [skip-revcheck] Update socket constant IDs (#3226)
cf5cf50fbc  [PHP 8.3] socket constants update. (#2948)

git log --format=%h  %s -- reference/filesystem/constants.xml

850ac483c8  XInclude GLOB constants in glob.xml
3675d8d7e4  Add GLOB_ERR and description of GLOB constants
1ecf4d6872  [skip-revcheck] Fix whitespace
5e9500ddad  [skip-revcheck] Syncronize predefined constants with stubs - part 1 (#2739)
aab33d6443  Removed references to PHP 5, PHP 7.0 (#544)
86e6094e86  Use canonical type names

git log --format=%h  %s -- reference/image/constants.xml
7f5d646f3e  Add IMG_CROP constants (#3504)
fe955bf734  Add missing image constants
8e875f1349  [skip-revcheck] Fix whitespace
5e9500ddad  [skip-revcheck] Syncronize predefined constants with stubs - part 1 (#2739)
9bef3400f8  Fix GH-1380: Add `IMAGETYPE_AVIF` constant (#2628)
928f051323  [PHP 8.1] Add new global GD constants (#1545)

git log --format=%h  %s -- reference/session/sessionhandler.xml
601f6f4ce5  Update ext/session role attributes (#1987)
2ab45105b4  Stop referring to `openssl_random_pseudo_bytes()` outside of ext/openssl (#1967)
6d1ca753fb  [skip-revcheck] Fix typos
4eb60e8132  [skip-revcheck] Generate ext/session class synopses from stubs
5fabd07880  Removed Changelog entries for PHP 5, PHP 7.0 (#543)
fcfd6ea1d2  Document SessionIdInterface

git log --format=%h  %s -- reference/sodium/sodiumexception.xml
09c49da6f0  Update Error, Exception, and Throwable role attributes (#2069)
8f28fbb43a  Revert "Revert "Generate ext/sodium class synopses from stubs""
6700a28f74  [skip-revcheck] Revert "Generate ext/sodium class synopses from stubs"
1b3769ae7d  [skip-revcheck] Generate ext/sodium class synopses from stubs
8469c26f01  SodiumException has no members

git log --format=%h  %s -- reference/fileinfo/constants.xml
4968cab7de  Add FILEINFO_APPLE constant
790b69f2e0  [skip-revcheck] Fix whitespace
5e9500ddad  [skip-revcheck] Syncronize predefined constants with stubs - part 1 (#2739)
aab33d6443  Removed references to PHP 5, PHP 7.0 (#544)

git log --format=%h  %s -- reference/gmp/constants.xml
18ad21afec  Add GMP_MPIR_VERSION constant
2ff877cdec  [skip-revcheck] Fix whitespace
5e9500ddad  [skip-revcheck] Syncronize predefined constants with stubs - part 1 (#2739)
86e6094e86  Use canonical type names

git log --format=%h  %s -- install/ini.xml
d228d0aa06  [skip-revcheck] Convert info constant tables to variablelists (#3342)
15841d1884  Link INI mode table
327111cc5d  [skip-revcheck] Fix whitespace
d4d5216e7a  [skip-revcheck] Replace PHP_INI_* with INI_* constants
4eeb07225f  Remove old install docs (#662)

git log --format=%h  %s -- language/types/type-juggling.xml
f94d903985  Fix markup issues in language section
183439d468  [skip-revcheck] Correct sentence grammar in type juggling page (#2942)
8870f9d1ce  [skip-revcheck] Remove double "are" (#2689)
4498e3ed86  Fix GH-2077, not just numeric strings can be implicitly converted to int in functions

git log --format=%h  %s -- language/predefined/variables/superglobals.xml
a6d209f4ff  [skip-revcheck] Use <refentry> tag with role attribute instead of custom XML tag
ec6e871a47  [skip-revcheck] Replace role="noversion" attribute usage with the new annotations one
fd6629b1ac  Remove duplicate "Superglobals"

git log --format=%h  %s -- reference/datetime/dateperiod/construct.xml
7d81260767  Document new DatePeriod::createFromISO8601String method in PHP 8.3
71692b6f4c  Add Date/Time exception documentation
02ff7fef5b  Update ext/date role attributes (#1972)
655c1b1b94  [skip-revcheck] Fix typo (offets → offsets)
27f6943df8  Fixed #1291: DatePeriod::__construct(string ) - ISO string does not accept time zone offsets
b7ac6fa547  [skip-revcheck] Fix typos
aab7fca69f  [skip-revcheck] Remove trailing whitespace
c249f3bc56  Incorporate useful information from notes (take #2)

git log --format=%h  %s -- reference/datetime/functions/date-parse.xml
50dcf55ce5  date_parse does not return false (#2395)
19e8122137  Remove references to "double" and use float instead (#1848)
20415705a3  Update date-parse.xml (#1832)
5c951013ca  Incorporate useful information from notes (take #3)
6f845b90b4  [skip-revcheck] Undo false typo
e7b5172917  [skip-revcheck] Typos
bbfdc364de  Swap DateTime and DateTimeImmutable documention to push people towards using DateTimeImmutable
```.

@nilgun
Copy link
Member

nilgun commented Aug 20, 2024

If your claim is true, the PHP documentation seems to be a mess in terms of translations. If previous commits have made things so confusing, I will not translate the PHP documentation from now on. Do what you want. It does not concern me anymore.

@alfsb
Copy link
Member Author

alfsb commented Aug 27, 2024

I haven't looked at the revcheck script in years, and I am busy so if my review is required it might take a bit longer, but I am happy to merge a PR which is made to the relevant doc repo to have them in sync.

I will try, in the following weeks, to make a PR to the other revcheck.php. There is a bunch of [skip-revcheck] commits I want to send, about entities and <literal> usage, that will bump a lot a files with consecutive [skip-revcheck], so it's better to resolve this issue first.

@jimwins
Copy link
Member

jimwins commented Oct 22, 2024

Now that the infrastructure for doc.php.net has been moved (nearly), I'd be happy to revisit merging or getting rid of the separate implementations of this between web-doc and doc-base. My preference would be to get rid of the version in web-doc entirely, and add a way for revcheck.php in doc-base to generate the SQLite data.

(On the web-doc side, we could also get rid of the image generation scripts and just generate them on the fly.)

@alfsb
Copy link
Member Author

alfsb commented Oct 22, 2024 via email

@jimwins
Copy link
Member

jimwins commented Oct 22, 2024

No, relying on code in doc-base from web-doc using a git submodule is a hard pass for me. We need to either generate static files (like we do for the manual) or use an intermediary like SQLite (like we do for the manual search) between the doc and web sides. It doesn't need to be the same SQLite schema as being used now (which is kind of awful?).

@alfsb
Copy link
Member Author

alfsb commented Oct 22, 2024

No, relying on code in doc-base from web-doc using a git submodule is a hard pass for me.

The inverse. Having code from doc-base into web-doc as a submodule.

 We need to either generate static files (like we do for the manual) or use an intermediary like SQLite (like we do for the manual search) between the doc and web sides.

It's possible do have doc-base/configure.php to run doc-base/scripts/translation/configure.php and generate some custom data on a know location.

So anytime the manual configure, the revcheck data can be made available as static data, near by the generated manual pages.

@jimwins
Copy link
Member

jimwins commented Oct 23, 2024

No, relying on code in doc-base from web-doc using a git submodule is a hard pass for me.

The inverse. Having code from doc-base into web-doc as a submodule.

We're saying the same thing. It would be a hard pass in the other direction, too.

We need to either generate static files (like we do for the manual) or use an intermediary like SQLite (like we do for the manual search) between the doc and web sides.

It's possible do have doc-base/configure.php to run doc-base/scripts/translation/configure.php and generate some custom data on a know location.

So anytime the manual configure, the revcheck data can be made available as static data, near by the generated manual pages.

Sure, that would also be fine, and probably makes the most sense in the long run instead of generating them independently. That way the translation status on doc.php.net would be basically in sync with the translations live on www.php.net.

(The scripts that drive this on the systems side are build-docs-rsync and build-docs-lang-rsync.)

@alfsb
Copy link
Member Author

alfsb commented Oct 26, 2024

Accidentally closed when cleaning up my fork, to first start working on code duplication of revcheck.php, in a bug compatible manner. I will wait the code de-duplication before tackling the original issue.

Re-post the opening as an separate issue?

@Girgias
Copy link
Member

Girgias commented Oct 27, 2024

Accidentally closed when cleaning up my fork, to first start working on code duplication of revcheck.php, in a bug compatible manner. I will wait the code de-duplication before tackling the original issue.

Re-post the opening as an separate issue?

I think so yes

@alfsb alfsb mentioned this pull request Oct 31, 2024
3 tasks
@alfsb
Copy link
Member Author

alfsb commented Nov 12, 2024

An fix for issue two (sequential [skip-revcheck] marks files outdated in translations) was proposed in PR 178. A solution for issue one (squashed commits doesn't mark files outdated) are in the works. It's simple in principle, but will take time to make it robust.

As the PRs opened until now cause little or no changes, I will prefer to merge each one separately, wait for if there is any problem, before opening an PR for the squashed commit issue, that will change a lot of file status, in all translations. This is a battle that I prefer to not mix with other modifications.

There already an issue opened on the squashed issue (133), so no further issues need to be opened.

TL:DR; The only missing PR is for squashed commits, all other issues have open PRs or are already merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants