Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single source revcheck code/data #174

Merged
merged 19 commits into from
Nov 11, 2024
Merged

Single source revcheck code/data #174

merged 19 commits into from
Nov 11, 2024

Conversation

alfsb
Copy link
Member

@alfsb alfsb commented Oct 31, 2024

This PR is a work in progress to concentrate revcheck code and generation of revcheck data into a single place.

In doc-base (this PR):

  • Change revcheck lib code to generate similar data of actual revcheck scripts;
  • Change doc-base/scripts/revcheck.php to use revcheck lib;
  • Create doc-base/scripts/translations/genrevdb.php. (proposed)

In web-doc (other PRs)

  • New script or code to call genrevdb.php; (systens 54)
  • Rewrite HTML generating pages to use new db (web-doc 58);
  • Remove most/all code of web-doc/scripts/rev.php. (ditto)

This PR does not try to fix the issues reported on #133 or #132, for now. The priority is removing code duplication, and after that to possibly change revcheck semantics of [skip-revcheck] (and other possible issues)

Reviews are welcome. Cosmetic changes delayed in favor of resolving issues found.

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimwins as you have been working on doc-web, mind keeping an eye on this?

scripts/translation/lib/all.php Outdated Show resolved Hide resolved
scripts/translation/lib/RevcheckRun.php Outdated Show resolved Hide resolved
@jimwins
Copy link
Member

jimwins commented Nov 4, 2024

@jimwins as you have been working on doc-web, mind keeping an eye on this?

Yes, I'm ready to take a look when it is past the draft stage, and move over the systems side of doc.php.net to stop running web-doc/scripts/rev.php and switch it over to doc-base/scripts/revcheck.php (or configure.php) with the necessary configuration to generate the SQLite database or whatever that web-docs/www/rev.php will read.

@alfsb
Copy link
Member Author

alfsb commented Nov 6, 2024

Yes, I'm ready to take a look when it is past the draft stage, and move over the systems side of doc.php.net to stop running web-doc/scripts/rev.php and switch it over to doc-base/scripts/revcheck.php

If switching over to doc-base/scripts/revcheck.php is possible, it could avoid the last step, of serializing data in a secure format, and directly use more code from this PR, that I rewrote using the popos instead arrays (to facilitate debug).

@alfsb alfsb marked this pull request as ready for review November 6, 2024 18:00
@alfsb
Copy link
Member Author

alfsb commented Nov 6, 2024

I finished the rewriting of doc-base/scripts/revcheck.php, with back ported counting and listing behaviors to match actual code, and so facilitate migration. After all revcheck code (or data) becoming single sourced, I will then try open separated GH issues for some cases I found while rewriting, including PR 132.

There are two cosmetic modifications that i would like to make, namely:

  • Placing changed lines in line in old/wip list, to better use the horizontal space
  • Add a copy button before the actual Change column, that copies to clipboard the full path/name of file.

This won't change any data or structure, but will make the generated HTML differ, and so I may wait for web-doc code sync or the merging of this PR, before making these changes.

@jimwins
Copy link
Member

jimwins commented Nov 6, 2024

Seems fine to me. What I'd like to see is a way to pass a list of languages to check to doc-base/scripts/revcheck.php, not just one, and an argument to output an SQLite database with the information as currently done by web-doc/scripts/rev.php (and not output any HTML, of course).

@alfsb
Copy link
Member Author

alfsb commented Nov 7, 2024

That would be a partial rewrite of web-doc/scripts/rev.php, to both eliminate the code duplication and to generate the same database output as currently implemented. The base data generating code can be as minimal as:

require_once "path/to/doc-base/scripts/translation/lib/all.php";
foreach( $langs as $lang ) {
    $revcheck = new RevcheckRun( $lang );
    generateSql( $revcheck->revData );
}

By the way, the database output appears to me to be overly complicated, using 8 tables (and some indexes) where is only really necessary 3 tables: languages, translations and files. I propose to write and submit a PR for an alternative web-doc/scripts/revdb.php that does the rewrite and simplification.

After this alternative is installed, running and generating a simplified db, it would only be necessary to modify the SELECTs on HTML generating pages to use the new db/tables, with no joining necessary. For this last part I ask someone else to volunteer, before starting this alternative revdb.php.

@jimwins
Copy link
Member

jimwins commented Nov 7, 2024

The script for generating the database should live in the doc-base repo because we will not do cross-repo code dependencies. I have no problem with simplifying the schema, and will handle updating the display code in web-doc.

@alfsb
Copy link
Member Author

alfsb commented Nov 7, 2024

So, instead of cross-repo code including, I can write an doc-base/scripts/translations/genrevdb.php to generate an Sqlite file from a language list.  That would be acceptable?

The proposed syntax would be: genrevdb.php [relative/path/file.db] [langdir1,...,langdirN]

Invoked from a directory above $LANG checkouts, that is afaik the same convention of doc-base/configure.php.

@jimwins
Copy link
Member

jimwins commented Nov 7, 2024

That sounds good. But the path to the SQLite database may be an absolute path, too.

@alfsb
Copy link
Member Author

alfsb commented Nov 7, 2024

Pushed with a new script named doc-base/scripts/translation/genrevdb.php. Tested in one repository, and in the following days I will test the new revcheck.php and genrevdb.php in all translations listed CI for automatic testing.

Reviews and tests are welcome. Runs in about 28s on 14y old CPU and SSD for pt_BR repo.

@jimwins
Copy link
Member

jimwins commented Nov 7, 2024

Haven't played around with the resulting database and might not have time to do today, but it looks like it works for me, at least.

It would be preferable to specify languages as individual arguments instead of one argument that you explode on ,.

One thing I did notice is that the intros in the languages table are wrapped in the <intro> tag.

$ php doc-base/scripts/translation/genrevdb.php out.sqlite de,es,fr,it,ja,pl,pt_br,ro,ru,tr,uk,zh
[2024-11-07 18:54] Creating revdata database out.sqlite for languages de,es,fr,it,ja,pl,pt_br,ro,ru,tr,uk,zh.
[2024-11-07 18:54] Language de run
[2024-11-07 18:55] Language de done (elapsed 19.00s)
[2024-11-07 18:55] Language es run
[2024-11-07 18:56] Language es done (elapsed 68.00s)
[2024-11-07 18:56] Language fr run
[2024-11-07 18:57] Language fr done (elapsed 58.00s)
[2024-11-07 18:57] Language it run
[2024-11-07 18:57] Language it done (elapsed 5.00s)
[2024-11-07 18:57] Language ja run
[2024-11-07 18:58] Language ja done (elapsed 45.00s)
[2024-11-07 18:58] Language pl run
[2024-11-07 18:58] Language pl done (elapsed 4.00s)
[2024-11-07 18:58] Language pt_br run
[2024-11-07 18:58] Language pt_br done (elapsed 30.00s)
[2024-11-07 18:58] Language ro run
[2024-11-07 18:58] Language ro done (elapsed 4.00s)
[2024-11-07 18:58] Language ru run
[2024-11-07 19:00] Language ru done (elapsed 80.00s)
[2024-11-07 19:00] Language tr run
[2024-11-07 19:00] Language tr done (elapsed 17.00s)
[2024-11-07 19:00] Language uk run
[2024-11-07 19:00] Language uk done (elapsed 3.00s)
[2024-11-07 19:00] Language zh run
[2024-11-07 19:00] Language zh done (elapsed 24.00s)
[2024-11-07 19:00] Revdata database out.sqlite complete.

@alfsb
Copy link
Member Author

alfsb commented Nov 7, 2024

It would be preferable to specify languages as individual arguments instead of one argument that you explode on ,.

Changed.

One thing I did notice is that the intros in the languages table are wrapped in the <intro> tag.

It does not affect rendering, but is certainly invalid HTML. Working with DOMDocumentFragment or detaching root-less XML is always awkward. Piecewise XML node saving is no prettier, but generates way less code. Fixed.

@alfsb
Copy link
Member Author

alfsb commented Nov 8, 2024

I am now confident this PR produces identical results in most cases, and near identical in a few cases, comparing counts and listings of old and lib versions of revcheck.

For example, ro revcheck counts 20 notinen files, but lists 21. The code that extracts hashes, determines files final status and count cases are spread in so many places that it is impossible to make further progress. But as doc-base and web-doc revchecks are run with different times (and checkout states), some differences are expected.

And with other PRs in place (listed above), I think it's time to push for this. I plan to:

  1. Merge this PR in the following week, if there are no objections;
  2. In the meantime, to open various separated issues some minor, some major, that changes revcheck results;
  3. The minor issues to be pushed separately;
  4. Two major issues, dealing with [skip-revcheck], to be pushed around early December.

This new "lib revcheck" code is somewhat future proofed, in the sense that the modifications above only generate exchanged data, and no further changes in database or HTML generating would be necessary.

Copy link
Member

@jimwins jimwins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@alfsb alfsb mentioned this pull request Nov 10, 2024
@alfsb alfsb merged commit bee795d into php:master Nov 11, 2024
9 of 12 checks passed
@alfsb
Copy link
Member Author

alfsb commented Nov 11, 2024

Merged. Further issues/PRs will consider this as baseline. Is necessary to comment/notify the PRs of system and web-doc, above?

@jimwins
Copy link
Member

jimwins commented Nov 11, 2024

All handled now, new data is live on https://doc.php.net/.

@alfsb
Copy link
Member Author

alfsb commented Nov 11, 2024 via email

@alfsb
Copy link
Member Author

alfsb commented Nov 12, 2024

I spoke to soon. The pt_BR translation is not shown, other are ok. See: https://doc.php.net/revcheck.php?lang=pt_br

@Girgias Girgias mentioned this pull request Dec 5, 2024
32 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants