@@ -1110,6 +1110,25 @@ optional.
1110
1110
- 'fetchTimeMs': time in milliseconds that it took to collect the
1111
1111
archived URI, starting from the initiation of network traffic.
1112
1112
1113
+ > ** Community recommendation:** #59
1114
+ > The ` hopsFromSeed ` field comes from the [ discovery path] ( https://heritrix.readthedocs.io/en/latest/glossary.html#discovery-path )
1115
+ > concept in the Heritrix web crawler. The value is a string containing
1116
+ > one character for each link or embed followed from the seed, for
1117
+ > example "LLLE" might be an image on a page that's 3 links away from
1118
+ > a seed. The value of ` hopsFromSeed ` for a seed URI should be the
1119
+ > empty string.
1120
+ >
1121
+ > | Symbol | Meaning | Examples |
1122
+ > | --------| ----------------------------------------------------------| ----------------------------------------------------------|
1123
+ > | ` L ` | Link | ` <a href=...> ` |
1124
+ > | ` E ` | Embedded | ` <img src=...> ` <br >` <script src=...> ` |
1125
+ > | ` X ` | Speculative embed | ` <script>var url = 'http://example.org/foo.js';</script> ` |
1126
+ > | ` R ` | Redirect | ` HTTP/1.0 302 Found ` <br >` Location: ... ` |
1127
+ > | ` P ` | Prerequisite | robots.txt, DNS lookup |
1128
+ > | ` I ` | Implicit/Implied | favicon.ico |
1129
+ > | ` M ` | Manifest | URLs listed in sitemap files |
1130
+ > | ` S ` | Form submission | ` <form action=...> ` |
1131
+
1113
1132
A 'metadata' record may be associated with other records derived from
1114
1133
the same capture event using the WARC-Concurrent-To header. A 'metadata'
1115
1134
record may be associated to another record which it describes, using the
0 commit comments