Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add accessibility checking to deposit workflow #1580

Closed
binkylush opened this issue Aug 30, 2024 · 8 comments
Closed

Add accessibility checking to deposit workflow #1580

binkylush opened this issue Aug 30, 2024 · 8 comments

Comments

@binkylush
Copy link

No description provided.

@binkylush
Copy link
Author

@binkylush
Copy link
Author

Sensus Access API info
https://www.sensusaccess.com/integrating-sensusaccess/

@ajkiessl
Copy link
Collaborator

ajkiessl commented Sep 5, 2024

We have access to SiteImprove so let's test out their API. Still haven't heard back from Sensus, yet.

@EricDurante
Copy link
Collaborator

Here's my initial assessment of the tools just based on the documentation that I can find:

Siteimprove

Siteimprove provides a specification for their API and a few notes about each of the endpoints, but no real overall guidance on how to use it (that I can find). Based on the specification, this tool looks like it's far from ideal for our use case. The API appears to support two different methods for checking documents for accessibility:

  1. You can feed it a URL for a website. It will crawl the public portions of the entire site (presumably by following links on pages within the domain, though it doesn't specifically say) and report on the accessibility of each page including both HTML and PDF documents. This feature is probably useless for what we're trying to accomplish since it doesn't give synchronous feedback per document in realtime, and it can only access resources that are publicly available on the web.
  2. You can upload "content" directly for accessibility analysis. According to the API specification, the accepted content types for this method are text/plain, text/html and application/zip. I have no idea what that actually means in terms of what types of documents can be uploaded for analysis, but the list notably doesn't include application/pdf or any other kind of binary or marked-up text document. To understand what this feature actually does, I think that we're going to have to experiment with it. This would be the method that we'd want to use if it works with the right kinds of documents, but based on what I've seen so far, I don't have a lot of hope that it does.

The bottom line is that we should do a little experimentation to be sure, but it doesn't look like Siteimprove is going to be a useful tool for our proposed workflow.

Sensus Access

I can't find any publicly available documentation whatsoever about the Sensus Access API. I tried to sign up for an account to see if that would allow me to view any API docs, but the account sign-up process is apparently manual, and I haven't yet received a response to my request for an account. At this point, I have no idea if Sensus Access provides what we need.

Other tools

I've been searching for other tools that might be a better fit. In particular, I think that it would be ideal if we could find a tool that we can install and run locally alongside the ScholarSphere app rather than requiring us to upload documents to a third-party web service. So far I haven't found anything that looks promising, but I'll continue looking.

@EricDurante
Copy link
Collaborator

This may be useful for analyzing PDFs.

@EricDurante
Copy link
Collaborator

It looks like Adobe's PDF Services API is probably going to be our best bet for this. I signed up for a free-tier account and was able to try out the accessibility checker by obtaining an API key and following the general documentation and the specific documentation for the PDF Accessibility Checker.

Based on this documentation, it appears that it may be possible to process documents directly from their location in ScholarSphere's AWS S3 storage.

I looked around for a Ruby wrapper for this REST API, but didn't find anything that looked useful. Since we're only interested in performing one type of operation, I think it will be pretty trivial to implement our own client. For the test that I did, I simply used curl.

I'm not sure if there's an existing Adobe subscription that we can use, but a free-tier account gives us "500 free Document Transactions per month". I'm not yet sure what exactly counts as a "document transaction" since I don't currently see any activity when I check my API key usage after running the test.

@EricDurante
Copy link
Collaborator

EricDurante commented Sep 19, 2024

The test of the Adobe API that I did yesterday counted as 2 document transactions against my free 500/month limit. I suspect that the document upload counted as 1 and the accessibility check counted as 1. If we can check documents that are already stored in S3, then we may be able to avoid the upload transaction for each document. In that case, we could accessibility check 500 documents per month on the free tier. Scratch that. The document upload didn't count for anything. I forgot that I actually made two requests to the accessibility checking API endpoint for the same document because I didn't capture the response headers that I needed on the first request. So it's definitely 500 full-document accessibility checks per month for free.

I don't know what the maximum number is that we can expect for PDF document uploads per month into ScholarSphere, so I don't know if 500 checks/month is enough - especially if depositors need multiple iterations of checking and re-uploading to make their documents compliant.

@ajkiessl
Copy link
Collaborator

Closing since initial planning for this is done. Refer to #1589 for more detailed specs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants