Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package, localize to EN & improve #1

Open
Helveg opened this issue May 22, 2020 · 5 comments
Open

Package, localize to EN & improve #1

Helveg opened this issue May 22, 2020 · 5 comments

Comments

@Helveg
Copy link

Helveg commented May 22, 2020

Hi, I love the concept and want to contribute!

Could you tell me what it currently does? It seems to load German word lists and seems to use pypandoc to convert an input file to a plain text file. The plain text file is then analysed, but what exactly is the analysis? Spell check?

I'd like to help you package this into a python package that we can distribute on PyPI! I assume this will work both as a CLI tool & library (for automation tools etc, can you imagine, PaperCI, continuous integration for your academic papers hehe 😛)

Apart from packaging I'd also like to use this to generate repositories with structures suited for making papers (with for example a provided plots & figures folder where you can automatically build your python plot scripts into high res figures) and ofcourse make sure that atleast the English locale is supported.

How does that sound? If you're no longer interested in the project would you mind if I continued this myself?

@pwab
Copy link
Owner

pwab commented May 22, 2020

Hi @Helveg,
thanks for your interest in this (quite simple) script. I haven't touched it for 3 years now but well this could change I guess 😄.

Could you tell me what it currently does? It seems to load German word lists and seems to use pypandoc to convert an input file to a plain text file. The plain text file is then analyzed, but what exactly is the analysis? Spell check?

Well you almost hit the point of it. Let me explain the main problem. When writing abstracts, papers and other stuff that have to be changed a lot in the process of making you often compress sentences to put something straight. Two problems arose then when I did that:

  • I always used some phrases and words too frequently and I wanted to simply check for quantity (and maybe compare the 'word count per overall words' to my average of a proofread text)
  • Some words and phrases shouldn't be used in a scientific paper - or at least you have to be very careful when doing so (e.g. 'believe', 'think', 'often', 'rarely' etc.)

So I wrote this little script that should be able to:

  • Convert different file types to plain text that can be analyzed (at least docx, md and tex)
  • Count words and show me a table
  • Search for so called 'attention words' and point to the position where they appear

I assume this will work both as a CLI tool & library (for automation tools etc, can you imagine, PaperCI, continuous integration for your academic papers hehe 😛)

That would be a very good idea. I always thought about how to make it more user-friendly:

  • Just export a report file?
  • Add a gui?
  • Integrate it into an editor like VS-Code or Atom?

But I always got to points where you have to leave your used writing program (e.g. Word) or it would be a lot of work to create plugins for every editor I use (and I use a lot 😅).
I'm used to linters and other CI tools for programming stuff. I already read that there might be some for prose text but never used one. Would be a cool thing to work with I guess.

How does that sound? If you're no longer interested in the project would you mind if I continued this myself?

I would really like some help here. Maybe starting with some research if there are already some tools for this kind of stuff that can be adapted to scientific needs. And then we could work out a roadmap how this could be achieved.

Please don't hesitate to use everything of this project as you wish. I would be glad if my few lines could help somebody in any way.

@Helveg
Copy link
Author

Helveg commented May 22, 2020

Just export a report file?

I think a report file would be great, as other tools (including our first-party tools like maybe a GitHub Action) can read that report and use it however they want. For example as a part of a CI pipeline you could fail your build or send a Slack notification if the report files contains any reports of overused attention words, things like that.

A well described interface would be the key to success for a report file

Add a gui?
Integrate it into an editor like VS-Code or Atom?

I think those could all be built upon the CLI tool as seperate programs.

Maybe to move from a script to a CLI tool we can think up a quick structure. I was thinking about pluggable actions:

  • We create a top level paperpy command and provide a set of Actions ourselves like:
    • lint: Run configurable linting/formatting
    • report: Run configurable sets of analyses and produce a report
    • plot/figures: Run the plots scripts, use --build to build high resolution static image files
    • build: Generate your paper in some output format
  • Use pkgresources to discover entry points like paperpy.action or paperpy.analysis so that others can plug in and distribute their own tools they like.

If we can provide just that basic architecture and these actions, maybe support for the most common document formats (latex, docx) and output formats (pdf, html?) then we got a nice tool on our hands.

Maybe starting with some research if there are already some tools for this kind of stuff that can be adapted to scientific needs

I'll have a look

@Helveg
Copy link
Author

Helveg commented May 22, 2020

I've created an organization and repository for this project. I've invited you as owner.

Check it out over at https://github.com/paperpy/paperpy

@pwab
Copy link
Owner

pwab commented May 24, 2020

Great job. I can see a good start over there (ci, coverage, docs, tests).

I'm no python expert or something and I really need to recap some basics of repository management. But good things need time and I'm pretty sure I'll catch up later.

So how about moving #2 to the new repo and then close this one here?

@Helveg
Copy link
Author

Helveg commented May 24, 2020

Good idea! If you accept the invitation to the org I can involve you a bit. It would be good for example if you could convert your analysis into an action (subcommand) at your own pace :) It would also give me some initial pointers on how clear I made the documentation ^_^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants