-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize PutFS in text.z80 #53
base: master
Are you sure you want to change the base?
Conversation
This is what I want to do to the rest of the document.
Update branch
Optimized the Text Routine and added it to the pull request. You can test it if you'd like :) |
Five bytes smaller. I think... |
I don't know how fast this one is but it is a little more optimized.
I would like to remove the |
I added in some improvements to this version. If I calculated right the Clock Cycles are 839 for PutRight and 834 for PutLeft
I have used ixl half register as a temp storage for the mask. It reduced 1 byte and removed more clock cycles.
For code like that, I do something like this:
That goes from 8 bytes, 59cc to 8 bytes, 33cc|41cc (average is 33.375 though as the 41cc only happens 12/256 times (~4.7%) on average). Your newest text code didn't quite work for me when I compiled it, but here is what I came up with for the PutFS routine
I reorganized some of the beginning code in PutFS. With the somewhat recent text updates, Grammer (finally) supported archived fonts, but I basically patched the PutFS code instead of reorganizing it to be more optimal. So now it reads the char data from flash to a fixed location so it doesn't have to keep track of the char # or pointer. It then updates text coordinates, and directly proceeds to convert those to an offset into the graphics buffer. I tweaked that calculation to save a few more bytes and clock cycles by taking advantage of the Y-coordinate being less than 64. Then we get into the actual drawing of the char were I use your idea of calling a common a put/shiftput subroutine, but instead of using B as a counter and looping 3 times, I just Over all, the code that actually draws the char is about 141.25cc faster than your latest routine (and ~263.25cc than the original), and I didn't calculate the clock cycles saved from my changes to the load/coord/calculate stages. Your version is 6 bytes smaller than mine and a full 19 bytes smaller than the original, but currently I like the above version more. (Side note: While I was typing all of this up, I saw your trick with using (de) to restore the byte, and by using that in my code, saved 3 bytes and 6cc, nice! Since that also frees up a variable, I'm hoping to find even more optimizations, so I'll edit this comment.) EDIT: That |
I think these modifications save bytes and clock cycles. I can't test it atm but I'm almost positive unless there is something I'm missing. I added some of the operations to the routine instead of the overhead loop as I think they should do the same thing but with less bytes.
Edit: I tested and it didn't work. So time to try again until I get it. I know the c optimization works because c isn't being used by anything else so it should stay the same no matter what. Edit2: I removed the extra bytes and added my trick into the routine. I also fixed a superb amount of extra stuff that wasn't needed in the loop area that was used twice. I took those and put them into the Edit3 |
Your optimization is almost too superior but that's okay. It is fast! That's my final push. I think. |
No description provided.