Skip to content
This repository has been archived by the owner on Dec 10, 2022. It is now read-only.

Data store/cache service improvements needed #21

Open
tdmalone opened this issue Jun 5, 2017 · 10 comments
Open

Data store/cache service improvements needed #21

tdmalone opened this issue Jun 5, 2017 · 10 comments

Comments

@tdmalone
Copy link
Owner

tdmalone commented Jun 5, 2017

v0.0.38 added support for AWS being used for the data store/data cache. This means that Slackémon can now finally be used on a ephemeral filesystem service like Heroku.

However, there are issues inherent in this:

  • AWS is an eventually consistent filesystem, which theoretically could cause some player and battle data to be lost, overridden, or to temporarily step back in time.
  • Slackémon was originally written around using the filesystem as a data store, so it is quite liberal in its opening of and writing to files. While this doesn't seem to cause too much latency when the calls are coming from nearby (eg. Heroku is also on AWS), it can quickly add up AWS costs.
  • There is no concept of file locking built in. This is already an issue that exists even when Slackémon is run with its own local data store: a player trying to perform certain actions simultaneously, across different Slack messages or button actions, could potentially find they lose some data or even worse, corrupt their player JSON file if it is read by a local system while it is being written to.

Slackémon currently has separate settings for an image cache and a data cache, they can both be set to either 'local' or 'aws' independently. However, we're utilising the data cache as both a data cache and a data store.

We likely could solve the above issues by:

  1. When 'aws' is set as the data cache, also using a secondary local cache to reduce reads from AWS for the life of the current filesystem.
  2. Implementing an additional option for the data store, allowing either 'local' or an alternative store such as Redis or Postgres.
    • Slackémon currently uses JSON files to store player and battle data (one per player and one per battle) so it would be ideal, to avoid having to rewrite alot, to stick somewhat to this method if possible. However, perhaps each individual player's Pokémon could be separated into a separate file to prevent single JSON strings from ever getting too big.
    • The method chosen would need to be easy to set up and support. Both Redis and Postgres are simple on Heroku; we would just need to look into Docker options too.
  3. Implementing some sort of file lock for the data store, whether 'local' or the above alternative option is in use.

In addition, while in-memory caching is already in use for most data cache reads (eg. Pokémon & species data, move data, item data) and data store reads (player data and battle data), we could also probably reduce writes by ensuring that there's one save to a player file per request (at the moment there are sometimes two especially due to the slackemon_add_xp() function being called while something else requiring a save is also happening, such as catching a Pokémon).

@tdmalone tdmalone added this to the Set up of a robust architecture milestone Jun 5, 2017
@tdmalone tdmalone self-assigned this Jun 10, 2017
@tdmalone
Copy link
Owner Author

I'm currently working on implementing Postgres as an option for the data store, using docker postgres for dev and intending to use the free hobby plan of Postgres on Heroku for prod (allows up to 10k rows).

Redis is another potential option but is only free on Heroku without data persistence, so that's on the backburner for now.

10k row limit for free Heroku prod should be fine providing we can handle large JSON strings - in the current structure, that's 1 per player, and 1 each per spawn instance and battle instance both of which can be pruned.

@Naramsim
Copy link
Collaborator

Hey, @tdmalone, I see the project is expanding 😄

Maybe it's time to create a docker-compose.yml. Basically, it is a file processed by Docker Compose (on Linux it is not shipped with Docker) which creates several containers at the same time and orchestrate them. So we could split the project into the following images/containers:

  • Apache2 / Nginx (you choose)
  • Php7
  • PostgreSQL

@tdmalone
Copy link
Owner Author

@Naramsim I'm glad you've suggested that! I had a bit of a read into Compose today but it looked too complicated for me to understand at first, so to start development on this I just installed PG and manually linked the containers together. It was a challenge to get the PHP PG extension installed but I got there in the end!

If you could help with the docker-compose.yml that would be fantastic. I would prefer Apache for now but maybe we can move to nginx later - I know it's faster but just trying to manage how much I take on that I haven't used before ;)

@tdmalone
Copy link
Owner Author

(We could even look at Alpine Linux maybe to try to reduce the size later, but I'm not sure if that would just increase complexity with getting things working)

@Naramsim
Copy link
Collaborator

I would not stick with alpine-linux, and installing php/apache/postgre on it.
I would prefer use three containers that are based on Alpine. That are

So, in the end, we would have a system composed by three small (and secure) containers, which are started automatically with the command docker-compose up -d

What do you think?

@tdmalone
Copy link
Owner Author

I think that sounds perfect!

@tdmalone
Copy link
Owner Author

Postgres should now be working in https://github.com/tdmalone/slackemon/tree/data-store as an option for SLACKEMON_DATA_STORE_METHOD. I just need to test this on Heroku and then will probably push it to master and bump to v0.0.39.

This partially implements point 2 (Postgres data store ability) mentioned above at #21 (comment)

The rest of point 2 (using Postgres on Docker) will be covered by using Docker Compose.

Point 3 (file locking) is probably the next priority, then point 1 (augment AWS cache with local cache).

Of course, Postgres isn't implemented properly at this stage - we're basically treating it like a filesystem with each row being a 'file' and containing a big JSON string. In future we should completely rewrite this, drop support for a local data store, and use the database properly.

@tdmalone
Copy link
Owner Author

tdmalone commented Jun 15, 2017

  • Automatically create database if it doesn't exist yet, for example when run in a new Docker setup
  • Data caching appears it might be broken at the moment, may have been a regression caused by the work in this ticket so far, need to look into
  • Fix regression with $action not sending through full data anymore
  • Verify that slackemon_get_files_by_prefix() is returning in alphanumeric order when the database storage is being used

  • Docker Compose (see Adds Compose #35)
  • Initial file locking work
    • Basic file locking
    • Make locking an optional variable for now, until it is improved
    • User feedback if waiting for a lock to become available
    • Make locking more intelligent to only lock when writing is just about to happen
  • Augment AWS cache with local cache

tdmalone added a commit that referenced this issue Jun 16, 2017
* database-improvements-#21:
  Minor image cache debug tweaks
  Minor: attempt to fix time discrepencies with sleeping Docker containers
  Add SLACKEMON_CACHE_DEBUG constant and update cache debugging to work like the others
  Complete database creation if database does not exist
  Make start/stop/reset shell scripts a bit smarter
  Add basic shell scripts to quickly start/stop/reset docker containers
  Debug reporting tweaks
  WIP: Creating database if not exists
tdmalone added a commit that referenced this issue Jun 25, 2017
* database-improvements-#21:
  Minor: additional error log debugging for PokeAPI retrieval failures
  Minor: fix version compare logic for player data migrations
  Minor: fix file locking config
  WIP: Simplify dealing with player XP for file locking
  Hotfix: rename send2slack in newly merged in function call
  WIP: More efficient file locking for some player data operations
  WIP: Locking/unlocking just as required
  Minor tweaks
  Send waiting message to user during pending file locks
  Minor: rename send2slack to slackemon_send2slack
  Adjust file lock debugging + make file locking optional for now
  Hotfix: ensure when renaming that a new folder is created
  Minor cleanups & battle invite filename fixes
  Implement basic file locking for player & battle data
  Minor: remove commit hash from image URL base to avoid changing/re-caching
  Extend S3 local augmenting to exists & mtime calls; + ensure it only happens on cache not data calls
  Minor: publish postgres port for development Docker container
  Augment S3 cache with temporary local cache
  Minor database debug tweaks + ensure correct order when 'globbing'
tdmalone added a commit that referenced this issue Jun 26, 2017
…or all player operations; battle file locking still to come)

* database-improvements-#21:
  Ensure all remaining uses of player data respect file locking
  Minor: avoid locking errors when a player file is first created
  Minor: file locking support for transferring Pokemon
  Further file locking fixes to battle and happiness updates
  Minor: cleanup leftover code left in evolutions.php
  Implement file locking for further battle, evolution and organising functions
  Deprecate slackemon_add_xp as of next version
  Deprecated function & debug backtrace tools
  WIP: Further file locking for battle ops
  WIP: Implement file locking for some battle operations
  Implement file locking for some organising & player operations
  Implement file locking for item operations
tdmalone added a commit that referenced this issue Jun 26, 2017
* database-improvements-#21:
  Minor tweaks to lock debugging
  Minor: add TODO for later
  Implement file locking for battle data
tdmalone added a commit that referenced this issue Jun 27, 2017
* database-improvements-#21:
  Revert "Minor: speed up battle move & catching waits slightly" - no longer needed now that AWS local cache augmentation is shipping
  Minor improvements to file lock debugging
  Update spawn and file locking default behaviour now that file locking is ready
  Resolve a couple of extra bugs that file locking introduced
tdmalone added a commit that referenced this issue Jun 27, 2017
* database-improvements-#21:
  Minor tweaks to lock debug reporting
tdmalone added a commit that referenced this issue Jun 27, 2017
* database-improvements-#21:
  Further tweaks to file lock debug functions
tdmalone added a commit that referenced this issue Jun 27, 2017
* database-improvements-#21:
  Hotfix: woops, no longer an array - need to remove the join!
tdmalone added a commit that referenced this issue Jun 27, 2017
* database-improvements-#21:
  Hotfix: only get player data for writing if they haven't seen a Pokemon before
tdmalone added a commit that referenced this issue Jun 27, 2017
* database-improvements-#21:
  Fix pokemon held items & evolution after file locking
@tdmalone
Copy link
Owner Author

Most of the work this ticket asks for is now in master!

A few more things to look into to make file locking more robust:

  • Timeout if lock takes too long to acquire
  • Is it worth implementing a queueing system rather than just the first try after lock is available?
  • Is it worth trying to make file_exists() then file_put_contents() atomic? (Probably possible with DB, but what about local filesystem?) Potentially look into locks with flock() and fopen()'s modes.

@tdmalone
Copy link
Owner Author

In addition to the above, there probably needs to be a regular cronned check to expire locks that did not get removed properly - probably after 2 minutes? Or even 1 minute.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants