Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DFA that tracks phrases of power #2

Open
Vlad-Shcherbina opened this issue Aug 7, 2015 · 5 comments
Open

DFA that tracks phrases of power #2

Vlad-Shcherbina opened this issue Aug 7, 2015 · 5 comments

Comments

@Vlad-Shcherbina
Copy link
Owner

It consumes characters one by one, and after each character increments some of 18 counters, each corresponding to number of occurrences of individual power phrases.

I expect this automaton to have not too many states (less than twice the total length of all power words).

I think it'll be very useful for dynamic programming approximations to the solution.

@mdragus
Copy link
Collaborator

mdragus commented Aug 7, 2015

I will be working on this.

@mdragus
Copy link
Collaborator

mdragus commented Aug 8, 2015

This is submitted. There is a stress test and a few normal tests. The only possible issue is the fact that currently if the word is "aa" and the text is "aaaaaa" we get a total of 3 matches, not 5 (which we don't know if it is correct.)

@mdragus
Copy link
Collaborator

mdragus commented Aug 8, 2015

FullDfa is the only

Usage:

from production.dfa import FullDfa
x = FullDfa(["aa","aaaa"], ['a','b'])
x.get_final_state_lookup()
{frozenset({'aaaa_3_a', 'aa_2_a'}): ['aa'], frozenset({'aa_2_a', 'aaaa_2_a'}): ['aa'], frozenset({'aaaa_4_a', 'aa_2_a'}): ['aa', 'aaaa']}
x.get_dfa()
{frozenset({'aaaa_3_a', 'aa_2_a'}): {'b': frozenset({'start'}), 'a': frozenset({'aaaa_4_a', 'aa_2_a'})}, frozenset({'aa_1_a', 'aaaa_1_a'}): {'b': frozenset({'start'}), 'a': frozenset({'aa_2_a', 'aaaa_2_a'})}, frozenset({'aa_2_a', 'aaaa_2_a'}): {'b': frozenset({'start'}), 'a': frozenset({'aaaa_3_a', 'aa_2_a'})}, frozenset({'start'}): {'b': frozenset({'start'}), 'a': frozenset({'aa_1_a', 'aaaa_1_a'})}, frozenset({'aaaa_4_a', 'aa_2_a'}): {'b': frozenset({'start'}), 'a': frozenset({'aaaa_4_a', 'aa_2_a'})}}
x.add_letter('a')
x.add_letter('a')
x.add_letter('a')
x.add_letter('a')
x.add_letter('a')
x.get_scores()
['aa', 'aa', 'aa', 'aaaa', 'aa', 'aaaa', 'aa', 'aaaa']

@peluche
Copy link
Collaborator

peluche commented Aug 8, 2015

working on adding hypothesis tests

@peluche
Copy link
Collaborator

peluche commented Aug 8, 2015

Tests added and passing

I changed the API a bit,

  • previously .get_scores() returned a list of all the matches in order of matching.
  • now it only return a dict of {match: number_of_matches} to prevent wasting memory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants