Skip to content
This repository has been archived by the owner on Jan 10, 2025. It is now read-only.

Batch request embedings from the server for performance emprovement #75

Open
stuartlynn opened this issue Sep 24, 2020 · 0 comments
Open
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@stuartlynn
Copy link
Contributor

Currently we send a request per unique word to the embedding server to get that words embedding vector.

The server supports sending multiple words at a time and getting back the results. We should chunk up the requests to make fewer API calls which should make the embedding fetching quicker.

const get_embedings_from_server = entries => {
let unique_words = new Set();
entries.forEach(entry => {
entry.name.split(' ').forEach(word => {
unique_words.add(word);
});
});
return Promise.all(
Array.from(unique_words).map(entry =>
fetch(
`${
process.env.REACT_APP_API_URL
}/embedding/${entry.toLowerCase().replace(/[\W_]+/g, '')}`,
)
.then(r => r.json())
.then(r => r[0]),
),
);
};

This is the function that will need to be modified to run the queries in batches and then correctly assign the result once the batch has been effected.

Things to consider :

  1. The server might fail if one or more of the words does not have a representation in the corpus. We would need to fix that here :

    smooshr/server/server.py

    Lines 66 to 80 in 8b11ccb

    @app.route('/embedding/<words>')
    def embeding(words):
    conn = get_db()
    try:
    words = words.split(',')
    sql = "select * from embeddings where key in ({seq})".format( seq=','.join(['?']*len(words)))
    result = conn.execute(sql, words)
    result = [ [r[0], r[1].tolist()] for r in result ]
    result = [ {"key": key, "embedding": embed} for key,embed in dict(result).items() ]
    return jsonify(result)
    except:
    return jsonify([])
    if __name__=='__main__':
    print('starting up server')
    app.run(host='0.0.0.0', port=5000, debug=True)

  2. It would be also good to give some feedback on this process that can show in the classification interface to let a user know how much of the embedding has been loaded.

@stuartlynn stuartlynn added good first issue Good for newcomers help wanted Extra attention is needed labels Sep 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant