Go package to implement the whosonfirst/go-dedupe/embeddings.Embedder
interface using the mlx_clip Python package and Apple's MLX
libraries.
Documentation (in particlar godoc
) is incomplete at this time.
Error handling removed for the sake of brevity.
import (
"context"
_ "github.com/sfomuseum/go-embeddings-mlxclip"
"github.com/whosonfirst/go-dedupe/embeddings"
)
func main() {
ctx := context.Background()
// Set setup notes below for details (they are important)
emb_uri := "mlxclip:///path/to/your/embeddings.py"
emb, _ := embeddings.NewEmbedder(ctx, emb_uri)
embeddings, _ := emb.Embeddings(ctx, "Hello world")
// Do something with embeddings here...
}
The MLXClipEmbedder
package requires that you build your application with the mlxclip
build tag. For example:
$> go build -tags mlxclip -o yourapp cmd/yourapp/main.go
This package assumes that you have already installed and configured the mlx_clip Python library and all its dependencies (including the need for the code to be run on Apple Silicon hardware).
It is still the case that "installing [insert machine-learning thing here] and all its dependencies" can be a challenge so there is no attempt to automate it yet. If you can run the embeddings.py
script, described below, from the command-line then the rest of this package should work as documented.
What follows is the "simplest and dumbest" embeddings.py
script possible. You can write your own version, and call it whatever you want. The only requirements are that the script accept (3) ordered input parameters. They are:
- The "target" for the embedding types. Valid options are: image, text.
- The "input" data to process. If
target
is "text" then this value is a string. Iftarget
is "image" then this value is the path to an image on the local disk. - The "output" file where JSON-encoded embeddings should be written to the local disk.
For example:
$> python3 ./embeddings.py text "hello world" /tmp/mlx-tmp-1234.json
For example:
from mlx_clip import mlx_clip
import sys
import json
if __name__ == "__main__":
model_dir = "/usr/local/src/mlx-examples/clip/mlx_model"
clip = mlx_clip(model_dir)
target = sys.argv[1]
input = sys.argv[2]
output = sys.argv[3]
with open(output, "w") as wr :
if target == "image":
image_embedding = clip.image_encoder(input)
json.dump(image_embedding, wr)
else :
text_embedding = clip.text_encoder(text)
json.dump(text_embedding, wr)