Skip to content

pre-evaluate let bindings and cache evaluations of free vars #1801

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 17, 2025

Conversation

imalsogreg
Copy link
Contributor

@imalsogreg imalsogreg commented Apr 17, 2025

Important

Pre-evaluate let bindings and cache evaluations of free variables in eval_expr.rs to improve performance.

  • Behavior:
    • Cache evaluations of free variables in EvalEnv using evaluated_cache.
    • Modify beta_reduce() in eval_expr.rs to pre-evaluate let bindings and substitute evaluated values into the body before further evaluation.
  • Concurrency:
    • Use Arc<Mutex<HashMap<Name, Expr<ExprMetadata>>>> for evaluated_cache in EvalEnv to handle concurrent access.
  • Misc:
    • Initialize evaluated_cache in BamlRuntime in lib.rs.

This description was created by Ellipsis for 113bb91. It will automatically update as commits are pushed.

Copy link

vercel bot commented Apr 17, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
baml ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 17, 2025 5:23am

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to ca4cf33 in 1 minute and 30 seconds

More details
  • Looked at 94 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 6 drafted comments based on config settings.
1. engine/baml-runtime/src/eval_expr.rs:258
  • Draft comment:
    Caching free var evaluations using RefCell is not thread-safe and may cause issues if beta_reduce is awaited concurrently. Consider using a Mutex or other async-safe cell.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
    This is a tricky case. The code is async but that doesn't necessarily mean it runs in parallel threads. We need to consider:
  1. Is beta_reduce actually called concurrently?
  2. Is RefCell actually unsafe here?
  3. Do we have enough context to be sure either way?
    Looking at the full file, I don't see clear evidence that beta_reduce is called concurrently from multiple threads. The async seems to be for handling LLM API calls sequentially.
    I may be missing evidence of concurrent usage elsewhere in the codebase. The async nature means it's possible for concurrent calls even if not immediately visible here.
    While concurrent usage is theoretically possible, without clear evidence that this code needs to be thread-safe, suggesting a thread-safety fix could be premature optimization. RefCell is a perfectly valid choice for single-threaded async code.
    The comment raises a theoretical concern but lacks strong evidence that thread-safety is actually needed here. We should not keep speculative comments without clear evidence of the issue.
2. engine/baml-runtime/src/eval_expr.rs:136
  • Draft comment:
    Let binding handling now evaluates and substitutes the bound expression. Ensure that the use of body.close and subst2 correctly handles variable capture and edge cases.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50%
    None
3. engine/baml-runtime/src/eval_expr.rs:136
  • Draft comment:
    Pre-evaluation of let bindings: you now evaluate the bound expression and substitute it into the closed body. Verify that using a fixed target (VarIndex { de_bruijn: 0, tuple: 0 }) works correctly in all nested scopes and preserves semantics (especially if side‐effects or lazy behavior were expected).
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
    The comment is asking the PR author to verify that a specific change works correctly in all nested scopes and preserves semantics. This falls under the rule of not asking the PR author to ensure behavior is intended or to double-check things. Therefore, this comment should be removed.
4. engine/baml-runtime/src/eval_expr.rs:258
  • Draft comment:
    Caching free variable evaluations: the free var branch now checks and stores results in evaluated_cache. Ensure that potential cycles or recursive dependencies are handled and that using RefCell for caching is safe in your intended async/threading context.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
    The comment raises valid concerns about thread safety and recursive dependencies, but it's more of a speculative "what if" comment asking for verification rather than pointing out a clear issue. The comment starts with "Ensure that..." which is a red flag. The code appears to be single-threaded async code where RefCell is commonly used, and the caching implementation looks standard.
    The comment identifies real potential issues that could cause problems if not handled correctly. Cycles in expression evaluation could cause infinite loops.
    While the concerns are valid, the comment is asking for verification rather than pointing out a clear issue. It's speculative and not actionable without specific evidence of problems.
    Delete this comment as it violates the rule against asking authors to verify or ensure things. If there are specific issues with RefCell or cycles, they should be pointed out directly with evidence.
5. engine/baml-runtime/src/lib.rs:560
  • Draft comment:
    Initialization of EvalEnv: the evaluated_cache field is initialized with a new HashMap. Confirm that each EvalEnv instance gets its own cache to prevent leakage across evaluations.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
6. engine/baml-runtime/src/lib.rs:711
  • Draft comment:
    Typographical error: In the comment discussing the request body, 'requeset' is misspelled. It should be 'request'.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_Zcie2kXQf9Hif94K


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on 834561f in 1 minute and 50 seconds

More details
  • Looked at 95 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 6 drafted comments based on config settings.
1. engine/baml-runtime/src/eval_expr.rs:259
  • Draft comment:
    The caching logic for free variables looks correct, but be cautious about potential recursive bindings. Ensure that cyclic dependencies are handled appropriately so that caching doesn’t freeze evaluation.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 30% <= threshold 50%
    The comment is suggesting caution about potential recursive bindings in caching logic, which could lead to issues if not handled properly. However, it is phrased in a way that asks the author to ensure something, which violates the rule against asking the author to ensure behavior is intended or tested. The comment could be rephrased to suggest a specific action or ask a specific question about the implementation.
2. engine/baml-runtime/src/eval_expr.rs:137
  • Draft comment:
    The new let-binding handling via direct evaluation and substitution is clearer. Confirm that the 'close' and subst2 steps correctly handle variable capture and scoping.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
    This comment is asking the PR author to confirm their intention and ensure the behavior is intended, which violates the rules. It does not provide a specific code suggestion or ask for a specific test to be written.
3. engine/baml-runtime/src/eval_expr.rs:137
  • Draft comment:
    Let binding now pre-evaluates the bound expression and substitutes it in the body. Confirm that this evaluation order and substitution behavior match the intended semantics.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
    The comment is asking the PR author to confirm the evaluation order and substitution behavior, which violates the rule against asking for confirmation of intention or behavior. It does not provide a specific code suggestion or ask for a specific test to be written.
4. engine/baml-runtime/src/eval_expr.rs:259
  • Draft comment:
    Caching free variables using a RefCell works in a single-threaded context. If EvalEnv may be used concurrently, consider switching to a Mutex for thread safety.
  • Reason this comment was not posted:
    Marked as duplicate.
5. engine/baml-runtime/src/lib.rs:564
  • Draft comment:
    The evaluated_cache is instantiated with RefCell here. Ensure that if multiple expressions run in parallel, you replace RefCell with a thread-safe primitive such as Mutex.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
    The comment makes a speculative suggestion about potential thread safety issues. However, there's no clear evidence that this code runs in parallel - it appears to be part of a single expression evaluation context. RefCell is appropriate for single-threaded runtime borrowing. Without seeing how EvalEnv is actually used across threads (if at all), we can't definitively say Mutex is needed.
    I could be wrong about the threading model - there could be parallel expression evaluation happening that I'm not seeing in this code. The comment author may have more context about the runtime architecture.
    Even with that possibility, the comment is speculative ("if multiple expressions run in parallel") rather than pointing to a concrete issue. We should only suggest thread-safety changes when we have clear evidence of parallel execution.
    The comment should be deleted as it makes a speculative suggestion about thread safety without concrete evidence that parallel execution is actually happening or planned.
6. engine/baml-runtime/src/lib.rs:711
  • Draft comment:
    Typographical error: "requeset" should be corrected to "request".
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_t6pCOE00MAUoaol5


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

// e.g. (let x = y in f) => (\x y => f)
// TODO: Should rewriting be done here in beta_reduce? Or elsewhere?
// First evaluate the bound expression
let evaluated_value = Box::pin(beta_reduce(env, value)).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review the use of Box::pin on beta_reduce calls. If beta_reduce already returns an async future, you might simplify by awaiting it directly.

Suggested change
let evaluated_value = Box::pin(beta_reduce(env, value)).await?;
let evaluated_value = beta_reduce(env, value).await?;

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on 113bb91 in 1 minute and 22 seconds

More details
  • Looked at 88 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 7 drafted comments based on config settings.
1. engine/baml-runtime/src/eval_expr.rs:258
  • Draft comment:
    Using std::sync::Mutex in async code may block executor threads. Consider using an async-aware mutex (e.g. tokio::sync::Mutex) for evaluated_cache.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50%
    This is a legitimate concern - using std::sync::Mutex in async code can block threads if the lock is held during an await. However, in this case the lock is only held very briefly to access a cache, with no awaits while holding the lock. The performance impact would be minimal. The suggestion would make the code more idiomatically correct but isn't fixing a serious issue.
    I could be underestimating the performance impact. Even brief blocking could add up with many concurrent requests. The code could be modified in the future to hold the lock longer.
    The current usage is safe and performant enough for the simple caching use case. The suggestion is more about code style than fixing a real problem.
    While technically correct, this comment suggests a change that would have minimal real-world impact given the current usage pattern. The comment should be removed as it doesn't highlight a significant issue requiring action.
2. engine/baml-runtime/src/lib.rs:560
  • Draft comment:
    The evaluated_cache is initialized as a new Mutex in each EvalEnv instance. Verify that this cache scope is intended.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50%
    None
3. engine/baml-runtime/src/eval_expr.rs:21
  • Draft comment:
    New evaluated_cache field added. Consider mentioning concurrency trade-offs (using Mutex) in the docs if heavy concurrent evaluations are expected.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None
4. engine/baml-runtime/src/eval_expr.rs:135
  • Draft comment:
    Let binding evaluation now pre-evaluates the bound expression and substitutes it. Verify that the closing and substitution (using body.close and subst2) correctly preserve variable scoping and avoid capture issues.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
    The comment is asking the PR author to verify that the changes correctly preserve variable scoping and avoid capture issues. This falls under the category of asking the author to double-check things, which is against the rules.
5. engine/baml-runtime/src/eval_expr.rs:257
  • Draft comment:
    Caching free variable evaluations by using a Mutex lock with unwrap(). Consider handling lock poisoning gracefully or documenting that unwrap() is acceptable in this context.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
    This is a code quality comment about error handling. While unwrap() usage can be dangerous, in this context:
  1. Lock poisoning is extremely unlikely since the Mutex only guards a simple cache
  2. If lock poisoning did occur, it would just mean cache misses, not critical failures
  3. The code is already handling many other error cases properly with ? operator
  4. Adding lock poison handling would add complexity for little benefit
    I could be underestimating the importance of proper Mutex error handling. Lock poisoning, while rare, could indicate other serious issues in the program.
    While lock poisoning can indicate issues, in this case it's just a cache - the worst case is slightly degraded performance. The code can continue functioning without the cache.
    The comment should be deleted. The unwrap() usage is acceptable for a cache implementation where lock poisoning would only impact performance, not correctness.
6. engine/baml-runtime/src/lib.rs:559
  • Draft comment:
    Initializes evaluated_cache per call in EvalEnv. Confirm if caching should be instance-scoped per evaluation or shared across calls.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None
7. engine/baml-runtime/src/lib.rs:710
  • Draft comment:
    Typo detected: In the comment at line 710, 'requeset' should be corrected to 'request' for clarity.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_KJwKoFt9jeYtZpru


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

// e.g. (let x = y in f) => (\x y => f)
// TODO: Should rewriting be done here in beta_reduce? Or elsewhere?
// First evaluate the bound expression
let evaluated_value = Box::pin(beta_reduce(env, value)).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Box::pin(...).await is unnecessary here; beta_reduce is already async so you can simply await its result.

Suggested change
let evaluated_value = Box::pin(beta_reduce(env, value)).await?;
let evaluated_value = beta_reduce(env, value).await?;

@imalsogreg imalsogreg added this pull request to the merge queue Apr 17, 2025
Merged via the queue into canary with commit c2b3c43 Apr 17, 2025
11 of 12 checks passed
@imalsogreg imalsogreg deleted the greg/thunk-2 branch April 17, 2025 05:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant