Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Transaction panics in endTransaction with "transaction not found" #41281

Closed
buehlefs opened this issue Aug 23, 2023 · 5 comments · Fixed by #41506
Closed

[Bug]: Transaction panics in endTransaction with "transaction not found" #41281

buehlefs opened this issue Aug 23, 2023 · 5 comments · Fixed by #41506
Assignees
Labels
Lang/Transactions Ballerina Transaction and its implementation related issued Team/CompilerFE All issues related to Language implementation and Compiler, this exclude run times. Type/Bug
Milestone

Comments

@buehlefs
Copy link

Description

Transactions can panic in endTransaction when many transactions are invoked from multiple threads concurrently.

Steps to Reproduce

Perform many transactions rapidly from multiple threads (e.g. using scheduled tasks)

Minimal code example (click to expand)

The sqlite DB contains a single table "hello" with a string column "world".

import ballerina/io;
import ballerina/sql;
import ballerinax/java.jdbc;
import ballerina/task;
import ballerina/http;

sql:ConnectionPool pool = { 
    maxOpenConnections: 1, 
    maxConnectionLifeTime: 30,
    minIdleConnections: 0
};

final jdbc:Client testDB = check new jdbc:Client("jdbc:sqlite:test.db", connectionPool = pool);

type WorldRecord record {|
    string|() world;
|};

// set simulationError to true to get different stacktrace
public isolated function dbTest(boolean simulateError = false) returns error? {
    transaction {
        stream<WorldRecord, error?> result = testDB->query(`SELECT world FROM hello;`);
        check result.close();
        var _ = check testDB->execute(`INSERT INTO hello VALUES ('test')`);
        result = testDB->query(`SELECT world FROM hello;`);
        check result.close();
        if (simulateError) {
            rollback;
        } else {
            check commit;
        }
    }
    transaction {
        var _ = check testDB->execute(`DELETE FROM hello WHERE world='test'`);
        check commit;
    }
}

public isolated class TestTask {

    *task:Job;
    private int counter = 0;
    private final string name;

    function init(string name) {
        self.name = name;
    }

    public isolated function execute() {
        int count = 0;
        lock {
            count = self.counter.cloneReadOnly();
            self.counter += 1;
        }
        io:println(string`Task: ${self.name},  Invocation:  ${count}`);

        var err = trap dbTest();

        if ! (err is ()) {
            io:println("Error in task " + self.name);
            io:println(err);
            io:println(err.stackTrace());
            panic err; // remove task after first error
        }
    }

    public isolated function schedule(decimal interval) {
        do {
            var _ = check task:scheduleJobRecurByFrequency(self, interval);
        } on fail error err {
            io:println(err);
        }
    }
}


service / on new http:Listener(9097) {
    resource function get .() returns json {
        return {
            hello: "world"
        };
    }
}


public function main() {
    var task1 = new TestTask("a    ");
    var task2 = new TestTask(" b   ");
    var task3 = new TestTask("  c  ");
    var task4 = new TestTask("   d ");
    var task5 = new TestTask("    e");

    decimal interval = 0.1; // adjust to change error occurrence speed
    task1.schedule(interval);
    task2.schedule(interval);
    task3.schedule(interval);
    task4.schedule(interval);
    task5.schedule(interval);
}

Stacktrace:

// error("Transaction: 0afa6c8f-84cd-4b32-91c3-1e0384befbab:3 not found")
[
    {"callableName": "endTransaction", "moduleName": "ballerinai.transaction.0", "fileName": "transaction_block.bal", "lineNumber": 115},
    {"callableName": "dbTest", "moduleName": "test.playground.0", "fileName": "main.bal", "lineNumber": 59},
    {"callableName": "execute", "moduleName": "test.playground.0.TestTask", "fileName": "main.bal", "lineNumber": 101}
]

If simulateError=true the stacktrace is often different:

// error("{ballerina}TypeCastError",message="incompatible types: '()' cannot be cast to '(readonly & lang.transaction:InfoInternal)'")
[
    {"callableName": "info", "moduleName": "ballerina.lang.transaction.0", "fileName": "transaction.bal", "lineNumber": 69},
    {"callableName": "dbTest", "moduleName": "test.playground.0", "fileName": "main.bal", "lineNumber": 76},
    {"callableName": "execute", "moduleName": "test.playground.0.TestTask", "fileName": "main.bal", "lineNumber": 101}
]

Another possibly related stacktrace (from a different program):

time = 2023-08-22 14:42:02.708 message = Unable to execute the job[817534]. java.lang.NegativeArraySizeException
error: Transaction: 5a3a0ff3-e1c1-4dd1-9998-52b4a911f385:15 not found
        at ballerinai.transaction.0:endTransaction(transaction_block.bal:115)
           qhana.qhana_backend.0.$anonType$_0:$get$experiments$^$data$^$download(main.bal:606)

The errors seem to suggest that initiatedTransactions is not handled threadsafe everywhere.

Affected Version(s)

Ballerina 2201.7.2 (Swan Lake Update 7)

OS, DB, other environment details and versions

OS: Ubuntu 20.04 running inside WSL2

JDBC Connector used:

[[platform.java11.dependency]]
artifactId="sqlite-jdbc"
version="3.36.0.1"
groupId="org.xerial"

Related area

-> Runtime

Related issue(s) (optional)

Originally reported as https://github.com/ballerina-platform/module-ballerinai-transaction/issues/524, resubmitting here as the issue got no attention.

Suggested label(s) (optional)

No response

Suggested assignee(s) (optional)

No response

@ballerina-bot ballerina-bot added the Team/jBallerina All the issues related to BIR, JVM backend code generation and runtime label Aug 23, 2023
@warunalakshitha warunalakshitha added Team/CompilerFE All issues related to Language implementation and Compiler, this exclude run times. Lang/Transactions Ballerina Transaction and its implementation related issued and removed Team/jBallerina All the issues related to BIR, JVM backend code generation and runtime labels Aug 24, 2023
@buehlefs
Copy link
Author

Any updates on this issue? Because the transaction just ends with a panic this bug can result in unexpected crashes. The panics are also hard to trap as you essentially need to trap the call to the function that creates the transaction, which is not convenient if the transaction is directly inside a service function.

@gimantha
Copy link
Contributor

gimantha commented Sep 25, 2023

Any updates on this issue? Because the transaction just ends with a panic this bug can result in unexpected crashes. The panics are also hard to trap as you essentially need to trap the call to the function that creates the transaction, which is not convenient if the transaction is directly inside a service function.

Hi @buehlefs,
We have been working on few prioritised issues last few weeks, hence the delay in replying. We will soon start working on this issue. Thank you for providing the minimal code for reproducing the issue.

Copy link

github-actions bot commented Jan 3, 2024

This issue is NOT closed with a proper Reason/ label. Make sure to add proper reason label before closing. Please add or leave a comment with the proper reason label now.

      - Reason/EngineeringMistake - The issue occurred due to a mistake made in the past.
      - Reason/Regression - The issue has introduced a regression.
      - Reason/MultipleComponentInteraction - Issue occured due to interactions in multiple components.
      - Reason/Complex - Issue occurred due to complex scenario.
      - Reason/Invalid - Issue is invalid.
      - Reason/Other - None of the above cases.

@buehlefs
Copy link
Author

I am still able to reproduce a transaction not found error in Ballerina 2201.8.5 (Swan Lake Update 8):

error("Transaction: 32d30053-f3e9-4015-aeb7-e825ecf75538:1 not found")

    callableName: endTransaction moduleName: ballerinai.transaction.0 fileName: transaction_block.bal lineNumber: 115
    callableName: dbTest moduleName: test.playground.0 fileName: main.bal lineNumber: 30        # check (trap commit);
    callableName: checkForPanic moduleName: test.playground.0.TestTask fileName: main.bal lineNumber: 78        # var err = dbTest(false);
    callableName: execute moduleName: test.playground.0.TestTask fileName: main.bal lineNumber: 66        # var err = trap self.checkForPanic();

The error seems to be much less likely to occur but is apparently not fixed entirely. I am able to trigger it relatively reliable with a single scheduled task with a ridiculously low interval of 0.001 (when all transactions commit normally). If more tasks scheduled the error can still occur, but the connection pool is more likely to exhaust (see #42242) before that happens.

@dsplayerX
Copy link
Contributor

Hi @buehlefs,

The bug has already been addressed. The decreased occurrence might be due to this partial fix rolled out with 2201.8.5. Unfortunately, this fix wasn't rolled out with version 2201.8.5. It will be included in the upcoming releases: 2201.8.6 and 2201.9.0.

As per the issues #42241 and #42242, we will investigate them soon. Thank you for bringing them to our attention!

@dsplayerX dsplayerX added this to the 2201.8.6 milestone Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Lang/Transactions Ballerina Transaction and its implementation related issued Team/CompilerFE All issues related to Language implementation and Compiler, this exclude run times. Type/Bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants