Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PYTHON-5168 Use logging for client background task errors #2166

Merged
merged 7 commits into from
Mar 5, 2025

Conversation

ShaneHarvey
Copy link
Member

PYTHON-5168 Use logging for client background task errors

Old behavior we printed to sys.stderr:

Traceback (most recent call last):
  File "/Users/shane/git/mongo-python-driver/pymongo/asynchronous/mongo_client.py", line 2073, in _process_periodic_tasks
    await self._topology.update_pool()
  File "/Users/shane/git/mongo-python-driver/pymongo/asynchronous/topology.py", line 692, in update_pool
    await server.pool.remove_stale_sockets(generation)
  File "/Users/shane/git/mongo-python-driver/pymongo/asynchronous/pool.py", line 1219, in remove_stale_sockets
    conn = await self.connect()
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/shane/git/mongo-python-driver/test/asynchronous/test_client.py", line 1803, in fail_connect
    raise Exception("failed to connect")

New behavior we log via logging.exception():

ERROR:pymongo.client:MongoClient background task encountered an error:
Traceback (most recent call last):
  File "/Users/shane/git/mongo-python-driver/pymongo/asynchronous/mongo_client.py", line 2073, in _process_periodic_tasks
    await self._topology.update_pool()
  File "/Users/shane/git/mongo-python-driver/pymongo/asynchronous/topology.py", line 692, in update_pool
    await server.pool.remove_stale_sockets(generation)
  File "/Users/shane/git/mongo-python-driver/pymongo/asynchronous/pool.py", line 1219, in remove_stale_sockets
    conn = await self.connect()
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/shane/git/mongo-python-driver/test/asynchronous/test_client.py", line 1803, in fail_connect
    raise Exception("failed to connect")
Exception: failed to connect

@ShaneHarvey ShaneHarvey requested a review from NoahStapp March 3, 2025 18:46
@ShaneHarvey
Copy link
Member Author

Hmm it looks like this change somehow makes test.test_client.TestClient.test_continuous_network_errors more likely to fail:

 [2025/02/28 13:34:44.280] FAILURE: AssertionError: 46 != 24 within 20 delta (22 difference) ()
 [2025/02/28 13:34:44.280] self = <test.test_client.TestClient testMethod=test_continuous_network_errors>
 [2025/02/28 13:34:44.280]     @unittest.skipIf("PyPy" in sys.version, "PYTHON-2927 fails often on PyPy")
 [2025/02/28 13:34:44.280]     def test_continuous_network_errors(self):
 [2025/02/28 13:34:44.280]         def server_description_count():
 [2025/02/28 13:34:44.280]             i = 0
 [2025/02/28 13:34:44.280]             for obj in gc.get_objects():
 [2025/02/28 13:34:44.280]                 try:
 [2025/02/28 13:34:44.280]                     if isinstance(obj, ServerDescription):
 [2025/02/28 13:34:44.280]                         i += 1
 [2025/02/28 13:34:44.280]                 except ReferenceError:
 [2025/02/28 13:34:44.280]                     pass
 [2025/02/28 13:34:44.280]             return i
 [2025/02/28 13:34:44.280]     
 [2025/02/28 13:34:44.280]         gc.collect()
 [2025/02/28 13:34:44.280]         with client_knobs(min_heartbeat_interval=0.003):
 [2025/02/28 13:34:44.280]             client = self.simple_client(
 [2025/02/28 13:34:44.280]                 "invalid:27017", heartbeatFrequencyMS=3, serverSelectionTimeoutMS=150
 [2025/02/28 13:34:44.280]             )
 [2025/02/28 13:34:44.280]             initial_count = server_description_count()
 [2025/02/28 13:34:44.280]             with self.assertRaises(ServerSelectionTimeoutError):
 [2025/02/28 13:34:44.280]                 client.test.test.find_one()
 [2025/02/28 13:34:44.280]             gc.collect()
 [2025/02/28 13:34:44.280]             final_count = server_description_count()
 [2025/02/28 13:34:44.280]             # If a bug like PYTHON-2433 is reintroduced then too many
 [2025/02/28 13:34:44.280]             # ServerDescriptions will be kept alive and this test will fail:
 [2025/02/28 13:34:44.280]             # AssertionError: 19 != 46 within 15 delta (27 difference)
 [2025/02/28 13:34:44.280]             # On Python 3.11 we seem to get more of a delta.
 [2025/02/28 13:34:44.280] >           self.assertAlmostEqual(initial_count, final_count, delta=20)
 [2025/02/28 13:34:44.280] E           AssertionError: 46 != 24 within 20 delta (22 difference)
 [2025/02/28 13:34:44.280] test/test_client.py:1820: AssertionError

https://spruce.mongodb.com/task/mongo_python_driver_test_rhel8_python3.12_test_8.0_replica_set_noauth_ssl_sync_async_patch_080c1c61212594e4de4792cf055b904a570c4359_67c227d5d6931d00075a1ae1_25_02_28_21_17_11?execution=0&sortBy=STATUS&sortDir=ASC

It could be a coincidence but it happened 3 times in the last PR patch.

@ShaneHarvey
Copy link
Member Author

One more failure:

 [2025/03/04 22:25:45.106] _____________ TestClient.test_background_connections_log_on_error ______________
 [2025/03/04 22:25:45.106] self = <test.test_client.TestClient testMethod=test_background_connections_log_on_error>
 [2025/03/04 22:25:45.106]     def test_background_connections_log_on_error(self):
 [2025/03/04 22:25:45.106]         with self.assertLogs("pymongo.client", level="ERROR") as cm:
 [2025/03/04 22:25:45.106]             client = self.rs_or_single_client(minPoolSize=1)
 [2025/03/04 22:25:45.106]             # Create a single connection in the pool.
 [2025/03/04 22:25:45.106]             client.admin.command("ping")
 [2025/03/04 22:25:45.106]             # Cause new connections to fail.
 [2025/03/04 22:25:45.106]             pool = get_pool(client)
 [2025/03/04 22:25:45.106]             def fail_connect(*args, **kwargs):
 [2025/03/04 22:25:45.106]                 raise Exception("failed to connect")
 [2025/03/04 22:25:45.106]             pool.connect = fail_connect
 [2025/03/04 22:25:45.106]             # Un-patch Pool.connect to break the cyclic reference.
 [2025/03/04 22:25:45.106]             self.addCleanup(delattr, pool, "connect")
 [2025/03/04 22:25:45.106]             pool.reset_without_pause()
 [2025/03/04 22:25:45.106]             wait_until(lambda: len(cm.records) > 0, "start creating connections")
 [2025/03/04 22:25:45.106]             log_output = "".join(cm.output)
 [2025/03/04 22:25:45.106] >           self.assertIn("failed to connect", log_output)
 [2025/03/04 22:25:45.106] E           AssertionError: 'failed to connect' not found in ''
 [2025/03/04 22:25:45.106] test/test_client.py:1770: AssertionError
 [2025/03/04 22:25:45.106] ------------------------------ Captured log call -------------------------------
 [2025/03/04 22:25:45.106] ERROR    pymongo.client:logger.py:102 MongoClient background task encountered an error:
 [2025/03/04 22:25:45.106] Traceback (most recent call last):
 [2025/03/04 22:25:45.106]   File "/Users/ec2-user/data/mci/059a7dc71a0205aac5d7674a44a4bfa3/src/pymongo/synchronous/mongo_client.py", line 2067, in _process_periodic_tasks
 [2025/03/04 22:25:45.106]     self._topology.update_pool()
 [2025/03/04 22:25:45.106]   File "/Users/ec2-user/data/mci/059a7dc71a0205aac5d7674a44a4bfa3/src/pymongo/synchronous/topology.py", line 690, in update_pool
 [2025/03/04 22:25:45.106]     server.pool.remove_stale_sockets(generation)
 [2025/03/04 22:25:45.106]   File "/Users/ec2-user/data/mci/059a7dc71a0205aac5d7674a44a4bfa3/src/pymongo/synchronous/pool.py", line 1213, in remove_stale_sockets
 [2025/03/04 22:25:45.106]     conn = self.connect()
 [2025/03/04 22:25:45.107]   File "/Users/ec2-user/data/mci/059a7dc71a0205aac5d7674a44a4bfa3/src/test/test_client.py", line 1760, in fail_connect
 [2025/03/04 22:25:45.107]     raise Exception("failed to connect")
 [2025/03/04 22:25:45.107] Exception: failed to connect

https://spruce.mongodb.com/task/mongo_python_driver_test_macos_python3.9_test_8.0_sharded_cluster_auth_ssl_sync_patch_74b85d51d5e91b2c39fe912b9218101ecd0ed156_67c79572b81e47000768d0d4_25_03_05_00_06_15?execution=0&sortBy=STATUS&sortDir=ASC

Looks like a race in the new test.

@ShaneHarvey
Copy link
Member Author

@NoahStapp this is ready for review now.

def _log_client_error() -> None:
logger = _CLIENT_LOGGER
if logger:
logger.exception("MongoClient background task encountered an error:")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method supposed to also print the actual exception?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger.exception already includes the full traceback (see the PR description for an example).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a short comment saying that here for clarity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@ShaneHarvey ShaneHarvey requested a review from NoahStapp March 5, 2025 21:48
@ShaneHarvey ShaneHarvey merged commit 4ffebb0 into mongodb:master Mar 5, 2025
32 of 33 checks passed
@ShaneHarvey ShaneHarvey deleted the PYTHON-5168 branch March 6, 2025 00:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants