Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequent I2C timeouts when running I2C alongside WiFi stack #3167

Open
petekubiak opened this issue Feb 22, 2025 · 11 comments · May be fixed by #3199
Open

Frequent I2C timeouts when running I2C alongside WiFi stack #3167

petekubiak opened this issue Feb 22, 2025 · 11 comments · May be fixed by #3199
Assignees
Labels
bug Something isn't working timing Issue related to timing
Milestone

Comments

@petekubiak
Copy link

petekubiak commented Feb 22, 2025

Bug description

Bug was seen when driving a display with a flush mechanism (full screen redraw) at the same time as running the WiFi stack on ProS3. If the WiFi is not running, I2C timeouts are not seen.

To Reproduce

Bug was observed on a ProS3 connected via I2C to an Adafruit FeatherWing display. The display driver used was https://github.com/aaron-hardin/sh1107

  1. Connect display to 4-pin connector for I2C on the ProS3
  2. Flash code which spins up the WiFi stack and also updates the display at regular intervals (e.g. 10 Hz)
  3. Print I2C errors to trace
  4. Observe regular timeouts, also visual glitches on the display

Minimal examples of the timeouts being caused by WiFi and by defmt are in this comment

Expected behavior

No I2C timeouts caused by running alongside WiFi.

Environment

  • Target device: ProS3
  • Crate name and version: esp-hal 0.23.1
@petekubiak petekubiak added bug Something isn't working status:needs-attention This should be prioritized labels Feb 22, 2025
@katyo
Copy link
Contributor

katyo commented Feb 23, 2025

Link this issue: #3034

@bjoernQ
Copy link
Contributor

bjoernQ commented Feb 24, 2025

Problem seems to be that the scheduler will switch to another task while the I2C driver tries to catch-up filling the FIFO (which it needs to do in-time)

Here is some simplified code showing this effect w/o esp-wifi

//! Read calibration data from BMP180 sensor
//!
//! This example dumps the calibration data from a BMP180 sensor
//!
//! The following wiring is assumed:
//! - SDA => GPIO4
//! - SCL => GPIO5

//% FEATURES: esp-hal/unstable embassy
//% CHIPS: esp32 esp32c2 esp32c3 esp32c6 esp32h2 esp32s2 esp32s3
//% TAG: bmp180

#![no_std]
#![no_main]

use embassy_executor::Spawner;
use esp_backtrace as _;
use esp_hal::{
    delay::Delay,
    handler,
    i2c::master::{Config, I2c},
    time::Duration,
    timer::{systimer::SystemTimer, timg::TimerGroup, PeriodicTimer},
};
use esp_println::println;

#[esp_hal_embassy::main]
async fn main(_spawner: Spawner) -> ! {
    let peripherals = esp_hal::init(esp_hal::Config::default());

    esp_alloc::heap_allocator!(size: 72 * 1024);

    let syst = SystemTimer::new(peripherals.SYSTIMER);
    esp_hal_embassy::init(syst.alarm0);

    let timg0 = TimerGroup::new(peripherals.TIMG0);

    let mut periodic = PeriodicTimer::new(timg0.timer0);

    periodic.set_interrupt_handler(handler);
    periodic.start(Duration::from_millis(10)).unwrap();
    periodic.enable_interrupt(true);

    // Create a new peripheral object with the described wiring and standard
    // I2C clock speed:
    let mut i2c = I2c::new(peripherals.I2C0, Config::default())
        .unwrap()
        .with_sda(peripherals.GPIO4)
        .with_scl(peripherals.GPIO5)
        .into_async();

    loop {
        let mut data = [0u8; 220];
        println!(
            "{:?}",
            i2c.write_read_async(0x77, &[0xaa; 64], &mut data).await
        );
    }
}

#[handler]
fn handler() {
    unsafe {
        let timg0 = TimerGroup::new(esp_hal::peripherals::TIMG0::steal());
        let mut periodic = PeriodicTimer::new(timg0.timer0);

        let mut _d = Delay::new();
        _d.delay_millis(1);

        periodic.clear_interrupt();
    }
}

In that code lowering the 64 bytes written to something that fits the FIFO should make the errors disappear. Removing the delay from the interrupt handler, too.

Looking at the driver here: https://github.com/aaron-hardin/sh1107/blob/26bb3020fa825ef706b303ae017911e0aeb71931/src/interface/i2c.rs#L46 it's using 64 byte chunks while the FIFO is 32 bytes - probably just lowering that might do the trick in your case

@petekubiak
Copy link
Author

I tried changing the chunk length in the driver to 32 but the errors still seem to occur at the same rate as before.

@MabezDev
Copy link
Member

Could you try setting ESP_WIFI_CONFIG_TICK_RATE_HZ=1000 in the [env] section of .cargo/config.toml and see if that makes a difference?

@petekubiak
Copy link
Author

As far as I can tell It hasn't made a difference. Maybe with the environment variable set it causes bursts of errors to occur more often? I wouldn't count on that observation though, as it's a very random occurrence so it would be very difficult to quantify.

Something I've noticed (unrelated to the env variable) is that it seems like the errors occur more frequently when more is happening with the WiFi. There tend to be a couple of bursts of errors initially when the stack is being spun up, then it calms down a bit, but when connection to an AP is established (the board is operating in STA mode), the errors seem to become more common. Probably not surprising, but thought it was worth mentioning anyway.

@bjoernQ
Copy link
Contributor

bjoernQ commented Feb 24, 2025

it seems like the errors occur more frequently when more is happening with the WiFi

Would make sense - if there is nothing to do for the WiFi driver we quickly return to the main task

I tried changing the chunk length in the driver to 32 but the errors still seem to occur at the same rate as before.

Had another look at the code - they take that 32 and add 1 to it the next line for the actual data buffer to be written - so 31 here might be worth a try

@petekubiak
Copy link
Author

Setting CHUNKLEN to 31 causes a panic due to mismatched slice lengths. It seems that the driver is very much not designed for the chunk to be anything other than 64 - when I set it to 32 it completely messed up the positioning of what was being drawn on the screen. I think it's sheer luck that that didn't also cause a panic tbh.

I'm sure the driver could be modified to properly handle different buffer lengths, but this would be an entirely separate endeavour.

@petekubiak
Copy link
Author

petekubiak commented Feb 24, 2025

I've been working on a minimal example to exhibit what I've seen and it's led to some interesting observations. The key takeaway from what I've seen is that I can recreate the issue with a very small example which doesn't even require the wifi:

Timeouts with defmt

#![no_std]
#![no_main]

use alloc::format;
use core::cell::RefCell;
use embassy_time::Timer;

use defmt::{info, warn};
use embassy_embedded_hal::shared_bus::blocking::i2c::I2cDevice;
use embassy_executor::Spawner;
use embassy_sync::{blocking_mutex::raw::NoopRawMutex, blocking_mutex::Mutex};
use embedded_graphics::{
    fonts::{Font6x8, Text},
    pixelcolor::BinaryColor,
    prelude::*,
    style::TextStyle,
};
use esp_hal::{
    clock::CpuClock,
    i2c::{
        self,
        master::{BusTimeout, I2c},
    },
    time::RateExtU32,
    Async,
};
use sh1107::mode::GraphicsMode;
use sh1107::prelude::*;
use static_cell::StaticCell;
use {defmt_rtt as _, esp_backtrace as _};

extern crate alloc;

static I2C: StaticCell<Mutex<NoopRawMutex, RefCell<I2c<'_, Async>>>> = StaticCell::new();

#[esp_hal_embassy::main]
async fn main(_spawner: Spawner) {
    // generator version: 0.2.2

    let config = esp_hal::Config::default().with_cpu_clock(CpuClock::max());
    let peripherals = esp_hal::init(config);

    esp_alloc::heap_allocator!(72 * 1024);

    let timer0 = esp_hal::timer::systimer::SystemTimer::new(peripherals.SYSTIMER);
    esp_hal_embassy::init(timer0.alarm0);

    info!("Embassy initialized!");

    let i2c_bus = I2C.init(Mutex::new(RefCell::new(
        I2c::new(
            peripherals.I2C0,
            i2c::master::Config::default()
                .with_frequency(400.kHz())
                .with_timeout(BusTimeout::BusCycles(1)),
        )
        .unwrap()
        .with_sda(peripherals.GPIO8)
        .with_scl(peripherals.GPIO9)
        .into_async(),
    )));
    let i2c = I2cDevice::new(i2c_bus);
    let mut display: GraphicsMode<_> = sh1107::Builder::new()
        .with_rotation(DisplayRotation::Rotate90)
        .with_size(DisplaySize::Display64x128)
        .connect_i2c(i2c)
        .into();
    display.init().unwrap();

    let mut counter = 0;
    let mut errors = 0;
    loop {
        display.clear();
        Text::new(
            format!("Counter: {counter}\nError count: {errors}").as_str(),
            Point { x: 0, y: 28 },
        )
        .into_styled(TextStyle::new(Font6x8, BinaryColor::On))
        .draw(&mut display)
        .unwrap();
        let _ = display.flush().inspect_err(|err| {
            errors += 1;
            warn!("Error: {}", format!("{err:?}").as_str())
        });

        counter += 1;
        Timer::after_millis(100).await;
    }
}

Please note:

  • To run this, you will need to clone the sh1107 driver and bump the embedded-hal dependency version to 1.0 to avoid version mismatch issues
  • Please see the original report for the link to the correct driver - sh1107 on crates.io is a different driver.
  • This example uses v0.6.2 of embedded-graphics to align with the driver.

Here it's defmt that seems to be interfering and causing the I2C to timeout. Furthermore, the timeouts only occur if the trace is connected: as soon as I disconnect probe-rs from the port the errors stop (the error count stops rising entirely).

Timeouts with WiFi

In my original project I was using esp-println rather than defmt. When I discovered that defmt on its own without the WiFi was still causing the I2C timeouts I created another minimal example with the same code but using esp_println instead of defmt. On its own, esp_println does not cause any I2C errors. However, if I run the WiFi at the same time the timeouts reappear. Here is the code for the WiFi minimal example:

#![no_std]
#![no_main]

use embassy_executor::Spawner;
use embassy_time::{Duration, Timer};
use esp_backtrace as _;
use heapless::String;
use log::{info, warn};

use alloc::format;
use core::{cell::RefCell, str::FromStr};

use embassy_embedded_hal::shared_bus::blocking::i2c::I2cDevice;
use embassy_net::{Runner, StackResources};
use embassy_sync::{blocking_mutex::raw::NoopRawMutex, blocking_mutex::Mutex};
use embedded_graphics::{
    fonts::{Font6x8, Text},
    pixelcolor::BinaryColor,
    prelude::*,
    style::TextStyle,
};
use esp_hal::{
    clock::CpuClock,
    i2c::{
        self,
        master::{BusTimeout, I2c},
    },
    time::RateExtU32,
    Async,
};
use esp_wifi::{
    wifi::{
        self, ClientConfiguration, Configuration, WifiController, WifiDevice, WifiEvent,
        WifiStaDevice, WifiState,
    },
    EspWifiController,
};
use sh1107::mode::GraphicsMode;
use sh1107::prelude::*;
use static_cell::StaticCell;

extern crate alloc;

static CONTROLLER: StaticCell<EspWifiController<'static>> = StaticCell::new();
static STACK_RESOURCES: StaticCell<StackResources<3>> = StaticCell::new();

static I2C: StaticCell<Mutex<NoopRawMutex, RefCell<I2c<'_, Async>>>> = StaticCell::new();

#[embassy_executor::task]
async fn net_task(mut runner: Runner<'static, WifiDevice<'static, WifiStaDevice>>) {
    runner.run().await
}

#[embassy_executor::task]
async fn wifi_connection(mut controller: WifiController<'static>) {
    info!("start connection task");
    info!("Device capabilities: {:?}", controller.capabilities());
    loop {
        match esp_wifi::wifi::wifi_state() {
            WifiState::StaConnected => {
                // wait until we're no longer connected
                controller.wait_for_event(WifiEvent::StaDisconnected).await;
                Timer::after(Duration::from_millis(5000)).await;
            }
            _ => {}
        }
        if !matches!(controller.is_started(), Ok(true)) {
            let client_config = Configuration::Client(ClientConfiguration {
                ssid: String::from_str("DUMMY_WIFI").unwrap(),
                password: String::from_str("PASSWORD").unwrap(),
                ..Default::default()
            });

            controller
                .set_configuration(&client_config)
                .expect("Invalid WiFi Controller Configuration");
            info!("Starting wifi");
            controller.start().expect("Unable to start WiFi Controller");
            info!("Wifi started!");
        }
        info!("About to connect...");

        match controller.connect() {
            Ok(_) => info!("Wifi connected!"),
            Err(e) => {
                info!("Failed to connect to wifi: {e:?}");
                Timer::after(Duration::from_millis(5000)).await;
            }
        }
    }
}

#[esp_hal_embassy::main]
async fn main(spawner: Spawner) {
    // generator version: 0.2.2

    let config = esp_hal::Config::default().with_cpu_clock(CpuClock::max());
    let peripherals = esp_hal::init(config);

    esp_alloc::heap_allocator!(72 * 1024);

    let timer0 = esp_hal::timer::systimer::SystemTimer::new(peripherals.SYSTIMER);
    esp_hal_embassy::init(timer0.alarm0);

    info!("Embassy initialized!");

    let timer1 = esp_hal::timer::timg::TimerGroup::new(peripherals.TIMG0);
    let init = CONTROLLER.init(
        esp_wifi::init(
            timer1.timer0,
            esp_hal::rng::Rng::new(peripherals.RNG),
            peripherals.RADIO_CLK,
        )
        .unwrap(),
    );

    let (wifi_interface, wifi_controller) =
        wifi::new_with_mode(init, peripherals.WIFI, WifiStaDevice).unwrap();
    let config = embassy_net::Config::dhcpv4(Default::default());
    let stack_resources = STACK_RESOURCES.init(embassy_net::StackResources::<3>::new());
    let (_stack, runner) = embassy_net::new(wifi_interface, config, stack_resources, 0);

    spawner.spawn(net_task(runner)).unwrap();
    spawner.spawn(wifi_connection(wifi_controller)).unwrap();

    let i2c_bus = I2C.init(Mutex::new(RefCell::new(
        I2c::new(
            peripherals.I2C0,
            i2c::master::Config::default()
                .with_frequency(400.kHz())
                .with_timeout(BusTimeout::BusCycles(1)),
        )
        .unwrap()
        .with_sda(peripherals.GPIO8)
        .with_scl(peripherals.GPIO9)
        .into_async(),
    )));
    let i2c = I2cDevice::new(i2c_bus);
    let mut display: GraphicsMode<_> = sh1107::Builder::new()
        .with_rotation(DisplayRotation::Rotate90)
        .with_size(DisplaySize::Display64x128)
        .connect_i2c(i2c)
        .into();
    display.init().unwrap();

    let mut counter = 0;
    let mut errors = 0;
    loop {
        display.clear();
        Text::new(
            format!("Counter: {counter}\nError count: {errors}").as_str(),
            Point { x: 0, y: 28 },
        )
        .into_styled(TextStyle::new(Font6x8, BinaryColor::On))
        .draw(&mut display)
        .unwrap();
        let _ = display.flush().inspect_err(|err| {
            errors += 1;
            warn!("Error: {}", format!("{err:?}").as_str())
        });

        counter += 1;
        Timer::after_millis(100).await;
    }
}

I did note that with just the net_task running I didn't see any timeouts. It was only when I also introduced the wifi_connection task that the errors started to manifest.

defmt seems to cause significantly more errors than using the WiFi: ~560 errors per 1000 frames, vs ~30 errors per 1000 frames when running with WiFi and esp_println.

@bjoernQ
Copy link
Contributor

bjoernQ commented Feb 25, 2025

For the WiFi case it might be a workaround to wrap the flushing in a critical section like this

        critical_section::with(|_| {
            let _ = display.flush().inspect_err(|err| {
                errors += 1;
                warn!("Error: {}", format!("{err:?}").as_str())
            });    
        });

That's not a great solution - I wouldn't even call it a solution t.b.h. - but maybe WiFi will tolerate the latency resulting from blocking interrupts for a short time

@petekubiak
Copy link
Author

For the WiFi case it might be a workaround to wrap the flushing in a critical section like this

    critical_section::with(|_| {
        let _ = display.flush().inspect_err(|err| {
            errors += 1;
            warn!("Error: {}", format!("{err:?}").as_str())
        });    
    });

That's not a great solution - I wouldn't even call it a solution t.b.h. - but maybe WiFi will tolerate the latency resulting from blocking interrupts for a short time

I thought I'd give it a try with my original project which actually does something with the wifi. This does stop the I2C errors, but it also seems to stop the application from being able to accept connections and serve web pages, which I guess isn't that surprising.

@MabezDev MabezDev added timing Issue related to timing and removed status:needs-attention This should be prioritized labels Feb 26, 2025
@MabezDev MabezDev added this to the 1.0.0-beta.1 milestone Feb 26, 2025
@bugadani bugadani self-assigned this Feb 27, 2025
@bjoernQ bjoernQ assigned bjoernQ and unassigned bugadani Mar 4, 2025
@bjoernQ bjoernQ linked a pull request Mar 4, 2025 that will close this issue
6 tasks
@bjoernQ
Copy link
Contributor

bjoernQ commented Mar 4, 2025

Mind trying #3199 to see if it actually fixes this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working timing Issue related to timing
Projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

5 participants