Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Froze at "Sending data via Ethernet" #39

Open
jmicrobe opened this issue May 21, 2019 · 13 comments
Open

Froze at "Sending data via Ethernet" #39

jmicrobe opened this issue May 21, 2019 · 13 comments
Assignees

Comments

@jmicrobe
Copy link
Owner

On 5/20 I came in an saw that the ODIn was frozen (platform was up to take a reading, was not shaking). The last reading recorded by the server was on 5/19/19 at 15:28. Until that point readings were looking as expected.

Here are the most recent lines from run.log, located in /var/www/ODIn_dataServer/

/var/www/ODIn_dataServer/dataServer: at 05/06/19 21:22:26 -0700 - 13000 total packets received
httpd: parsing config form
setting log output file to 'OD1_100331.txt'
httpd: sending config form
httpd_send_response() = 1
httpd: parsing config form
setting log output file to 'OD1_100332.txt'
httpd: sending config form
httpd_send_response() = 1
httpd: parsing config form
setting log output file to 'OD1_100331.txt.bak'
httpd: sending config form
httpd_send_response() = 1
httpd: parsing config form
setting log output file to 'OD1_100332.txt.bak'
httpd: sending config form
httpd_send_response() =

Here is some of the data from the system log, located at /var/log/syslog

May 20 06:25:05 raspberrypi rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="381" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
May 20 06:25:06 raspberrypi CRON[20499]: (CRON) info (No MTA installed, discarding output)
May 20 06:25:42 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 20 06:25:42 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 20 06:25:50 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 20 06:25:50 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 20 06:27:28 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 20 06:27:28 raspberrypi rsyslogd0: action 'action 17' resumed (module 'builtin:ompipe') [try http://www.rsyslog.com/e/0 ]
May 20 06:27:28 raspberrypi rsyslogd-2359: action 'action 17' resumed (module 'builtin:ompipe') [try http://www.rsyslog.com/e/2359 ]
...
May 20 11:24:49 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 20 11:26:03 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 20 11:26:03 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 20 11:26:55 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 20 11:26:55 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 20 11:27:07 raspberrypi systemd[1]: Starting Session c12 of user muninn.

And I noted that there is a syslog.1 file. Here is some of the data from that:

May 19 15:14:00 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:14:00 raspberrypi rsyslogd-2007: action 'action 17' suspended, next retry is Sun May 19 15:15:30 2019 [try http://www.rsyslog.com/e/2007 ]
May 19 15:14:43 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:14:43 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:16:18 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:16:18 raspberrypi rsyslogd-2007: action 'action 17' suspended, next retry is Sun May 19 15:17:48 2019 [try http://www.rsyslog.com/e/2007 ]
May 19 15:17:01 raspberrypi CRON[20147]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
May 19 15:19:36 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:19:36 raspberrypi rsyslogd-2007: action 'action 17' suspended, next retry is Sun May 19 15:21:06 2019 [try http://www.rsyslog.com/e/2007 ]
May 19 15:20:44 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:20:44 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:22:22 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:22:22 raspberrypi rsyslogd-2007: action 'action 17' suspended, next retry is Sun May 19 15:23:52 2019 [try http://www.rsyslog.com/e/2007 ]
May 19 15:24:00 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:24:00 raspberrypi rsyslogd-2007: action 'action 17' suspended, next retry is Sun May 19 15:25:30 2019 [try http://www.rsyslog.com/e/2007 ]
May 19 15:25:08 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:25:08 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:26:11 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:26:11 raspberrypi rsyslogd-2007: action 'action 17' suspended, next retry is Sun May 19 15:27:41 2019 [try http://www.rsyslog.com/e/2007 ]
May 19 15:26:11 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:27:04 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:27:04 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:30:59 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:30:59 raspberrypi rsyslogd-2007: action 'action 17' suspended, next retry is Sun May 19 15:32:29 2019 [try http://www.rsyslog.com/e/2007 ]
May 19 15:30:59 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:31:51 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:31:51 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:32:47 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:32:47 raspberrypi rsyslogd-2007: action 'action 17' suspended, next retry is Sun May 19 15:34:17 2019 [try http://www.rsyslog.com/e/2007 ]
May 19 15:32:47 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:33:40 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:33:40 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:35:18 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:35:18 raspberrypi rsyslogd-2007: action 'action 17' suspended, next retry is Sun May 19 15:36:48 2019 [try http://www.rsyslog.com/e/2007 ]
May 19 15:35:33 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:35:33 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:37:12 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:37:12 raspberrypi rsyslogd-2007: action 'action 17' suspended, next retry is Sun May 19 15:38:42 2019 [try http://www.rsyslog.com/e/2007 ]
May 19 15:38:41 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:39:03 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 19 15:39:03 raspberrypi rsyslogd-2007: action 'action 17' suspended, next retry is Sun May 19 15:40:33 2019 [try http://www.rsyslog.com/e/2007 ]
...
...

May 20 06:24:50 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 20 06:24:50 raspberrypi avahi-daemon[379]: Invalid response packet from host 172.28.236.46.
May 20 06:25:01 raspberrypi rsyslogd-2007: action 'action 17' suspended, next retry is Mon May 20 06:26:31 2019 [try http://www.rsyslog.com/e/2007 ]
May 20 06:25:01 raspberrypi CRON[20506]: (root) CMD (test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ))
May 20 06:25:04 raspberrypi systemd[1]: Reloading LSB: Apache2 web server.
May 20 06:25:05 raspberrypi apache2[20627]: Reloading web server: apache2.
May 20 06:25:05 raspberrypi systemd[1]: Reloaded LSB: Apache2 web server.
May 20 06:25:05 raspberrypi rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="381" x-info="http://www.rsyslog.com"] rsyslogd was HUPed

On 5/20 I shut the ODIn down and started a completely new run. So far I have not encountered this error. It seems to me that the main issue is that the ODIn isn't skipping sending data to the server when it can't make the connection, and instead freezes.

@jmicrobe
Copy link
Owner Author

Here's the code from the arduino. It seems like there's no "else: write to SD and continue".

Design-wise, it seems like it would be more efficient to allow the ODIn to queue data to send to the server, in cases where the connection is down.

odin/System_2.ino

Lines 612 to 624 in 85af6f1

// Sends the data packet to the server via the Ethernet Shield
void sendpacket(boolean on)
{
if(on){
Serial.println("Sending data via Ethernet.");
lcdPrint("Sending data","via Ethernet","","");
// the data structure was filled during the scan, now create a udp packet and send it
udp.beginPacket(data_server_ip, data_server_port);
udp.write((byte *)&data, PACKET_SIZE);
udp.endPacket();
}
}

@jsebrof
Copy link
Collaborator

jsebrof commented May 28, 2019

The reason there is no else branch to the if statement here is because the boolean variable that is being tested has been hard-coded to true at the top of this source code file.

I'll take a look through the Ethernet library that was used to see if we might have missed anything when we wrote the software.

@dacb
Copy link
Collaborator

dacb commented May 28, 2019

Those syslog entries, e.g. the avahi don't seem to be related. I don't think that is the IP of minun, right?

One other thing I might recommend is to look to see if the Arduino libraries for the ethernet hardware have been updated since the Arduino software was last built. I wonder if the ethernet send failed and the device froze. It may also be possible to begin/endPacket in a way that is non-blocking.

@jmicrobe
Copy link
Owner Author

jmicrobe commented May 28, 2019

Those syslog entries, e.g. the avahi don't seem to be related. I don't think that is the IP of minun, right?

@dacb No, the ip is different, but these files came off of the muninn. Any idea why that is? It's on a static IP that the department assigned.

Also as for the Ethernet & libraries - the ODIn is still using the old ethernet sheild (as in, hasn't been replaced since I've started using it) but I did purchase a replacement shield. I just haven't installed it yet. It's an Arduino Ethernet Sheild 2

@dacb
Copy link
Collaborator

dacb commented May 28, 2019

I think those avahi-daemon[379]: Invalid response packet from host errors are related to network discovery services running on the pi for connecting to things like printers. They can safely be ignored.

@jsebrof
Copy link
Collaborator

jsebrof commented Jun 3, 2019

The Ethernet library that we use I believe comes with the Arduino IDE package of standard libraries, which they publish reference documentation for. I looked through the release notes for newer versions and nothing relevant to our situation popped out at me since the 1.0.6 release that we have been using.

What I suspect happened is that the ODIn software failed on line #621, where the data is being written into the packet to send to the server. The Ethernet library reference documentation for the write() function states that it must "be wrapped between beginPacket() and endPacket()." In the source code it is indeed placed between calls to those two functions, however I suspect that an uncaught error when calling beginPacket() sets up some undefined behavior when the write() function is called, causing the ODIn system to freeze.

A follow-up question then would be, what caused the call to the beginPacket() function to fail?
I believe that the DHCP lease that the ODIn system negotiated with the network router expired, and while the ODIn system can use an expired DHCP lease successfully, it can only do so until the router leases the IP Address to another user, at which time I suspect the calls to the beginPacket() began to fail.

In the Ethernet library reference documentation I noticed the existence of a maintain() function, which we do not currently use in our source code. Its documentation warns that this function needs to be called on a sub-second interval in order to successfully renegotiate the DHCP lease before the network router leases the IP Address to someone else. Internally the software keeps track of when the lease expires so calls to the maintain() function do nothing unless it is time for DHCP renegotiation.

Unfortunately, the data sampling section of the loop() function contains multiple delays which can potentially provide enough of a gap for potentially bad network behavior to occur.
I'm still devising the best path forward with regard to solving this issue, but I'll make some pull requests soon.

@jsebrof
Copy link
Collaborator

jsebrof commented Jun 30, 2019

Pull request #41 contains changes that should help make the Ethernet connection more resilient.
Note that these changes have not been tested, as I lack the hardware to do so.
More specifically, these changes check for error returns on all Ethernet function calls where errors are returned, adds a call to the Ethernet.Maintain() function, and if that call is unsuccessful, attempts to reconfigure the Ethernet.
This reconfiguration might cause the Arduino sketch to exceed the size limitation for the Arduino Mega, so watch for whether that becomes an issue.
I also fixed some code formatting and inconsistency that seems to have crept in over time.

@jmicrobe
Copy link
Owner Author

jmicrobe commented Jul 2, 2019

Thanks @jsebrof. I'm running your new code right now. Is there a certain length of time I should run it in order to confirm it works, or any other thing I could do to test the changes?

@jsebrof
Copy link
Collaborator

jsebrof commented Jul 2, 2019 via email

@jmicrobe
Copy link
Owner Author

jmicrobe commented Jul 2, 2019

@jsebrof the server is on a static IP address and the arduino is not (I believe) although it uses an ethernet connection. Would you happen to know how this affects the dhcp lease? On the raspberrypi server I entered the following:

sudo dhclient -v
[sudo] password for muninn: 
irs_resconf_load failed: 59.
Unable to set resolver from resolv.conf; startup continuing but DDNS support may be affected
Internet Systems Consortium DHCP Client 4.3.1
Copyright 2004-2014 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/

Listening on LPF/eth0/b8:27:eb:65:cd:4b
Sending on   LPF/eth0/b8:27:eb:65:cd:4b
Sending on   Socket/fallback
DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8
DHCPREQUEST on eth0 to 255.255.255.255 port 67
DHCPOFFER from 128.208.236.102
DHCPACK from 128.208.236.102
bound to 128.208.236.222 -- renewal in 434 seconds.

I'm not sure if that accomplished what you suggested, but the data is still coming in from the device without issue. I'm also curious - if I assigned a static IP address to the ODIn as well would this circumvent the issue?

@jsebrof
Copy link
Collaborator

jsebrof commented Jul 2, 2019 via email

@jmicrobe
Copy link
Owner Author

jmicrobe commented Jul 2, 2019

I asked the IT department if they could give me a new static IP for the ODIn. Even if the code fixes it, I think it would be good to have a failsafe. If it's easy you can tell me where this info would go and I'll edit the code when I hear back from IT. Otherwise I'll forward the email to you. Thanks!

@jsebrof
Copy link
Collaborator

jsebrof commented Jul 11, 2019

As requested, pull request #42 contains modifications to have ODIn use a Static IP Address instead of DHCP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants