- This topic has 6 replies, 3 voices, and was last updated 2 years, 6 months ago by john.saxon.
-
AuthorPosts
-
June 23, 2021 at 9:07 pm #1258mattkratzParticipant
Hey all,
One of our BB/control boards died and we’ve recently ran into some problems with a couple reactors that were previously working fine. Specifically, we have three reactors which when we connect to the control board and launch cb.sh script lead to a string of “multiplex comm failure” errors that leads to script shutdown. There is some additional nuance to this issue; on occasion I will plug on of these faulty reactors into the control board port M2 (or any other than M0) and the “multiplex comm failure” error will appear but reference port M0 (despite nothing being connected to said port). Some of these reactors may have been connected to the control board that died (unfortunately didn’t keep track). We’ve tried disconnecting the moisture sensor on a couple of these non-functioning reactors, but no change. We have three other reactors in the lab that can successfully launch the cb.sh script with this board. Do you think this is a reactor problem and is there anything we ought to try out on our side before sending them back to labmaker?
Beyond this, we also have a specific reactor that has a previously encountered issue where one of the LED stays permanently ON. This persists when we power cycle the reactor.
All of this has occured when running on the most recent kernel and github release. Here is an example of the “multiplexer comm failure” error:
[2021-04-06 12:03:27 +0000] [1817] [INFO] Starting gunicorn 20.1.0
[2021-04-06 12:03:27 +0000] [1817] [INFO] Listening at: http://192.168.7.2:5000 (1817)
[2021-04-06 12:03:27 +0000] [1817] [INFO] Using worker: sync
[2021-04-06 12:03:27 +0000] [1821] [INFO] Booting worker with pid: 1821
2021-04-06 12:03:35.392833 Starting watchdog
2021-04-06 12:03:37.880497 Initialising devices
2021-04-06 12:03:37.956809 Failed Multiplexer Comms 1 times
2021-04-06 12:03:38.012835 Failed Multiplexer Comms 2 times
2021-04-06 12:03:38.068862 Failed Multiplexer Comms 3 times
2021-04-06 12:03:38.104588Failed to recover multiplexer on device M0
2021-04-06 12:03:38.160787 Failed Multiplexer Comms 4 times
2021-04-06 12:03:38.196651Failed to recover multiplexer on device M0
2021-04-06 12:03:38.252665 Failed Multiplexer Comms 5 times
2021-04-06 12:03:38.288682Failed to recover multiplexer on device M0
2021-04-06 12:03:38.541497Did multiplexer hard-reset on M0
2021-04-06 12:03:38.612764 Failed Multiplexer Comms 6 times
2021-04-06 12:03:38.648525Failed to recover multiplexer on device M0
2021-04-06 12:03:38.704924 Failed Multiplexer Comms 7 times
2021-04-06 12:03:38.740533Failed to recover multiplexer on device M0
2021-04-06 12:03:38.796786 Failed Multiplexer Comms 8 times
2021-04-06 12:03:38.837950Failed to recover multiplexer on device M0
2021-04-06 12:03:38.892835 Failed Multiplexer Comms 9 times
2021-04-06 12:03:38.928660Failed to recover multiplexer on device M0
2021-04-06 12:03:38.984681 Failed Multiplexer Comms 10 times
2021-04-06 12:03:39.020647Failed to recover multiplexer on device M0
2021-04-06 12:03:39.273374Did multiplexer hard-reset on M0
2021-04-06 12:03:39.344780 Failed Multiplexer Comms 11 times
2021-04-06 12:03:39.380546Failed to recover multiplexer on device M0
2021-04-06 12:03:39.436807 Failed Multiplexer Comms 12 times
2021-04-06 12:03:39.472603Failed to recover multiplexer on device M0
2021-04-06 12:03:39.528799 Failed Multiplexer Comms 13 times
2021-04-06 12:03:39.564527Failed to recover multiplexer on device M0
2021-04-06 12:03:39.620786 Failed Multiplexer Comms 14 times
2021-04-06 12:03:39.656700Failed to recover multiplexer on device M0
2021-04-06 12:03:39.712705 Failed Multiplexer Comms 15 times
2021-04-06 12:03:39.748663Failed to recover multiplexer on device M0
2021-04-06 12:03:40.001601Did multiplexer hard-reset on M0
2021-04-06 12:03:40.072789 Failed Multiplexer Comms 16 times
2021-04-06 12:03:40.108553Failed to recover multiplexer on device M0
2021-04-06 12:03:40.164865 Failed Multiplexer Comms 17 times
2021-04-06 12:03:40.200522Failed to recover multiplexer on device M0
2021-04-06 12:03:40.256789 Failed Multiplexer Comms 18 times
2021-04-06 12:03:40.292555Failed to recover multiplexer on device M0
2021-04-06 12:03:40.348800 Failed Multiplexer Comms 19 times
2021-04-06 12:03:40.384540Failed to recover multiplexer on device M0
2021-04-06 12:03:40.440777 Failed Multiplexer Comms 20 times
2021-04-06 12:03:40.476656Failed to recover multiplexer on device M0
2021-04-06 12:03:40.533018 Failed Multiplexer Comms 21 times
2021-04-06 12:03:40.568655Failed to recover multiplexer on device M0
2021-04-06 12:03:40.570113Failed to communicate to Multiplexer 20 times. Disabling hardware and software!
[2021-04-06 12:03:40 +0000] [1817] [INFO] Shutting down: Master
[2021-04-06 12:03:40 +0000] [1817] [INFO] Reason: App failed to load.Additionally, I’ve had the error below when testing out reactors. When this error occurs need to restart BB to get any reactor working.
2021-04-05 14:50:39.152565 Starting watchdog
2021-04-05 14:50:41.629889 Initialising devices
2021-04-05 14:50:41.689046 Failed Multiplexer Comms 1 times
2021-04-05 14:50:41.745073 Failed Multiplexer Comms 2 times
2021-04-05 14:50:41.801101 Failed Multiplexer Comms 3 times
2021-04-05 14:50:41.836878Failed to recover multiplexer on device M0
2021-04-05 14:50:41.893038 Failed Multiplexer Comms 4 times
2021-04-05 14:50:41.928875Failed to recover multiplexer on device M0
2021-04-05 14:50:41.985064 Failed Multiplexer Comms 5 times
2021-04-05 14:50:42.020740Failed to recover multiplexer on device M0
2021-04-05 14:50:42.293041 Failed Multiplexer Comms 6 times
2021-04-05 14:50:42.328756Failed to recover multiplexer on device M0
2021-04-05 14:50:42.385014 Failed Multiplexer Comms 7 times
2021-04-05 14:50:42.420759Failed to recover multiplexer on device M0
2021-04-05 14:50:42.477024 Failed Multiplexer Comms 8 times
2021-04-05 14:50:42.512867Failed to recover multiplexer on device M0
2021-04-05 14:50:42.568903 Failed Multiplexer Comms 9 times
2021-04-05 14:50:42.604915Failed to recover multiplexer on device M0
2021-04-05 14:50:42.660973 Failed Multiplexer Comms 10 times
2021-04-05 14:50:42.696872Failed to recover multiplexer on device M0
2021-04-05 14:50:42.752892 Failed Multiplexer Comms 11 times
2021-04-05 14:50:42.788947Failed to recover multiplexer on device M0
2021-04-05 14:50:42.845056 Failed Multiplexer Comms 12 times
2021-04-05 14:50:42.880885Failed to recover multiplexer on device M0
2021-04-05 14:50:42.937025 Failed Multiplexer Comms 13 times
2021-04-05 14:50:42.972878Failed to recover multiplexer on device M0
2021-04-05 14:50:43.029018 Failed Multiplexer Comms 14 times
2021-04-05 14:50:43.064798Failed to recover multiplexer on device M0
2021-04-05 14:50:44.195096 Failed Multiplexer Comms 15 times
2021-04-05 14:50:45.173631Failed to recover multiplexer on device M0
2021-04-05 14:50:46.173886 Failed Multiplexer Comms 16 times
2021-04-05 14:50:46.208794Failed to recover multiplexer on device M0
2021-04-05 14:50:47.193478 Failed Multiplexer Comms 17 times
2021-04-05 14:50:48.173302Failed to recover multiplexer on device M0
2021-04-05 14:50:49.173872 Failed Multiplexer Comms 18 times
2021-04-05 14:50:49.208792Failed to recover multiplexer on device M0
2021-04-05 14:50:50.193459 Failed Multiplexer Comms 19 times
2021-04-05 14:50:51.173267Failed to recover multiplexer on device M0
2021-04-05 14:50:52.173013 Failed Multiplexer Comms 20 times
2021-04-05 14:50:52.208795Failed to recover multiplexer on device M0
2021-04-05 14:50:53.193762 Failed Multiplexer Comms 21 times
2021-04-05 14:50:54.173172Failed to recover multiplexer on device M0
2021-04-05 14:50:54.173558Failed to communicate to Multiplexer 10 times. Disabling hardware and software!
[2021-04-05 14:50:54 +0000] [2237] [INFO] Shutting down: Master
[2021-04-05 14:50:54 +0000] [2237] [INFO] Reason: App failed to load.Message from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.060365] Internal error: : 1028 [#1] PREEMPT SMP ARMMessage from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.153440] Process irq/38-4819c000 (pid: 30, stack limit = 0x530f697a)Message from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.160083] Stack: (0xdc401eb8 to 0xdc402000)Message from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.164461] 1ea0: c11a7054 c1607a40Message from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.172678] 1ec0: dc330d00 dc3fb840 dc330d00 ffffe000 00000000 dc3e7f00 c01afda0 00000001Message from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.180896] 1ee0: dc401f04 dc401ef0 c0a82c38 c0a828a0 dc3e7f00 dc330d00 dc401f24 dc401f08Message from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.189113] 1f00: c01afdcc c0a82c24 dc330d00 dc3e7f24 ffffe000 00000000 dc401f74 dc401f28Message from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.197330] 1f20: c01b0144 c01afdac c10ea2d0 c1506e08 c15de2e1 dc401f40 c0d44b48 00000000Message from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.205548] 1f40: c01afefc 754b2d89 c015ffe4 dc3e7000 dc3e7bc0 00000000 dc400000 dc3e7f00Message from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.213766] 1f60: c01affe0 dc149be4 dc401fac dc401f78 c01604f4 c01affec dc3e701c dc3e701cMessage from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.221983] 1f80: 00000000 dc3e7bc0 c0160388 00000000 00000000 00000000 00000000 00000000Message from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.230200] 1fa0: 00000000 dc401fb0 c01010e8 c0160394 00000000 00000000 00000000 00000000Message from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.238417] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000Message from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.246633] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000Message from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.322208] Code: e34cc11a e50bc034 e5d00001 e0813310 (e1d340b0)Message from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.356592] Internal error: : 1028 [#2] PREEMPT SMP ARMMessage from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.451144] Process irq/36-44e0b000 (pid: 110, stack limit = 0x220098e4)Message from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.457875] Stack: (0xdac13eb8 to 0xdac14000)Message from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.462253] 3ea0: c11a7054 c1607a40Message from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.470471] 3ec0: 00000000 dab9b440 dc330800 ffffe000 00000000 dabfe100 c01afda0 00000001Message from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.478688] 3ee0: dac13f04 dac13ef0 c0a82c38 c0a828a0 dabfe100 dc330800 dac13f24 dac13f08Message from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.486905] 3f00: c01afdcc c0a82c24 dc330800 dabfe124 ffffe000 00000000 dac13f74 dac13f28Message from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.495123] 3f20: c01b0144 c01afdac c10ea2d0 c1506e08 c15de2e1 dac13f40 c0d44b48 00000000Message from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.503340] 3f40: c01afefc 754b2d89 c015ffe4 dabfe740 dabfee80 00000000 dac12000 dabfe100Message from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.511558] 3f60: c01affe0 dc305bec dac13fac dac13f78 c01604f4 c01affec dabfe75c dabfe75cMessage from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.519776] 3f80: 00000000 dabfee80 c0160388 00000000 00000000 00000000 00000000 00000000Message from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.527993] 3fa0: 00000000 dac13fb0 c01010e8 c0160394 00000000 00000000 00000000 00000000Message from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.536210] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000Message from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.544427] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000Message from syslogd@beaglebone at Apr 5 14:50:54 …
kernel:[ 393.619996] Code: e34cc11a e50bc034 e5d00001 e0813310 (e1d340b0)
root@beaglebone:~/chibio#
Message from syslogd@beaglebone at Apr 5 14:51:05 …
kernel:[ 404.305919] Disabling IRQ #36June 24, 2021 at 8:59 am #1261harrisonKeymasterHello,
For the first challenge – that some of them won’t start when connected – I would say the most likely issue is indeed the moisture sensor circuitry being tripped (internally or externally) including perhaps by a defect in the hardware manufacturing. If this is the case then it can show up at happening on M0 at startup (EVEN IF nothing is connected to M0) since that is the first reactor that is probed by the initialisation script. The second log you posted looks like it has the same root cause (i.e. multiplexer not connecting because something is shorted on the I2C lines), but it is strange that it then leads to a Kernel panic. This is not something we saw before updating to the new Kernel – so I would be very interested to know if it happens mid-experiment (rather than as part of the thing crashing for a separate reason).
Out of interest, what control board (V1.1? V1.2?) do you have?
You mentioned a light is always on in one of the reactors – is it on at very high power? Or just low/medium power? If the former then it is likely an issue in the OmAmp upstream of it, whereas if the latter it is probably that one of the resistors that provides bias to the (LED) current regulation circuit is not soldered perfectly.
It sounds like in general you have had several issues which may stem from manufacturing faults – for which the best solution (unless you want to pull everything apart and find the defect yourself) is probably to ask for a replacement from Labmaker! Sorry that this has been so difficult to fix/debug…
June 24, 2021 at 8:15 pm #1262mattkratzParticipantHey Harrison,
With regards to the first problem, the error with the kernel panic occurred when running the initialization script. Additionally, we are using the V1.1 of the control board, out of curiosity, what are the major changes between the two versions? With regards to the reactor LED that is permanently on, it does seem to be on at very high power, although I haven’t probed it in the API as I’m afraid it may end up melting the source piece if I leave it on too long (occurred with a previous reactor). I’ll get into contact with labmaker looking for replacements.
June 25, 2021 at 7:22 am #1265harrisonKeymasterThe main differences are that V1.2 has an extra ability to cycle power to the multiplexer in error cases. However, this should have no impact if your system is failing to start due to some hardware fault…
May 10, 2022 at 11:49 am #1494john.saxonParticipantHi Harrison,
I’ve been troubleshooting a set of ChiBio reactors at the Letton Lab. During an experiment we came across an error like what you were describing; a Kernel panic mid-experiment. This is the error message we logged:
Message from syslogd@beaglebone at May 5 02:36:29 …
kernel:[52117.512683] Internal error: : 1028 [#1] PREEMPT SMP ARM… [another 12 similar messages]
Message from syslogd@beaglebone at May 5 02:36:29 …
kernel:[52117.698950] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000Message from syslogd@beaglebone at May 5 02:36:29 …
kernel:[52117.774528] Code: e34cc11a e50bc034 e5d00001 e0813310 (e1d340b0)After which the reactor on M2 failed to disconnect:
…
02:36:34 Failed Multiplexer Comms 20 times
02:36:34Failed to recover multiplexer on device M2
02:36:34 Failed Multiplexer Comms 21 times
02:36:34Failed to recover multiplexer on device M2We have gotten several similar errors after trying to run the reactors for a few days with the thermostats active. Do you think this could be the moisture sensors? And if so would you know a way to fix the issue?
May 10, 2022 at 12:17 pm #1495harrisonKeymasterHello John,
It might be a possibility that the moisture sensor is causing some such problems. Have you ever seen condensation forming on the vial? It might happen depending on what the climate/humidity is like in your lab…
But, I’d have to say more likely is it is an issue with the underlying operating system which is fundamental to the Beaglebone. I have had one device in the past that gave similar PREEMPT SMP ARM error and it was fixed by flashing the beaglebone with a clean operating system, i.e. following the software setup instructions on this site. I don’t know if some beaglebones (the microcontroller) are themselves less reliable than others, but my experience indicates that might be the place (some cause issues for no apparent reason, potentially pointing to manufacturing defects). So, if none of the above works you could also try changing the beaglebone for which you can find a spare fairly cheap at many places online…
Harrison
May 12, 2022 at 10:09 am #1496john.saxonParticipantWe’ve seen condensation on the inside of the vial, but none on the outside. I will try flashing the beaglebone with a clean operating system. Thanks heaps for your help.
John
-
AuthorPosts
- You must be logged in to reply to this topic.