View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0002320||SpeedFan||Hardware support||public||2014-11-08 19:32||2019-03-12 05:17|
|Target Version||Fixed in Version|
|Summary||0002320: SMBUS scanning causes intermittent system instability|
|Description||It took me months to track this issue down, but since I have disabled SpeedFan from using the SMBus, the problem has stopped.|
I have a Supermicro 6016TT-TF server, which SpeedFan's SMBUS scan identifies as "Intel 82801JIB ICH10 SMBUS at $0400".
I believe the problem is that SpeedFan polls this bus very frequently (perhaps once a second or even less?), and the SMBus (at least in these machines) cannot handle it. In fact, even the IPMIView utility offered by SuperMicro only scans the bus once every 60 seconds by default.
The problem I identified: The BIOS also scans the SMBus to make sure temperatures are within tolerance. When the problem would start, SpeedFan would show many more SMBus COLLISIONs, to the point where it seemed the SMBus would zonk out completely and no longer respond to any more requests, at least for a while. There would also be many IPMI-related System Events triggered during this time (a few examples shown below). If the non-response time was long enough (at least 15 minutes), the BIOS would assume there is something very wrong with the computer (ie. no system status temperature status, fans at 0 rpm, etc), and it would force an immediate system shutdown.
I had a very hard time tracing this issue. I hope this report can help at least a few people avoid some sleepless nights. Since SpeedFan has been running ISA BUS mode only (SMBus sensors DISABLED by user), the server has been rock solid and I haven't had a single issue so far since.
I think the simple solution is to assume that not all SMBuses can handle such excessive polling, and to make that possibility clear in SpeedFan's configuration screens. In addition, there should be an option to set the minimum time allowed between SMBus polls. This should dramatically reduce the chances of a conflict.
62,System Event,08/16/2014 09:40:53 Sat,Voltage,+5VSB,Assertion: Lower Non-critical - going low
149,System Event,08/16/2014 09:54:49 Sat,Voltage,CPU2DIMM,De-assertion: Lower Non-critical - going low
150,System Event,08/16/2014 09:54:49 Sat,Voltage,+1.5V,De-assertion: Lower Non-recoverable - going low
151,System Event,08/16/2014 09:54:50 Sat,Voltage,+1.5V,De-assertion: Lower Critical - going low
152,System Event,08/16/2014 09:54:50 Sat,Voltage,+1.5V,De-assertion: Lower Non-critical - going low
153,System Event,08/16/2014 09:54:50 Sat,Fan,Fan4,Assertion: Lower Non-critical - going low
154,System Event,08/16/2014 09:54:50 Sat,Fan,Fan4,Assertion: Lower Critical - going low
155,System Event,08/16/2014 09:54:51 Sat,Fan,Fan4,Assertion: Lower Non-recoverable - going low
156,System Event,08/16/2014 09:54:51 Sat,Voltage,+5VSB,De-assertion: Lower Non-recoverable - going low
157,System Event,08/16/2014 09:54:51 Sat,Voltage,+5VSB,De-assertion: Lower Critical - going low
158,System Event,08/16/2014 09:54:52 Sat,Voltage,+5VSB,De-assertion: Lower Non-critical - going low
(system shuts down after 15 minutes of this. these are all side effects of failure to communicate with the system's SMBus)
|Tags||No tags attached.|
|Motherboard Model||Supermicro X8DTT-F|
|Video Card Model|
Just a comment:
This may be related to 0002305 in a sense.
At least for windows 8.1 and I think earlier launching speedfan with Task Scheduler gives different results.
Wherever speedfan is writing to it is not the "desktop" and some errors do not occur event if windows records a fault (which it does). I have purposely turned on SMBUS scan and it is rock solid stable but would not be if I was running where the speedfan icon or the application can be seen.
If you can see what speedfan is doing when it is launched by task scheduler then the distinction between that and being launched normally may give a hint as to what is happening.
I have a Supermicro motherboard too and noticed that, even if I use SpeedFan to set PWM MODE to MANUAL, something keeps resetting that flag. My best guess is that something else is accessing the SMBus and collides with SpeedFan. There are multiple options here:
1) if the H/M is accessible using both ISA and SMBus, I can selectively disable the SMBus address
2) SMBus access can be completely disabled using /NOSMBSCAN command line parameter.
Can you send me a SEND REPORT from SpeedFan's INFO tab?
speedfan-report.txt (411,038 bytes)
I attached a report to this ticket. The problem does not appear related to scanning the bus periodically (ie. when loading speedfan). When I load speedfan, it still says "Scanning Intel SMBus at $0400", along with a bunch of other SMBus events. The difference is that I disabled it, which it acknowledges with "INFO: SMBus sensors DISABLED by user". Alfredo's comment basically agrees with what I said: "My best guess is that something else is accessing the SMBus and collides with SpeedFan.". I'm pretty sure that "something else" is the system's BIOS and/or IPMI system monitoring, which is why the system shuts down after losing contact with the SMBus. Just to clarify again, the events I posted are from the system's IPMI, and not any type of Windows-related event. I still think the solution is likely to allow adjustment of SpeedFan's polling frequency to the SMBus.
For me, I'm just fine leaving it completely off given the system has been rock solid since I disabled using SMBus in SpeedFan. I mostly wanted to share this with others in case they were possibly dealing with a similar situation with their SuperMicro (or other) motherboards.
I also don't think it's related to how SpeedFan starts (ie. via Task Scheduler), because when testing, I would periodically restart it manually and the problem always eventually showed up.
|2014-11-08 19:32||qd||New Issue|
|2014-11-08 19:32||qd||Status||new => assigned|
|2014-11-08 19:32||qd||Assigned To||=> alfredo|
|2014-11-12 14:14||johnlodge||Note Added: 0007635|
|2014-11-12 17:45||alfredo||Note Added: 0007637|
|2014-11-12 17:45||alfredo||Status||assigned => acknowledged|
|2014-11-12 19:23||qd||File Added: speedfan-report.txt|
|2014-11-12 19:35||qd||Note Added: 0007639|
|2014-11-12 19:36||qd||Note Edited: 0007639||View Revisions|
|2014-11-12 19:36||qd||Note Edited: 0007639||View Revisions|