Thursday, November 17, 2016

A Distraction of Bees During A Furnace Control Board Repair

I woke up half an hour before my alarm the other morning thinking it was rather colder in the house than it should be. The furnace blower was running, but it was just blowing cold air. Thats rather frustrating. Two years ago I had to have the control board in another unit replaced and knew that this was likely going to be an expensive repair and they weren’t going to have the proper parts and would need to order them and I’d have to wait with no heat. After getting the kids to school I went up to have a look.

When I started it back up the inductor blower started up properly and something was rattling around inside the flue pipe. The gas ignitor glowed red for a moment and then the whole thing gave up. It retried 2 or 3 times before giving up, starting the blower and flashing me an error code on the control board.

I should have realized at this point that since the igniter was lighting up that all the interlocks and sensors were working properly. It is unlikely that it would heat the ignitor and just not turn on the gas if a sensor was not seeing that the inducer fan was running or something, unfortunately I was still listening to that noise in the flue pipe and was distracted by that. I thought perhaps something had made a nest in the pipe so I disconnected it to have a look. I found dead bees...

Not many and they were all dead thank goodness, cooked or blackened by the heat. Continuing to ignore the fact that the ignitor was working I decided that the flue had to be blocked by a hive even though I could hear no buzzing anywhere and no live bees came out of the furnace or the flue. I still wasted the next half an hour taking apart the inductor fan and heat exchanger to see if the sensor pipe was clogged with hundreds of cooked bees or something. This involved messing with some rather friable high temperature fiberglass gaskets that would be nasty if you got it on you or inhaled it. You probably don’t want to remove those parts because of that.

It turns out that the bees were not the problem at all and had just distracted me from what should have been the obvious conclusion that the gas valve wasn’t turning on. Now that I had eliminated the impossible bees I could move on to actual debugging. Keep in mind that your furnace has many ways to kill you, exposed line voltage, high pressure gas lines, quickly spinning sharp things that may start up when your finger is in a place to get sucked into them, a gas ignitor that heats to red hot and fire to name just a few. Then if you don’t get it back together right or if you don’t wire the interlocks back on properly it won’t protect you from itself later and may burn your house down or kill you and your family from CO poisoning some night months or years from now. Even if it doesn’t kill you it might eat itself and replacing a cracked heat exchanger because the temp cutout wasn’t working is very expensive and not something you can do yourself. I make it a point to actually look at things before I call a repair man because many times it turns out to be something I can fix and save myself the call. But thats my choice and you should probably make a different choice for safety’s sake.

Putting the meter on the gas valve showed no power to the solenoid so it wasn’t just that the gas valve was bad, the board wasn’t sending it any power. At this point if you’re screwing around inside your stuff you probably want to turn off the circuit breaker so that you don’t hit the cutoff switch while you’re holding the board and electrocute yourself.

Tracing the wires from the gas valve to the large plug at the bottom of the picture above and then to the relay right next to it. In my case there were 2 sets of red and blue wires going into this plug, so don’t rely on color to tell you which wire was which. I gave that relay a short sharp wrap with my screwdriver and reconnected it. For 2 cycles it sent power to the gas valve! Success we found the problem! I can replace a relay on a single sided PC board no problem. Unfortunately it was an 18v coil and I didn’t have any of those laying around to just drop in. I considered replacing it with a solid state relay for a moment, but the ones I have on hand don’t have an input voltage range that goes that high, and they weren’t the same form factor so I would  have had to make a mess of it by running wires to an external relay. Not an ideal solution.

Then it occurred to me that when the other furnace failed for a similar reason a couple of years ago I had actually saved the control board that they swapped out at great expense. One of the relays on this board was bad too, but there are 4 identical relays on the board so I should be able to find one of them that was still good!

You’ll see 4 empty positions on this board where I desoldered the relays. The boards are conformal coated on the back and have lots of surface mount components on the back side. You can’t desolder through the coating so I used a dremel with a small wire brush to scrape off the coating and get to the pads.

Being very careful not to damage the surface mount parts or scrape off the tiny traces that were very close to the relay pins! That was less important on the old board than it would be when working on the new board, but this was an opportunity to practice. It turns out that the brush also removed most of the solder as well turning it to very fine dust that covered the board and went into the air. This board is probably older than lead free solder so I decided to do the rest of the pads outside so that I wouldn’t breath in any lead dust or get it all over my work bench. With most of the solder scraped off the pins they desoldered very easily. It’s interesting in the picture to see the surface mount transistor and snubber diode that controls the relay and protects the CPU from induced back current are right under the relay itself there.

I was a little concerned about making sure which of the old relays was bad, but that turned out to be easy. Not just an intermittent or faulty connection one of them had a coil that was open circuit so filtering out the bad one was easy. That left me with 3 good ones. One to repair this board and 2 to save for future spares.

Now it was time to remove the existing board from the furnace. You did turn off the circuit breaker before you did this right? I would also consider it vital to label all the wires you’re taking off, there are many connections and nothing is obviously labeled on the board other than the thermostat wires. Even the thermostat wires on my furnace didn’t match up color wise, the power lead wasn’t red, it was yellow. They had run the power to the thermostat through the float switch in the pan below the furnace and returned that with a different colored wire to those connections. So don’t just rely on your perfect eidetic memory to remember where all these connections go. Make labels and take pictures.

I was concerned about the yellow and red wire there going to the blower. They were both on a pad that was called “park” rather than going to a specifically named pin that I could label them. So I took pictures of that to get it right again later. It occurred to me later that “park” probably meant that they were not connected to anything and were just “parked” there ;) Looking on the back of the board did reveal that those wires are not actually connected. It’s a place to park extra leads from the blower motors so they aren’t flopping around in there shorting against other things or getting sucked into the blower. I believe they are for running the blower at different speeds and were not needed in my install. In the case of “parked” leads the order you connect them won’t matter. Everything else is important to get right or you’ll cook the board or burn your house down.

This is the existing board. You can see there is quite a bit of discoloration of the conformal coating around some of the surface mount components where they get quite hot. None of them have failed yet, so perhaps they are still within their spec. There is a lot of airflow in that part of the furnace since this board mounts inside the blower cabinet. These still get hot enough to discolor the coating, that doesn’t look good. 

Since I had already dremeled off the coating over 4 other relays on the old board doing it one more time without damaging the surface mount connectors or board traces was easy. Since I knew the board was bad anyway if I had damaged them and couldn’t repair it I wouldn’t be any worse off than I already was except for wasting all this time.

The replacement relay has been soldered on! The conformal coating around the pads bubbled up and burned a bit during the soldering process but I was less worried about that then melting the solder on the adjacent surface mount components and having them migrate away. That didn’t happen when I was desoldering on the other board though and that takes much more time and heat than putting a new one back in. I think the coating would keep them in place even if you did overheat the board, at least to a point, but you’ll want to be careful not to do that.

Put the board back in and used my pictures and labels to reconnect all the terminals on the board and we have heat again! Nobody does board level repairs anymore but it can certainly be done depending on what the problem actually is. I have 2 more spares for when more relays fail on these now aging furnaces in the house. So far so good and however many more seasons of heat I can get out of these before they need to be replaced will save money.

Friday, July 22, 2016

OSX’s “powerd” daemon has some bugs...

There are manthreads on the internet about problems with the “powerd” daemon in OSX using up huge amounts of CPU time until everything else grinds to a halt. For me this problem started in Yosemite and has continued even through El Capitan 10.11.5. There is a .6 update for that now that I will install on my server shortly and see if that makes any difference.

On one site someone offered a bash shell script to regularly monitor the CPU usage of powerd and to kill it if it goes above 30%. I’ve wrapped that into an executable file that actually starts up in bash and doesn’t throw errors trying to run in tcsh which is the default shell on my system. I also added some code to send the CPU usage to XTension as well as keep track of the restart count. This way I can graph the CPU usage which looks like this over the last 2 days:

It takes just short of 11 hours for the spikes in activity to exceed the 30% threshold that causes the script to kill it and the system restarts it. I have no idea what the problem is, but something is definitely going on there. This graph is 24 hours of samples taken every 10 seconds so what you can’t see is that still most of the time in between those peaks it’s at 0% usage. Then it goes to do something periodically and each time it does it it takes a little bit more time. Some table not getting cleared? I don’t know what the cause is but it’s definitely real looking at that. Eventually if not killed the CPU usage becomes constant as it doesn’t finish servicing it’s queue before it has to service it again and the CPU usage becomes constant and debilitating to the machine.

If you’re also suffering from problems while having to continually kill powerd you can use this script. Cut and paste the code into any text editor, I like TextWranlger rather than using text edit built into the system, but any of them will work. Save it with the filename “powerd_monitor.command” the name is important! It must end in “.command” (and not .command.txt which text edit might try to add to the end of it) and it must be named powerd_monitor. This is because the script searches through the list of all processes for those containing the term “powerd” and it has to know how to exclude itself from the list that is returned. If you name it something else that also includes the term “powerd” then it will find multiple matches when searching and everything will error out and fail to do anything for you. So name the file “powerd_monitor.command”

You have to add the execute bit to the file. without that it won’t run when you double click it. If you’re terminal aware then thats easy. For anyone who isn’t you can do so by opening a terminal window and typing “chmod a+x “ (note the space after the x!) and then dragging the text file you’ve just created into the terminal window and pressing return.

If you aren’t an XTension user then you’ll want to remove the 2 lines that echo an applescript into osascript. Otherwise you’ll get an error printed for that with each check. Remove the 2 lines that begin “echo “tell app \”XTension\””

Lastly you must fill in your admin password into the first line of code where is says “sudopass=“ since the powerd is owned by the system and not the user you can’t kill it without executing the kill command with sudo to get the proper permissions to do so. Otherwise it will fail to actually stop the process.

The output in the terminal is multiplied by 100. So 0.1% comes out as a 10 and so forth. I don’t know why the original script author had to do that, but I suspect it’s because bash is bad at math and wasn’t able to compare against floating point numbers properly. 

The interval is currently set to check once a minute, but you can extend that or reduce it as necessary by changing the sleep value at the end. Of course, running the script too often will cause it to use far more CPU time than is necessary and make your problem worse not better. I generated the graph above with a check happening every 10 seconds.


# fill in your sudo password below or it won't be able to kill the process
# in XTension create a dimmable pseudo unit named "powerd cpu usage"
# in XTension create another dimmable pseudo unit named "powerd restart count"
# checks the cpu usage of powerd and if it's more than 30% it will kill it for you
# the output of the echo value is *100 so will say 40 for 0.4%!

sudopass='your sudo password here'
while true ; do
  cpu_usage=`ps aux | grep powerd | grep -v grep | grep -v powerd_monitor| awk 'BEGIN {ORS=""} {print $3*100}'`
  echo "powerd is using $cpu_usage"
  echo "tell app \"XTension\" to set value of \"powerd cpu usage\" to ($cpu_usage / 100)" | osascript -
  if [ $cpu_usage -ge 3000 ] ; then
    echo 'killing powerd!!!'
    echo $sudopass | sudo -S /usr/bin/killall powerd
    echo "tell app \"XTension\" to set value of \"powerd restart count\" to value of \"powerd restart count\" + 1" | osascript -
  sleep 60

There is some talk in some of those forum posts that just not running Activity Monitor or not running some other app that is listening to events from the powerd daemon solves the problem. I am testing that now on my server but it hasn’t been off long enough to tell yet. In the first hour since I quit it it seems like it might be making a difference but it’s too early to tell. I will update this afternoon when I have some more info. I have personally spoken to people who never ran activity monitor on their machines during the issue though, so that is not the only cause even if it does help mine. Perhaps something subscribed to the events from it is enough to make it start to leak CPU. Perhaps it’s calculating the energy impact which is faulty that some of those apps cause it to have to do constantly. I will play with some of those other issues after I’m done with a cycle without the activity monitor running.

UPDATE: it sure looks like just not running activity monitor is keeping it from climbing away in my case.

Since about 10:30am it’s CPU usage hasn’t climbed above about 0.5% of the CPU that it’s running on. It shows no signs of continuing to leak CPU time.

This is definitely not the only problem that causes this as my day job work neighbor Michael has the problem and has never left activity monitor running on his machine. I expect that just quitting the program won’t solve the problem, but it might keep it from continuing to escalate until you restart or kill the powerd. I will continue to watch and post any further observations as it’s a very frustrating problem!

Wednesday, June 22, 2016

A return to projects and a Vera upgrade...

It appears it’s been more than a year since I last blogged about a project. It’s been a fascinating year and all good things, but so busy. I’ve concentrated almost entirely on bringing the core and UI of XTension up to snuff with the more recent OSX updates. The UI especially has been languishing a little bit and lots of cleanup and just making things work better was necessary. I’ve been doing a weekly build and getting so much stuff done it’s felt really great. The software is starting to shine again now and I believe is on par or better than anything else thats out there. We’ve always had better internals than anyone else ;) but the UI was sometimes hard to use or understand. That is much better now.

Then there was the construction project around here. About 7  or 8 months ago we finally got a contractor who was able to start working on some small additions we wanted done. When we moved into this house it was a little smaller than the last house, which was good, but it didn’t have a couple of things that we really needed. As soon as we sold the old house we were going to add on to the garage for me to have a little workshop and put on a screened in porch for my wife to enjoy. Neither of which you’d think was that huge a project. It took 3 years to get the old house sold which was WAY longer than we expected and then it took a further 2 years of mucking about to get a contractor to actually do anything other than a few vague drawings and then disappear for months at a time. Most of them just never returned calls after that initial meeting. One of them who finally did some design work disappeared for 4 months and then came back, returned our check to him and said he was going to retire that he couldn’t do it anymore! The next one would only return my emails after I would follow up when I would suggest that he might recommend someone with more time on their plate. Finally we landed a company who was excited about the project and who’s estimate only came in about 20% over what I wanted to pay for it so we ran with them! The first set of framing subcontractors had to be fired because they never showed up. The second set of framing subcontractors who came in to fix everything and rebuild from scratch some of the stuff the second group did were terrific. They showed up every day they said they were going to, they were skilled and friendly and they had a foreman on site with the carpenters all the time to make the important decisions and make sure everything was going to plan.

The addition to the garage also gave me an opportunity to extend my office and lab which is above the garage over the new space. Four months and change ago I took apart my professional life, packed it in boxes and stacked it up in the center of the room while they came through the wall and very slowly finished the new section of my space! It’s finally done up here! I am very slowly starting to make sense of where I put everything and am going to rebuild all my workbenches. In another few days I’ll actually have a place to solder things that isn’t the kitchen table! It’s a huge job just given the volume of stuff that I need to organize. I’ll be putting kits together and getting ready to release more XTension kits for more things in the very near future and I’m very excited. Right now it’s a horrific mess, but that will change.

Lastly I finally got around to something I’ve been meaning to do for a long time. XTension and I are huge fans of the Vera device as a ZWave interface. We’re not huge fans of the user interface to it on it’s web pages nor it’s flexibility as far as scripting and logic are concerned, which is why we recommend it as the Z-Wave interface for XTension and not as your only home automation device. I had pre-ordered the new Vera Plus which has Zigbee functions, Z-Wave plus support, BlueTooth LE and all sorts of new low level things that will be very nice. I knew that the backend connections were compatible with XTension as we already have users using them without any difficulty, but some things are subtly different and I really needed it to test with. XTension connects to it via it’s excellent and well documented REST API and JSON interfaces. Having a documented protocol also makes it possible for us to support things without having to continually reverse engineer the changes that they make. Going forward I’m really only going to support devices with a documented protocol. Their support is pretty good too. I was previously running a Vera Lite interface with a much earlier version of the UI firmware. For any other XTension users that are upgrading I’ve not written a wiki article yet on the process, but here’s the gist.

Vera has on their site several support articles for upgrading from different boxes to the latest one. Since I had the Lite I followed this one. The only trouble I had was that the new Vera would not accept the transfer until I rebooted it one more time. After that it did and all was well. Both devices will show in their status lines that something is happening if it’s working. It will take the new box some time before the devices start to show up in the devices tab, so don’t panic. It will just say that it’s configuring the ZWave devices or some such. When all the devices are visible, even if they aren’t all reachable yet, you can upload the backup file you saved off in an earlier step of the process on the web page above.  Initially things aren’t going to work very well. There were several devices that simply were not responding that evening, but this morning it’s finished it’s re-routing magic and as far as I can tell everything is working again.

The only thing you need to know about upgrading XTension is NOT to create a new interface and point it to the new box. When the units are automatically created you’ll end up with 2 units for each real device, one assigned to the old interface and one on the new interface.

The proper method is to keep the connection to the old box open during the upgrade and transfer process. When the process is complete simply disable the old interface, change the IP to the new box and re-enable the interface. Thats all.
.code { background:#f5f8fa; background-repeat:no-repeat; border: solid #5C7B90; border-width: 1px 1px 1px 20px; color: #000000; font: 13px 'Courier New', Courier, monospace; line-height: 16px; margin: 10px 0 10px 10px; max-height: 200px; min-height: 16px; overflow: auto; padding: 28px 10px 10px; width: 90%; } .code:hover { background-repeat:no-repeat; }