Thursday, November 30, 2006

My first blogger Work Rant...

I work for one of those high tech companies, doing chip design. Same unnamed place as CherkyB. Which probably explains us knowing each other, and his frequent digs about me (CherkyB - FYI - today's was Color Blind LumberJack day).

Unfortunately, I work for a team that took over a project from another team, which not only had their own way of doing things (one of the banes of chip design is having multiple teams doing similar things with different tool suites), but then decided to "play nice" with the rest of us, and start to converge on a standard tool suite. Except they knew how to do things better, so they bolted on some of their tools with some of the lead projects tools. But then didn't get very far along before a reorg got them to work on something else, and for us to take over that project.

I've been working on integrating one block from the lead project for the last 4 weeks. Why 4 weeks? About 1.5 of those weeks were waiting for this "better" tool suite to compile my code. Between the tools bogging down, trying to handle the load of a large team, and the tools being so complex and confusing, no one really understands what's going on and the documentation is either (a) non-existent, (b) explains only what the switches are, or (c) goes into gory detail on why they chose to do this or that, but absolutely zero detail on what the frack you need to do when you get this type of failure. So when something doesn't work, you have to find a co-worker who's either bumped into it before, or has seen something similar enough they can guess what's going on.

2.5 of those weeks were in ripping out large chunks of the original design, as one of the basic interface systems it uses doesn't exist in our design. (Guess who choose this? And guess what else? Our "better" version has 2x the signals, but is used for test only. Nope, can't have system A, that costs to much, but you can have system B, that's twice as expensive, and not nearly as useful. Great.) And then trying to replace it with what we can use. And then re-doing parts of it, as I learn more about how the original design was intended to work, and have to re-work our version. And then doing it a 3rd time, when I find out that one assumption I was told was wrong, and I need to cram more data crap down this new system in the same amount of time. And then a fourth time to clean it all back up into something that'll be easy enough to modify the original design to, and still fit onto system B interface.

Finally get all that done, and go to turn it in for the next full chip integration on Tuesday. 1st time I've done this on this project. Stupid tool #1 chokes just before finishing, and sits there. I was told that the process would take half a day minimum, so I didn't worry about it. After 4 hours, and no signs of progress, ask a coworker, and find out "Oh - that should have finished in 1 hour, and auto-submitted itself to the next step, so it's not working." Kill it, and do it again. 1 hour later, it finishes happy, and auto-submits to the next step. 4 hours after that, I find out that my turn-in was rejected, because there was a "merge conflict" with a co-workers change.

Which goes back to the beauty of the "better" tool suite. One of it's major features is to enable multiple people to work on the same code simultaneously, and auto-merge as part of the turn-in / build process. Except that the auto-merge only handles half of the simple merging, and pukes on anything simple, like changing the number of spaces on a line of code to make it read easier, even if you're not changing that piece of code. Seems that everyone else on the project has learned to live with ugly looking code. I thought they were just "less visually annoyed" than I was. So - first turnin was rejected, and I have to merge 1 file (out of 16) who's conflicts the tool couldn't figure out. Do this, save it. Go to sanity check it before it get's turned in, and it wasn't right. Fix it, and turn it in.

Somehow my fixes don't make the turnin, so it fails again overnight. Fix it again, and double check that it made it in. Compile. Failure. Why? This whole "simplified" process has wasted a day of my time, so I tried to do some work in another area to not fall behind. Except this better tool suite tries to understand so much of what you're doing, you can't just make another work area, you have to jump through hoops to tell it that it's really not related at all to your other work area, and yes, trust me, don't try to keep track of both of them together. And no thank you, please don't try to share the files between. And then frack, missed something - need to blast this work area, and try to set it up again.

But back to the turnin. Finally get everything cleaned up - merged code is done. 2 files that got updated with newer code from the other work area cleaned up. Compile again, and another failure. Code that hasn't been touched in days, but now won't compile. Some new checker must have been turned on. Compile again - phew, it finally finishes. Now, all the code in this area has either been regression tested in my original area, or in the area of the co-worker who did the orthogonal change. So, I could, with clear conscience turn it in.

But, I decide to the right thing and just regress it anyway, should just take an hour or so. After 4 hours, cause someone else put some long running tests into the regression, I have about 20% of the tests failing. All for the same reason, all for code that is in the changes I'm turning in, but that I haven't touched. Aka same code has been working for months on the lead project, and for the last 4 weeks in my local areas, until today. Now it complains about some runtime checker. Something else must have been turned on in the last day. Error message is cryptic, but as the logic is only used in 2 places, and 1 of them is to just feed back the existing state into the next state calculation, it's probably the other place. Take a quick look - yep, a multiple case statement without a default. Not the end of the world - easy to add the default, but annoying that it worked fine for months until the day after I tried to turn it in.

Compile it again - it passed. Regress it - twice this time. Once in a new area, so all the tests will pass, but it'll take 4 hours, and a second time, with only the failures form the first time, but if they'll give me pass / fail in less than 1 hour.

Double check that (a) all the files I want to turn in are in the right area, (b) that area was used for compile, (c) that compile was used for regression, and (d) nothing else is changed in that area. Should be smooth sailing. Of course not. Remember those "other work area problems"? Well, that was happening as I was getting this code cleaned up, so there's some little flag in some file that says I still have one file edited, and I have to either check it in, or break the lock. Go to my area, diff it - no diffs. Go to check it in, to clear the flag - no diffs, so I can't check it in. Try to turnin again, no - it's out for editing, so you can't turn in. Cross my fingers and just break the lock.

Yes, turnin works, but will the auto-merge and compile? Which brings us back to the beauty of the system. I won't know until tomorrow. The automated system to make all this parallel work and auto-merge possible, and allows the design to have full-chip models built and released multiple times through-out the day, vs. 1 or 2 times a week when we did it all manually only lets you get 1 or 2 changes in per week. Now, instead of all the changes going in on the same 1 or 2 days a week, they get spread out over the week, but for any one person, the through-put is the same. And it sucks, because the overhead of this system is much higher than the old system.

So - output is same, overhead is higher. That's called progress by some teams.

Think I'll go have some beer.


Nava said...

The JohnnyB must be really pissed!
His rant ends with "Think I'll go have some beer", and yet - 15 minutes have passed before his wife finally decided to remind him to actually do so, as all he did when he was done blogging was sit there and stare with disgusted hatred at his work laptop.

CherkyB said...

I don't know where to start, so I'll just use a list.

1) Sounds more like a whiskey day than a beer day.
2) Your wife's blog has been mentioning the unnamed high tech company at which we both work by name for weeks. That probably means we both have to now end all our posts with a disclosure that we are not speaking for The Company. Or, it means someone has to get a handle on his wife.
3) This all just underscores my feeling that they should have fired you years ago.
4) Fort Collins is hiring. You can work for CJ.

Nava said...

2. Like, Oops?

4. So, the fact that CJ is not big enough to help you carry heavy stuff, means The JohnnyB needs to relocate to the frozen prairies?

CherkyB said...

See, the joke here is that in number 3, I said they should have fired JohnnyB, and then in #4 I tell him we're hiring. See, that's funny. You took something beautiful and ruined it.

CherkyB said...

You heartless cad. Letting your wife suffer cheap gin.

Nava said...

It's the cheap gin blocking my sense of humor.
Or lack of.
Of cheap gin, that is.
Not of sense of humor.