Problem: A large lab enviroment. Several MSCs, with different software loads. Several BSSs, again running different software loads all of which can be connected at different times to different MSCs. All these BSSs have multiple cells, whose coverage can be connected to different offices. (There is more equipment on top of this that needs to be tracked but I'll use this basic setup to show how wikis can be used to decentralize the management of this lab environment)
In the past, there was a single person who was in charge of this. Any changes to the configuration went through him/her. He tracked the configuration, triggered and tracked the changes, and if there were any problems, was the point of contact for that as well. He updated the config books, kept track of data changes to all the nodes (which with a large environment used by numerous projects was a big task), did the short term booking for the nodes and the long term planning for the lab, and was in charge of making sure the lab was correctly configured for each project using each node, i.e. patches were up to date, config that was requested was correct, and the lab was working when it was needed.
Then came the budget cuts. The position wasn't eliminated as such but the funding for a dedicated person disappeared. So we were left with a task that remained unchanged but no actual budget to address that task.
The first attempt to remedy the situation was to split the role between many individuals: One person for each MSC, one person for the BSSs, and one person in charge overall. They took over the tasks in addition to their usual tasks. As things got hectic however, this was the task was the first to be ignored.
On the one hand, this kind of bookkeeping task is abhorred by engineers. It is also the most thankless. No one notices when things are going well. That's what's expected but when things aren't going well, the amount of complaining is frustrating. Its a very obvious failure and people really like to stick it to you that you're not doing your job. It seems like whatever frustrations are being felt are taken out on the lab managers when things go wrong.
It was a bit of a Catch-22. On the one hand, you cannot pass this task on to someone without the requisite experience. That's an immediate recipe for disaster. On the other hand, experience and knowledge is required for other projects as well and allocating that for lab upkeep seems a bit of a waste.
It was quickly realized that centralizing the task even further as not option given the manpower available and the whole team needed to step up and address this from a decentralized point. The wiki turned out be a good solution to the problems we faced though it in itself is not a full solution.
Short term Booking
Booking is done via the wiki on a first come/first serve basis. It is done on an MSC basis and on a cell basis. Any conflicts are ironed out between the people first and then via the project managers. The complaints are that a previous booking can be overwritten without the previous person's knowledge. That's true but the wiki does keep track of who's making the changes and you can get emails when pages are changed. So there are ways to track this. In the end though, if there's an a-hole in your department, a wiki gives that person more ways of being an a-hole.
Long Term planning
This has been taken over by a manager. It cannot be taken over via a wiki. It really requires one person looking at the inputs and deceiding what's required in the future.
Updating the config books
Config books are updated per person based on the changes made. If you make a change and it is not reflected in the config book, then a change to he configuration noted in the config book can be made at any time. This gives people an impetus to update the config book when a change is made. This method functions most fruitfully when the configurations are checked to make sure they match the config book. A person to check that the config books are up to data is required (pehaps a manager) but they avoid being the target of any bitching. Everyone knows the policy and if it is not followed, well then...tough.
Hardware changes
This can be decentralized based on experience. Caveat: A change that is not done correctly can hose the system and finding out what that change is can be time consuming. We went through stages where all changes were decentralized and based on our experiences, have centralized some tasks and left others to the herd. The problem with centralizing the task is that the people whose responsibility it is need to have time to do the task requested. In our lab, based on the number of projects running concurrently and the changes required, it may be beyond the capacity of the person put in charge. It really should be decentralized completely but compaints have been too numerous.
Once a hardware change is made, the wiki is updated with the change. Thus everyone can check what the current configuration is and what needs to be requested or done.
Making sure a lab is correctly configured
This again is decentralized to a degree. There are designated people to take care of the activities that should happen at regualr intervals. For anything that is specific to a project, the person running the project is responsible. Any changes that causes a system wide change are noted on specific wiki pages. So if another project is seeing some wierd issues, then there are specific wiki pages to check. Again, there are complaints but no one has been able to come up with a better system and the system currently works pretty well, akin to PGP.
The main problem with the current system is people not updating the wiki. The whole system depends on that happening but especially when things are extra busy, updating the wiki is not upmost in people's minds. We have threatened punitive measures to keep transgressors in line but none have been actually imposed. It just seems like it would cause more pissed off people than anything else. Less adherence rahter than more.
At the moment, the benefits of using the wiki outweight the costs. It frees us, in a large part, from a single point of contact for much of lab mangement and it frees a single person from mind numbing work. However, the amount of work that goes into lab management has decreased significantly and in general, the lab isn't in as good a shape as before. However, the risks added to specific projects due to this change have been minimal, and the quality of our products have not suffered due to this change.
1 comment:
Great work.
Post a Comment