Preemptive Multitasking

The impetus for this reflection involved a theoretical yet practical discussion of the virtues and pitfalls of potentially moving from our existing 9672-R52 mainframe system to a Multiprise 3000 H50 processor. The old system is a 5-way MP system (it has 5 CPU's) and the new system is a uni-processor system (one CPU). The new CPU is more than 5 times faster than the effective speed of each of the 5 old CPU's. The discussion centered around the total throughput of the system and the potential impact of CPU wait time since all processes would now be in the queue for the one CPU as opposed to the current situation in which there are 5 engines to service the requests. Aside from the fact that we run at least 2 online systems that could have multiple dispatchable tasks, we also run the system in LPAR mode, so two operating systems could be concurrently vying for use of the lone processor.

Pretty theological so far, eh?

So we're postulating scenarios like, "What if the system goes into a tight loop... how will we be able to break in and try to fix it?" "What if one of the online systems grabs hold of the CPU... will other things back up behind it and timeout?" "What if MVS puts a stranglehold on the system... will we get missing interrupts in VM?" Our brows were furrowed with concern.

But then we realized that while our concerns were genuine and sounded legitimate, they were pretty much illusory. The system was designed to handle the kinds of things we were concerned about. Since I'm a VM expert, I said to myself, okay, how would VM handle this? (Which is a valid question since with LPAR mode we are essentially running VM.)

Well in the first place, the system is interrupt driven. So if I/O or an external interrupt or a timer pop goes off, the system will handle the interruption and save the info regardless of what or who is running at the time. We have that whole preemptive multitasking thing going for us. And despite how in control of and possessive of the system I think that I am, there is a higher power that is actually running the system. I can do anything I want in my address space or virtual machine, but I do not have control of the system. When I do I/O or execute an SVC, I am requesting service from the system. My requests are almost always honored, but I do not have ultimate control. Even when the operating system allows me to run in supervisor mode, I am still vulnerable to outside influences.

When running VM, the dispatchable unit of work is a Virtual Machine. The Control Program does not usually get involved in the details of what you do in your little world. But when you interact with the rest of the universe, like doing I/O, the operating system makes sure that you can only blow up your own stuff. And if what you're doing too seriously impacts the system as a whole, the dispatcher makes adjustments to your priority and timeslice. I am supreme in my machine. I can run CMS, VSE, MVS, Linux, AIX, GCS, whatever I want. I can write my own operating system and run that. I can do nothing. When I'm running whatever I'm running, I need have no knowledge that I'm running in a virtual machine. I think I'm doing my own I/O to my own real devices. I think I'm in control of who does what to whom. I think the world ends at the boundaries of my virtual machine. I am mistaken.

This seems like a valid model for life. God dispatches my life and lets me run whatever I want in it. For the most part God does not interfere. The main interference involves honoring my requests for system services. There are frequent opportunities for God to intervene in my life whether I acknowledge it or not, and the requests are almost always honored. It is only when the system as a whole is impacted that restrictive interference is required. And if I get myself into trouble, there is opportunity for the system to provide extra aid to help me through my time of trouble. Priority is raised, timeslices are increased. I can receive the benefits of a benevolent system without even knowing that it's there. If you ask for it, your life can even be granted extra privileges, authority, and CPU share.

Technical addendum: In fact, our concerns were valid. As we now strain our mainframe we do in fact see instances where one rogue task does impact the response time of all users of the system. Part of the issue is that we have upgraded our I/O subsystem so that now the CPU is the limiting factor. Where previously tasks were able to "cut in" during I/O waits for DASD, that situation is now rare. When a CPU intensive task gets dispatched it is possible that response time suffers. While we can do many things to minimize this impact within a given LPAR (running OS/390), the problem manifests itself across LPARs. There is less tuning possible within the LPAR dispatching, so a low priority task in our production OS/390 LPAR can impact our production VM LPAR. We need more CPU.
October, 2002