a itn hare th Pate oS ~~ ae Case-39394-21 te ee an ors i date: July 7, 1978 Subject: A UNIX™ Operating System for the DEC VAX- from: Thomas B. London. John F. Reiser 78-1353-4 fe ™: nh ne Bye 11/780 Computer li Bell Laboratories MEMORANDUM FOR FILE Introdaction ' ms 1. lee ic digital comThe VAX-11/780 [1] is a new, general-purpose, stored-program electron it provides prices mputer puter manufactured by Digital Equipment Corporation. At minico address space bound of addresses and data which are 32 bits wide; the traditional minicomputer the implementation of a 64K is gone. This memorandum describes the VAX-11/780 and 2 contains an overview UNIX operating system and complete user e..vironment for it. Section only to devotees of computer syssuitable for general consumption, details normally of interest on software portability in Section t tem architecture appear in Section 3. The authors commen 4. 2. Overview Environment. the VAXA user of UNIX and C software on the PDP-11 will find that Nala Weta SO a ap apparent differences in the com11/780 provides a very similar environment. There are no rily invoked directly from customa arc mand language or the vast majority of programs which hardware, except by issuing the the shell. A casual user probably will not be able to distinguish the current user) or by noting that the command “who am i” (which identifies the hardware and is in hexadecimal rather than one of the columns printed by the process status command ps pointer data types all occupy 4 octal. The C language programmer will find that int, long, and The architecture seen is “culturally compatible” with by the user-mode assembly-language programmer of a VAX-11 r with the PDP-11 can quickly the PDP-11. Specific details differ, but a programmer familia and uses MASSBUS interfaces understand tne differences. The VAX-11 provides UNIBUS and the same input/output peripheral devices as a PDP-11. virtual address space, intelliSignificant new features of the VAX-11 include an extended The address space of a process is gent console, and dramatically improved physical packaging. divided into a large number of divided into a few gigantic segments. Each segment is further « viable memory management paging demand small pages. Sufficient hardware exists to make omputer through a standard microc LSI-11 an strategy. All console functions are handled by processor and can still halt, the from located ASCII terminal. The terminal may be remotely of the VAX-1 1/780 is well design l boot, or diagnese the VAX-11. The mechanical and physica parts are easily accessiAll cables. done. The processor contains no sliding drawers or moving ble for servicing. Adequate airflow is maintained even under maintenance conditions. ee The VAX-11 is a follow-on computer to the PDP-11. Ae Hardware. oe ee ed char. ted to longer integer types, but one may use the declaration unsign te ps ene mcrae neS Bit wile ala eee the Mage ae oh eatin stored in a different bytes (a short still occupies 2 bytes), and that a long has its two halves on when converextensi sign suffer still ers order on the PDP-11 than on the VAX-11. Charact The actual configuration purchased by Department Configuration. 1353 is: VAX-11/780 cpu 0.5 megabytes memory with battery backup floating-point accelerator 12Kbyte uses-writeable control store UNIBUS adaptor with DZ11 (8 RS-232C lines) MASSBUS adaptor with TE16 tape drive (800/1600 bpi) bytes per spindle) _ MASSBUS adaptor with two RP06 disk spindles (176M additional BA1IKE UNIBUS box 1978 was $241,255; the price including a ary The list price of the above configuration in Febru DEC discount to a Bell Labs purchaser was $200, 242. Software. ‘We have implemented a UNIX operating system [2] and complete user ting system is Research version 7 as of software environment on the VAX-11/780. The opera shell, C compiler, code improver c2, April 15, 1978. The environment includes the Bourne y libS, C subroutine library libe, assembler, loader, debugger, standard 1/0 subroutine librar enance prothan 130 commands. source code control system SCCS, nrofftroff, and more Maint disk pack handling have also been grams for file system checking, bootstrapping, ani physical implemented. ting system, We began with the C language code of Research version 7 of the UNIX opera ing a C compiler which produand a PDP-11/45 running UNIX as a bootstrap machine. Creat The code generator portion of the ced VAX-11 native-mode assembly code was the first task. code ; J a i PDP-11/45 to the VAX-11/780. for deadstart load, and physically carried these tapes from the arrived on March Work on the C compiler began in mid-December 1977. The hardware system. the n of 3. We held a party on May 19 to celebrate successful multiuser operatio rd loader, based on similar portable C compiler was rewritten to do this. An assembler and Existing PDP-11/70 device for the Interdata 8/32, completed the basic support software. were adapted to the VAX-11/780. : drivers for disk, tape, and terminal communication lines etc.) were completely Assembly language interfaces (trap handlers, hardware initialization, initial file system and an for format rewritten. We then created magnetic tapes in the proper - /780 and on a Performance. Identical documents were formatted by nroffon our VAX-11 Identical C proPDP-11/70 running Research version 7 UNIX, both systems used RP06 disks. the PDP-11/70. grams were compiled and assembled on the VAX-11/780 and on As reported by the fime command, the results (converted to seconds) were: nroff -ms -e -1T450-12 ios.c >/dev/null VAX-11/780 PDP-11/70 real 47.0 54.0 user 28.6 36.9 sys 8.7 7.9 real 86.0 82.0 153.0 user 43.5 64.0 114.6 ce -c -O pftn.c PDP-11/70 (Ritchie compiler) VAX-11/780 (portable compiler) PDP-11/70 (portable compiler for Interdata 8/32) sys 11.8 10.5 16.6 time, the From the statistics on nroff one should conclude that, based on user-mode CPU VAX-11/780 can execute the code produced by the VAX-11 C compiler approximately 22% faster thar the PDP-11/70 can execute the code produced by the PDP-11 C compiler. This is a by the measure of the combined power of the hardware and efficiency of the code generated ~ _ Except compiler. as an upper limit, figures the give no indication as to the throughput, in real time and system response time, or efficiency of the operating system. The differences significant. time between the VAX-11/780 and the PDP-11/ 70 are not a "black box” comparThe times given for compilation of the file pfin.c are an attempt at er) which takes C language ison of appies and oranges. The black box is any program (compil son is that the current compari ox input and produces executable instructions. The black-b the VAX-11 requires VAX-11 C compiler running on the VAX-11/780 and compiling code for on the PDPrunning r 49% more user-mode CPU time than the current PDP-11 C compile the The apples and oranges aspect arises because black box viewpoint, are (on the inside) totally 11/70 and compiling code for the PDP-11. two compilers, while equivalent from the different pieces of software. Ritchie; the VAX-11 M. The PDP-11 compiler is a production compiler written by D. on work compiler is a portable compiler based The by S. C. Johnson. and compiling for the Interfigures for the portable compiler running on the PDP-11/70 portable compilers. We have no data 8/32 are included for those who wish to compare two enable the tests which would VAX-11 equivalent to the Ritchie compiler, and thus cannct run comparison of two production compilers. programs appears in The loaded size in bytes of the operating system and seven other (instructions) sizes on the Table 1. One should note the general similarity between the text sizes on the VAX-11 and data) alized (uniniti PDP-11 and on the VAX-11, and between the bss on the Inte:data 8/32. The particular PDP-11 system UNIX chosen has several not in the VAX-11 input/output device drivers and experimental multiplexing software which accounts for its larger text size. If many global integer variables (or large arrays) are used, there is a tendency for the data and bss portions to double PDP-11 to a VAX-11 .] ' ; going from a However, character arrays occupy the same amount of An unusually large number of references to global variables in the nroff program accounts for its increase in text size on the VAX-11 4 in size when or an Interdata 8/32 because an int occupies two bytes on the PDP-11 and four bytes on the other machines. space on all machines. more system, compared with the PDP-11. A used in the VAX-11 code program can be written to automatically change the addressing modes been so that most references to global data become done. shorter than at present, but this has not t hardware environEvaluation. We believe that the VAX-11/780 provides an excellen state, we view the ment for running UNIX and C software. With the software in its current that the 64K except , software UNIX system as operationally equivalent to a PDP-11/70 running advanced We believe that the limit on process address space is gone and progrems run faster. ties of the VAX-11/780 offer an memory management and user/system communication capabili ially higher throughput than opportunity to construct future UNIX-like systems with substant provided by today’s UNIX on a PDP- i 1/70. 3. Details Hardware . memory, and input/output Four main subsystems — the central processor, console, main processor, memory, ‘id central The — constitute the VAX-11/780 computer system. Interconnect (SBI), an ane Backpl input/output subsystems are connected by the Synchronous 13.3 megabytes per second. The internal synchronous bus with a maximum data throughput of the SBI address space is (zserSBI deals in physical addresses which are 30 bits wide. Half of registers. Arbitraticn for bus ved for memory addresses, and half for input/output device use the next bus cycle. cycles on the SBI is distributed; each subsystem decides if it will er computer. The archiThe central processor is a microprogrammed 32-bit ysneral-regist mmer is “culturally compatible” with tecture seen by the user-mode assembly-language progra can learn and understand the the PDP-11; an expert programmer familiar with the PDP-11 handles binary integers of 8, 16, and 32 bits, differences in one day or less. The processor s (64 bit) floating-point numbers, character string single precision (32 bit) and double precision tu up s string wide; and IBM-style packed decimal up to 65535 bytes long, bit fields up to 32 bits er, all other data types require res‘rictions whatsoe' 31 digits lony. Bit fields have no alignment genThe central processor provides sixteen 32-bit alignment only to a byte (8 bit) boundary. the counter pe. Software operating in one of eral registers. Register 15 is the program cinstru The sp. er register 14 as a stack point privileged access modes (see below) must use call and return tions which implement high-lev.1 procedure (pushl, calls, callg, ret) assume a the (fp, the frame pointer) and register 12 (ap, convention about the use of sp, register 13 use s string handle character and packed decimal argument pointer). The instructions which to be interruptible. Floating-point counters, so as registers 0 through 5 to hold pointers and care no separate floating-point registers. Instru operations may use the general registers, there by tion code occupies one byte and is followed tions take from zero to six operands. The opera modes (including all each. Nine addressing the operands, which require from one to nine bytes the addressing modes are independent of the the PDP-11 modes except *—(r)) are allowed, and executing in the context of a process, there are operation code. When the central processor is tive, kernel), each with its own stack four access privilege modes (user, supervisor, execu A fifth stack stack is easy to implement. pointer, software which desires a per-process kernel e interrupt context. The VAX-11/780 pointer is used when executing in a special system-wid associative, write-through, memory data processor includes an cight kilobyte, two-way set a 128-address virtual. address translation cache; an eight-byte instruction stream buffer, and A programmable buffer. MSI logic. Most of the processor is implemented in Schottky TTL ed during loss of line voltage) are stan-. realtime clock and a time-of-year clock (battery operat ng-point accelerator and user-writeable condard equipment. Options include a hardwired floati trol store. ter, local memory, floppy disk, DECThe console subsystem consists of an LSI-11 compu port. The console is connected directly to writer terminal, and remote-access communications of a conventional “lights and switches” the central processor and performs all the functions operation and disk serves as the initial bootstrap device for normal lly) 98% of all memory need for extra memory references during address translation for (typica RAMs with an error r nducto semico MOS memory is implemented using references. The all double-bit errors and 70% of correcting code which corrects all single-bit errors and detects handle 8 memory boards; using 4K all greater-than-double bit errors. A memory controller can memory controllers, thus the chips each board can hold 128K bytes. There can be two wee eee front panel. The floppy When activated by a key switch on the central holds special microcode for diagnostic operation. e. A terminal connected through the processor, the remote-acccss port becomes the consol it, diagnose it, etc. remote-access fort can halt the central processor, boot the VAX-11/780 consists of 2°°32 8-bit The virtual address space of a process running on mine one of four segments. Two of bytes. The two high-order bits of a 32-bit address deter address space of all processes. One of the these segments are system segments common to the two segments are separately defined for system segments is reserved for future use. The other context switching instructions. One of the each process and are automatically managed by the grows towards lower-numbered memory per-process segments is designed for a stack which bytes. Memory mapping hardware translates addresses. Segments are divided into pages of 512 A page table contains one four-byte virtual addresses into physical addresses using page tables. bit, a four-bit field which encodes access entry for each page mapped, the entry contains a valid number where the page is mapped. privileges, a modify bit, and the physical page-frame re!) A base register and @ iimit regis(There is no reference bit which is maintained by hardwa register of a per-process segment conter describe the page table of each segment. The bese register for the system segment contains a virtual address within the system segment, the base sor contains a virtual address tains a physical memory address. The VAX-11/780 central proces r pairs which eliminates the translation buffer holding 128 virtual address-page frame numbe | maximum amount of physical memory is currently 2 megabytes. When 16K chips are used be 8 mega(forecasted for late 1978], each board will hold 512K, and physical memory can ey failure. bytes. There is a battery backup option for maintaining data in the event of a power Esch optional battery will maintain 1 megabyte for 10 minutes. adaptors. A The input/output subsystem consists of UNIBUS adaptors and MASSBUS the SBI. The UBA UNIBUS adaptor (UBA) is an interface between a standard UNIBUS and It also conUNIBUS. the er administ bus arbitration and everything else necessary to does the addresses. The maxtains a set of registers for mapping UNIBUS addresses to and from SDI S adaptor (MBA) is an imum throughput on a UBA is 1.5 megabytes per second. A MASSBU TE16 tape, etc.). An MBA interface between the SBI and MASSBUS devices (RPC6 disk, controller on a would be more properly called an RH-780 controller, analogous to the RH-11 PDP-11/70 MASSBUS; units only one unit inay transfer data at a time, although severa! similar . The MBA contains connected to the same MBA can execute control functions simultaneously. registers lie in the /O the device control registers normally found in an RH controller. The which translate devsection of SBI addresses. An MBA also contains a set of mapping registers on a MBA is 2.0 ice byte addresses to and from SBI addresses. The maximum throughput Theoretically megabytes per second. The published limits are 1 UBA and 4 MBAs per system. of central procesone could have any number of either kind as long as the sum of the number since the SBI sors, memory controllers, MBAs, and twice the number of UBAs were 15 or less, has 15 “ports”. with the The physical packaging of the system has been dramatically improved compared is PPP-11. The VAX-11/780 processor cabinet contains no drawers or moving cables. The SBI — flow air sufficient fixed and rigid. Three one-third horsepower squirrel-cage blowers provide be replaced within even while servicing the CPU. Any logic card, rower supply, or blower can x 1.17m x twenty minutes by one person using only a screwdriver. The CPU stands 1.53m usually bolted 0.77m (HWD); cabinets housing the CPU, UNIBUS devices, and tape drive are section 2) (see tion configura Our 0.77m. x togetner to form a single unit 1.53m x 2.5lm weighs 3452 pounds and requires 42050 BTU/hr cooling. C Compiler portable comA VAX-11 “native mode” C compiler was constructed using S. C. Johnson’s it produced code which piler as a base. After one month, a reasonable version began to evolve: bootstrap PDPwas good enough to exercise the assembler, loader, and debugger (on the (which does addressing indexed VAX-11 of use make 11/45). This initial version did not or instructions, field bit shifts), index single-level array subscripting including appropriate the since particularly bugs, of share autoincrement/decrement audressing. It contained its code. hardware had not arrived and could not be used to actually rin the generated of the Substantial effort has been subsequently direcied towards improving all aspects and y, efficientl more compiier: buss have been corrected, routines have been made to execute the quality of the generated code has been improved. All addressing modes are supported, bit- wi’ and autodefield instructions are used for programmer-defined bit fields, and autoincrement crement addressing as well as three-address instructions are used. Overall, our experience with the compiler has been vety favorable. 11/780 was delivered, the compiler worked well enough to compile itself, and many user-level commands. In fact, since the delivery of the machine, dozen serious bugs have been detected. Additionally, the framework of the When the VAXthe UNIX kernel, only about a halfcompiler has pro- for ven itself to be flexible: a compiler for the Interdata 8/32 was transformed into a compiler a the VAX-11/780, some improvements and extensions were easily added, and, in general, witn that, feel authors The quickly evolving compiler has remained stable and productive. few extensions to the model of the compiler and a certain amount of tuning, the current VAX11 compiler could easily remain as the production VAX-11 compiler. n of the compiler, as well as in the There are still some deficiencies in the current versio quite large; see the statistics in section 2 and basic “product” itself. The compiler is slow and gy of the first pass can be attributed to the Table 1. Some of the blame for the size and lethar nicate for the parser, and to the use of ASCII to commu use of lex for the scanner and ya bytes 17K is r e large routines: the scanne information between passes. Both /ex and yacc produc bytes parser is 16K bytes long (over 5.5K in length (over 4.5K dytes of instructions), and the spends 20% of its time in the lexical scanner. of instructions). On the average, the first pass yylook, and 9% of its time in the parser yyparse. passes causes an additional speed penalty Using ASCII to communicate between the two of its programs, the first pass (parser) spends roughly 30% for character conversion. On typical _strout time performing output services (i.e., calls to _doprnt (18%), its time of 21% y roughl spends tor) while the second pass (code genera used to e routin the y, ionall (Addit calls to read (18%) and rdin (3%)). 31) —(2°° is (which 48" binary contained a bug which caused *.21474836 our PDP-11/45.) model. The above problems are not inherent to the compiler (8%), and printf (4%)), reading it back in (i.e., convert from ASCII to ) to be read as zero on To speedup compilation, the er), and the interpass data can be scanner can be hand-coded (as in the standard PDP-11 compil With these simple modifications formatted in binary (or the two passes can be combined). e a compiler almost twice as fast (some are already in progress), it should be possible to produc as the current one. Two features of the VAX-11 architecture — three-address instructions and indexed ure of the compiler. The full addressing mode — were difficult to mode! within the basic struct lt that it was not really difficu so address instructions proved to be implementation of threeer, tries to merge several instrucattempted. Instead, c2, the assembly language code improv example, tr: statement @ = b+7 tions into an appropriate three-a.'dress instruction. For compiles addl3__b,c.r0 movi 10,a which the improver can change to: addl3—byc,a for a savings of three bytes and over 400 nanoseconds. this shortening. It cannot tell the difference between However, c2 will not always succeed in a=b+c; return; and return(a = b +c), may be required later) since register r0 must be considered “live” (i.e., contains a value which across the return statement. of an element of a The VAX-1]1 has six indexed addressing modes which yield the address or double). The one-dimensional array of a base type (char, skort, int, long, pointer, float, statement ali) = b&) * clk); external or where i, j, and & are declared register int and a, 5, and ¢ are double arrays (either local). can be compiled into the single instruction: ~ “a muld3 b{jJ,clk), ali] oO must be a register, the base address Although the index specifier (e.g. iin the above example) or another indexed mode. For specifier can be any addressing mode except register, literal, (+ +)fiJ, and (p+ +J[i) (or their example, the C-language constructs a/i/, (sp)[il, (pill, e(ep+ + +i), respectively) all can be equivalents ¢(a+i), *(ep+i), o(--p+i), (p++ +i), and type, pis a pointer to the same done with a single VAX-11 address (where a is an array of base ze or conveniently represent type, and /is of type register int). It is usually difficult to recogni (e.g., a/i/ where a is not such constructs (e.g., @p+ +J/i/ is fun), or generate the possible cases readily addressable). ion trees of height one ‘fhe fact that the code generator can easily recognize only express (two if OREG and UNARY making MUL nodes are taken into account) causes substantial difficulty in ing. use of indexed mode, three address instructions, and indirect address the statement trees of non-trivial height occur not infrequently (e.g. as a worst case, a=b Expression + (p+ +)i{i); instruction has an expression tree of height six, but can be compiled into the single addi3__b,°(p) +[i),a raised by forcing the if p and i are register variables). The complexity of the code generator is checks, special © compression of subtrees into single nodes which are then treated with special code, etc. ent, even though The size and alignment attributes of data objects are logically independ have imposed previous hardware architectures (IBM 360, PDP-11, Interdata 8/32, ...) although prons, restrictio such alignment restrictions based on size. The VAX 11/780 has no grams run faster with data aligned on natural boundaries. The C language has little notion of basic data types alignment; because of run-time penalties, the VAX-11 C compiler aligns all the on address boundaries which are a multiple of sizeof the basic type. Due to questions about on char c:10,. alignment, both the language and the compiler have difficulty with the declarati effects which cannot The decision to naturally align most data items has urdesirable side be ignored. Consider the structure declaration struct foo [ char c, float f; } bar; is currently 8 bytes On the PDP-11, sizeof (oo) is 6 bytes while on the VAX-11, sizeof (foo) 5 bytes in each case. (the offset of f within bar is 2 and 4 respectively). sizeof (foo) could be floats, the differing alignment Although both machines use the same data formats for chars and s cannot speak directly to machine imposed by the the VAX-11 C compiler means that the two Since information. binary one another using media whick record structures containing . alignment is important, we feel that it ought io be specifiable in the C language Operating system conversion rting software A UNIX system running on a PDP-11/45 was used as the base for transpo produced by members of to the VAX-11/780. The software itself originated with the code Programs were crossCenter 127, Computing Science Research, for the Interdata 8/32. absolute bit-string files compiled, assembled, loaded, and put on magnetic tape in ¢p format, the VAX-11/780. were put on tape ‘n dd format. Tapes were then carried across the room to (in assembly An absolute tape boot (in machine language), «p boot and: primary disk boot verifier, disk er, formatt (disk utilities lone language), secondary disk buut (in C), and stand-a tape-to-disk, disk-to-ta; , disk-to-disk, and disk-to-console, all in C) were then used to bring up the system. er than expected. Establishing an initial file system on the disk took long was running USG issue 3 of the UNIX The PDP-11/45 operating system with a "16-bit" file system and the system. Also, C-language code on VAX-11/780 was to have a Research version 7 °32-bit" file be stored in a different order than Cthe VAX-11 expects the bytes of a 32-bit integer to red herrings hard, and suffered. We language code on the PDP-11}. We swallowed these two em is to modify the program mkfs so now know that the proper way to create an initial file syst ng the proper bits, put that file on that its output (on the bootstrap machine) is a file containi ine. tape, and use the tape-to-disk utility on the target mach g system onto the hardware archiMapping the software architecture of the UNIX operatin s. Commentary on these decisions foltecture of the VAX-11 required a number of decision lows. The SCB (system context base) processor the user stack register contains a page-aligned physical puts memory address which is the base of the hardware this vector at physical memory address zero. the VAX-11/780 Operating system code, data, kernel stacks, and interrupt stack occupy and data are loaded into system segment (virtual addresses 80000000 to bfffffir). User code segment cero and (0 to 3fffffif) interrupt vector. is initialized The system UNIX in segment one (7ffffif to calls User processes pass arguments to system service code using the ordinary 40000000). The privileges. kernel gain to used then subroutine calling sequence. The chmk instruction is does but stack, kernel the chmk instruction switches the stack pointer sp from the user stack to the value in ap to - not change the argument pointer ap or the frame pointer fp. The kernel uses values to be directly copy the arguments into u.u_arg. The VAX-11 hardware allows the addressed, but the kernel software requires the copy. keeps swappable The w area is a per-process data structure in which the operating system information about a process. The kernel virtual address of the u area must be a constant across address 0160000; when all processes. The PDP-11 implementation puts the wu area at kernel space segmentation process switching occurs the u area is switched by changing a kernel data the u area could register. Since the operating system can address user memory on a VAX-11, be placed in (protected) user memory, say at address 0 or at 7fffe000. However, it was desira- s part of the w area, ble for the first implementation to make the page tables for user segment base of the u area The space. system in lies area u which creates timing problems unless the the u area is occurs, g switchin process When . was assigned kernel virtual address 80020000 translation le page-tab the ting invalida and table changed by changing the system-space page cache for the appropriate pages. process, Since the operating system can directly address the meme-y of the cur:.nt user macros into made be could and the procedures fubyte, subyte, fuword, etc., are unnecessary with (along es procedur these , which would merely do the appropriate load or store. However copyin and copyout) were kept to ensure that each access to user space is valid. to A VAX-11/780 internal processor register called the PCB (process context base) points when an area in which the VAX-11/780 saves the hardware state of the machine (96 bytes) switching context. This save area \.as put in the wu area as u_rsav. The implementation of context switching required major effort. The VAX-11 has two very nice instructions (svpetx, save process context, and Idpctx, load process context) which facilitate context switching. Unfortunately, they do not impiement the mechanism which the UNIX system expects. (The mechanism used by UNIX is so dispersed and intricately detailed that it is hard to imagine any hardware which implements it directly.) The terptstion to drasti- cally change the UNIX code has been resisted so far. inated, but it took more than a week. The newer The savwretudretu tar pit was VAX- save/restore primitive does make the C- language code prettier, but the assembly-language side (at least for the VAX-11) is just as dirty as ever. The UNIX context switching mechanism requires three state save areas, W.u_rsav, also used for abnormal returns. The u.u_ssav, and u.u_qsav because the seme mechanism is of the ctions use only a single state save area. To make use VAX-11 context ‘switching instru deal of microcode and bastardizes call VAX-11 instructions, the software simulates a great is certainly high on the list of things to frames in a most ugly manner. Context switching the PDP-11!). rewrite in the second implementation (even for to implement. The procedures sureg and estabur were also tricky They were designed with fewer) of registers would be needed to map the the assumption that only a small number (16 or process requires 64 page table of a user process, while on the VAX-11 a 32K address space entries. Furthermore, the memory expand and getxfile. Handling DMA map in of a process is diddled in tricky ways, particularly eneck. I/O hardware was the other major implementation bottl The UBA ry page numbers, and physical addresses are and MBA mapping registers contain physical memo hardware which implements the mapping hard to handle. It is not pleasant to deal with the ing registers may be neither read nor registers. If an I/O transfer is in progress then the mapp by the transfer. As a result, the written; this applies even to registers which would not be used ng the current 1/O operation. Furthermap for the next I/O operation cannot be setup duri the byte counter is only 16 bits wide. more, a single transfer is limited to 64K bytes because I/O operations. The solution to these ple Thus swapping a process to the disk can require multi registers in each map to service both problems involved permanently reserving the last 129 ters are available to map the system swap and physical I/O operations. The remaining map regis ECC error correction is currently buffers, and are loaded at system initialization time. Disk s on raw I/O cause process terminadone only for /O involving the system buffers. Disk error tion, the swap area on disk had better be error-free. entation for the VAX- 11/780 Like the UNIX system for the PDP-11, the current implem when there y and swaps processes to disk maintains each process in contiguous physical memor fragmentation is not enough physical memory to contain them all. Reducing external memory a a g hardware for scatter loading is high on to zero by utilizing the VAX-11/780 memory mappin pass. To simplify kernel memory allocathe list of things to do in the second implementation an assembly parameter which currently tion, the size of the user-segment memory map is text, data, and stack. This also deserves allows three pages of page table or 192K bytes total for to allow processes larger than physical to be rewritten, both to allow varying process size, and would mean dynamic wu area size if memory through demand peging. Dynamic page table size the page table remained part of the u area. s a tedious simulation of the The code in sendsig for sending a signal to a process involve privilege modes upon termination calls instruction due to the problem of “inward retum” across of the kernel code readable by a of the routine which handles the signal. Making a portion a problem with the Bourne shell, the user-mode process would simplify sendsig. Motivated by signal number is passed as a parameter to the signalled routine. uses the low-order bit of a Interprocess communication via signals (signal and kill) implies that a procedure which machine address for something other than addressing. This that every procedure must | means which ry, handles signals must start on an even byte bounda a pseudo-op to the assembler to start on an even byte boundary. The C compiler thus issues on a VAX-11. It also imposes memory align the beginning of each procedure. This can waste of conditional jump instrucion a nontrivial requirement on the assembler, since if the resolut alignment directive must also tions can change the parity of the length of a procedure then the distinct value be handied like a conditional jump. In hindsight, it would have been better if a bottom bit. (say +1 or -1) were used for ignore, rather than multiplexing the n by zero. The sysThe VAX-11/780 provides a (non-maskable) trap for integer divisio subscript into a signal to the process. A similer situation exists for tem would like to turn this underflow, and reserved operand also range trap. Integer overflow, floating overflow, floating -10- is needed with some other means for need signal numbers. Perhaps only one “error” signal interrupts, signals, asynchronous I/O, ar? determining the true fault. The whole business of attention. the use of the hardware AST mechanism deserves more involving the proc and A bug was discovered in the UNIX code for process termination only be noticed if a would but it xproc structures. (The problem also existed on the PDP-11, highly unlikely.) is which process had accumulated more than 65535 ticks of system time, When a process dies its resource process CPU time) are temporarily dents of the parent process. The process issues a wait system call; utilization statistics (currently only saved so that they can be added to actual accumulation is done by the the child process is then completely exit status, system, and the totals for the descenkernel when the parent erased. Tue kernel was dy the scheduler to contain overlaying the statistics in a part of the proc structure normally used no harm. But “~ the causing ately, immedi the pointer p textp. Ordinarily the exit was processed the scheduler could sneak in after system was loaded so that swapping was necessary, then the interpret the timing data in the child exited and before the parent read the statistics, and would memory reference from zombie xproc structure as a pointer. This invariably caused an illegal kernel mode on the VAX-11/780. a design quirk in One of the greatest disappointments with the current system stems from between floating-point the FP-11 floating-point processor for the PDP-11. When convertir. to be stored at the and 32-bit integer, the FP-11 expects the high-order 16 bits of the integer of the PDP-11, lower memory address; this is not in line with the general "right to left” design the PDP-11 for code which would place the low-order 16 bits in the lower memory address. C e stores the least . uses the FP-11 convention for storing beng integers. The VAX-11 hardwar for the VAX-11 significant bit of any integer data type in the lowest addressed byte. C code nted in the represe integers long ing contain files uses the hardware convention. This means that local convention are not binary compatible UNIX system on the PDP-11. This is the machines: char, short, float, and double all (and the structure alignment problem noted between a UNIX system on the VAX-11 and a only exception for data types common to both have a common representation. Except for this earlier), disk packs containing 32-bit file systems, Plus for the tapes, etc., would have been interchangeable. The fact that DEC’s Fortran-IV between PDP-11 avoided the FP-11 convention, and that RSX-11 files are binary compatible the VAX-11 and the PDP-11, is only salt on an open wound! Subroutine libraries libe. Conversion of the system-call Most routines are merely LI: .word chmk bee jmp ret interface routines was straightforward but tedious. 0x0000 $nn.Ll cerror The routines printf, ecvt, and fevt were left to 1ibS and were not implemented in libe. iibS. Conversion of the standard input/output library libS posed no problems except for __doprnt, the routine which constructs character representations of other datatypes for the prin- ting routines printf, Jprint/, and sprinyf. Since many programs spend 15% to 20% of their execution time within __doprnt, it pays to code the routine for speed in assembly language. Packeddecimal instructions handle decimal, unsigned, and floating-point conversions. The algorithm chosen for converting from floating-point to character string revealed a microcode bug in the VAX-11/780's ashp (arithmetic shift and round packed) instruction. Under certain conditions a carry from the rounded digit propagated both to the adjacent digit and to the digit eight places further left. This usually caused an overflow, since the destination packed-decimal string was -ll- for the spurious carry. DEC claims to have a fix typically not long enough to represent the cts corre meantime a five-instruction patch detects and bug, but the FCO has not arrived. In the the spurious overflow. Commands as, id. 8/32 was the model for an interCode developed by Center 127 for the Interdata heuassembler uses an algorithm described in [3] with pretation by a VAX-11/780 artist. The jump ristic improvement of [4] to resolve conditional pseudoinstructions. Variable-length, ~—” files to forced the relocation information in object unaligned instructions and address constants deducing for each relocatable datum, rather than include the explicit segment-relative address the between the position in the segment and the address from a one-to-one correspondence infor- This caused a slight change in the header corresponding position in the relocetion table. mation within object files. generated by the VAX-11 C compiler is c2. The code tmprover for the assembly language usage pass, performed once A “backwards” register based on a similar program for the PDP-11. is live addition. Knowing that no temporary register and before anything else, was a major where pass introduces three-address instructions across a backwards jump, the register usage bs), extract field jump on bit (jbe, jbs, jibe, ever possible. It also recognizes situations where pushal, pushab) instructions can be used. (extzv, movzbl), and move address (moval, movab, aob, acb was als extended. instructions sob, The code for insertion of fancy loop control a lic debugging routine was the writing of adb. Tne most signifcant change to the symbo outand input uctions. Additionally, the character disassembler for VAX-11 nativeemode instr initialized radix for all numeric values. The radix is put routines were modified to use a default to sixteen. sh. interpreter. The (Bourne) shell is the star.dard user command It required by far the it is not portable program, for the simple reason that largest conversion effort of any supposedly rewritpainstakingly be to language and had portable. Critical portions are coded in assembly in routine standard functionally different from the ten. The shell uses its own sbrk which is the giving a signal to be passed a parameter libe. The shell wants the routine which fields a private routine. This was handled by also was number of the signal being caught, signal in the first place, doing away with the having the operating system provide the parameter sys(for constr cting the argurcent list to an ex2e private code for signal Tie code in fixargs tem call) bad te be dicdled. ns Jievimem ijostat. (physical The process memory) and when input/output they should status have commands referred consistently to Mev/kmem referenced (kernel virtual by the kernel were allocated jiostat also assumed that certain variables maintained memory). as part of a structure. contiguously, even though they were not declared pr. bug that caused a division by zero The command which formats and prints files had a On a PMP-11 several files and the first file in the list did not exést. when it was asked to print 2 VAX-11 it gives an unmaskable trap. © division by zero returns the dividend, but on their arguments using the first parameter cat, du. These two commands did not count -1) could be ent (argv/argc], initialized as argc, but rather assumed that an additional argum ss references the fixed end of the stack, used as a pointer. On the PDP-11 the resulting addre on the VAX-11, -1 is an illegal address. preparation and phototypesetter commands nroff troff. The source code for the document produce properly ruaning version of these comis not portable; several weeks were required to quite it) constent “2° instead of sizeof(int) was mands. Use of the explicit (or worse, implic y occup ns are adjacent in external declar.iio common. The cede assumes that variables which proge tables are initialized by assembly-lsngua contiguous memory at execution time. Several thought it knew the grams. | code which Converting the tables was merely tedious, changing the tia alee PI wine oe ga was created using the conver- to provide version SCCS. Version 4 of the Source Code Control System [5] is used itself had not SCCS for source The backup for software in case disastrous bugs are introduced. ng. The massagi some d require ! quite been converted to version 7 UNIX, and the header files procedures for dynamic PWB routines logname and pexec had to be simulated. The utility and to remove PDP-11 storage allocation required some work to integrate them with libS delta to bomb. The dialect. The exit status of the dif’command changed in version 7, causing The documentation code implicitly assumed that all checksums were computed modulo 65536. procedure safoi The "65535". say reaily is incorrect: everywhere "99999" appears it should paran.ter. Naturally, satoi returns two values, storing one of them indirectly through a pointer to track down. day a and its callers did not agree on sizeof the stored value; this took 4. Software portability We thank the members of Center 127, Computing Science Research, for their efforts in re portable. producing the basic software and for their recent efforts towards making the softwa system for g runnin a create quickly can The fact that peor‘e other than the original develcrers a new machine is a tribute to how well the original work was done. stumbled Yct in our effort to transpuit a complete UNIX system to the VAX-11/780 we g lack or seemin across a large number of nonportable constructions and were dismayed by the strongly recomapprapriste facilities to detect and prevent them. Based on our experience, we er ed sostint enhanc beil and ks comp andge mend that the C langua The actual arguments in a procedure call are type checked against the procedure declara1. protion, and a “dummy” declaration which specifies types is permitted even if the called cedure is not actually declared in the same compilation. 2. 3. The '—>’ operator is checked to insure that the structure element on the right is a member of a structure to which the pointer on the left may point. A structure element may be declared with any name as long as the name is unique within (The current requirement that a structure the immediately surrounding structure. element name must uniquely correspond to an offset from the beginning of the structure, across ail structures in a compilation, creates naming problems and frequently leads *a errors of the type noted in item 2 above.) 4. The issue of alignment to an even-byte (or other) boundary is brought into the open, so that arbitrary data structures can be accurately described. There is a program called Unt [6] which, if conscientiously used throughout the life of a piece of sc{vware, provides type checking which partially addresses the first two points in the above list. The problem is that Jint is big, noisy, relatively recent and unknown, and (partially as a result) infrequently used. There is little incen.ivs for the average programmer to use lint as a matter of course. The authors believe that type checking belongs in the everyday compiler as the defauli, where it is very inexpensive to implement. Those who wish to do “dirty” work may request that type checking be disabled; those who wish to bless their dirty work may use type casts. We believe that these four enhancements would go a long way towards making C langu- age software portable as a rule rather than as an caception, thus preserving Bell Laboratories’ investment in present and future C software. Bb This memorandum Face Pte i format of an 2.out file required some effort. ted nrofftroff programs on the VAX-1 1/780. wai Tees) ees and Department 8234, for helpful comments and suggestions. uns Aboud Thomas B. London Te e aneT ng questions Acknowledgments. Thank you, D. M. Ritchie and S. C. Johnson, for answeri stand-alone utilities, at key moments; G. K. Swanson, for assistance with boot procedures and help in bringing up for Sharma, K. D. and J. F. Jarvis, for the mathematical function library, 127 and 135, Centers of s member user-level commands. Additional thanks go to many other Tees BaP emer ™ er -13- F Renew ohn F. Reiser HO-1353-tbi/jfr Att: References Table 1 Maynard, Mas- sachusetts, 1977. 17, 7 (July D.™M. Ritchie and K. Thompson, The UNIX Time-Sharing System, CACM 1974), 365-375. See also BSTJ 57, 6 (July-August 1978), 1905-1929. Design W. Wulf, R. K. Johnsson, C. B. Weinstock, S. O. Hobbs, and C. M. Geschke, The of an Optimizing Compiler. American Elsevier, New York, 1975. 78J. F. Reiser, Common Instances of Pathological Span-dependent Instructions, TM 1353-3. SCCS/PWB User's Manual, The Source Code Control System. §.C. Johnson, Jint, a C Program Checker. Computing Science Technical Report #65, Bell Laboratories, December 1977. ne Handbook. ee Architecture aes 5. 6. -VAX-11/780 wee 4. Corporation, oe 3. Equipment SD ae 2. Digital Vee 1. ee References Se Data Bss Total ede Text 2470 44040 79976 PDP-11 VAX—11 Interdata 8/32 PDP—ii 48064 Interdata 8/32 94574 39208 = 78216 11904 39448 19826 29492 32192 17656 23512 24920 74218 90524 =117718 PDP—I1 VAX—11 Interdata 8/32 21248 23408 35652 6254 9092 9032 $246 7§52 7560 32748 40052 52244 PDP-11 VAX—11 Interdata 8/32 VAX=11 34476 4292 131088 ; = See os C, passl ed a — * ~ i Le a C, pass2 grep PpP—il 1936 Interdata 8/32 11950 1160 1936 15046 PDP-—11 VAX—-1l Interdata 8/32 768 1140 1920 3856 5764 5768 11728 13788 23348 PDP—11 29312 6684 7842 43838 9408 _ 10636 - 58836 6656 1578 2104 10338 ee es VAX—-11 A4 : ~ q . ls nrofft § 4 VAX—11 Interdata 8/32 : 4 ia sort a 4 ‘ al j PDP-—11 VAX-11 Interdata 8/32 36360 - 6580 13886 1764 2208 2788 2792 Table 1. Loaded Program Sizes (in bytes) : 4 7276 476 4864 11132 18886 A ake Bente Se M4 i ; fa /unix System Ee See Program ee Ss -14- Serre A tel pes gpae pnctc — se Dene aire ii RO ere