Open Network Emulator

Students

The continuing exponential growth of the Internet has resulted in the rapid proliferation of new network protocols. Protocol interactions have become vastly more complex and it is not longer possible to analyze their runtime behavior in small experimental testbeds. The success of the socket abstraction in hiding the complexity of lower protocol layers, has in its own way exacerbated the problem of protocol interaction. As the oft-quoted example of HTTP 1.0’s use of multiple TCP connections and associated degradation of TCP performance shows, it is not always possible to foresee the impact of design decisions on the operation of other protocols. Furthermore, critical network protocols, such as routing, are highly distributed in nature. The complexity of router software – the backbone of Internet communication - requires large-scale testbeds to verify its scalability and validate the correct operation of its various component modules.

The last several years have seen the deployment of protocol development environments that allow users to create complex controlled experimental test-beds to verify and validate network protocols. Protocol development environments can be broadly classified into (a) Network simulators and (b) Direct code execution environments (DCEE). While simulators such as NS, OPNET, REAL, x-kernel, PARSEC, dummynet, SSFNet offer an efficient event-driven execution model, they require that the protocol under test be written in their event driven model. The simulated protocol can be refined and then converted to a real-world code implementation. The basic problem is that there is no easy way to ensure the equivalence of the simulated protocol and its real world code version. A secondary issue in network simulators stems from its clean-room implementation. Since the TCP/IP protocol stack in simulators is written from scratch, it does not exactly emulate real-world protocol stacks, in particular the idiosyncrasies of real-world TCP/IP, which can significantly affect performance.

DCEEs - such as ENTRAPID, NIST network emulator and MARS - solve the verification and validation problems by directly executing unmodified real-world code in a network test-bed environment. However, they have their own set of problems, largely due the lack of a development framework. First, direct code execution environments use OS processes for each network protocol/application under test. The large context switch time of processes and OS limits on the maximum number of processes inherently restrict the scalability of this solution. Secondly, in order to represent multiple network nodes and the digraph nature of each network node, DCEEs use specialized OS kernels, which precludes their parallelization using optimistic parallel simulation strategies – it is hard to rollback kernel state as opposed to user state. Finally, since DCEEs work in real-time as opposed to virtual simulation time, they suffer from an inherent lack of temporal determinism which impacts controllability of experimental test-beds.

Goals

The reseach goal of this proposal is to enable the systematic study of network applications and protocols through the development of a scalable network emulation testbed. To achieve this goal, we present a new network emulation system called the Open Network Emulator (ONE), based on a novel compiler directed framework that supports both simulation and direct code execution paradigms. This integration of paradigms - which often have opposing requirements – enables a new dimension in verification and validation of network protocols. Event-driven simulation models can now be validated through their interaction with their real-world code counterparts. The enabling technology of this proposal – our compiler directed strategy – provides a modular framework that includes support for automatic checkpointing and recovery, without necessitating application support. The impact of our research focus – a modular approach to scalable simulation – is not restricted to our problem domain of network emulation. Our compiler directed strategy for generating composable code objects enables the vast repository of existing non-OO codes to effectively use the power of grid computing environments such Globus and Legion. Support for transparent checkpointing and recovery, enables large scientific simulation codes to use optimistic parallel discrete event simulation (PDES) algorithms without modification. Support for runtime code migration enables new dynamic load balancing algorithms that are of use to large-scale parallel compuation.

Research Challenges

To represent the scale and heterogeneity of the Internet, the ONE is architected to support high levels of scalability – order of tens of thousands of network nodes. This presents several fundamental research challenges.

Components of the Open Network Emulator

This research uses a novel vertical integration of

This synergy creates a new modular framework for the development of large-scale simulations using code composition, without restricting the application programmer to any language or programming paradigm.

Figure 1: Architecture of the Open Network Emulator system. ONE can support wireless link-layer modeling through continuous-time S4W models and hardware emulation by physical wireless interfaces.

Architecture

Figure 1 shows the architecture of the Open Network Emulator. At the top level, the ONE consists of multiple virtual hosts, each of which has a protocol stack and multiple applications linked to the protocol stack. To support the development of large integrated network testbeds, the ONE architecture combines multiple virtual hosts—each with network applications and protocol stacks — into a single user level process using the selectively shared namespace abstraction provided by the Weaves framework. The applications and the protocol stack may be simulated using an event driven model or emulated through direct-code execution. To emulate the protocol stack, we need to port the stack from kernel space to user level. Since protocol stacks depend on kernel functions, we need to expose the same set of functions at the user level, which brings us to the OS personality layer. The OS personality layer exposes a function interface, similar to the kernel of an operating system, in effect assuming the personality of a particular operating system. Emulated protocol stacks expose the standard socket API. Simulated IP stacks follow the clean room implementation model used in NS. Simulated stacks can be written in an event-driven model and can directly interact with the underlying parallel discrete-event simulation layer within ONE. Since the architecture of ONE supports both simulation and direct code execution, it provides the ability to conduct additional verification and validation tests through the interaction of event driven and actual code versions of the same application or network protocol.

The ONE architecture includes a fault generation module that provides hooks for the protocol designer to generate controlled experimental testbeds. User defined hook functions are called when control traverses from one layer of the protocol stack to another. Since hook functions are provided both the data as well socket information, they can be used to inject controlled faults into the system. Hook functions also have access to the underlying parallel discrete-event simulation layer, allowing them to use current information to inject faults at a future time. The secondary purpose of the hook functions is to perform user-specified statistics collection to monitor the performance of the protocol/application under test.

At the bottom layer, the ONE relies on an optimistic parallel discrete event simulator that operates over a cluster of interconnected workstations (processors). The parallel discrete event simulation manages a distributed event queue and a global timeline based on the relativistic time model. The temporal model enables mixed frameworks with both real-time components as well as virtual time models, while maintaining a consistent monotically increasing view of time.

Current Status: 1/15/2005

Acknowledgements

This work is supported by NSF CAREER-0133840 and NSF NGS-0305644. We thank the National Science Foundation for their support.