https://groups.google.com/d/topic/comp.lang.scheme/7k8iPOpaCLc/discussion John Rose 10/5/94 There is a new possibility of making a release of the Sun "esh" software. This is timely, because "esh" was designed from the ground up as a glue language for system services. If you are interested in such things, please read the technical summary below, and let me know what you think. I think "esh" closes the gap with C-level system services, much more thoroughly than the other glue technologies. Is there some way now to draw in hestitant non-expert users and thus complete the circuit from user to system platform? A freeware "esh" with a clean, portable implementation and reasonable documentation would go a long way toward that goal. I can't devote much time to carry on this effort (although I'd like to). I am therefore looking for practical ways to transform "esh" from an initial freeware release to something reliable and portable, and without doing too much of the work myself. ("Too much" == takes away from unrelated, important tasks at Sun.) If you might be interested in participating in such an effort, let's talk. Or, if you think a merger of the "esh" 64-bit techniques with an existing freeware 32-bit Scheme implementation could make sense, let's talk. What I don't want to do is sit on this stuff any longer, and my management may soon agree with me, depending on how much interest people have in it. -- John "esh" Technical Summary (10/94) The relevant published material on esh is is: Rose & Muller, "Integrating the Scheme and C Languages", __Proc. SIGPLAN Conf. on Lisp & Functional Programming__ (1992). Rose, "A Minimal Metaobject Protocol for Dynamic Dispatch", __Proc. OOPSLA 1991 Workshop on Reflection and Metalevel Architectures in Object-Oriented Programming__, Phoenix, Arizona (October 1991). Rose, "Fast Dispatch Mechanisms for Stock Hardware", __Proc. OOPSLA__, San Diego, CA (1988). See also the comp.lang.scheme posting of 7/21/94. The basic idea of a "glue language" is now familiar. Modern Unix sports an increasingly rich set of library services. While the shells are very good at driving the string-oriented "exec" interfaces, there has not been a correspondingly flexible way to drive tightly coupled procedural interfaces. "esh" stands for "Embeddable Shell", and is an implementation of Scheme specifically as a glue for procedural interfaces. (I have found, however, that to most people, "shell" means "something for reading and processing the strings I type".) Scheme makes an almost ideal glue language for procedural interfaces: + arbitrary types can be adjoined to its data model + much glue is control flow, which Scheme is good at + Scheme has a reasonable set of built-in utilities, notably container types + Scheme semantics are simple and therefore unobtrusive + the language works at all scales, from one-liners to large systems In order to solidify those advantages, it was necessary to implement Scheme in an unorthodox way. The goals included these: + all C(++) types must be fully integrated (see the first reference above) + Scheme procedures and C procedures should be interchangeable + there should be no "noticeable" overhead for crossing language boundaries + auto. storage management must follow references in both Scheme and C types The implementation of "esh" is unique in its tagging scheme and its integration of statically-typed procedures. The GC is unusual (but not unique) in its conservative and uniform treatment of heap data. Here are the technical features of "esh": Object references are 64 bits, including one word for a vtable-like type descriptor. This makes tagging and untagging of C values trivial and cons-free, up to 32 bits. All type checking is fully polymorphic: There are no distinguished types. Polymorphism overhead is similar to C++, 5 RISC instructions plus a procedure call. An "esh" procedure is any value satisfying a certain calling protocol. There are several implementations of this protocol, including native and interpreted closure types, native C functions, overloaded C++ functions, and C++ templates, to name a few. Since everything is done with procedure calling in Scheme, there are many classes of procedure. The dispatch mechanism is neutral to languages and object models. Unlike C++, both types and operations are allocated dynamically. Method lookup, inheritance, type parameterization, etc., are all defined, usually in C, by a metaobject protocol. (See second ref.) C and C++ types are imported directly and automatically, by means of a header file compiler "ix", whose output is an *.o file containing appropriate tag and procedure definitions. In order to support more direct gluing to stateful interfaces, such as X resources, the Scheme procedure type is extended with a new sub-type called "accessor", which is settable, via the usual extension to "set!". In order to support direct gluing to callback interfaces, there is implicit conversion to the C procedure types from any procedural class, by means of a trampoline procedure generate on the heap. In order to support gluing to C++ "virtual function" interfaces, there is a facility for dynamically generating subclasses of a given C++ class. This is provided by "ix" for any class with a virtual function. The Scheme arithmetic operations are polymorphic and extensible. (At present, no one has tied bignums to it, but other extensions have been made for particular applications, such as time-varying values.) The garbage collector is standalone and replaces malloc (as in PCR). It supports multiple heaps (of independent formats) and parallel (multiprocessing) collection. It does not yet support relocation or generations, but could. Interpretation is provided by a bytecode engine. The bytecode engine runs in a pair of C stack frames, without depending on any global data structure for control flow management. The bytecode design is simple and language independent. There is a small debugger for the bytecode engine. The byte compiler passes through source line and variable name information, which the debugger presents (imperfectly at present, but usefully). There are tracing and timing packages. There is a modest set of exception handling utilities. Exceptions are plists with certain well-defined fields. There is dynamic-wind. The call/cc primitive is limited to non-local exits. There is no reason this restriction could not be lifted (as in Elk or Scheme->C) except that multi-thread integration would then be somewhat difficult. For scalability, there is a native compiler "eshc", featuring some typical Scheme optimizations like inlining, stack allocation, and direct arithmetic on fixnums. It consumes bytecodes. It produces low-level C code (like Scheme->C; "ix" does also). It can create C callable routines. It is reasonably good at laying out complex block-compiled closure groups. The code is largely portable, with certain well-defined exceptions, such as the means for returning 64 bits from a C function, for compiling a trampoline procedure onto the heap, for enumerating GC root sets, for making a tail call from a natively compiled procedure, packing and unpacking C argument lists, etc. However, "esh" presently works on SunOS 4.x and Solaris 2 (SVR4), and only on SPARC. The usual Solaris packaging mechanisms, such as object files, shared libraries and dynamic loading, are usable with esh. The ".init" section feature supplies the basis for load-time execution of Scheme code. There are independently developed packages at various stages of completion for providing Motif and other graphics, Tooltalk, and other libraries. A CLOS-like object system has been developed (but not integrated with the machine-level dispatch kernel). Also included in "esh" are a grab-bag of independently engineered C-language packages which provide parts of the Scheme run-time, but are also usable separately. These include the garbage collector, the dynamic dispatch kernel, and various one-file "modulettes" providing page tables, ropes, C closures, C tail calls, exception handling, lists and arrays, pathname parsing, etc., etc. Documentation is a set of plain files describing the differences from the Scheme standard. Most language changes are carefully described. There are examples showing how to use the more complex features, such as the Motif bindings. History, status, futures: Most of "esh" was developed in 1991. Today, the technology is stable and in daily use internally to Sun. It has proven itself useful for a number (> 5) of projects within Sun. It is not a Sun product. The typical glue user starts by "ix"-ing a favorite library, and then capturing higher-level usage patterns as Scheme utilities. The Scheme implementation appears to support applications with several tens of thousands of lines of Scheme code. The typical application user starts by gluing together a prototype, and then growing it, rewriting selected portions in C or Scheme as the architecture evolves. It has been easy in practice to develop mixed language applications. In particular, recoding "hot spots" in C or C++ has been straightforward, without needing to rearchitect unrelated parts of the application. Here's how a user in the Sun CAD group put it about a year ago: The real, killer technology here is the Scheme/C interface. The program "ix" (interface extractor) reads the C header files to learn about the data structures and then builds the routines to convert C-space into Esh-space. It's this automatic process that makes C/Esh integration easy and cost effective. Another key advantage of this low cost integration is the ability to move the interface between the two languages. Often, it's better to develop and algorithm in a higher level language like Scheme or Awk and then re-implement in C. The Esh the cost of going to C, if necessary, is greatly reduced. Although no exhaustive survey has been made, there are about 100k lines of application code developed on top of esh. Users of "esh" often report the productivity gains roughly associated with using a high-level language, usually 2x to 5x. Performance is usually acceptable, especially for GUIs. The bytecode engine performs in the usual range, between two and three orders of magnitude slower than optimized C code (depending on the amount of optimization). The native compiler gains up to a 50x speedup, depending on the amount of inlinable primitive use. The performance problems that arise have been tractable. The worst problems have been due to unconscious or misguided use of the bytecode compiler in inner loops, or missed opportunities to use lexical scoping. ("Eval considered harmful.") Often an expensive Scheme inner loop is recoded in C, leading to the production of small, reusable C utilities as a side benefit. At present, "esh" is link-compatible with Sun's thread library, but lacks appropriate locking of global data structures to be MT safe. It can be MT hot on any machine that supports atomic aligned 64 bit memory accesses, assuming an MT hot adaptation of malloc(). There is more C++ work to be done, such as integrating C++ operators with Scheme generic arithmetic, and improving the subclassing API. There is little being done with modules (or static type checking). Scheme dynamic linking does support multiple separate namespaces internally. There are macros which use Scheme's lexical scoping to support block compilation, with convenient selective exporting. R4RS macros are still missing. Despite its name "esh" is not a superset of the standard Unix shells. At present it lacks pattern matching and facilities. However, it contains nearly all of the "libc" facilities, plus a few Scheme wrappers which simplify C interfaces like . More of these are easy to add, and it is also trivial to import new C libraries, such as your favorite regexp package, by means of "ix". It would also be very interesting to blend the Scheme Underground shell work with the esh ability to dynamically attach to C services. Although "esh" is unlikely to be supported by Sun any time soon, it might become a Sun freeware release, if there is demand for it. -- -------------------------------------------------------------- John R. Rose Sun Microsystems, Inc. john...@eng.sun.com 2550 Garcia Avenue, M/S 12-40 (415) 336-1071 Mountain View, CA 94043-1100 (The opinions expressed here are mine, but I'll gladly share.)