Previous Up Next

Chapter 19  Building Portable Polyglot Software

Babel generates very portable source code for multilingual programing. There is also an art and science to transforming the source code to binary assets without breaking the language encapsulation Babel is trying to create. This chapter discusses the details: from the mundane issues of file layout, to the arcana of linker and loader flags.


19.1  Layout of Generated Files

Babel generates a lot of files. Many of these files you never have to look at in an editor, but they must all be compiled and properly linked into an application (see Section 19.2). In this section we discuss several flags that can affect where files are generated.

Arbitrary combinations of the above flags are allowed. Regardless of the order they appear in the commandline, they are applied to the resulting path in the order they are presented above. For example if a SIDL file pkg.sidl defines a Cls class in the pkg package, and the user runs Babel as follows:

% babel -lugo there -sc

Then the majority of the sources will be generated in the there/pkg/c/glue/ directory (except the Impl files which will occur one directory up in there/pkg/c/). Note the use of equivalent short-form commands in this example. If readers wish to review long and short forms of command line arguments, see Table 4.1 on page ??.

Note that many of these options were contributed by users and are not employed in Babel’s own build. Instead, we tend to put a SIDL file in a directory and then generate client-side or server-side bindings in either runXXX/ or libXXX/ subdirectories, respectively (where XXX is a language name). We don’t use the --generate-subdirs or --hide-glue flags because they place source files that belong in the same library in different directories. Automake, which Babel uses as part of its build system, works much more reliably when all the sources that go into a library appear in the same directory as the library to be. The --language-subdir has a similar effect to what we do manually, but doesn’t capture if it was client-side or server-side. In our tests and demos, we tend to build these separately because we want to exercise different drivers with different implementations.

The GNUmakefile generated by the --makefile command line option does not attempt to address all the possible combination of flags affecting the layout of generated files. It assumes that you generate files in the default locations.

19.2  Grouping compiled assets into Libraries

Babel enables one to completely encapsulate language dependencies inside a static or dynamically loaded library. This means that one can take a SIDL file and a compiled library, generate the bindings they want in their language of choice from the SIDL file, link against the library, and use it never knowing what the original implementation language was.

19.2.1  Basics of Compilation and Linkage

What we generally think of as a compiler is really an ensemble of related tools. Generally there is a preprocessing step where very simple transformations occur (e. g. #define, #include directives and others). Next, the compiler proper executes and typically transforms your sourcecode into assembler or some other intermediate form. Optimizers work on this intermediate form and do perform additional transformations. Most big vendors of C, C++, and Fortran compilers have a common optimizer for all languages. Next, assemblers transform the optimized codes into platform-specific binaries. But this is not the end. The binaries may be linked together into libraries or programs. Libraries can be linked against other libraries, and eventually multiple programs. The main difference is that a program has additional instructions to bootstrap itself, do some interaction with the operating system, receive an argument list, and call main(). To see all this in action, try building a “hello world” type program in your favorite language, and run the “compiler” with an additional flag such as -v, --verbose, or whatever.

For example, this is what I get from a g77 compiler.

% g77 hello_world.f
% ./a.out
Hello World! % g77 -v hello_world.f
Driving: g77 -v hello_world.f -lfrtbegin -lg2c -lm -shared-libgcc
Reading specs from /usr/local/gcc/3.2/lib/gcc-lib/i686-pc-linux-gnu/3.2/specs
Configured with: ../gcc-3.2/configure --prefix=/usr/local/gcc/3.2
Thread model: posix
gcc version 3.2
/usr/local/gcc/3.2/lib/gcc-lib/i686-pc-linux-gnu/3.2/f771 hello_world.f -quiet -dumpbase hello_world.f -version -o /tmp/ccp2GBGE.s
GNU F77 version 3.2 (i686-pc-linux-gnu)
compiled by GNU C version 3.2.
as --traditional-format -V -Qy -o /tmp/ccEiIsHc.o /tmp/ccp2GBGE.s
GNU assembler version 2.11.90.0.8 (i386-redhat-linux) using BFD version 2.11.90.0.8
/usr/local/gcc/3.2/lib/gcc-lib/i686-pc-linux-gnu/3.2/collect2 -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 /usr/lib/crt1.o /usr/lib/crti.o /usr/local/gcc/3.2/lib/gcc-lib/i686-pc-linux-gnu/3.2/crtbegin.o -L/usr/local/gcc/3.2/lib/gcc-lib/i686-pc-linux-gnu/3.2 -L/usr/local/gcc/3.2/lib/gcc-lib/i686-pc-linux-gnu/3.2/../../.. /tmp/ccEiIsHc.o -lfrtbegin -lg2c -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc /usr/local/gcc/3.2/lib/gcc-lib/i686-pc-linux-gnu/3.2/crtend.o /usr/lib/crtn.o

For the purposes of this discussion, we will make a distinction between linking to build a library and linking to build an executable. Even though these processes have similar names, they perform very different kinds of code transformations.

19.2.2  Circular Dependencies and Single-Pass Linkers

Almost all linkers are single pass. This means that when linking an executable, linkers will run through the list of libraries exactly once trying to resolve symbols Ever get libraries listed in the wrong order and an executable wouldn’t get built? Ever have to list the same libraries over and over again to build an executable? These are both side-effects of single pass linkers. The symbols in question are essentially jumps in the instruction code corresponding to subroutines that are defined elsewhere. When linking a final executable, all these symbols need to be resolved. When linking libraries, multiple undefined symbols are commonplace.

Having to list libraries over and over again in the link line when compiling the final executable typically indicates a circular dependency between libraries. Circular dependencies are much better kept within a single library. Even though linkers are single-pass between libraries, they exhaustively search within them.

This is important because all the files generated by Babel have a circular dependency in each Babel type. The stub makes calls on the IOR, the IOR calls the Skel, the Skel calls the Impl, but the Impl also may make calls on a Stub. Just like C++ has a this object, and Python has a self, Babel objects have a stub for them to call methods on themselves and dispatch properly through Babel’s IOR layer.

19.2.3  IOR as single point of access

When building a Babelized library, it’s important to note if your code has dependencies to other Babel types not in your library. These types often appear as base classes, argument types, or even exception types. Your library will need stubs corresponding to all these types, so it is best to put these in your library as well. We call these external stubs.

Many have tried to minimize replication of Babel stubs by removing the external stubs and letting the library link directly against the stubs in an external library. This is a mistake because the external library may be implemented in a different language, and the stubs may be for a different language binding. By bundling the external stubs specific to your implementation with the implementation’s library, you are ensuring that the only access your library has with any other Babelized library is through the IOR. This is a good thing. The Babel IOR is the same for all language bindings and essentially forms the binary interface by which all Babel objects interact.

19.3  Dynamic vs. Static Linking

Most UNIX users are very comfortable with statically linked libraries (e. g. libXXX.a). Most are aware of “shared object files” in UNIX (with the form libXXX.so) though few actually build them. Even fewer still are familiar with dynamically linked libraries, called DLL’s in Microsoft (after the common .dll suffix), which involve actually selecting and loading dynamic libraries at run time based on their string name. MacOSX uses the novel suffix libXXX.dynlib. (In most UNIX systems, including Linux and Solaris, .so “shared object files” are actually dynamically linked libraries.) This section serves as a quick overview of how Babel handles both static and dynamic libraries, including runtime loading.

19.3.1  Linkers and Position Independent Code (PIC)

In a static library, the linker simply copies needed compilation units from the library to the executable. The static library can subsequently be deleted with no adverse affects to the executable. This also causes common libraries to be duplicated in every executable that links against it, and for the resulting executables to be quite large.

In a shared library, the linker simply inserts in the executable enough information to find the library and load it when the executable is invoked. This typically happens before the program ever gets to main(). This keeps executables small and allows commonly used libraries to be reused without copying, but it also means that the executable can fail if the library is renamed, moved, deleted, or even if the user’s environment changes sufficiently.

A necessary (but not sufficient) condition for shared libraries to work is that all the compilation units (*.o) contained must be explicitly compiled as position independent code (PIC). Position independent code has an added level of indirection in critical areas since details (such as addresses to jump to in subroutine calls) are not known until runtime. Even though shared libraries are very useful, PIC causes a small but measurable degradation in performance, making static linked libraries with non-PIC code a viable option for performance-critical situations.

A dynamic-linked library is a shared library with one added feature, it can be loaded explicitly by the user at runtime by passing the string name into dlopen(). Dynamic-linked libraries (DLL’s) also require compilation as PIC, though many compilers (including GCC) have special commands for each1.

19.3.2  Tracking Down Problems

When tracking down problems with Babel libraries, to UNIX tools nm and ldd are your friends. nm will print the list of linker symbols in a file, including details such as whether the symbol is defined or not. ldd lists dynamic dependencies of a shared libraries or executables, indicating where it will look for these symbols when loaded.

Recall the Fortran hello world example in section 19.2.1. Even though we may think this is all done with static linking, using these tools we find out the truth.

% ldd a.out
        libg2c.so.0 => /usr/local/gcc/3.2/lib/libg2c.so.0 (0x400180000)
        libm.so.6 => /lib/i686/libm.so.6 (0x4004a000)
        libgcc_s.so.1 => //usr/local/gcc/3.2/lib/libgcc_s.so.1 (0x4006d000)
        libc.so.6 => /lib/i686/libc.so.6 (0x40076000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

Here, we clearly see that five libraries are shared libraries that will be loaded after the executable is invoked, but before we get to the main program. Some of these libraries make sense: libg2c is a Fortran to C support library, libc is the C standard library, but why is libm listed... its a library of transcendental functions (e. g. sin(), cos()) why would it be included? The answer becomes obvious when using ldd on libg2c. The Fortran support library has dependencies on the math library, so our Fortran executable inherits that dependency too.

% nm a.out | grep ’ U ’
        U __cxa_atexit@@GLIBC_2.1.3
        U __libc_start_main@@GLIBC_2.0
        U do_lio
        U e_wsle
        U exit@@GLIBC_2.0
        U f_exit
        U f_init
        U f_setarg
        U f_setsig
        U s_stop
        U s_wsle

nm (and grep) shows us 11 symbols that are were left undefined in our final hello world application. A little more nm|greping about will help us find that symbols starting with f_ are defined in libg2c.

19.4  SIDL Library Issues

As mentioned in Section 6.6, the Babel toolkit includes the SIDL runtime library. The library provides a base interface, class, and exception as the foundation. This is how Babel provides object-oriented features to non-object-oriented languages. In order to support the runtime system and build the SIDL library, it also provides DLL and Loader classes.

Babel generated code depends critically on babel_config.h to correctly define a lot of platform specific details. One detail that changes too frequently to encode in babel_config.h is whether or not the software is being compiled is position independent code (PIC). This detail is commonly added to the compilation instruction using the flags (e. g. -fPIC -DPIC 2). The first flag tells the compiler to generate position independent code. The second defines the preprocessor macro PIC. Looking now at babel_config.h, we see that either SIDL_DYNAMIC_LIBRARY or SIDL_STATIC_LIBRARY are defined depending on whether or not PIC is defined.

As described in Section 19.3.1, Babel tends to focus on static libraries and dynamic linked libraries; not worrying much about shared libraries. The main reason is that for every last drop of performance, people would want static libraries. To support Java and Python (and the CCA model) dynamic loading is required. There’s no real benefit to doing shared libraries that can’t be dynamically loaded, so in developing Babel, we focus on the other two linkage situations.

19.5  Language Bindings for the sidl Package

The implementation and C stubs for the sidl package are stored in libsidl.so and libsidl.a, shared and static libraries that are installed when you install babel. You can determine the directory where these libraries are stored by running babel-config --libdir. Normally, running babel-config --libdir will yield something like /usr/lib or /usr/local/lib; however, your system administrator may have chosen a different directory by specifying a --prefix when they configured Babel (see Section 3.1.1). The IOR header files and C stub header files are installed in the directory shown by babel-config --includedir.

Babel also provides precompiled stubs for the sidl package for the C++, F77, F90, Java and UC++ language bindings. These libraries are also installed in babel-config --libdir, and they are named libsidlstubs_cxx.so, libsidlstubs_ucxx.so, Codelibsidlstubs_f77.so, and libsidlstubs_f90.so. Similarly named static archives and libtool .la files are also inalled in babel-config --libdirst. The header files for these languages are installed in subdirectories of babel-config --includedir named Cxx, F77, F90, and UCxx.

19.6  SCL Files for Dynamic Loading

If you generate a dynamic-linked library containing implementations of SIDL classes, you must also generate a SIDL Class List file (SCL file). An SCL file contains metadata about zero or more dynamic-linked libraries; for each dynamic-linked library, the SCL file has the list of SIDL classes implemented in that library. The sidl.Loader.findLibrary method searches SCL files when trying to find the implementation (or some other aspect) of a SIDL class.

The SCL file is an XML file with three kinds of elements. The top level element is scl which contains zero or more library elements. The library element has several attributes, and it contains zero or more class elements. The library element has three required attributes, uri, scope and resolution, and two optional attributes, md5 and sha1. The uri is a local filename including path or a network url indicating where the library is stored. The scope attribute allows developers to suggest whether the library should be loaded in a local or the global namespace. The developer can suggest lazy or now symbol resolution using the scope attribute. The md5 and sha1 are optional message digests to confirm that the library has not been modified or replaced. The class element has two required elements, name and desc. The name field is the name of the class, and desc indicates what kind of information is held in the library. Each class contained in the dynamic-linked library should be listed in the SCL file. For now, the only desc values with standardized meanings of ior/impl, java and python/impl. ior/impl indicates the dynamic-linked library contains the IOR object and implementation for the class, and java indicates that the library has the Java JNI wrapper object code. python/impl has the Python skeletons and implementation libraries.

Here is an the SCL file for the SIDL runtime library installed in /usr/local.

XML
<?xml version="1.0" ?> <scl> <library uri="/usr/local/lib/libsidl.la" scope="global" resolution="now" > <class name="SIDL.BaseClass" desc="ior/impl" /> <class name="SIDL.ClassInfoI" desc="ior/impl" /> <class name="SIDL.DLL" desc="ior/impl" /> <class name="SIDL.Loader" desc="ior/impl" /> <class name="SIDL.Boolean" desc="java" /> <class name="SIDL.Character" desc="java" /> <class name="SIDL.DoubleComplex" desc="java" /> <class name="SIDL.Double" desc="java" /> <class name="SIDL.FloatComplex" desc="java" /> <class name="SIDL.Float" desc="java" /> <class name="SIDL.Integer" desc="java" /> <class name="SIDL.Long" desc="java" /> <class name="SIDL.Opaque" desc="java" /> <class name="SIDL.SIDLException" desc="ior/impl" /> <class name="SIDL.String" desc="java" /> </library> </scl>

It’s worth noting that the uri can be a libtool metadata file (.la) when the library is located on the local file system or a dynamic-linked library file (.so or another machine dependent suffix). You cannot have a libtool .la when the library is remote (e.g., ftp: or http:) because libtool expects the files references in the .la file to be local and in particular directories.

The GNUmakefile generated with the --makefile Babel option contains rules to automatically generate .scl files for each of the supported Babel languages.

19.7  Deployment of Babel-Enabled Libraries

At this point, there is no standard — or even recommended — model for deploying Babel enabled libraries. Below are a few examples of how our developers are currently packaging their code.

Server Source Only
With this option your users are expected to have Babel installed on their system. In this mode, developers simply include a SIDL file and their corresponding implementation files. The user in this case must build the software, call Babel to generate the client bindings in the language of choice, and link it all together into a final application.
Client and Server Source
This option tries to hide Babel as much as possible. In this mode, the developer pre-generates many different client language bindings and distributes them along with their code and the sources for the Babel runtime library. Then the user has a “batteries included” package that’s ready to run out of the box. The user may not even be aware that Babel has been used unless they pay careful attention to how the package was built.
Server Libraries Only
Finally, in this mode only the SIDL file and the precompiled shared library files are distributed. This is not an open-source solution, though users still need to build the language bindings to access the shared library.

1
-fpic for SO’s, -fPIC for DLL’s
2
The actual command to the compiler varies, -fPIC is understood by GCC

Previous Up Next