[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gettext cvs, woe32 dlls
From: |
Charles Wilson |
Subject: |
Re: gettext cvs, woe32 dlls |
Date: |
Fri, 12 May 2006 16:35:18 -0400 |
User-agent: |
Thunderbird 1.5.0.2 (Windows/20060308) |
Bruno Haible wrote:
Thank you for your long explanations. I believe that I have committed to
the gettext CVS a solution that, like yours, supports building DLLs on
Cygwin, but also satisfies the following additional goals:
Overall response: very nice solution. I look forward to seeing it.
I've got a typically long-winded detailed response below, and at times
it may seem overly critical. However, the takeaway message remains:
very nice, and I believe it will work for gettext on cygwin and mingw.
A similar methodology could be adopted by other C libraries on those
platforms. I don't think a similar solution will work, in general, for
C++ libraries -- but gnu::autosprintf is coded in such a way that on
cygwin, with g++-3.4.4, it "squeaks by".
A) No source code in .h and .c files change. Only some infrastructure is
added in separate files and in Makefile.ams and configure.acs.
Nice.
B) For public libraries, the same .h file is valid regardless whether
the user will link with the shared or with the static library. No
STATIC_LIBRARY_FOO flags.
That's a neat trick. <g>
C) The boundaries between private libraries (here: between libgettextlib
and libgettextsrc) does not require source code changes. I.e. a
module can be moved from libgettextsrc to libgettextlib or vice
versa without source code changes. (This is important because most of
libgettextlib is shared code from gnulib. It must not carry the name
of the library into which it gets compiled.) So no
LIBGETTEXTSRC_DLL_VARIABLE etc. macros.
Yeah, I was kinda worried about that -- I saw the ChangeLog comments
concerning some of the files my patch modified: "Reimported from
gnulib". I could just imagine a new re-import clobbering my changes...
Let me explain, because some things would be simpler if libtool had the
adequate support for it.
(This lack of meshing well with libtool is why most other packages --
and libtool -- signed on to the auto-import bandwagon)
GNU ld's --enable-auto-import has three fatal drawbacks:
- It produces executables and shared libraries with relocations in the
.text segment, defeating the principles of virtual memory.
I agree this is a serious drawback, but only because it means every
process has its own true copy of the *client's* .text segment. I'm not
sure how this "defeats the prinicples of virtual memory" unless .bss
segments ALSO "defeat" them (.rdata is still "shared", now that it is
actually used [circa gcc-3.4 on pei386] ). It just means that the
*client's* .text segment in each process is backed by separate real
memory in each case, instead of the same block of memory for all cases.
Given that the .text segment is usually the largest segment by far,
that's bad enough, isn't it?
It means that there is (virtually) NO memory savings in having multiple
processes use the same DLL (if that DLL is a client of another and
auto-imports data from it -- recall that for DLLs which do NOT
auto-import from somewhere else, their .text is still read-only; see
code snippet below) Now, eventually this is a distinction without a
difference: in a large framework like gnome or kde, almost all DLLs are
clients of at least one low-level DLL that exports DATA items -- and in
an auto-import regime like cygwin, it means that almost all DLLs will
have a writable .text so you lose the *physical* memory advantages of
DLLs for *most* of them.
Thus, the only remaining benefit to shared libraries is the ability to
slipstream in an updated version without relinking client apps.
But "virtual memory" defeated? There's a lot more to "virtual memory"
than simply the mechanics of loading certain code objects from disk!
Say rather that auto-import defeats (most of) the _physical_ memory
savings expected by users of DSOs on virtual memory systems.
From pe-dll.c:
void pe_create_import_fixup (rel)
{
...
if (!name_thunk_sym || name_thunk_sym->type != bfd_link_hash_defined)
{
...
/* If we ever use autoimport, we have to cast
text section writable */
config.text_read_only=false;
}
}
And, of course, in the case of your packages, in an auto-import regime
both libgettextsrc and libgettextlib force their clients to auto-import
data. Thus, the .text segment *of the client app* for all the utilities
msg* etc will be writable and subject to the physical memory
disadvantages above -- if you were going to simultaneously run a bunch
of msg* applications on a windows box!
However, you *might* simultaneously run a bunch of applications that all
rely on libintl (like bash shells, for instance). Since libintl (in an
auto-import regime) forces ITS clients to auto-import -- then the .text
segment of bash.exe will be marked writable, and all those open shells
will each incur a physical memory penalty. (Ditto for the
libgettextsrc/libgettextlib .text segments, since they are clients of
libintl: so the per-process physical memory penalty for all those
simultaneously-running msg* applications is bigger than just the msg*
app's .text; it also includes the libgettextsrc/libgettextlib .text)
- For some constructs such as
extern int var;
int * const b = &var;
it creates an executable that will give an error at runtime, rather
than either a compile-time or link-time error or a working executable.
(This is with both gcc and g++.) Whereas this code, not relying on
auto-import:
extern __declspec (dllimport) int var;
int * const b = &var;
gives a compile-time error with gcc and works with g++.
I'm wondering if the magic code in g++ that allows this (necessary for
proper initialization of C++ objects) should be re-implemented in gcc --
or moved from the C++ frontend to the pe[i]386 backend -- specifically
to enable this to work in both cases. But with PRESENT capabilities of
gcc/g++, you're right.
- It doesn't work in some cases (references to a member field of an
exported struct variable, or to a particular element of an exported
array variable), requiring code modifications. One platform dictates
code modifications on all platforms.
This happens all the time: #if HAS_PROPERTY_FOO ... #else ... #endif is
a source code modification that appears in common code seen by all
platforms, even if the preprocessor removes it on most of them. I'm
thinking here of AC_C_CONST or my Nov2005 attempt with
CONST_PROBLEMATIC_WIN32.
Further, in 0.14.5's po-lex.h:
#if !(__STDC__ && \
((defined __STDC_VERSION__ && __STDC_VERSION__ >= 199901L && !defined
__DECC) \
|| (defined __GNUC__ && __GNUC__ >= 2 && !defined __APPLE_CC__)))
(Now, granted, the above ugliness has disappeared from 0.15pre2 --
because now ALL platforms use the po_gram_error* functions rather than
the optimized macros. But even there, isn't that a case of less-capable
platforms finally "forcing" more capable ones to do it the same way as
their weaker cousins?)
Plus, the newer --enable-runtime-pseudo-relocs option (enabled by
default, but has no effect if --auto-import is disabled) usually takes
care of this drawback. It relies on pre-main startup code linked-in
from the platform runtime; cygwin1.dll on cygwin, but crt0.o on mingw.
This is unacceptable. Therefore I disable this option, through the
woe32-dll.m4 autoconf macro.
That's fine, and it's your decision as maintainer. We've actually
already had, and settled, this argument last Novemenber, when you
announced this decision. However, I would like to point out the
benefits to my platform that auto-import has generated, notwithstanding
the drawbacks above -- not in an attempt to change your mind about your
package(s), but simply to explain why those who continue to rely on
auto-import are not all benighted heathens (which is the impression I
get from your comments about that feature, both this round and last
November):
(1) How many hours of my time and yours has it taken to get to this
point? Including my patch last November, your re-implementation of
(parts of) it last December, and now this round? At that level of
effort, do you think ANY libraries would have been built as DLLs on
cygwin -- by anyone other than people as stubborn as I?
(2) On the other hand, the drawbacks #2 and #3 that you mention
rarely seem to occur in practice, and #3 is handled by
runtime-pseudo-relocs. Further, #2 was non-existent until gcc-3.4.x --
because until that time readonly vars were NOT stored in .rdata on
pei386. Even now, the most common occurrence of #2 is exactly what it
is in gettext: popt or getopt_long const structs containing addresses of
local flag variables.
So, until very recently, the only oft-encountered true drawback to
dll-import on cygwin/mingw was extra memory usage per-process. The
benefits of DLLs (smaller on-disk executable size, modularization, and
ability to update (e.g.) zlib DLL after security flaw was discovered
WITHOUT having to relink every application known to mankind [*]) far
outweighed this ONE memory consumption drawback.
[*] not to mention the inevitable mailing list traffic a static-lib-only
cygwin distribution would generate: "Why don't you make libz.a a DLL?
libpng? libX11? libXpm? ...." -- just look at the complaints we DO get
about the C++ runtime library! And, DLLs are absolutely necessary for
decent and manageable operation/distribution of plugin-based packages
like apache.
====
So, given the old physical memory drawback and the newly-revealed const
struct issues, it is an admirable goal to try to reduce reliance on
auto-import -- as long as no additional burden is imposed upon the
users. However, those who set other priorities for their package
development and continue to rely on auto-import can be forgiven, IMO.
----------------------------------------------------------------
gettext has 3 kinds of libraries:
1) Public libraries which export only functions.
2) Public libraries which export also variables.
3) Private libraries which export functions and libraries.
Namely
1) libasprintf
2) libintl, libgettextpo
3) libgettextlib, libgettextsrc
1) This case is well handled by libtool and ld already. The .h file doesn't
need modifications; __declspec(dllimport) on functions is not needed.
The function names are exported because GNU ld does an implicit
--export-all-symbols if no symbols are explicitly exported.
Not exactly. This is true for C functions and C++ functions, but not
for C++ classes: classes *are* data. That's why there is no such beast
as a C++ library that doesn't export some data (even if just a vtable or
type_info object, in .rdata).
Sadly, I don't think your solution below will work for these C++ objects
*in general*, because "manually" (e.g. using a script) generating the
_imp__* pointer variables for C++ mangled names of vtables and such
is...err, non-trivial.
---
I could envision a scenario where you must always build shared if you
want static -- and that somehow you use the shared libraries' import lib
as a hint for which _imp__* pointers you need to create for the static
lib. (e.g. --disable-shared is not allowed on cygwin).
---
In this particular case, autosprintf has no virtual functions -- so
there is no vtable. It has no public or protected data members which
would need to be exposed to (imported by) clients or derived classes.
It has no private data members *that are non-POD types*, whose
class-type would need to be exposed to clients.
Therefore, with ONE exception, the libasprintf DLL can be treated as
having a functional-only interface, even tho it is C++.
The exception, missing type_info for gnu::autosprintf, doesn't appear to
be a problem. You'll see by doing an
objdump -x -t ./cygasprintf-0.dll |\
sed -e 's/.rdata\$_/.rdata$ _/' \
-e 's/.text\$_/.text$ _/' \
-e 's/ _Z/ __Z/' |\
c++filt
on the un-stripped cygasprintf-0.dll that it DOES, actually, expose
quite a few data items -- vtables, type_info, and guard objects -- for
C++ stdlib stuff. But these are mostly in .rdata ["(sec 4)" == section
with Idx = 3 thanks to 0-based/1-based idiocy] and .text ["(sec 1)" ==
Idx 0].
There's one thing missing: the type_info object for the gnu::autosprintf
class. (If autosprintf had virtual functions, then unless the class
were explicitly marked declspec(dllexport), the vtable would also be
"missing". However, you can't miss what doesn't exist, so that's not a
problem here).
Now, the missing type_info OUGHT to be a problem. And, the fact that
g++ has some issues with exporting vtables and type_info is a known
problem in g++-4.x up to current 4.2.CVS. (I think, but do not KNOW,
that if I tried to compile foo.cc below using g++-4.x on cygwin, it
would fail due to the issues mentioned above. I don't have a 4.x
compiler on _this_ computer.)
However, for whatever reason, and even though I do NOT see a type_info
representation for autosprintf in the library, the following code
actually works when compiled with g++-3.4.4 on cygwin:
#include <iostream>
#include <autosprintf.h>
#include <typeinfo>
#include <cxxabi.h>
using namespace gnu;
using namespace std;
int main(int argc, char* argv[])
{
const char* directory = "/c/bob";
const char* filename = "alice.txt";
const int line = 27;
const char* errstring = "a message";
autosprintf as("%s/%s", directory, filename);
char *pathname = as;
cerr << autosprintf("syntax error in %s:%d: %s",
pathname, line, errstring)
<< endl;
char* demang = abi::__cxa_demangle(typeid(as).name(), 0, 0, NULL);
cerr << demang << endl;
free(demang);
}
So, you're OK with this treatment of libasprintf -- but I wouldn't draw
any conclusions about this method with regards to OTHER C++ libraries or
classes. And no promises when (if) the cygwin/mingw guys ever release a
4.x compiler.
2) For this case, when --enable-shared is specified, I preprocess the .h file
so that exported variables are marked with __declspec(dllimport). A simple
sed statement:
sed -e 's/export \([^()]*\);/export __declspec(dllimport) \1;/'
After this header file is installed, it must be valid for both the
shared and the static library. (You cannot expect that users of the
library really think of setting a STATIC_XYZ flag when using the static
library.) When a user compiles code that accesses a variable, the compiler
will generate a reference to _imp__variable. These _imp__* pointer
variables are normally generated for the DLL by the compiler or linker when
__declspec(dllexport) is used. But we need them also in the static
library! So I create a C file that generates these _imp__* pointer
variables:
#include "cygwin/export.h"
VARIABLE(variable1) // defines _imp__variable1
VARIABLE(variable2) // defines _imp__variable2
...
and compile this into both the static and the shared library. So I don't
need to provide the __declspec(dllexport) alternative in the header file;
it's ok to use __declspec(dllimport) always.
The linker needs to be given the --export-all-symbols flag explicitly in
this case.
Nice. I like it. (I assume that on other platforms, the 'export'
prefix is simply removed?)
Q: what about the installed files in /usr/share/gettext/ ? If
<libintl.h>/"gettext.h" is "munged" on cygwin, then can cygwin's
gettextize make an un-munged client source package -- so that the
"munging" occurs when the client package gets built?
3) For this case, where no .h file needs to be installed, the same approach
can be used. However, a small modification is possible: Since the
library is a private one, no .h file is installed, and the static
library doesn't need to be installed. If --enable-shared was specified, the
static library is not even used. (The programs and the testsuite do not
link statically.) The .h file contains
export PRIVATE_DLL_VARIABLE int variable1;
export PRIVATE_DLL_VARIABLE struct { ... } variable2;
...
and PRIVATE_DLL_VARIABLE is defined in config.h through
#if defined __CYGWIN__ && (--enable-shared was specified)
#define PRIVATE_DLL_VARIABLE __declspec(dllimport)
#else
#define PRIVATE_DLL_VARIABLE
#endif
I believe you should use (__CYGWIN__ || __MINGW32__) && ... not just
__CYGWIN__.
Notes:
- You see that DLL_EXPORT (set by libtool) is never used: in case 2 because
we don't need/want a LIBFOO_DLL_VARIABLE macro for every library, in case 3
because the code compiled without DLL_EXPORT is not used at all.
Right -- by adding the redirection pointers to the static lib, you don't
care about DLL_EXPORT any more.
- The process of adding the _imp__* pointer variables to the .a and .dll.a
file, and the --export-all-symbols flag, could be done by libtool.
Hmm. Maybe -- I'll have to think about that, especially as relating to
projects which, even then, will continue to rely on auto-import. Would
this break that? Also, would the ease-of-use advantage now shift the
other way: --disable-auto-import now becomes preferred although not
required -- that might be a good thing. Also, would the behavior of
libtool need to be different for C, vs C++/objC/etc?
Like I said, I'll have to give it some thought.
- The only drawback I can see of this technique is that a static library
built alone (with --disable-shared) will be slightly more efficient
than the static library built together with the shared one - due to the
_imp__* indirections. But hey, if it has taken 11 years to port gettext
with shared libraries to Cygwin, it is because the Woe32 DLLs are
optimized excessively for performance at the expense of standards compliance
and ease of use. (Like the floating-point hardware that was in use before
IEEE 754: it produced wrong results but did so very efficiently.)
Optimized? Hah! The Windows386 developers said "hey, let's just reuse
the PE386 exe format for dlls. We only need to change one thing here,
call it PEI386, and we're done!" Never mind that whole no-unresolved
symbols thing...They were LAZY, not clever.
- Unlike --enable-auto-import, which operates on the code that _uses_
a shared library, this technique operates on the library itself; the
code that uses the library sees the dllimports in the header file and
does not need further fixup.
Yep, that part I like.
Yes, in general you are right: we have four states (per library)
building the library as shared : declspec(dllexport)
building the library as static : <no decorator> (*)
building a client of the library, intending to link shared :
extern declspec(dllimport)
building a client of the library, intending to link static :
extern (*)
With the technique above, the last two states are collapsed into one.
Right -- so then there is no burden on the client. The first two states
remain, but that's the library builder's (i.e. yours and mine) problem
-- which we can handle.
Overall, I like it. Not sure if it is completely generalizable to other
(non-gettext) libraries and languages on cygwin/mingw -- and we may run
into issues with C++ when/if there is ever an official g++-4.x release
for cygwin/mingw. But that's a g++ bug, not a gettext bug. For right
now and the medium-term future, your solution looks good to me.
--
Chuck
- Re: gettext cvs, Charles Wilson, 2006/05/05
- Re: gettext cvs, Charles Wilson, 2006/05/05
- Re: gettext cvs, Bruno Haible, 2006/05/08
- Re: gettext cvs, woe32 dlls, Bruno Haible, 2006/05/10
- Re: gettext cvs, woe32 dlls, Charles Wilson, 2006/05/10
- Re: gettext cvs, woe32 dlls, Bruno Haible, 2006/05/12
- Re: gettext cvs, woe32 dlls,
Charles Wilson <=
- Re: gettext cvs, woe32 dlls, Bruno Haible, 2006/05/15
- Re: gettext cvs, woe32 dlls, Charles Wilson, 2006/05/15
Message not available
Re: gettext cvs, Bruno Haible, 2006/05/12