FWIW, the Perforce documentation says they handle EOL translations like
this:
- When you add a file, you can explicitly specify what "type" it
is. Valid types in the current version are "binary", "text",
"symlink", "unicode", and two types for those funny Macintosh resource
files ("apple" and "resource"). The "type" is part of the per-file
metadata stored in Perforce.
- The "text" file type translates line endings; they don't say
how. "unicode" means "store the file in UTF-8 in the repository, and
translate to the local code page upon sync/checkout". (I bet "unicode"
does EOL conversion too, but they don't say.)
- If you don't explicitly specify a type, Perforce looks in the
"typemap". This is a per-depot text file, mapping wildcard filename
specifications to types. It is empty by default.
- Failing to match the file in the "typemap", it guesses at the
type by examining the first 8192 bytes. If it discovers "nontext"
bytes, it uses "binary", otherwise it uses "text". A "nontext" byte is
any byte > 127 (in other words, has its high bit set).
- You can change a file's type at any time.
- There are also type modifiers, like "+k" (perform keyword
expansion, like $Date:$), "+x" (set execute bit), and more.
All this information was gleaned from publically available
documentation from Perforce's website. The main page of interest is
the documentation on "file types", here:
http://www.perforce.com/perforce/doc.052/manuals/cmdref/o.ftypes.html
My notes on this:
- I'm surprised at their "nontext" heuristic; before I saw the
documentation, I was guessing they'd look for characters < 32 that
weren't valid whitespace characters. A random sampling of binary files
on my hard disk shows plenty of zeros in the first 256 bytes.
- Their documentation mentions that some PDFs fail the file type
guesser. PDFs store comments first, and some wordy PDFs have > 8k
of ASCII comments. Though they ship an empty typemap, they do have a
list of "recommended" entries which includes "any file ending with pdf
-> binary".
I assert that no solution will do the right thing by default for
everyone at any time. But a conservative default, combined with the
ability to adjust the transformation on a file-by-file basis at any
time, should be Good Enough.
Cheers,
larry
|