[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: how to calculate the size of string in bytes?
From: |
tomas |
Subject: |
Re: how to calculate the size of string in bytes? |
Date: |
Tue, 18 Aug 2015 13:47:03 +0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Tue, Aug 18, 2015 at 03:43:44AM -0700, Sam Halliday wrote:
> On Tuesday, 18 August 2015 11:14:04 UTC+1, to...@tuxteam.de wrote:
> > On Tue, Aug 18, 2015 at 02:11:54AM -0700, Sam Halliday wrote:
> > > We used to have a 6 character hex number at the start of each message
> > > that counted the number of multibyte characters, but we'd like to change
> > > it to be the number of bytes in the message.
> > >
> > > We're sending the string to `process-send-string' and `read'ing from the
> > > associated network buffer. But when calculating the outgoing length of
> > > the string that we want to send, we use `length' --- but we need this to
> > > be `length-in-bytes' not the number of multibyte chars. Is there a built
> > > in function to do this or am I going to have to iterate the string and
> > > count the byte size of each character?
> > >
> > > A quick test shows that
> > >
> > > (length (encode-coding-string "EURO" 'raw-text))
> > >
> > > seems to give the correct result (1 for ASCII, 2 for Pound Sterling, 3
> > > for Euro), but I am not 100% sure if this is correct.
> >
> > Raw is, afaik, Emacs's internal coding system. You don't want traces of it
> > in the network :-)
>
>
> We're not sending the message using raw, we're using UTF-8. But I need to
> calculate the length of the UTF-8 string IN BYTES as part of the payload
> (each messages begins with a 6 character hex encoding of the proceeding
> string's raw length).
Yes, I get that. The way I understand encode-coding-string is that you give
it the target encoding:
(length (encode-coding-string foo 'raw-text))
would mean "transform this string to whatever Emacs uses as internal
encoding and measure its length in bytes", whereas what you want is,
AFAIU "transform this string to UTF-8 and measure its length in bytes",
which would read as:
(length (encode-coding-string foo 'utf-8))
> I'm using "raw" to calculate an approximation of the UTF-8 string's byte
> length, but I am aware that it might not actually be true in the general case
> :-/
Use utf-8 then?
> I don't think what you've suggested would actually change the semantics, but
> it would allow us to use a different encoding on the wire than the encoding
> of the string. We don't really need to worry about that at this stage,
> because all our users are using UTF-8. We'll keep it in mind though.
But, but... isn't that a bug lurking? And it would be so easy to fix...
(that is unrelated to the above issue -- that I think you want utf-8
instead of raw)
Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iEYEARECAAYFAlXTGzcACgkQBcgs9XrR2kbq/wCggTBpkebxoL9wIXzoFcSBZDAq
RqQAmwTy3yopi8MdM3r1xn9iQDXYRYWa
=ISij
-----END PGP SIGNATURE-----
- Re: how to calculate the size of string in bytes?, (continued)
Re: how to calculate the size of string in bytes?, Stefan Monnier, 2015/08/18
Message not available
Message not available
Re: how to calculate the size of string in bytes?, Eli Zaretskii, 2015/08/18