help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: how to calculate the size of string in bytes?


From: Sam Halliday
Subject: Re: how to calculate the size of string in bytes?
Date: Tue, 18 Aug 2015 03:43:44 -0700 (PDT)
User-agent: G2/1.0

On Tuesday, 18 August 2015 11:14:04 UTC+1, to...@tuxteam.de  wrote:
> On Tue, Aug 18, 2015 at 02:11:54AM -0700, Sam Halliday wrote:
> > We used to have a 6 character hex number at the start of each message that 
> > counted the number of multibyte characters, but we'd like to change it to 
> > be the number of bytes in the message.
> > 
> > We're sending the string to `process-send-string' and `read'ing from the 
> > associated network buffer. But when calculating the outgoing length of the 
> > string that we want to send, we use `length' --- but we need this to be 
> > `length-in-bytes' not the number of multibyte chars. Is there a built in 
> > function to do this or am I going to have to iterate the string and count 
> > the byte size of each character?
> > 
> > A quick test shows that
> > 
> >   (length (encode-coding-string "EURO" 'raw-text))
> > 
> > seems to give the correct result (1 for ASCII, 2 for Pound Sterling, 3 for 
> > Euro), but I am not 100% sure if this is correct.
> 
> Raw is, afaik, Emacs's internal coding system. You don't want traces of it
> in the network :-)


We're not sending the message using raw, we're using UTF-8. But I need to 
calculate the length of the UTF-8 string IN BYTES as part of the payload (each 
messages begins with a 6 character hex encoding of the proceeding string's raw 
length).

I'm using "raw" to calculate an approximation of the UTF-8 string's byte 
length, but I am aware that it might not actually be true in the general case 
:-/

I don't think what you've suggested would actually change the semantics, but it 
would allow us to use a different encoding on the wire than the encoding of the 
string. We don't really need to worry about that at this stage, because all our 
users are using UTF-8. We'll keep it in mind though.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]