Author: Ernesto De Spirito
How can I get the length in characters of a multibyte-character string? Function
Length returns the length in bytes, but in Eastern languages some characters may
take more than one byte...
Answer:
Solve 1:
Introduction
The Length function returns the length of a string, but it behaves differently
according to the type of the string. For the old short strings (ShortString) and
for long strings (AnsiString), Length returns the number of bytes they take, while
for wide (Unicode) strings (WideString) it returns the number of wide characters
(WideChar), that is, the number of bytes divided by two. In the case of short and
long strings, in Western languages one character takes one byte, while for example
in Asian languages some characters take one and others two bytes. For this reason,
there are two versions of almost all string functions, one of great performance
that only works with single-byte character strings (SBCS) and another -less
performant- one that also works with strings where a character can take one or two
bytes (DBCS) that are used in applications distributed internationally. This way we
have functions like Pos, LowerCase and UpperCase on one side and AnsiPos,
AnsiLowerCase and AnsiUpperCase on the other. Curiosly there is no AnsiLength
function that returns the number of characters in a DBCS.
AnsiLength (Draft)
Then here it goes a function that returns the number of characters in a double-byte
character string:
1 function AnsiLength(const s: string): integer;
2 var
3 i, n: integer;
4 begin
5 Result := 0;
6 n := Length(s);
7 i := 1;
8 while i <= n do
9 begin
10 inc(Result);
11 if s[i] in LeadBytes then
12 inc(i);
13 inc(i);
14 end;
15 end;
AnsiLength (Final)
Naturally, this function is not optimized. We are not going to mess with assembler,
but at least we can use pointers:
16 function AnsiLength(const s: string): integer;
17 var
18 p, q: pchar;
19 begin
20 Result := 0;
21 p := PChar(s);
22 q := p + Length(s);
23 while p < q do
24 begin
25 inc(Result);
26 if p^ in LeadBytes then
27 inc(p, 2)
28 else
29 inc(p);
30 end;
31 end;
Solve 2:
32 function AnsiLength(const s: string): integer;
33 var
34 p: PAnsiChar;
35 begin
36 Result := MultiByteToWideChar(CP_ACP, 0, PAnsiChar(s), -1, NULL, 0);
37 end;
The documentation on MultiByteToWideChar says:
"If the function succeeds, and cchWideChar is zero, the return value is the
required size, in wide characters, for a buffer that can receive the translated
string." Number of wide characters is, actually, the number of characters in MBCS.
Solve 3:
38 function AnsiLength(const s: string): integer;
39 begin
40 Result := lstrlenA(PAnsiChar(s));
41 Result := MultiByteToWideChar(CP_ACP, 0, PAnsiChar(s), -1, NULL, 0);
42 end;
The documentation on MultiByteToWideChar says:
"If the function succeeds, and cchWideChar is zero, the return value is the
required size, in wide characters, for a buffer that can receive the translated
string." The number of wide characters for the buffer is, actually, the number of
characters in the string - it's length.
Copyright (c) 2001 Ernesto De Spiritomailto:edspirito@latiumsoftware.com
Visit: http://www.latiumsoftware.com/delphi-newsletter.phphttp://www.latiumsoftware.com/delphi-newsletter.php
|