Articles   Members Online:
-Article/Tip Search
-News Group Search over 21 Million news group articles.
-Delphi/Pascal
-CBuilder/C++
-C#Builder/C#
-JBuilder/Java
-Kylix
Member Area
-Home
-Account Center
-Top 10 NEW!!
-Submit Article/Tip
-Forums Upgraded!!
-My Articles
-Edit Information
-Login/Logout
-Become a Member
-Why sign up!
-Newsletter
-Chat Online!
-Indexes NEW!!
Employment
-Build your resume
-Find a job
-Post a job
-Resume Search
Contacts
-Contacts
-Feedbacks
-Link to us
-Privacy/Disclaimer
Embarcadero
Visit Embarcadero
Embarcadero Community
JEDI
Links
How to determin the actual length of a DBCS string (multibyte-character ANSI str Turn on/off line numbers in source code. Switch to Orginial background IDE or DSP color Comment or reply to this aritlce/tip for discussion. Bookmark this article to my favorite article(s). Print this article
22-May-03
Category
Object Pascal-Strings
Language
Delphi 3.x
Views
129
User Rating
No Votes
# Votes
0
Replies
0
Publisher:
DSP, Administrator
Reference URL:
DKB
			Author: Ernesto De Spirito

How can I get the length in characters of a multibyte-character string? Function 
Length returns the length in bytes, but in Eastern languages some characters may 
take more than one byte...

Answer:

Solve 1:

Introduction 

The Length function returns the length of a string, but it behaves differently 
according to the type of the string. For the old short strings (ShortString) and 
for long strings (AnsiString), Length returns the number of bytes they take, while 
for wide (Unicode) strings (WideString) it returns the number of wide characters 
(WideChar), that is, the number of bytes divided by two. In the case of short and 
long strings, in Western languages one character takes one byte, while for example 
in Asian languages some characters take one and others two bytes. For this reason, 
there are two versions of almost all string functions, one of great performance 
that only works with single-byte character strings (SBCS) and another -less 
performant- one that also works with strings where a character can take one or two 
bytes (DBCS) that are used in applications distributed internationally. This way we 
have functions like Pos, LowerCase and UpperCase on one side and AnsiPos, 
AnsiLowerCase and AnsiUpperCase on the other. Curiosly there is no AnsiLength 
function that returns the number of characters in a DBCS. 

AnsiLength (Draft) 

Then here it goes a function that returns the number of characters in a double-byte 
character string: 

1   function AnsiLength(const s: string): integer;
2   var
3     i, n: integer;
4   begin
5     Result := 0;
6     n := Length(s);
7     i := 1;
8     while i <= n do
9     begin
10      inc(Result);
11      if s[i] in LeadBytes then
12        inc(i);
13      inc(i);
14    end;
15  end;


AnsiLength (Final) 

Naturally, this function is not optimized. We are not going to mess with assembler, 
but at least we can use pointers: 

16  function AnsiLength(const s: string): integer;
17  var
18    p, q: pchar;
19  begin
20    Result := 0;
21    p := PChar(s);
22    q := p + Length(s);
23    while p < q do
24    begin
25      inc(Result);
26      if p^ in LeadBytes then
27        inc(p, 2)
28      else
29        inc(p);
30    end;
31  end;



Solve 2:

32  function AnsiLength(const s: string): integer;
33  var
34    p: PAnsiChar;
35  begin
36    Result := MultiByteToWideChar(CP_ACP, 0, PAnsiChar(s), -1, NULL, 0);
37  end;


The documentation on MultiByteToWideChar says: 
"If the function succeeds, and cchWideChar is zero, the return value is the 
required size, in wide characters, for a buffer that can receive the translated 
string." Number of wide characters is, actually, the number of characters in MBCS.


Solve 3:

38  function AnsiLength(const s: string): integer;
39  begin
40    Result := lstrlenA(PAnsiChar(s));
41    Result := MultiByteToWideChar(CP_ACP, 0, PAnsiChar(s), -1, NULL, 0);
42  end;


The documentation on MultiByteToWideChar says: 
"If the function succeeds, and cchWideChar is zero, the return value is the 
required size, in wide characters, for a buffer that can receive the translated 
string." The number of wide characters for the buffer is, actually, the number of 
characters in the string - it's length. 

Copyright (c) 2001 Ernesto De Spiritomailto:edspirito@latiumsoftware.com
Visit: http://www.latiumsoftware.com/delphi-newsletter.phphttp://www.latiumsoftware.com/delphi-newsletter.php

			
Vote: How useful do you find this Article/Tip?
Bad Excellent
1 2 3 4 5 6 7 8 9 10

 

Advertisement
Share this page
Advertisement
Download from Google

Copyright © Mendozi Enterprises LLC