BeBot - An Anarchy Online and Age Of Conan chat automaton

Development => Coding and development discussion => Topic started by: Beatbozz on February 15, 2010, 12:20:07 pm

Title: utf8_encode utf8_decode
Post by: Beatbozz on February 15, 2010, 12:20:07 pm
I run a bot for international guild. I wanted to make a module that checks guild chat for nonstandart/national characters and notifies(annoys) member that uses them. Just to have  guild chat clear of posts that some might not understand (russian, greek and other post).

I planed to check messages for characters that have value > 127 as national characters are usually coded in utf8 with 2 bytes with value > 127.

I made that module for my bot but i had to comment utf8_ functions in Sources\BOT.php and main\15_ChatQueue.php files. With this functions uncomented i got always "?" for any specific character.

So my question is, why bebot decodes utf8 strings into latin1 at input and codes latin1 to utf8 to output?

Can i get into some troubles when i commented utf8_functions?

I wont post this module as an unsupported module as it will be easy for bebot devs to make it in better way and mainly due to changes in core files.
Title: Re: utf8_encode utf8_decode
Post by: Khalem on February 15, 2010, 01:29:42 pm
Blondengy was the one who originally wrote this, but I assume it is because PHP will not support UTF8 natively until version 6.

This means that a number of functions can and will break UTF8 strings.
String functions like substr(), ucfirst(), strtoupper(), strlen() and so forth can break the actual string or will return wrong values. For example strlen will return the number of bytes, not the number of characters.

Ideally we would be UTF8 all the way, but that currently gives us a few challenges.
Title: Re: utf8_encode utf8_decode
Post by: Beatbozz on February 15, 2010, 02:30:19 pm
Ok, i didnt know php does not use utf8 as default coding, im completly new to php. Ill revert changes to corefiles and better stop using that module before something gets wrong. Ill patiently wait for future utf8 php and utf8 bebot :). Ty for response.
Title: Re: utf8_encode utf8_decode
Post by: Khalem on February 15, 2010, 03:30:14 pm
mbstring extension would likely be the current solution. However it is not a default PHP extension, and as such I'm not sure how viable it would be to require for BeBot.

The following might give some insight
http://developer.loftdigital.com/blog/php-utf-8-cheatsheet

Using the mbstring.func_overload directive you should probably be able to make UTF8 work without having to do any code changes aside from removing encode/decode calls currently used.

You would also have to change the database over to UTF8 and likely convert all source files to UTF8.
But in theory, it would work.
SimplePortal 2.3.7 © 2008-2024, SimplePortal