collapse collapse
* User Info
 
 
Welcome, Guest. Please login or register.
* Search

* Board Stats
  • stats Total Members: 989
  • stats Total Posts: 18366
  • stats Total Topics: 2501
  • stats Total Categories: 7
  • stats Total Boards: 35
  • stats Most Online: 1144

Author Topic: utf8_encode utf8_decode  (Read 10113 times)

0 Members and 1 Guest are viewing this topic.

Offline Beatbozz

  • BeBot Rookie
  • *
  • Posts: 3
  • Karma: +0/-0
utf8_encode utf8_decode
« on: February 15, 2010, 12:20:07 pm »
I run a bot for international guild. I wanted to make a module that checks guild chat for nonstandart/national characters and notifies(annoys) member that uses them. Just to have  guild chat clear of posts that some might not understand (russian, greek and other post).

I planed to check messages for characters that have value > 127 as national characters are usually coded in utf8 with 2 bytes with value > 127.

I made that module for my bot but i had to comment utf8_ functions in Sources\BOT.php and main\15_ChatQueue.php files. With this functions uncomented i got always "?" for any specific character.

So my question is, why bebot decodes utf8 strings into latin1 at input and codes latin1 to utf8 to output?

Can i get into some troubles when i commented utf8_functions?

I wont post this module as an unsupported module as it will be easy for bebot devs to make it in better way and mainly due to changes in core files.

Offline Khalem

  • BeBot Founder
  • Administrator
  • ********
  • Posts: 1169
  • Karma: +0/-0
    • http://www.ancarim.com
Re: utf8_encode utf8_decode
« Reply #1 on: February 15, 2010, 01:29:42 pm »
Blondengy was the one who originally wrote this, but I assume it is because PHP will not support UTF8 natively until version 6.

This means that a number of functions can and will break UTF8 strings.
String functions like substr(), ucfirst(), strtoupper(), strlen() and so forth can break the actual string or will return wrong values. For example strlen will return the number of bytes, not the number of characters.

Ideally we would be UTF8 all the way, but that currently gives us a few challenges.
BeBot Founder and Fixer Kingpin

Offline Beatbozz

  • BeBot Rookie
  • *
  • Posts: 3
  • Karma: +0/-0
Re: utf8_encode utf8_decode
« Reply #2 on: February 15, 2010, 02:30:19 pm »
Ok, i didnt know php does not use utf8 as default coding, im completly new to php. Ill revert changes to corefiles and better stop using that module before something gets wrong. Ill patiently wait for future utf8 php and utf8 bebot :). Ty for response.

Offline Khalem

  • BeBot Founder
  • Administrator
  • ********
  • Posts: 1169
  • Karma: +0/-0
    • http://www.ancarim.com
Re: utf8_encode utf8_decode
« Reply #3 on: February 15, 2010, 03:30:14 pm »
mbstring extension would likely be the current solution. However it is not a default PHP extension, and as such I'm not sure how viable it would be to require for BeBot.

The following might give some insight
http://developer.loftdigital.com/blog/php-utf-8-cheatsheet

Using the mbstring.func_overload directive you should probably be able to make UTF8 work without having to do any code changes aside from removing encode/decode calls currently used.

You would also have to change the database over to UTF8 and likely convert all source files to UTF8.
But in theory, it would work.
BeBot Founder and Fixer Kingpin

 

* Recent Posts
Com bot module by bitnykk
[November 25, 2024, 05:36:11 pm ]


0.8.x updates for AO by bitnykk
[June 23, 2024, 03:19:47 pm ]


0.8.x updates for AoC by bitnykk
[June 23, 2024, 03:19:44 pm ]


[AoC] special char for items module by bitnykk
[February 09, 2024, 09:41:18 pm ]


BeBot still alive & kicking ! by bitnykk
[December 17, 2023, 12:58:44 am ]

* Who's Online
  • Dot Guests: 82
  • Dot Hidden: 0
  • Dot Users: 0

There aren't any users online.
* Forum Staff
bitnykk admin bitnykk
Administrator
Khalem admin Khalem
Administrator
WeZoN gmod WeZoN
Global Moderator
SimplePortal 2.3.7 © 2008-2024, SimplePortal