non-ASCII characters in UTF-8
Permalink
varibles from phpmyadmin
show variables like 'character%';
character_set_client utf8
character_set_connection utf8
character_set_database utf8
character_set_filesystem binary
character_set_results utf8
character_set_server latin1
character_set_system utf8
character_sets_dir /usr/share/mysql/charsets/
SHOW VARIABLES LIKE 'collation%'
collation_connection utf8_general_ci
collation_database utf8_general_ci
collation_server latin1_general_ci
On page:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
When importing with phpMyAdmin Polish non-ASCII characters get imported correctly.
When displaying non-ASCII chars in concrete5 they are replaced with question-marks � Why ?
When writing to database through Concrete5 (editing document etc.) non-ASCII chars are replace in database by "Ä???óÄ etc." Why?
When tables in database are set to latin2_general_ci, non-ASCII chars display correctly on site ?
Also non-ASCII chars are replace to unicode in database? (ó etc) ?
I can't get UTF-8 encoded non-ASCII characters. All I get are question-marks.
Any ideas ?
Concrete5 version 5.1.0
show variables like 'character%';
character_set_client utf8
character_set_connection utf8
character_set_database utf8
character_set_filesystem binary
character_set_results utf8
character_set_server latin1
character_set_system utf8
character_sets_dir /usr/share/mysql/charsets/
SHOW VARIABLES LIKE 'collation%'
collation_connection utf8_general_ci
collation_database utf8_general_ci
collation_server latin1_general_ci
On page:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
When importing with phpMyAdmin Polish non-ASCII characters get imported correctly.
When displaying non-ASCII chars in concrete5 they are replaced with question-marks � Why ?
When writing to database through Concrete5 (editing document etc.) non-ASCII chars are replace in database by "Ä???óÄ etc." Why?
When tables in database are set to latin2_general_ci, non-ASCII chars display correctly on site ?
Also non-ASCII chars are replace to unicode in database? (ó etc) ?
I can't get UTF-8 encoded non-ASCII characters. All I get are question-marks.
Any ideas ?
Concrete5 version 5.1.0
This work and this is a good idea ;) Only emails don't display non-ASCII chars but i change sendMail() function in following file /concrete/helpers/mail.php on line 118 to:
and all works.
public function sendMail() { $from = $this->generateEmailStrings($this->from); $to = $this->generateEmailStrings($this->to); if (ENABLE_EMAILS) { $naglowki = "MIME-Version: 1.0\r\n"; $naglowki = "Content-type: text; charset=utf-8\r\n"; $naglowki .= "From: Formularz_www<".$from.">"; mail($to, $this->subject, $this->body, $naglowki); }
and all works.
Cool. I'll give it a try.
Now there is two probles left...
1. Questions for Forms.
I think c5 uses AJAX to send INSERT query and stuff... Since it only handles ASCII code, the multi-bytes characters becomes corruputed.
2. Search
Search cannot search UTF-8 characters yet.
But solving email was really good.
Thanks a lot! You made my day.
Now there is two probles left...
1. Questions for Forms.
I think c5 uses AJAX to send INSERT query and stuff... Since it only handles ASCII code, the multi-bytes characters becomes corruputed.
2. Search
Search cannot search UTF-8 characters yet.
But solving email was really good.
Thanks a lot! You made my day.
I fixed the lucene problem, there's a discussion where I attached a very simple patch somewhere.
Andrew already merged it, I guess it will be part of the next c5 update.
C5 does use a few AJAX calls, but not only and even if it does, that's not the problem. Even with AJAX, it's possible to insert utf-8 characters, this works fine..
I only have a few characters that can't be saved (öäü), I therefore don't see the problem as often as you do - have you created a list with blocks, functions that don't handle utf-8 properly?
Andrew already merged it, I guess it will be part of the next c5 update.
C5 does use a few AJAX calls, but not only and even if it does, that's not the problem. Even with AJAX, it's possible to insert utf-8 characters, this works fine..
I only have a few characters that can't be saved (öäü), I therefore don't see the problem as often as you do - have you created a list with blocks, functions that don't handle utf-8 properly?
Oh, ok.
I kinda remember your lucene post.
I'll find it.
And about AJAX (specifically form block), it doesn't work for our side. And it works if I directly insert the Japanese into MySQL... And I could insert the Japanese letter during the installation. So I'm guessing it's AJAX that messing multi-bytes characters.
I posted the list of the problems at
http://www.concrete5.org/community/forums/internationalization/mult...
I didn't bother to submit this as a bug since this is not a big deal for the most of people... only East Asian languages which uses multi-bytes characters. (Or should I?)
Anyway, when I'm done with everything, I'll submit the code to Andrew
I kinda remember your lucene post.
I'll find it.
And about AJAX (specifically form block), it doesn't work for our side. And it works if I directly insert the Japanese into MySQL... And I could insert the Japanese letter during the installation. So I'm guessing it's AJAX that messing multi-bytes characters.
I posted the list of the problems at
http://www.concrete5.org/community/forums/internationalization/mult...
I didn't bother to submit this as a bug since this is not a big deal for the most of people... only East Asian languages which uses multi-bytes characters. (Or should I?)
Anyway, when I'm done with everything, I'll submit the code to Andrew
Sure there's a problem but it's not related to AJAX but rather to a problem in the c5 code.. This is the only thing I wanted to say..
It is actually a problem since the German language also uses a few multi byte letters. Not as many as you need though ;-)
Same with most languages in Europe, they mostly have a few "strange letters" too.
I'm probably going to work on that issue later that week..
It is actually a problem since the German language also uses a few multi byte letters. Not as many as you need though ;-)
Same with most languages in Europe, they mostly have a few "strange letters" too.
I'm probably going to work on that issue later that week..
You probably wanted to write .= and not only =?
otherwise MIME-Version doesn't find its way into the header..
public function sendMail() { $from = $this->generateEmailStrings($this->from); $to = $this->generateEmailStrings($this->to); if (ENABLE_EMAILS) { $naglowki = "MIME-Version: 1.0\r\n"; $naglowki .= "Content-type: text; charset=utf-8\r\n"; $naglowki .= "From: Formularz_www<".$from.">"; mail($to, $this->subject, $this->body, $naglowki); }
otherwise MIME-Version doesn't find its way into the header..
the attached patch works for me.
I can now use german umlauts for the labels in the form and I can submit data by mail too. Only tested with outlook and gmail webclient.
I'm not sure about htmlentities. This might cause troubles too...
I can now use german umlauts for the labels in the form and I can submit data by mail too. Only tested with outlook and gmail webclient.
I'm not sure about htmlentities. This might cause troubles too...
It didn't work for Japanese.... I think I need to use
It does work for email body, but not to the subject line....
But you gave me some idea... i'll follow you guys up.
It does work for email body, but not to the subject line....
But you gave me some idea... i'll follow you guys up.
the body of email display polish non-ASCII chars, title not :/
I didn't carefully checked your zip file.
I'll give it a try this as well.
Thanks!
I'll give it a try this as well.
Thanks!
It shows you where to problem occurs but it's not the real source of it.
What you probably have to check is this file:
/concrete/blocks/form/auto.js
The method addQuestion contains a few calls "escape". This is a bit dangerous since it also escapes characters like the Japanese full-width characters (thanks Katz for the lesson :-)
I don't have a completely tested patch yet but try working with "escape" (removing it for example) and it should look better..
What you probably have to check is this file:
/concrete/blocks/form/auto.js
The method addQuestion contains a few calls "escape". This is a bit dangerous since it also escapes characters like the Japanese full-width characters (thanks Katz for the lesson :-)
I don't have a completely tested patch yet but try working with "escape" (removing it for example) and it should look better..
I've talked to Katz for a while and tried to fix a few of his problems.
There are a few things I've learnt which I'd like to share:
1. Using the JavaScript method "escape" causes troubles since it escapes all the full width characters too!
2. The "standard" string functions also cause a few problems. For example - concrete/helpers/text.php. shortText contains a call to "substr". However, this method might truncate a full width character in the middle of it! using mb_substr however is safe!
There are a few things I've learnt which I'd like to share:
1. Using the JavaScript method "escape" causes troubles since it escapes all the full width characters too!
2. The "standard" string functions also cause a few problems. For example - concrete/helpers/text.php. shortText contains a call to "substr". However, this method might truncate a full width character in the middle of it! using mb_substr however is safe!
function shortText($textStr, $numChars=255, $tail='...'){ if(intval($numChars)==0)$numChars=150; $textStr=strip_tags($textStr); if (strlen($textStr)>intval($numChars)){ $textStr= mb_substr($textStr,0,$numChars,'utf-8').$tail; } return $textStr; }
Just wanting to let everyone know that we're taking this very seriously, and have instituted a lot of fixes for these issues in svn, in development/5.3.0 branch (although they may make it out before that.)
a patch for current svn repository (rev.693)
Content-Type: text -> text/plain
base64 encoding for Subject
Content-Type: text -> text/plain
base64 encoding for Subject
Added this patch to trunk.
By the way, Is svn URI of C5 open to the public?
I found it by chance.
I found it by chance.
if you'd like to be on the beta team, you should let me know.
-frz
-frz
How should I do?
Japanese <title> is garbled on IE.
'<meta http-equiv="content-type"' should be before '<title>'.
Attached a patch for current svn trunk (rev.712)
'<meta http-equiv="content-type"' should be before '<title>'.
Attached a patch for current svn trunk (rev.712)
Should I have posted to Beta Bugs?
if so sorry.
if so sorry.
this should be fixed in subversion.
I had the same problem, and fixed that.
Run this script for your database, and you will see it fixed. The problem is, most likely you had your default charset set to .. swedish_ci. Later changing it to UTF-8 does not help. Because, C5 already created tenth of new tables which are not utf8 and you keep on adding to it. Sometimes, host providers does not watch the default charsets during upgrade.
After backing up your database, copy the content of below php code in a file. I also attached the file, rename the extension to PHP and run it.
Edit the database settings,
Run it with your browser.
Done!
=== PHP CODE BEGINS ===
<?php
$host=' '; //this is the database hostname, Do not change this.
$user=' '; //please set your mysql user name
$pass=' '; // please set your mysql user password
$dbname=' '; //please set your Database name
$charset='utf8'; // specify the character set
$collation='utf8_general_ci'; //specify what collation you wish to use
$db = mysql_connect('localhost',"$user","$pass") or die("mysql could not CONNECT to the database, in correct user or password " . mysql_error());
mysql_select_db("$dbname") or die("Mysql could not SELECT to the database, Please check your database name " . mysql_error());
$result=mysql_query('show tables') or die("Mysql could not execute the command 'show tables' " . mysql_error());
while($tables = mysql_fetch_array($result)) {
foreach ($tables as $key => $value) {
mysql_query("ALTER TABLE $value CONVERT TO CHARACTER SET $charset COLLATE $collation") or die("Could not convert the table " . mysql_error());
}}
mysql_query("ALTER DATABASE $dbname DEFAULT CHARACTER SET $charset COLLATE $collation") or die("could not alter the collation of the databse " . mysql_error());
echo "The collation of your database has been successfully changed!";
?>
=== PHP CODE END ===
You may contact us if you have any problem
http://www.kordil.com
Run this script for your database, and you will see it fixed. The problem is, most likely you had your default charset set to .. swedish_ci. Later changing it to UTF-8 does not help. Because, C5 already created tenth of new tables which are not utf8 and you keep on adding to it. Sometimes, host providers does not watch the default charsets during upgrade.
After backing up your database, copy the content of below php code in a file. I also attached the file, rename the extension to PHP and run it.
Edit the database settings,
Run it with your browser.
Done!
=== PHP CODE BEGINS ===
<?php
$host=' '; //this is the database hostname, Do not change this.
$user=' '; //please set your mysql user name
$pass=' '; // please set your mysql user password
$dbname=' '; //please set your Database name
$charset='utf8'; // specify the character set
$collation='utf8_general_ci'; //specify what collation you wish to use
$db = mysql_connect('localhost',"$user","$pass") or die("mysql could not CONNECT to the database, in correct user or password " . mysql_error());
mysql_select_db("$dbname") or die("Mysql could not SELECT to the database, Please check your database name " . mysql_error());
$result=mysql_query('show tables') or die("Mysql could not execute the command 'show tables' " . mysql_error());
while($tables = mysql_fetch_array($result)) {
foreach ($tables as $key => $value) {
mysql_query("ALTER TABLE $value CONVERT TO CHARACTER SET $charset COLLATE $collation") or die("Could not convert the table " . mysql_error());
}}
mysql_query("ALTER DATABASE $dbname DEFAULT CHARACTER SET $charset COLLATE $collation") or die("could not alter the collation of the databse " . mysql_error());
echo "The collation of your database has been successfully changed!";
?>
=== PHP CODE END ===
You may contact us if you have any problem
http://www.kordil.com
nice one :)
or just print out all queries and run it because script can time out :)
<?php
$host='localhost'; //this is the database hostname, Do not change this.
$user=''; //please set your mysql user name
$pass=''; // please set your mysql user password
$dbname=''; //please set your Database name
$charset='utf8'; // specify the character set
$collation='utf8_general_ci'; //specify what collation you wish to use
$db = mysql_connect('localhost',"$user","$pass") or die("mysql could not CONNECT to the database, in correct user or password " . mysql_error());
mysql_select_db("$dbname") or die("Mysql could not SELECT to the database, Please check your database name " . mysql_error());
$result=mysql_query('show tables') or die("Mysql could not execute the command 'show tables' " . mysql_error());
while($tables = mysql_fetch_array($result)) {
foreach ($tables as $key => $value) {
print "ALTER TABLE $value CONVERT TO CHARACTER SET $charset COLLATE $collation;" ."<br />";
//mysql_query("ALTER TABLE $value CONVERT TO CHARACTER SET $charset COLLATE $collation") or die("Could not convert the table " . mysql_error());
}}
print "ALTER DATABASE $dbname DEFAULT CHARACTER SET $charset COLLATE $collation;" ."<br />";
//mysql_query("ALTER DATABASE $dbname DEFAULT CHARACTER SET $charset COLLATE $collation") or die("could not alter the collation of the databse " . mysql_error());
//echo "The collation of your database has been successfully changed!";
?>
or just print out all queries and run it because script can time out :)
<?php
$host='localhost'; //this is the database hostname, Do not change this.
$user=''; //please set your mysql user name
$pass=''; // please set your mysql user password
$dbname=''; //please set your Database name
$charset='utf8'; // specify the character set
$collation='utf8_general_ci'; //specify what collation you wish to use
$db = mysql_connect('localhost',"$user","$pass") or die("mysql could not CONNECT to the database, in correct user or password " . mysql_error());
mysql_select_db("$dbname") or die("Mysql could not SELECT to the database, Please check your database name " . mysql_error());
$result=mysql_query('show tables') or die("Mysql could not execute the command 'show tables' " . mysql_error());
while($tables = mysql_fetch_array($result)) {
foreach ($tables as $key => $value) {
print "ALTER TABLE $value CONVERT TO CHARACTER SET $charset COLLATE $collation;" ."<br />";
//mysql_query("ALTER TABLE $value CONVERT TO CHARACTER SET $charset COLLATE $collation") or die("Could not convert the table " . mysql_error());
}}
print "ALTER DATABASE $dbname DEFAULT CHARACTER SET $charset COLLATE $collation;" ."<br />";
//mysql_query("ALTER DATABASE $dbname DEFAULT CHARACTER SET $charset COLLATE $collation") or die("could not alter the collation of the databse " . mysql_error());
//echo "The collation of your database has been successfully changed!";
?>
My c5 is 5.2.0.
First you have to edit the following file
/concrete/libraries/3rdparty/adodb/drivers/adodb-mysql.inc.php
on Line 371 - add the following code
Add the line above to before the following code
But I don't know if this is a good idea.
Also once you do this, you MUST set your database collation to utf-8 related ones.