Help me reproduce this database issue on localhost

Discussion of testing theory and practice, including methodologies (such as TDD, BDD, DDD, Agile, XP) and software - anything to do with testing goes here. (Formerly "The Testing Side of Development")

Moderator: General Moderators

Post Reply
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Help me reproduce this database issue on localhost

Post by josh »

On 1% of my customer's machines the character set of my tables is setup wrong because I never specified them. I guess I was relying on the default value. To reproduce this I could probably set the default character set for the connection to the problematic one (utf8_unicode_ci) and then try to collate utf8 text?

The collation error is SQLSTATE[HY000]: General error: 1267 Illegal mix of collations (utf8_unicode_ci,IMPLICIT) and (utf8_general_ci,IMPLICIT) for operation '='

On my client's site it says the table is utf8.
On my site it says the table is latin1

How can I reproduce this locally, what would by the mysql command? I want to write a unit test so I put it under testing as opposed to databases, as I don't want the answer to be "change some value in my.cnf", I only want to reproduce it for purposes of a unit test, that is I want to switch the default character set at run time only, so I can replicate the issue.
User avatar
Zyxist
Forum Contributor
Posts: 104
Joined: Sun Jan 14, 2007 10:44 am
Location: Cracow, Poland

Re: Help me reproduce this database issue on localhost

Post by Zyxist »

Heh... typical problems of people who do not have non-latin characters in the alphabet :). Collations define, how the strings are compared, when it comes to the non-latin characters. The algorithm has a knowledge whether ą should be before or after a, and so on. However, in order to make them work properly, two strings must have the same collation set. Your error suggests that you have two columns with two different collation algorithms and MySQL does not know, which one to use. So the problem is very simple to reproduce - just set two different collations (i.e. latin1_general_ci and latin1_swedish_ci) on two columns and try to compare them.

To solve the problem, you should define a single collation in the entire database. If you are using UTF8, choose either utf8_general_ci or utf8_unicode_ci, never both of them. The same applies to other encodings. If you want some more extra reading about the charset and collation support in MySQL, I wrote an article once about them: http://www.zyxist.com/en/archives/74#more-74
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Re: Help me reproduce this database issue on localhost

Post by josh »

I'm comparing to a literal though. I'm invoking a function in my code, pointing it to a stream (file) that has some text in it. The code in this function reads my file, and uses text from it in a query (as a literal string in the where clause)

I've tried saving the file as utf8, utf8-unicode, etc..

I guess what I need to figure out is, how do I make mysql so that it creates it with the "wrong" character set (so I can reproduce it) when I don't specify one, as right now I specify none.
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Re: Help me reproduce this database issue on localhost

Post by Weirdan »

You might not be able to reproduce the problem if your mysql server has different version to what you client's server has.
MySQL manual wrote: As of MySQL 5.1.21, string expressions have a repertoire attribute[...]
The use of repertoire enables character set conversion in expressions for many cases where MySQL would otherwise return an “illegal mix of collations” error.
It was also introduced in 5.0 branch in a minor release.
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Re: Help me reproduce this database issue on localhost

Post by josh »

I reproduced it on a customers server who is running a version released after that, so that's not the issue. Also I'm explicitly setting the character sets as following and it still happens:

Code: Select all

self::$dbAdapter->getConnection()->query('SET character set utf8;');
        self::$dbAdapter->getConnection()->query('SET character_set_client = utf8;');
        self::$dbAdapter->getConnection()->query('SET character_set_results = utf8;');
        self::$dbAdapter->getConnection()->query('SET character_set_connection = utf8;');
        self::$dbAdapter->getConnection()->query('SET character_set_database = utf8;');
        self::$dbAdapter->getConnection()->query('SET character_set_server = utf8;');
 
When I create my tables I use the following options:

Code: Select all

 ENGINE = InnoDB CHARSET=utf8;
Then under 'show table status' I see it is using 'utf8_general_ci' which is what I want. The problem however comes when I forget to specify the character set, it defaults to 'utf8_unicode_ci'. The fix is obvious, specify it. However I cannot replicate it on localhost. I've explicitly set all the settings that affect what the default character set would be, yet the default character set seems to differ on certain hosts. Again, its easy to fix but I want a unit test. I want to figure out why I can't replicate it on my localhost, so I can write an automated test and prevent against people forgetting to specify the character set in the future, if someone forgets to set the character set I want the test to fail everywhere, but right now it passes on some systems and fails on others.
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Re: Help me reproduce this database issue on localhost

Post by josh »

Also sometimes when I specify it it shows up in the table status as 'latin1_swedish_ci'..., even though I had specified UTF8.
Post Reply