Friday, March 23, 2012

retrieving unicode using odbc

I have a table that has arabic, japanese, utf8 and ascii strings in it. When I try and rerieve those strings using the windows odbc driver, the ascii and since byte utf8 characters come back fine, the arabic and japanese are returning as single byte characters. I read on microsofts page that odbc versions before 3.7 are considered non-unicode. When I look at the version of the odbc administrator it says 3.5. Is there a newer version of the sqlserver odbc driver or windows odbc administrator that I would need?

Is there some way to specify what the client charset is? I'm using SQLGetData with a target type of SQL_C_WCHAR.

Right now I have xp SP2 that has mdac 2.8 SP1.

Thanks for your help.You need to check the Driver's File Product Version, the administrator version.

Ex. My C:\WINDOWS\SYSTEM32\sqlsrv32.dll is 3.85.1117.

The version the administrator reports is the Driver Manager version, which is basically the ODBC API compliance level for the Driver Manager. Anything on your system should be at least 3.5. The Latest version is 3.5.2

Jay Grubb
Technical Consultant
OpenLink Software
Web: http://www.openlinksw.com:
Product Weblogs:
Virtuoso: http://www.openlinksw.com/weblogs/virtuoso
UDA: http://www.openlinksw.com/weblogs/uda
Universal Data Access & Virtual Database Technology Providers|||Oh yes, you also need to fiddle with the Perform Translation settings, and potentially the server code pages. From the Driver help:

Perform translation for character data check box

When selected, the SQL Server ODBC driver converts ANSI strings sent between the client computer and SQL Server by using Unicode. The SQL Server ODBC driver sometimes converts between the SQL Server code page and Unicode on the client computer. This requires that the code page used by SQL Server be one of the code pages available on the client computer.

When cleared, no translation of extended characters in ANSI character strings is done when they are sent between the client application and the server. If the client computer is using an ANSI code page (ACP) different from the SQL Server code page, extended characters in ANSI character strings may be misinterpreted. If the client computer is using the same code page for its ACP that SQL Server is using, the extended characters are interpreted correctly.

Jay Grubb
Technical Consultant
OpenLink Software
Web: http://www.openlinksw.com:
Product Weblogs:
Virtuoso: http://www.openlinksw.com/weblogs/virtuoso
UDA: http://www.openlinksw.com/weblogs/uda
Universal Data Access & Virtual Database Technology Providers|||Thanks for the replay John,

The Driver's File Product Version is... sqlsrv32.dll is 3.85.1117.

Unselecting Perform Translation has no effect on what is returned. The data return actually appears to be returned in Latin1 encoding. So for a table containing arabic characters ...

الفائز بكأس

returns ...

'DA'&2

When I inserted the arabic into mssql I made sure it was encoded properly by using the statment ...

insert into test values (N'{CHARACTERS}')

Is there something else I could be missing?

Thanks for your help.|||What is the Fields Collation/ default collation?

default is found with sp_helpsort

Jay Grubb
Technical Consultant
OpenLink Software
Web: http://www.openlinksw.com:
Product Weblogs:
Virtuoso: http://www.openlinksw.com/weblogs/virtuoso
UDA: http://www.openlinksw.com/weblogs/uda
Universal Data Access & Virtual Database Technology Providers|||For this table I have the Collation set to <database default>. When I click on ... to change the collation it is set to SQL_Latin1_Gerneral_CP1_CI_AS.

I thought the collation was for sorting results only?

Thanks again.|||Another thing I've noticed, when I export data and write it to a file, the characters are incorrect the exact same way they are when I try using the odbc driver.

I feel like there is something basica that I'm missing here. Any ideas?

Thanks a lot for your help.sql

No comments:

Post a Comment