NVARCHAR versus VARCHAR
SQL Server provides both datatypes to store character information. For the most part the two datatypes are identical in how you would work with them within SQL Server or from an application. The difference is that nvarchar is used to store unicode data, which
is used to store multilingual data in your database tables. Other languages have an extended set of character codes that need to be saved and this datatype allows for this extension. If your database will not be storing multilingual data you should use the
varchar datatype instead. The reason for this is that nvarchar takes twice as much space as varchar, this is because of the need to store the extended character codes for other languages from Ask The Database Expert: Questions & Answers
Differences between varchar and nvarchar in SQL Server
The broad range of data types in SQL Server can sometimes throw people through a loop, especially when the data types seem to be highly interchangeable. Two in particular that constantly spark questions are VARCHAR and NVARCHAR: what's the difference between
the two, and how important is the difference?
VARCHAR is an abbreviation for variable-length character string. It's a string of text characters that can be as large as the page size for the database table holding the column in question. The size for a table page is 8,196 bytes, and no one row in a table
can be more than 8,060 characters. This in turn limits the maximum size of a VARCHAR to 8,000 bytes.
Why do some SQL strings have an 'N' prefix?
//z 2012-6-7 15:00:12 PM IS2120@CSDN
You may have seen Transact-SQL code that passes strings around using an N prefix. This denotes that the subsequent string is in Unicode (the N actually stands for National language character set). Which means that you are passing an NCHAR, NVARCHAR or NTEXT
value, as opposed to CHAR, VARCHAR or TEXT. See Article #2354 for a comparison of these data types.
Unicode is typically used in database applications which are designed to facilitate code pages which extend beyond the English and Western Europe code pages (Erland Sommarskog, a native of Sweden, refers to this set as "Germanic and Romance languages"), for
example Chinese. Unicode is designed so that extended character sets can still "fit" into database columns. What this means is that Unicode character data types are limited to half the space, because each byte actually takes two bytes (Unicode is sometimes
referred to as "double-wide"). For more information on Unicode, see Unicode.org. Note that there are many encoding schemes in the Unicode standard, but SQL Server only supports one: UTF-16.
While using Unicode is a design choice you can make in building your own applications, some facilities in SQL server require it. One example is sp_executeSQL. If you try the following:
EXEC sp_ExecuteSQL 'SELECT 1'
You will get this error:
Server: Msg 214, Level 16, State 2, Procedure sp_executesql, Line 1
Procedure expects parameter '@statement' of type 'ntext/nchar/nvarchar'.
You can get around this in two ways:
-- (a) using the N prefix
EXEC sp_ExecuteSQL N'SELECT 1'
-- (b) using a variable
DECLARE @sql NVARCHAR(100)
SET @sql = N'SELECT 1'
EXEC sp_ExecuteSQL @sql
Note that implicit conversion makes the N prefix optional in case (b); however, for legibility and consistency, you should always use the prefix when defining Unicode strings. One reason is that leaving it off can actually change your data if it contains Unicode
characters (losing the additional information), as in the following example:
DECLARE @n NVARCHAR(10)
SET @n = 'a'
PRINT @n
SET @n = N'a'
PRINT @n
The first assignment, which didn't use the N prefix, gets printed as a regular a. Only the second maintains the character that was actually supposed to be represented. As you can imagine, if you are intending to support data entry in foreign languages and code
pages, you will likely need to test for Unicode support (making sure that such columns support Unicode, and that data won't be lost when passed into stored procedures, functions, etc.). Note that your application will need to handle Unicode as well; for example,
when you try to print this character from ASP...
<%
Response.Write("a")
%>
...it actually prints out the string aa. (This result might depend on your codepage and regional settings.) So you might consider translating your data into its ASCII equivalent, e.g. a = ā.
Another reason you want to avoid implicit conversion is that there are some potentially serious performance issues. Consider the following quite simple repro:
USE tempdb
GO
CREATE TABLE a
(
b VARCHAR(3),
c NVARCHAR(3)
)
CREATE INDEX b ON a(b)
CREATE INDEX c ON a(c)
GO
SET NOCOUNT ON
INSERT a SELECT 'foo', N'foo'
INSERT a SELECT 'bar', N'bar'
DECLARE
@b VARCHAR(3),
@c NVARCHAR(3)
SELECT
@b = 'foo',
@c = N'foo'
SELECT * FROM a WHERE b = @b
SELECT * FROM a WHERE b = @c
SELECT * FROM a WHERE c = @b
SELECT * FROM a WHERE c = @c
SELECT * FROM a WHERE b LIKE @b
SELECT * FROM a WHERE b LIKE @c
SELECT * FROM a WHERE c LIKE @b
SELECT * FROM a WHERE c LIKE @c
DROP TABLE a
//z 2012-6-7 15:00:12 PM IS2120@CSDN
Paste the code into Query Analyzer, turn execution plan on, and let her rip. You'll observe the following breakdown of percentage of work (roughly, depending on your hardware):
VARCHAR = VARCHAR 4.48%
VARCHAR = NVARCHAR 13.31%
NVARCHAR = VARCHAR 4.48%
NVARCHAR = NVARCHAR 4.48%
VARCHAR LIKE VARCHAR 4.48%
VARCHAR LIKE NVARCHAR 13.31%
NVARCHAR LIKE VARCHAR 4.48%
NVARCHAR LIKE NVARCHAR 4.48%
Now, that's not the whole story; we all know that there are many other factors, such as I/O, that will impact the actual time each portion of the query takes. The key is that implicit conversion *can* cause a table scan instead of an index seek, and on larger
tables this can really hurt. While it's important to understand why this happens and in which scenarios, my recommendation is to match your character-based datatypes as explicitly as possible.
One other thing to watch out for: your database may be using Unicode without your knowledge. If you upsize from Access to SQL Server, for example, character-based text columns might be translated to Unicode (I believe this is a catch-all technique; in case
Access was storing Unicode strings, or if you might be storing Unicode strings later, you won't lose data or require changes). I think the Access upsizing tools should be updated to force a conscious choice, so that you aren't wasting space for nothing, and
so that you know that you made a decision at all.
For a more thorough discussion of Unicode and the N prefix, please see KB #239530, this MSDN article, and this Google thread.
In other RDBMS platforms, or in the ANSI and/or ISO specifications, you might see prefixes other than N being used against values. (Current versions of SQL Server only support Unicode.) Here are the additional monikers I am aware of:
B This is used to denote a BINARY string expressed in bits (0 and 1 only)
X This is used to denote a BINARY string expressed in hexadecimal (0 -> F)
//z 2012-6-7 15:00:12 PM IS2120@CSDN