Inserting Unicode string without prefix ' N....' - sql

The background of this question is that I have a column with the following definition:
FirstName VARCHAR(100).
I can insert a THAI/Chinese/European value if I change the column datatype to NVARCHAR and when inserting a value I need to Prefix it with N, as
Insert into table ([FirstName]) value(N'THAI/Chinese/European value').
Question:
There are a lot of applications that update this particular column and for me to assist this change I need to make a lot of changes to the procedures and various other application level changes. Is there a way I can make a change at the database level where I can accommodate this change.

Is there a way I can make a change at the database level where I can accommodate this change.
I don't believe there is any way to force SQL Server to handle all varchars as unicode nvarchars. They are simply different datatypes.
If you are using literals in your SQL code, you will have to use N''. Any columns, parameters, or variables that hold the data will have to be nchar/nvarchar. Your apps will all have to send unicode values to the DB.
I would search for "sql server migrate to unicode" for additional reading before you take this on.

While I agree with #TimLehner that I do not know of a way to force SQL Server to handle all varchar columns as nvarchar columns, there are a few things that could make your transition to Unicode strings in the column easier:
To support Unicode values in the column one-off or in an upgrade script, use ALTER TABLE [table] ALTER COLUMN FirstName nvarchar(100). (Of course, be sure to update your create script for [table] if applicable too - i.e. CREATE TABLE [table] (FirstName nvarchar(100)...).)
Use Unicode (i.e. N'SomeFirstName') literals where you expect to insert or set strings with Unicode characters; but continue to use non-Unicode (i.e. 'SomeFirstName') literals where you do not in transition.
Work your way up to altering procedures' parameters (i.e. from varchar to nvarchar) as needed.
Basically, ideally you would change the column and everything related to it to support Unicode at once; but you may be able to limit initial changes to application(s), procedure(s) etcetera that initially need to leverage the column's underlying Unicode support.

You could make use of a stored procedure for inserts and updates. If then the entire application uses that, you can solely update the stored procedure...
But i guess that would still require an update on all locations, so i guess this is not that much help...

Related

SQL injection if brackets and semicolons are filtered

I have a statement like this:
SELECT * FROM TABLE WHERE COLUMN = 123456
123456 is provided by the user so it is vulnerable to SQLi but if I strip all semicolons and brackets, is it possible for the hacker to run any other statements (like DROP,UPDATE,INSERT etc) except SELECT?
I am already using prepared statements but I am curious that if the input is stripped of the line-terminator and brackets, can the hacker modify the DB in any way?
Use sql parameters. Attempting to "sanitize" input is an extremely bad idea. Try googling some complex sql injection snippets, you won't believe how creative black hat hackers are.
In general it's very difficult to be 100% certain that you are safe from this type of attack by trying to strip out specific characters - there are just too many ways to get around your code (by using character encodings etc.)
A better option is to pass parameters to a stored procedure, like this:
CREATE PROCEDURE usp_MyStoredProcedure
#MyParam int
AS
BEGIN
SELECT * FROM TABLE WHERE COLUMN = #MyParam
END
GO
That way SQL will treat the value passed in as a parameter, and nothing else, no matter what it contains. And in this case it would only accept a value of type int anyway.
If you don't want, or can't, use a stored procedure, then I'd suggest changing your code so that the input parameter can only contain a pre-defined list of characters - in this case numeric characters. That way you can be certain that the value is safe to use.

Error Inserting Entry With Text Column That Contains New Line And Quotes

I have an Informix 11.70 database.I am unable to sucessfully execute this insert statement on a table.
INSERT INTO some_table(
col1,
col2,
text_col,
col3)
VALUES(
5,
50,
CAST('"id","title1","title2"
"row1","some data","some other data"
"row2","some data","some other"' AS TEXT),
3);
The error I receive is:
[Error Code: -9634, SQL State: IX000] No cast from char to text.
I found that I should add this statement in order to allow using new lines in text literals, so I added this above the same query I have already written:
EXECUTE PROCEDURE IFX_ALLOW_NEWLINE('t');
Still, I receive the same error.
I have also read the IBM documentation that says: to alternatively allow new lines, I could set the ALLOW_NEWLINE parameter in the ONCONFIG file. I suppose the last one requires administrative access to the server to alter that config file, which I do not have, and I prefer not to take advantage of this setting.
Informix's TEXT (and BYTE) columns pre-date any standard, and are in many ways very peculiar types. TEXT in Informix is very different from TEXT found in other DBMS. One of the long-standing (over 20 years) problems with them is that there isn't a string literal notation that can be used to insert data into them. The 'No cast from char to text' is saying there is no explicit conversion from string literal to TEXT, either.
You have a variety of options:
Use LVARCHAR in the table (good if your values won't be longer than a few KiB, because the total row length is approximately 32 KiB). Maximum size of an LVARCHAR column is just under 32 KiB.
Use a programming language which can handle Informix 'locator' structures — in ESQL/C, the type used to hold a TEXT is loc_t.
Consider using CLOB instead. However, this has the same limitation (no string to CLOB conversion), but you'd be able to use the FILETOCLOB() function to get the information from a file on the client to the database (and LOTOFILE transfers information from the DB to a file on the client).
If you can use LVARCHAR, that is by far the simplest alternative.
I forgot to mention an important detail in the question - I use Java and the Hibernate ORM to access my Informix database, thus some of the suggested approaches (the loc_t handling in particular) in Jonathan Leffler's answer are unfortunately not applicable. Also, I need to store large data of dynamic length and I fear the LVARCHAR column would not be sufficient to hold it.
The way I got it working was to follow Michał Niklas's suggestion from his comment, and use PreparedStatement. This could potentially be explained by Informix handing the TEXT data type in its own manner.

Problem with SQL Collation

I'm making an Arabic website , and after I create the database and start writing Arabic text inside it , it just show ???? , so I change the collation of my Database from SQL_Latien to Arabic_CI_AI
but I'm still getting the ???? inside my fields and when I check the properties of the field I found it SQL_Latien and it doesn't change
so what should I do to fix this problem without repeating building the database
please reply as soon as you can
Thanks in Advance
Database collation is just the default setting for new columns.
To change the collation of an existing column, you'd have to alter table. For example:
alter table YourTable alter column col1 varchar(10) collate Arabic_CI_AI
The collation sequence is the order in which characters appear when you sort (ie. use the 'ORDER BY' clause). Different collations will result in different sort orders.
This is obviously NOT what you are looking for. You problem is storing and retrieving UNICODE characters outside the ASCII range (ie. Arabic characters). To do that, the data types storing this data must support UNICODE, instead of ASCII. Simply, when defining a column, use the data types nchar, nvarchar, and ntext, instead of char, varchar and text.

is there a downside to putting N in front of strings in scripts? Is it considered a "best practice"?

Let's say I have a table that has a varchar field. If I do an insert like this:
INSERT MyTable
SELECT N'the string goes here'
Is there any fundamental difference between that and:
INSERT MyTable
SELECT 'the string goes here'
My understanding was that you'd only have a problem if the string contained a Unicode character and the target column wasn't unicode. Other than that, SQL deals with it just fine and converts the string with the N'' into a varchar field (basically ignores the N).
I was under the impression that N in front of strings was a good practice, but I'm unable to find any discussion of it that I'd consider definitive.
You should prefix strings with N when they are destined for an nvarchar(...) column or parameter. If they are destined for a varchar(...) column or parameter, then omit it, otherwise you end up with an unnecessary conversion.
It's definitely not a "best practice" to stick N in front of every string regardless of what it's for.
Short answer: fine for scripts, bad for production code.
It is not considered a best practice. There is a downside, it creates a minuscule performance hit as 2 byte characters are converted to 1 byte characters.
If one doesn't know where the insert is going, or doesn't know where the source text is coming from (say this is a general purpose data insertion utility that generates insert statements for an unknown target, say when exporting data), N'foo' might be the more defensive coding style.
So the downside is small and the upside is that your script code is much more adaptable to changes in database structure. Which is probably why you see it in bulk data-insert scripts.
However, if the code in question is something meant for re-use in an environment where you care about the quality of the code, you should not use N'the string' because you are adding a conversion where none is necessary.
From INSERT (Transact-SQL)
When referencing the Unicode character
data types nchar, nvarchar, and ntext,
'expression' should be prefixed with
the capital letter 'N'.
Also have a read at Why do some SQL strings have an 'N' prefix?
And
Server-Side Programming with Unicode
Unicode string constants that appear
in code executed on the server, such
as in stored procedures and triggers,
must be preceded by the capital letter
N. This is true even if the column
being referenced is already defined as
Unicode. Without the N prefix, the
string is converted to the default
code page of the database. This may
not recognize certain characters.

SQL Server Stored Proc Argument Type Conversion

Suppose I have a bunch of varchar(6000) fields in a table and want to change those to text fields. What are the ramifications of the stored procedures whose arguments are of type varchar(6000). Does each stored procedure also need those argument data types changed?
Text fields are deprecated in SQL Server 2005 and above. You should use varchar(MAX), if possible. If you expect to have more than 6000 characters passed in the arguments to your stored procedures, you will need to change them as well.
Text fields are rough to work with in SQL Server. You can't actually declare local variables of type text (except as parameters to a stored procedure) and most of the string manipulation functions no longer work on text fields.
Also if you have triggers the text fields will not appear on the INSERTED or DELETED tables.
Basically if the field is just holding data from a program and you aren't manipulating it then no big deal. But if you have stored procedures to manipulate the string then your task will be way more difficult.
As tvanfosson mentioned if you have SQL Server 2005 use VARCHAR(MAX) then you get the length of a text field with the ability to manipulate it like it is a VARCHAR.
The other answers are right, but they don't answer your question. Varchar(max) is the way to go. If you made the feilds varchar(max)/text, but kept the stored proc arguments the same, any field that came in through the stored proc would be truncated to 6000 characters. Since you say that it will never exceed that, you will be fine, until, of course, that isn't the case. It doesn't throw an error. It just truncate.
I'm not sure of the exact behavior of varchar(max) verses text, but I'm pretty sure that once you start putting a lot of them in one table, you can get some crazy performance hits. Why so many big fields in one table?
The reason for text field usage is that all of the varchar(6000) fields in one row exceed the max row length. Text fields just store a pointer in the row thus not exceeding the SQL Server max row length of 8000 something. ATM the database cannot be normalized. The data is not manipulated by the stored procedures it's just inserted, updated and deleted.
Does VARCHAR(MAX) behave like a text field and only store a pointer to the data in the row?