Inserting string which contain single quotes ' in SQL - sql

I've seen so many solutions to this that I don't know which one to follow.
I thought that what I have would work but upon further testing I have discovered that this is not true.
I'm using VB to pick up MS Excel worksheets from a given file directory, extract the data, and insert into SQL data tables.
here's the part I need some help with:
If saRet(linex, 11) <> "" Then
IntDesc = (saRet(linex, 11).ToString.Replace("'", "''"))
Echo("Internal description: " & IntDesc)
Else
Echo("No internal description given")
IntDesc = ""
End If
After tampering around with some test insert statements in SQL server studio I thought that replacing ' with '' worked. Sadly, not.
Here's an example of a string which makes the insert fail:
Set-up and Config of New Button on BP and UDF's for Despatch Process
and after my string manipulation, here's the Insert statement(I've blanked out some data which my company probably doesn't want to share, it's insignificant anyway):
NSERT INTO <tablename> VALUES ('2013-12-10', '12', '2013', 'AAAA', 'AAAA', '10668', 'JBT', 'Project - Config & System Build', 'CSB', '2', 'Y', 'N', '0', 'Set-up and Config of New Button on BP and UDF's for Despatch Process', 'Set-up and Config of New Button on BP and UDF''s for Despatch Process', '0', 'NULL')
Very grateful for any help!
Thanks.

The best way is always using parameters. Those handle everything for you and you don't need to do any escaping.
If you can't use parameters, you have to do the encoding yourself, and that's very tricky. One way would be to use a format you can encode safely - for example, instead of inserting as a string literal, you might use binary encoding (eg. cast(0xAABBCCDD as varchar(max))). This is perfectly safe, since you can be sure that there's no invalid character that would break it. Of course, it also has its problems.
As for your example, replacing ' with '' works fine (although of course you'd have to watch out for other invalid characters, such as endlines). Your problem is that you didn't do the encoding on all the strings. In your sample, the last string has the proper encoding, the one before it does not. This also beautifully illustrates the pain of making sure you're encoding properly - and everything you miss means an error, or even a way to exploit the code and cause harm. For example, what if the description was ', ''0'', ''NULL''); delete from users --?

Related

Is there a more efficient way to parse a fixed txt file in Access than using queries?

I have a few large fixed with text files that have multiple specification formats in them. I need to parse out the txt files based on a character with a set location in the file. That character can have a different position in the file.
I have written queries for each of the different specifications (95 of them) with the start position and length hard coded into the query using the mid() function with a WHERE() function to filter the [Record Identifier] from the specification. As you can see below the 2 specifications in the WHERE() function have different placements in the txt file.
\\\
SELECT Mid([AllData],1,5) AS PlanNumber, Mid([AllData],6,4) AS Spaces1, Mid([AllData],10,3) AS Filler1, Mid([AllData],13,11) AS SSN, Mid([AllData],24,1) AS AccountIdentifier, Mid([AllData],25,5) AS Filler2, Mid([AllData],30,2) AS RecordIdentifier, Mid([AllData],32,1) AS FieldType, Mid([AllData],33,4) AS Filler3, Mid([AllData],37,8) AS HireDate, Mid([AllData],45,8) AS ParticipationDate, Mid([AllData],53,8) AS VestinDate, Mid([AllData],61,8) AS DateOfBirth, Mid([AllData],77,1) AS Spaces2, Mid([AllData],78,1) AS Reserved1, Mid([AllData],79,1) AS Reserved2, Mid([AllData],80,1) AS Spaces3
FROM TBL_Company1
WHERE (((Mid([AllData],30,2))="02") AND ((Mid([AllData],32,1))="D"));
\\\
Or
\\\
SELECT Mid([AllData],1,5) AS PlanNumber, Mid([AllData],6,4) AS Spaces1, Mid([AllData],10,3) AS Filler1, Mid([AllData],13,11) AS SSN, Mid([AllData],24,1) AS AccountIdentifier, Mid([AllData],25,7) AS RecordIdentifier, Mid([AllData],32,22) AS StreetAddressForBank, Mid([AllData],54,20) AS CityForBank, Mid([AllData],74,2) AS StateForBank, Mid([AllData],76,5) AS ZipCodeForBank
FROM TBL_Company1
WHERE (((Mid([AllData],25,7))="49EFTAD"));
\\\
Is there a way to Parse out this without having to hard code every position and length into the code?
I was thinking of having a table with all of the specifications in it and have an import function look to the specification table and parse out the data accordingly to a new table or maybe something else.
What I have done is not very scalable and if the format changes a little I would have to go back to each query to change it.
Any Help is greatly appreciated
I think in your situation, I'd want to be able to generate the SQL statement dynamically, as you suggest.
I'd have a table something like:
Format#,Position,OutColName,FromPos,Length,WhereValue
1,1,"PlanNumber",1,5,
1,2,"Spaces1",6,4,
...
1,n,,30,2,"02"
1,n+1,,32,1"D"
and then some VBA to process it and build and execute the SQL string(s). The SELECT clause entries would be recognized by having a value in the OutColName field and WHERE clause entries by values in the the WhereValue column.
Of course this is only more "efficient" in the sense that it's a bit easier to code up new formats or fix/modify existing ones.

Using the " ' " mark as part of the data

I am trying to create an insert statements that will allow the " ' " mark to be used as data, and having problems finding a reference to it. My current statement:
$sql = "INSERT INTO turnover(tail, discription, date) values ('$tail', '$discription', '$date')";
How ever when i put though this sample data i get:
INSERT INTO turnover(tail, discription, date) values ('173AB', 'No Dying MEL's tonight.', '05-14-2018')
If i remove the ' then it works fine. It seems i saw something a long time ago on it, but cannot find it.
Thanks in advance for any information that anyone can offer.
Best practice: Use prepared statements. Asuming you are using PHP: http://php.net/manual/en/pdo.prepared-statements.php
Alternatively you would apply some sort of escaping, like mysqli_real_escape_string().
Take care not to mix PDO and MySQLi functions, and do not in any case use mysql_*() functions.
INSERT INTO turnover(tail, discription, date) values ('173AB', 'No Dying MEL''s tonight.', '05-14-2018')

How to fix error when value in SQL contains an apostrophe

I am updating a string in a table using a Firebird SQL statement, with information typed by a user. However, if the string entered by a user has an apostrophe then it creates an error because the SQL syntax no longer reads correctly.
I guess I could read the string and remove all instances of apostrophes, but I wonder if there is an easier way.
{Edit 17 May 2017}
I am using Firebird 2.5 (as part of a software program called Ostendo)
Here is an extract of the code:
UpdatedMsg := frmPOLineNotesMemo.Lines.text;
SQLUpdateStr := 'update ASSEMBLYLINES set LINENOTES = LINENOTES || '''+ UpdatedMsg +''' Where SYSUNIQUEID = ' + AssyPropertyLine + '';
ExecuteSQL(SQLUpdateStr);
frmPOLineNotesMemo.Lines.Text is information entered by the user via a form.
You need to double the apostrophes in the string.
Found here: https://firebirdsql.org/manual/qsg10-firebird-sql.html
However keep in mind the comments. User input should be passed as parameters to avoid security problems.

SQL Server : remove duplicated text within a string

I have a SQL Server 2008 table with a column containing lengthy HTML text. Near the top there is a link provided for an associated MP3 file which is unique to each record. The links are are all formatted as follows:
<div class="MediaSaveAs">Download Audio </div>
Unfortunately many records contain two or three sequential and identical instances of this link where there should be only one. Is there a relatively simple script I can run to find and eliminate the redundant links?
I'm not entirely sure - because your explanation wasn't very clear - but this appears to do what you want, although whether or not you consider this to be a "simple script", I don't know.
declare #Link nvarchar(200) = N'<div class="MediaSaveAs">Download Audio </div>'
declare #BadData nvarchar(max) = N'cbjahcgfhjasgfzhjaucv' + replicate(#Link, 3) + N'cabhjcsghagj',
#StartPattern nvarchar(34) = N'<div class="MediaSaveAs"><a href="',
#EndPattern nvarchar(27) = N'">Download Audio </a></div>'
select #BadData
select replace (
#BadData,
substring(#BadData, charindex(#StartPattern, #BadData), len(#BadData)-charindex(reverse(#EndPattern), reverse(#BadData))-charindex(#StartPattern, #BadData) + 2),
substring(#BadData, charindex(#StartPattern, #BadData), charindex(#EndPattern, #BadData) + len(#EndPattern) - charindex(#StartPattern, #BadData))
)
Personally I would not like to have to maintain this code; I would far rather use a script in another language that can actually parse HTML. You said this is "just a repeated text issue", but that doesn't mean it's an easy problem and especially not in a language like TSQL that has such limited support for string operations.
For future reference, please put all relevant information into the question - you can edit it if you need to - instead of leaving them in the comments where they are difficult to read and may be overlooked. And please post sample data and results instead of describing things in words.
First we need to identify the file names, which we can do with PATINDEX:
select
substring(html, PATINDEX('%filename%.mp3%', html), PATINDEX('%.mp3%', html)-PATINDEX('%filename%.mp3%', html)+4)
from files
And then secondly identify and the duplicates, check it out:
delete
from files
where id not in (
select max(id)
from files
group by substring(html, PATINDEX('%filename%.mp3%', html), PATINDEX('%.mp3%', html)-PATINDEX('%filename%.mp3%', html)+4)
)
http://www.sqlfiddle.com/#!3/887a3/5

Should SQL format the output or just retrieve the raw data?

Generally speaking, the SQL queries that I write return unformatted data and I leave it to the presentation layer, a web page or a windows app, to format the data as required. Other people that I work with, including my boss, will insist that it is more efficient to have the database do it. I'm not sure that I buy that and believe that even if there was a measurable performance gain by having the database do it, that there are more compelling reasons to generally avoid this.
For example, I will place my queries in a Data Access layer with the intent of potentially reusing the queries whenever possible. Given this, I ascertain that the queries are more likely to be able to be reused if the data remains in their native type rather than converting the data to a string and applying formatting functions on them, for example, formatting a date column to a DD-MMM-YYYY format for display. Sure, if the SQL was returning the dates as formatted strings, you could reverse the process to revert the value back to a date data type, but this seems awkward, for lack of a better word. Furtehrmore, when it comes to formatting other data, for example, a machine serial number made up of a prefix, base and suffix with separating dashes and leading zeros removed in each sub field, you risk the possibility that you may not be able to correctly revert back to the original serial number when going in the other direction. Maybe this is a bad example, but I hope you see the direction I am going with this...
To take things a step further, I see people write VERY complex SQLs because they are essentially writing what I would call presentation logic into a SQL instead of returning simple data and then applying this presentation logic in the presentation layer. In my mind, this results in very complex, difficult to maintain and more brittle SQL that is less adaptable to change.
Take the following real-life example of what I found in our system and tell me what you think. The rational I was given for this approach was that this made the web app very simple to render the page as it used the following 1-line snippet of classic ADO logic in a Classic ASP web app to process the rows returned:
oRS.GetString ( , , "</td>" & vbCrLf & "<td style=""font-size:x-small"" nowrap>" ,"</td>" & vbCrLf & "</tr>" & vbCrLf & "<tr>" & vbCrLf & _
"<td style=""font-size:x-small"" nowrap>" ," " ) & "</td>" & vbCrLf & "</tr>" & vbCrLf & _
Here's the SQL itself. While I appreciate the author's ability to write a complex SQL, I feel like this is a maintenance nightmare. Am I nuts? The SQL is returning a list of programs that are current running against our database and the status of each:
Because the SQL did not display with CR/LFs when I pasted here, I decided to put the SQL on an otherwise empty personal Google site. Please feel free to comment. Thanks.
By the way-This SQL was actually constructed using VB Script nested WITHIN a classic ASP page, not calling a stored procedure, so you have the additional complexity of embedded concatentations and quoted markup, if you know what I mean, not to mention lack of formatting. The first thing I did when I was asked to help to debug the SQL was to add a debug.print of the SQL output and throw it through a SQL formatter that I just found. Some of the formatting was lost in pasting at the following link:
Edit(Andomar): copied inline: (external link removed, thanks-Chad)
SELECT
Substring(Datename("dw",start_datetime),1,3)
+ ', '
+ Cast(start_datetime AS VARCHAR) "Start Time (UTC/GMT)"
,program_name "Program Name"
,run_sequence "Run Sequence"
,CASE
WHEN batchno = 0
THEN Char(160)
WHEN batchno = NULL
THEN Char(160)
ELSE Cast(batchno AS VARCHAR)
END "Batch #" /* ,Replace(Replace(detail_log ,'K:\' ,'file://servernamehere/DiskVolK/') ,'\' ,'/') "log"*/ /* */
,Cast('<a href="GOIS_ViewLog.asp?Program_Name=' AS VARCHAR(99))
+ Cast(program_name AS VARCHAR)
+ Cast('&Run_Sequence=' AS VARCHAR)
+ Cast(run_sequence AS VARCHAR)
+ Cast('&Page=1' AS VARCHAR)
+ ''
+ Cast('">'
+ CASE
WHEN end_datetime >= start_datetime
THEN CASE
WHEN end_datetime <> 'Jan 1 1900 2:00 PM'
THEN CASE
WHEN (success_code = 10
OR success_code = 0)
AND exit_code = 10
THEN CASE
WHEN errorcount = 0
THEN 'Completed Successfully'
ELSE 'Completed with Errors'
END
WHEN success_code = 100
AND exit_code = 10
THEN 'Completed with Errors'
ELSE CASE
WHEN program_name <> 'FileDepCheck'
THEN 'Failed'
ELSE 'File not found'
END
END
ELSE CASE
WHEN success_code = 10
AND exit_code = 0
THEN 'Failed; Entries for Input File Missing'
ELSE 'Aborted'
END
END
ELSE CASE
WHEN ((Cast(Datediff(mi,start_datetime,Getdate()) AS INT) <= 240)
OR ((SELECT
Count(* )
FROM
MASTER.dbo.sysprocesses a(nolock)
INNER JOIN gcsdwdb.dbo.update_log b(nolock)
ON a.program_name = b.program_name
WHERE a.program_name = update_log.program_name
AND (Abs(Datediff(n,b.start_datetime,a.login_time))) < 1) > 0))
THEN 'Processing...'
ELSE 'Aborted without end date'
END
END
+ '</a>' AS VARCHAR) "Status / Log"
,Cast('<a href="' AS VARCHAR)
+ Replace(Replace(detail_log,'K:\','file://servernamehere/DiskVolK/'),
'\','/')
+ Cast('" title="Click to view Detail log text file"' AS VARCHAR(99))
+ Cast('style="font-family:comic sans ms; font-size:12; color:blue"><img src="images\DetailLog.bmp" border="0"></a>' AS VARCHAR(999))
+ Char(160)
+ Cast('<a href="' AS VARCHAR)
+ Replace(Replace(summary_log,'K:\','file://servernamehere/DiskVolK/'),
'\','/')
+ Cast('" title="Click to view Summary log text file"' AS VARCHAR(99))
+ Cast('style="font-family:comic sans ms; font-size:12; color:blue"><img src="images\SummaryLog.bmp" border="0"></a>' AS VARCHAR(999)) "Text Logs"
,errorcount "Error Count"
,warningcount "Warning Count"
,(totmsgcount
- errorcount
- warningcount) "Information Message Count"
,CASE
WHEN end_datetime > start_datetime
THEN CASE
WHEN Cast(Datepart("hh",(end_datetime
- start_datetime)) AS INT) > 0
THEN Cast(Datepart("hh",(end_datetime
- start_datetime)) AS VARCHAR)
+ ' hr '
ELSE ' '
END
+ CASE
WHEN Cast(Datepart("mi",(end_datetime
- start_datetime)) AS INT) > 0
THEN Cast(Datepart("mi",(end_datetime
- start_datetime)) AS VARCHAR)
+ ' min '
ELSE ' '
END
+ CASE
WHEN Cast(Datepart("ss",(end_datetime
- start_datetime)) AS INT) > 0
THEN Cast(Datepart("ss",(end_datetime
- start_datetime)) AS VARCHAR)
+ ' sec '
ELSE ' '
END
ELSE CASE
WHEN end_datetime = start_datetime
THEN '< 1 sec'
ELSE CASE
WHEN ((Cast(Datediff(mi,start_datetime,Getdate()) AS INT) <= 240)
OR ((SELECT
Count(* )
FROM
MASTER.dbo.sysprocesses a(nolock)
INNER JOIN gcsdwdb.dbo.update_log b(nolock)
ON a.program_name = b.program_name
WHERE a.program_name = update_log.program_name
AND (Abs(Datediff(n,b.start_datetime,a.login_time))) < 1) > 0))
THEN 'Running '
+ Cast(Datediff(mi,start_datetime,Getdate()) AS VARCHAR)
+ ' min'
ELSE ' '
END
END
END "Elapsed Time" /* ,end_datetime "End Time (UTC/GMT)" ,datepart("hh" ,
(end_datetime - start_datetime)) "Hr" ,datepart("mi" ,(end_datetime - start_datetime)) "Mins" ,datepart("ss" ,(end_datetime - start_datetime)) "Sec" ,datepart("ms" ,(end_datetime - start_datetime)) "mSecs" ,datepart("dw" ,start_datetime) "dp" ,case when datepart("dw" ,start_datetime) = 6 then ' Fri' when datepart("dw" ,start_datetime) = 5 then ' Thu' else '1' end */
,totalrows "Total Rows"
,inserted "Rows Inserted"
,updated "Rows Updated" /* ,success_code "succ" ,exit_code "exit" */
FROM
update_log
WHERE start_datetime >= '5/29/2009 16:15'
ORDER BY start_datetime DESC
The answer is obviously "just retrieve output". Formatting on the SQL server has the following problems:
it increases the network traffic from the SQL server
SQL has very poor string handling functionality
SQL servers are not optimised to perform string manipulation
you are using server CPU cycles which could better be used for query processing
it may make life difficult (or impossible) for the query optimiser
you have to write many more queries to support different formatting
you may have to write different queries to support formatting on different browsers
you can't re-use queries for different purposes
I'm sure there are many more.
SQL should not be formatting, period. It's a relational algebra for extracting (when using SELECT) data from the database.
Getting the DBMS to format the data for you is the wrong thing to do, and that should be left to your own code (outside the DBMS). The DBMS is generally under enough load as it is without having to do your presentation work for you. It's also optimized for data retrieval, not presentation.
I know DBAs that would call for my immediate execution if I tried to do something like that :-)
The concept of formatting output in SQL does sort of break the whole concept of seperation of presentation and data, not only that, but there are a number of conditions that might arise:
What if you need to localise your date formats? UK uses a different date format to the US,
for example - are you going into internationalize all the way back up to your data layer?
What if the rules of formatting change? I.e. Some text needs to be formatted in a different way to comply with some new corporate policy? Again, you would need to go all the way back to the data layer.
If we take a web context, how do you decide on escaping values? Different forms of escaping might be desired if you are outputting to a web page, or to JSON, or elsewhere...
Not only that, but SQL string manipulation functions are not typically very zippy.
I'm the developer responsible for the reporting engine of my company's product. In simple terms the engine works by building an XML document of the data to go into a report from the database, and then transforming the XML any which way to build a web-page, or a PDF or a Word document based on user requirement.
When I started five years ago I had the database formatting the output, although I'm pleased to say nothing I wrote was as horrific as the questions example. Over time I've moved the other way and now the XML holds only the raw data, and this is tidied up during the presentation.
Our software uses Traffic Lights as a quick at-a-glance status indicator, so we have a lot of char fields in the database storing 'R', 'A', 'G', 'U' to represent red, amber, green and unknown. I had several tricks such as SELECTS with embedded CASE statements to tranform single character codes into their English counterparts:
SELECT CASE status WHEN 'R' THEN 'Red' WHEN 'G' THEN 'Green' ...etc...
Sorting can't be done on the native codes; Users expect things to be in two orders: Red, Amber, Green or Green, Amber, Red; so I had corresponding SORT columns as well
SELECT
CASE status WHEN 'R' THEN 'Red' WHEN 'G' THEN 'Green' WHEN 'A' THEN 'Amber' END as status,
CASE status WHEN 'R' THEN 0 WHEN 'A' THEN 1 WHEN 'G' THEN 2 END as sort
FROM
table
ORDER BY
sort
That's just a brief example. I had other tricks for doing date formatting, assembly of names, etc.
This of course led to problems making the application multi-language since English is boiled into the database. I'd need to lookup a customer locale and write lots of multi-language CASES to support other languages. Not good. Also dates were a problem. Americans like their dates mm/dd and Europeans do dd/mm.
It also led to other duplication problems. If someone added a fourth or fifth traffic light option I have to modify all my SQL when the new status is already represented in code as a Java enum or something, that I could lookup once I'd read the single character from the database.
It became far, far easier in my case to just have the database return the raw data and for me to write a suite of Comparators and formatters to present the data in a document in the user's native language and encoding. If I was starting over again today that'd be what I'd do.
I think there's a place for some kinds of transforms on the way out of SQL, and it depends on the calling program's expectations.
For instance, if a datetime is appropriate, it should be returned natively. On the other hand, if you are only returning a year in a datetime field (or a quarter, like 1/1, 4/1, 7/1, 10/1), and the client is expected to parse out the information, put it in a separate column (like year = 2008 or quarter = '2008Q1'). Some code translations from code to description (dropping the code column and only emitting the description). There are reasonable cases where concatenation and string building are appropriate.
Your particular example is a place where it's inappropriate and while on the surface it looks like looser coupling (only change the SP in the database) it can actually create stronger coupling by forcing additional SPs to be written for different usages instead of multiple UIs being able to use the same SP. And then multiple SPs might need to be changed in sync as the system evolves.
When considering whether to format your data on behalf of your presentation layer, consider that your "presentation layer" may be a web service or other program. You may start by doing the formatting on behalf of a piece of UI code, only to later need the same query to be used by a web service, which will have different requirements.
A favorite of mine was a set of stored procedures which all formatted date/times. In the local timezone. It didn't work quite so well when called by a web service from a different timezone. It worked even less well when the regional settings of the database server changed, changing the date/time format. Oh, and it didn't work at midnight, since it truncated the "00:00" at the end.
OTOH, it was very convenient for the UI.
Most people I know disagree with me here, but I kinda like this approach. So I'll list some advantages:
SQL is very powerful: how many lines of C# would this query take?
SQL is very easy to update. I imagine this code is in a stored procedure, which you can change with a simple ALTER PROC. This can greatly reduce the time to roll in fixes.
SQL is fast; I've seen cases where introducing an ORM layer slowed down the application to a crawl.
SQL is easy to debug, and errors are easy to reproduce. Just run the query. Testing your fix is a question of running the new query.
SQL like this is not that hard to maintain, when it's properly formatted. There is not much SQL I can't understand in 5-10 minutes; but a multi-layered C# solutions can take a very long time, especially if you have to figure out which layer's abstraction is breaking.
I'm sure other people will list the disadvantages of the SQL approach.