Query dynamic data from SQL table - sql

I'm using SQL Server 2012.
I want to query data from a specific SQL column that meets certain criteria. This column contains free form text entered by a user. The user can enter whatever he/she wants, but always includes a URL which may be entered anywhere within the free form text.
Each URL is similar and contains consistent elements, such as the domain, but also references a unique "article ID" number within the URL. Think of these numbers as referencing knowledge base articles.
The article ID is a different number depending on the article used and new articles are regularly created.
I need a query identifying all of these article ID numbers within the URLs. The only means I've developed so far is to use SUBSTRING to count characters until reaching the article ID number. This is unreliable since users don't always include the URL at the beginning. It would be better if I could tell SUBSTRING to count from the beginning of the URL regardless of where it resides within the text.
For example, it begins counting whenever it finds 'HTTP://' or a common keyword each URL contains. Another option would be if I could extract the URL into it's own table. I've yet to figure out how to execute either of these ideas inside SQL.
The following is what I have so far.
select
scl.number,
ol.accountnum,
scl.opendate as CallOpenDate,
sce.opendate as NoteEntryDate,
sce.notes,
substring(sce.notes, 102, 4) as ArticleID,
sclcc.pmsoft,
ol.territorydesc
from (select * from supportcallevent as sce
where sce.opendate > '2014-04-01 00:00:00.000') as sce
inner join supportcalllist as scl on scl.SupportCallID=sce.supportcallid
inner join organizationlist as ol on ol.partyid=scl.partyid
inner join supportcalllist_custcare as sclcc on sclcc.supportcallid=scl.supportcallid
where sce.notes like '%http://askus.how%'
order by ol.territorydesc, scl.number;

You can use the CHARINDEX function to find the URL in the string and start the substring from there.
This example will get the next 4 digits after the url:
DECLARE #str VARCHAR(100)
DECLARE #find VARCHAR(100)
SET #str = 'waawhbu aoffawh http://askus.how/1111 auwhauowd'
SET #find = 'http://askus.how/'
SELECT SUBSTRING(#str,CHARINDEX(#find, #str)+17,4)
SQLFiddle

Related

Query to ignore rows which have non hex values within field

Initial situation
I have a relatively large table (ca. 0.7 Mio records) where an nvarchar field "MediaID" contains largely media IDs in proper hexadecimal notation (as they should).
Within my "sequential" query (each query depends on the output of the query before, this is all in pure T-SQL) I have to convert these hexadecimal values into decimal bigint values in order to do further calculations and filtering on these calculated values for the subsequent queries.
--> So far, no problem. The "sequential" query works fine.
Problem
Unfortunately, some of these Media IDs do contain non-hex characters - most probably because there was some typing errors by the people which have added them or through import errors from the previous business system.
Because of these non-hex chars, the whole query fails (of course) because the conversion hits an error.
For my current purpose, such rows must be skipped/ignored as they are clearly wrong and cannot be used (there are no medias / data carriers in use with the current business system which can have non-hex character IDs).
Manual editing of the data is not an option as there are too many errors and it is not clear with what the data must be replaced.
Challenge
To create a query which only returns records which have valid hex values within the media ID field.
(Unfortunately, my SQL skills are not enough to create the above query. Your help is highly appreciated.)
The relevant section of the larger query looks like this (xxxx is where your help comes in :-))
select
pureMediaID
, mediaID
, CUSTOMERID
,CONTRACT_CUSTOMERID
from
(
select concat('0x', Replace(Ltrim(Replace(mediaID, '0', ' ')), ' ', '0')) AS pureMediaID
--, CUSTOMERID
, *
from M_T_CONTRACT_CUSTOMERS
where mediaID is not null
and mediaID like '0%'
and xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
) as inner1
EDIT: As per request I have added here some good and some bad data:
Good:
4335463357
4335459809
1426427996
4335463509
4335515039
4335465134
4427370396
4335415661
4427369036
4335419089
004BB03433
004e7cf9c6
00BD23133
00EE13D8C1
00CCB5522C
00C46522C
00dbbe3433
Bad:
4564589+
AB6B8BFC.8
7B498DFCnm
DB218DFChb
d<tgfh8CFC
CB9E8AFCzj
B458DFCjhl
rytzju8DFC
BFCtdsjshj
DB9888FCgf
9BC08CFCyx
EB198DFCzj
4B628CFChj
7B2B8DFCgg
After I did upgrade the compatibility level of the SQL instance to SQL2016 (it was below 2012 before) I could use try_convert with same syntax as the original convert function as donPablo has pointed out. With that the query could run fully through and every MediaID which is not a correct hex value gets nicely converted into a null value - really, really nice.
Exactly what I needed.
Unfortunately, the solution of ALICE... didn't work out for me as this was also (strangely) returning records which had the "+" character within them.
Edit: The added comment of Alice... where you create a calculated field like this:
CASE WHEN "KEY" LIKE '%[^0-9A-F]%' THEN 0 ELSE 1 end as xyz
and then filter in the next query like this:
where xyz = 1
works also with SQL Instances with compatibility level < SQL 2012.
Great addition for people which still have to work with older SQL instances.
An option (although not ideal in terms of performance) is to check the characters in the MediaID through a case statement and regular expression
Hexadecimals cannot contain characters other than A-F and numbers between 0 and 9
CASE WHEN MediaID LIKE '%[0-9A-F]%' THEN 1 ELSE 0 END
I would recommend writing a function that can be used to evaluate MediaID first and checks if it is hexadecimal and then running the query for conversion

Why do I get different results depending on the function I use? (SQL Server)

I've been tasked with creating a report for my company. The report is generated from the results returned by the Stored Procedure spGenerateReport, which has multiple filters.
Inside the SP, this is how the filter is expected to work:
SELECT * FROM MyTable WHERE column1 IN (
'filters', 'for', 'this', 'report'
)
Entering the code above yields ~30000 rows in 9s. However, I want to be able to change my SP's filter by passing it a single argument (since I may use 1 or 2 or n filters), like so:
spGenerateReport 'Filters,for,this,report'
For this I have the User-Created Function fnSplitString (yes, I do know that there is a STRING_SPLIT function but I can't use it due to a lower compatibility level of my database) which splits a single string into a table, like so:
SELECT splitData FROM fnSplitString('Filters,for,this,report')
Returns:
splitData
------
Filters
for
this
report
Thus the final code in my SP is:
SELECT * FROM MyTable WHERE column1 IN (
SELECT * FROM fnSplitString('Filters,for,this,report')
)
However, this instead yields ~10000 rows in 60s. The time taken to complete this SP is weird but isn't too much of a problem, however nearly a quarter of my rows disappearing into the void certainly is. The results only have rows from the first couple filters (for example, 'Filters' and 'for'; if I change the order of the arguments (e.g.: fnSplitString('report,for,Filters,this')), I get a different number of rows, and only from filters 'report', 'for', 'Filters'! I don't understand why using the function returns different results than those obtained when using the literal strings. Is there some inside gimmick that I'm not aware of?
PS - I'm sorry in advance for being bad at explaining myself, and for any grammar mistakes
You should definitely be getting the same results with both techniques. Something is wrong.
You havent posted the fnSplitString code but I suspect fnSplitString is not outputting the last string in the list, or maybe the last string in the list is being truncated before it reaches fnSplitString so that no matches are found.
e.g. if the parameter going into your spGenerateReport stored procedure is varchar(20) then what will reach the function is 'Filters,for,this,rep' with the last bit truncated.
SSRS, for example, will truncate strings that are being passed into an SP instead of warning you with an error message

How to find rows by filtering a specific text using Full text search in MS SQL 2012

I have a requirements to find rows by filtering a specific text using Full Text Search in MS SQL. The first requirement is to find rows by searching the text within the xml column, and the second requirement, is to find rows by searching the text within the json column(nvarchar data type). The following conditions should return a result.
XML Column
Criteria 1. Where Contains(XMLData,"1")
Criteria 2. Where Contains(XMLData,"/1/")
Criteria 3. Where Contains(XMLData,"<field>1</field>")
JSONDATA Column :
Criteria 1. Where Contains(JSONData,"1")
Criteria 2. Where Contains(JSONData,"/1/")
Criteria 2. Where Contains(JSONData,"PortalId:1")
My current implementation is by using the query below which has a performance issues when running thousand of records. Is there any other approach other than the code below?
XML QUERY
SELECT *
WHERE cast(XMLData as nvarchar(max)) LIKE '%/' + CONVERT(VARCHAR,'1') +'/%'
JSON QUERY
SELECT *
WHERE JSONDataLIKE '%/' + CONVERT(VARCHAR,'1') +'/%'
Here is a sample table for this question.
http://sqlfiddle.com/#!18/f65ef/1
I do not think that a full text search would help you. It seems you are looking for any fragment even such as technical terms like /1/.
Try this for XML
DECLARE #SearchFor VARCHAR(100)='1';
SELECT *
FROM SettingsData
WHERE xmldata.exist(N'//*[contains(text()[1],sql:variable("#SearchFor"))]')=1;
It will check any node's internal text() if it contains the search phrase. But any value with a 1 inside is found (e.g. any unrelated number which has a 1 somewhere.) You might search for text()="1" and perform the contains only if the string length exceeds a certain minimum.
Something like
WHERE xmldata.exist(N'//*[text()[1]=sql:variable("#SearchFor") or(string-length(text()[1])>=3 and contains(text()[1],concat("/",sql:variable("#SearchFor"),"/")))]')=1;
Json is - up to now - nothing more than a string and must be parsed. With v2016 Microsoft introduced JSON support, but you are on v2012. The problem with a LIKE search on a JSON-string might be, that you would find the 1 even as a part of an element's name. The rest is as above...
SELECT *
FROM SettingsData
WHERE jsondata LIKE '%' + #SearchFor + '%';

Extract a substring from a text field

New to TSQL and SQL generally, please pardon if this is really basic:
I am working with a new-to-me-database that has ignored some best practices. Relevant to this discussion, some data is stored in a generalized note field, including loyalty numbers. The good news is that the loyalty numbers are at least stored consistently within the note.
So, a simplified example from the note table might be:
I have verified that every Loyalty Number is stored consistently ("Loyalty Number ####"), but obviously this is not ideal. I want to extract the Loyalty Number for every primary key that has them, then create a new field that stores the Loyalty Number.
What I'm having trouble with is the following: How do I run a query that will give me each primary key then, if there is a loyalty number return it, if not leave it null or say something like no result found. E.g., turn the above into something like.
It's trivially easy to construct something like "select primary_key, note from note_table where note like '%Loyalty Number%', but that doesn't do the job of clipping down to just the loyalty number (and leaving out extraneous text). The uniformity of the data means I could probably do this in Excel, but I'm wondering if it's possible in TSQL. Thanks in advance for your help.
Give something like this a try using case with substring and charindex:
select id,
case when note like '%Loyalty Number [0-9][0-9][0-9][0-9]%'
then 'Loyalty Number ' +
substring(note,
charindex('Loyalty Number', note) + Len('Loyalty Number ') + 1, 4)
end as Note
from note
SQL Fiddle Demo
The case statement checks to see if Loyalty Number exists in the data. Substring splits the note field using charindex to find the starting position. This is hard coding a length of 4 characters for the loyalty number. Given your comments, this should work. If you have a dynamic number of characters, you'll need to modify this slightly.
Building on #segeddes answer, here's the rest of the code, that will update your new LoyaltyNumber column.
Working SQL Fiddle: http://sqlfiddle.com/#!3/36e46/8
UPDATE note_table
SET LoyaltyNumber =
CASE
WHEN note LIKE '%Loyalty Number [0-9][0-9][0-9][0-9]%'
THEN SUBSTRING(note, CHARINDEX('Loyalty Number', note)
+ LEN('Loyalty Number ') + 1, 4)
ELSE 'Regular Customer'
END
FROM note_table
Table Definition and CRUD
CREATE TABLE note_table (
id int identity(1,1),
Note VarChar(500),
LoyaltyNumber varchar(20)
)
Insert Into note_table(Note) Values
('Customer Since 2012. Loyalty Number 4747'),
('Loyalty Number 2209'),
('Loyalty Number 2234.Customer Since 2009'),
('Pending Order');

How to write SQL query with many % wildcard characters

I have a coloumn in Sql Server table as:
companystring = {"CompanyId":0,"CompanyType":1,"CompanyName":"Test
215","TradingName":"Test 215","RegistrationNumber":"Test
215","Email":"test215#tradeslot.com","Website":"Test
215","DateStarted":"2012","CompanyValidationErrors":[],"CompanyCode":null}
I want to query the column to search for
companyname like '%CompanyName":"%test 2%","%'
I want to know if I'm querying correctly, because for some search string it does not yield the proper result. Could anyone please help me with this?
Edit: I have removed the format bold
% is a special character that means a wildcard. If you want to find the actual character inside a string, you need to escape it.
DECLARE #d TABLE(id INT, s VARCHAR(32));
INSERT #d VALUES(1,'foo%bar'),(2,'fooblat');
SELECT id, s FROM #d WHERE s LIKE 'foo[%]%'; -- returns only 1
SELECT id, s FROM #d WHERE s LIKE 'foo%'; -- returns both 1 and 2
Depending on your platform, you might be able to use some combination of regular expressions and/or lambda expressions which are built into its main libraries. For example, .NET has LINQ , which is a powerful tool that abstracts querying and which provides leveraging for searches.
It looks like you have JSON data stored in a column called "companystring". If you want to search within the JSON data from SQL things get very tricky.
I would suggest you look at doing some extra processing at insert/update to expose the properties of the JSON you want to search on.
If you search in the way you describe, you would actually need to use Regular Expressions or something else to make it reliable.
In your example you say you want to search for:
companystring like '%CompanyName":"%test 2%","%'
I understand this as searching inside the JSON for the string "test 2" somewhere inside the "CompanyName" property. Unfortunately this would also return results where "test 2" was found in any other property after "CompanyName", such as the following:
-- formatted for readability
companystring = '{
"CompanyId":0,
"CompanyType":1,
"CompanyName":"Test Something 215",
"TradingName":"Test 215",
"RegistrationNumber":"Test 215",
"Email":"test215#tradeslot.com",
"Website":"Test 215",
"DateStarted":"2012",
"CompanyValidationErrors":[],
"CompanyCode":null}'
Even though "test 2" isn't in the CompanyName, it is in the text following it (TradingName), which is also followed by the string "," so it would meet your search criteria.
Another option would be to create a view that exposes the value of CompanyName using a column defined as follows:
LEFT(
SUBSTRING(companystring, CHARINDEX('"CompanyName":"', companystring) + LEN('"CompanyName":"'), LEN(companystring)),
CHARINDEX('"', SUBSTRING(companystring, CHARINDEX('"CompanyName":"', companystring) + LEN('"CompanyName":"'), LEN(companystring))) - 1
) AS CompanyName
Then you could query that view using WHERE CompanyName LIKE '%test 2%' and it would work, although performance could be an issue.
The logic of the above is to get everything after "CompanyName":":
SUBSTRING(companystring, CHARINDEX('"CompanyName":"', companystring) + LEN('"CompanyName":"'), LEN(companystring))
Up to but not including the first " in the sub-string (which is why it is used twice).