Using Microsoft Query and ODBC to SQL Server, complicated query - sql

I have a view in SQL Server that is somewhat similar to the following example.
SELECT *
FROM PEOPLE
LEFT OUTER JOIN (SELECT ID
FROM OTHER_TABLE
WHERE SOME_FIELD = 'x'
OR SOME_FIELD = 'y'
OR SOME_FIELD = 'z') AS PEOPLE_TO_EXCLUDE ON PEOPLE.ID = PEOPLE_TO_EXCLUDE.ID
WHERE PEOPLE_TO_EXCLUDE.ID IS null
The hassle:
I am perfectly capable of adding and modifying "OR SOME_FIELD = 'w'" countless numbers of times. However, I am making this view for a user to pull up in excel via ODBC. The user needs to be able to modify the inner select to her liking, to match whatever she happens to be limiting on at that time of the day/week/month/year/etc. I need to make this in a way that allows her to easily limit on SOME_FIELD.
Does anyone have suggestions on how to accomplish this? Ideally I could give her a view, which she could put a comma separated list of values that SOME_FIELD cannot be. Since people may have multiple rows in OTHER_TABLE I can't just have her limit off of that table specifically. For example someone may have SOME_FIELD = 'x' but also have a row in the table where SOME_FIELD = 's'. This person should be excluded because they have 'x' even though they also have 's'. So that is why the inner select is necessary.
Thanks for your help.

Don't create queries for EXCEL users, they always break them and then you have to debug them. Instead, create a stored procedure, pass in a CSV. In the stored procedure split the CSV using a split function and join to it. The user will only have an EXCEL query like:
EXEC YourProcedure 'x,y,z'
As a result, they will not break the query.
To help with the split function, see: "Arrays and Lists in SQL Server 2008 Using Table-Valued Parameters" by Erland Sommarskog , then there are many ways to split string in SQL Server. This article covers the PROs and CONs of just about every method:
You need to create a split function. This is how a split function can be used:
SELECT
*
FROM YourTable y
INNER JOIN dbo.yourSplitFunction(#Parameter) s ON y.ID=s.Value
I prefer the number table approach to split a string in TSQL but there are numerous ways to split strings in SQL Server, see the previous link, which explains the PROs and CONs of each.
For the Numbers Table method to work, you need to do this one time table setup, which will create a table Numbers that contains rows from 1 to 10,000:
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Number)
Once the Numbers table is set up, create this split function:
CREATE FUNCTION [dbo].[FN_ListToTable]
(
#SplitOn char(1) --REQUIRED, the character to split the #List string on
,#List varchar(8000)--REQUIRED, the list to split apart
)
RETURNS TABLE
AS
RETURN
(
SELECT
ListValue
FROM (SELECT
LTRIM(RTRIM(SUBSTRING(List2, number+1, CHARINDEX(#SplitOn, List2, number+1)-number - 1))) AS ListValue
FROM (
SELECT #SplitOn + #List + #SplitOn AS List2
) AS dt
INNER JOIN Numbers n ON n.Number < LEN(dt.List2)
WHERE SUBSTRING(List2, number, 1) = #SplitOn
) dt2
WHERE ListValue IS NOT NULL AND ListValue!=''
);
GO
You can now easily split a CSV string into a table and join on it:
Create Procedure YourProcedure
#Filter VARCHAR(1000)
AS
SELECT
p.*
FROM PEOPLE p
LEFT OUTER JOIN (SELECT
o.ID
FROM OTHER_TABLE o
INNER JOIN (SELECT
ListValue
FROM dbo.FN_ListToTable(',',#Filter )
) f ON o.SOME_FIELD=f.ListValue
) x ON p.ID=x.ID
WHERE x.ID IS null
GO

Related

How to use Join with like operator and then casting columns

I have 2 tables with these columns:
CREATE TABLE #temp
(
Phone_number varchar(100) -- example data: "2022033456"
)
CREATE TABLE orders
(
Addons ntext -- example data: "Enter phone:2022033456<br>Thephoneisvalid"
)
I have to join these two tables using 'LIKE' as the phone numbers are not in same format. Little background I am joining the #temp table on the phone number with orders table on its Addons value. Then again in WHERE condition I am trying to match them and get some results. Here is my code. But my results that I am getting are not accurate. As its not returning any data. I don't know what I am doing wrong. I am using SQL Server.
select
*
from
order_no as n
join
orders as o on n.order_no = o.order_no
join
#temp as t on t.phone_number like '%'+ cast(o.Addons as varchar(max))+'%'
where
t.phone_number = '%' + cast(o.Addons as varchar(max)) + '%'
You can not use LIKE statement in the JOIN condition. Please provide more information on your tables. You have to convert the format of one of the phone field to compile with other phone field format in order to join.
I think your join condition is in the wrong order. Because your question explicitly mentions two tables, let's stick with those:
select *
from orders o JOIN
#temp t
on cast(o.Addons as varchar(max)) like '%' + t.phone_number + '%';
It has been so long since I dealt with the text data type (in SQL Server), that I don't remember if the cast() is necessary or not.
Instead of trying to do everything in a single top-level query, you should apply a transformation projection to your orders table and use that as a subquery, which will make the query easier to understand.
Using the CHARINDEX function will make this a lot easier, however it does not support ntext, you will need to change your schema to use nvarchar(max) instead - which you should be doing anyway as ntext is deprecated, fortunately you can use CONVERT( nvarchar(max), someNTextValue ), though this will reduce performance as you won't be able to use any indexes on your ntext values - but this query will run slowly anyway.
SELECT
orders2.*,
CASE WHEN orders2.PhoneStart > 0 AND orders2.PhoneEnd > 0 THEN
SUBSTRING( orders2.Addons, orders2.PhoneStart, orders2.PhoneEnd - orders2.PhoneStart )
ELSE
NULL
END AS ExtractedPhoneNumber
FROM
(
SELECT
orders.*, -- never use `*` in production, so replace this with the actual columns in your orders table
CHARINDEX('Enter phone:', Addons) AS PhoneStart,
CHARINDEX('<br>Thephoneisvalid', AddOns, CHARINDEX('Enter phone:', Addons) ) AS PhoneEnd
FROM
orders
) AS orders2
I suggest converting the above into a VIEW or CTE so you can directly query it in your JOIN expression:
CREATE VIEW ordersWithPhoneNumbers AS
-- copy and paste the above query here, then execute the batch to create the view, you only need to do this once.
Then you can use it like so:
SELECT
* -- again, avoid the use of the star selector in production use
FROM
ordersWithPhoneNumbers AS o2 -- this is the above query as a VIEW
INNER JOIN order_no ON o2.order_no = order_no.order_no
INNER JOIN #temp AS t ON o2.ExtractedPhoneNumber = t.phone_number
Actually, I take back my previous remark about performance - if you add an index to the ExtractedPhoneNumber column of the ordersWithPhoneNumbers view then you'll get good performance.

Check if a list of items already exists in a SQL database

I want to create a group of users only if the same group does not exist already in the database.
I have a GroupUser table with three columns: a primary key, a GroupId, and a UserId. A group of users is described as several lines in this table sharing a same GroupId.
Given a list of UserId, I would like to find a matching GroupId, if it exists.
What is the most efficient way to do that in SQL?
Let say your UserId list is stored in a table called 'MyUserIDList', the following query will efficiently return the list of GroupId containing exactly your user list. (SQL Server Syntax)
Select GroupId
From (
Select GroupId
, count(*) as GroupMemberCount
, Sum(case when MyUserIDList.UserID is null then 0 else 1 End) as GroupMemberCountInMyList
from GroupUser
left outer join MyUserIDList on GroupUser.UserID=MyUserIDList.UserID
group by GroupId
) As MySubQuery
Where GroupMemberCount=GroupMemberCountInMyList
There are couple of ways of doing this. This answer is for sql server only (as you have not mentioned it in your tags)
Pass the list of userids in comma seperated to a stored procedure and in the SP create a dynamic query with this and use the EXEC command to execute the query. This link will guide you in this regard
Use a table-valued parameter in a SP. This is applicable to sql server 2008 and higher only.
The following link will help you get started.
http://www.codeproject.com/Articles/113458/TSQL-Passing-array-list-set-to-stored-procedure-MS
Hope this helps.
One other solution is that you convert the input list into a table. This can be done with various approaches. Unions, temporary tables and others. A neat solution combines the answer of
user1461607 for another question here on SO, using a comma-separated string.
WITH split(word, csv) AS (
-- 'initial query' (see SQLite docs linked above)
SELECT
'', -- place holder for each word we are looking for
'Auto,A,1234444,' -- items you are looking for
-- make sure the list ends with a comma !!
UNION ALL SELECT
substr(csv, 0, instr(csv, ',')), -- each word contains text up to next ','
substr(csv, instr(csv, ',') + 1) -- next recursion parses csv after this ','
FROM split -- recurse
WHERE csv != '' -- break recursion once no more csv words exist
) SELECT word, exisiting_data
FROM split s
-- now join the key you want to check for existence!
-- for demonstration purpose, I use an outer join
LEFT OUTER JOIN (select 'A' as exisiting_data) as t on t.exisiting_data = s.word
WHERE s.word != '' -- make sure we clamp the empty strings from the split function
;
Results in:
Auto,null
A,A
1234444,null

Solution to avoid non-sargable argument in where clause

In the code_list CTE in this query I have a row constructor that will eventually take any number of arguments. The column icd in the patient_codes CTE is a five digit identifier that is most descriptive that the three digit codes that the row constructor has. The table icd_patient has a 100 million rows so for performance's sake, I would like to filer the rows on this table before I do any further work. I have
;with code_list(code_list)
as
(
select x.code_list
from (values ('70700'),('25002')) as x(code_list)
),patient_codes
as
(
select distinct icd,pat_id,id
from icd_patient
where icd in (select icd from code_list)
)
select distinct pat_id from patient_codes
The problem is, however, is that in the icd_patient table all of the icd columns are five digit and more descriptive. If I look at the execution plan of this query it's pretty streamlined. If I do
;with code_list(code_list)
as
(
select x.code_list
from (values ('70700'),('25002')) as x(code_list)
),patient_codes
as
(
select substring(icd,1,3) as icd,pat_id
from icd_patient2
where substring(icd,1,3) in (select * from code_list)
)
select * from patient_codes
this if course has a large performance impact because of the substring expression in the where clause. Does something akin to like in exist so I can take advantage of my indexes?
Index on icd_patient
CREATE NONCLUSTERED INDEX [ix_icd_patient] ON [dbo].[icd_patient2]
(
[pat_id] ASC
)
INCLUDE ( [id],
This much simpler query should be better than (or, at worst, the same as) your existing query.
select pat_id
FROM dbo.icd_patient
where icd LIKE '707%'
OR icd LIKE '250%'
GROUP BY pat_id;
Note that sargability only matters if there is actually an index on this column.
An alternative (since OR can sometimes give the optimizer fits):
SELECT pat_id FROM
(
SELECT pat_id
FROM dbo.icd_patient
WHERE icd LIKE '707%'
UNION ALL
SELECT pat_id
FROM dbo.icd_patient
WHERE icd LIKE '250%'
) AS x
GROUP BY pat_id;
To make this extensible beyond a handful of OR conditions, I would use a table-valued parameter (TVP).
CREATE TYPE dbo.StringPatterns AS TABLE(s VARCHAR(3) PRIMARY KEY);
Then your stored procedure could say:
CREATE PROCEDURE dbo.whatever
#sp dbo.StringPatterns READONLY
AS
BEGIN
SET NOCOUNT ON;
SELECT p.pat_id
FROM dbo.icd_patient AS p
INNER JOIN #sp AS sp
ON p.pat_id LIKE sp.s + '%'
GROUP BY p.pat_id;
END
Then you can pass in your set of three-character substrings from a DataTable or other collection in C#. From T-SQL just as an example:
DECLARE #p dbo.StringPatterns;
INSERT #p VALUES('707'),('250');
EXEC dbo.whatever #sp = #p;
Something like like in does not exist. The following is sargable:
select *
from icd_patient
where icd like '70700%' or
icd like '25002%'
Because like with a constant initial substring is a special case for SQL Server. This does not work when the strings on the right are variables.
One solution is to create an indexed view on the icd_patient table with an index on the first five characters of the icd code.
Using "IN" makes that part of a command non-sargable on both sides. End of discussion.
Saying he fixes it using substring, completely changes what it would return while it remains non sarged.
Any "fix" should exactly match results. The actual fix is to join the cte so the five characters match or put three characters in the cte and match that in a join or put 4 characters in the cte where the fourth is "%" and join matching by using LIKE
Using a "like" that starts with "%" increases the complexity of the search, but it would still use the index to find the value because parsing the index should use less reading by only getting the full table row when a search is successful.

SP to find keywords like a list or strings

In my mssql database I have a table containing articles(id, name, content) a table containing keywords(id, name) and a link table between articles and keywords ArticleKeywords(articleId, keywordID, count). Count is the number of occurrences of that keyword in the article.
How can I write a SP that gets a list of comma separated strings and gives me the articles that have this keywords ordered by the number of occurrences of the keywords in the article?
If an article contains more keywords I want to sum the occurrences of each keyword.
Thanks, Radu
Although it isn't completely clear to me what the source of your comma-separated string is, I think what you want is an SP that takes a string as input and produces the desired result:
CREATE PROC KeywordArticleSearch(#KeywordString NVARCHAR(MAX)) AS BEGIN...
The first step is to verticalize the comma-separated string into a table with the values in rows. This is a problem that has been extensively treated in this question and another question, so just look there and choose one of the options. Whichever way you choose, store the results in a table variable or temp table.
DECLARE #KeywordTable TABLE (Keyword NVARCHAR(128))
-- or alternatively...
CREATE TABLE #KeywordTable (Keyword NVARCHAR(128))
For lookup speed, it is even better to store the KeywordID instead so your query only has to find matching ID's:
DECLARE #KeywordIDTable TABLE (KeywordID INT)
INSERT INTO #KeywordTable
SELECT K.KeywordID FROM SplitFunctionResult S
-- INNER JOIN: keywords that are nonexistent are omitted
INNER JOIN Keywords K ON S.Keyword = K.Keyword
Next, you can go about writing your query. This would be something like:
SELECT articleId, SUM(count)
FROM ArticleKeywords AK
WHERE K.KeywordID IN (SELECT KeywordID FROM #KeywordIDTable)
GROUP BY articleID
Or instead of the WHERE you could use an INNER JOIN. I don't think the query plan would be much different.
For the sake or argument lets say you want to look-up all articles containg the keywords Foo, Bar and Shazam.
ALTER PROCEDURE spArticlesFromKeywordList
#KeyWords varchar(1000) = 'Foo,Bar,Shazam'
AS
SET NOCOUNT ON
DECLARE #KeyWordInClause varchar(1000)
SET #KeyWordInClause = REPLACE (#KeyWords ,',',''',''')
EXEC(
'
SELECT
t1.Name as ArticleName,
t2.Name as KeyWordName,
t3.Count as [COUNT]
FROM ArticleKeywords t3
INNER JOIN Articles t1 on t3.ArticleId = t1.Id
INNER JOIN Keywords t2 on t3.KeywordId = t2.Id
WHERE t2.KeyWord in ( ''' + #KeyWordInClause + ''')
ORDER BY
3 descending, 1
'
)
SET NOCOUNT OFF
I think I understand what you are after so here goes ,(not sure what lang you are using but) in PHP (from your description) I would query ArticleKeywords using a ORDER BY count DESC statement (i.e. the highest comes first) - Obviously you can "select by keywordID or articleid. In very simple terms (cos that's me - simple & there may be much better people than me) you can return the array but create a string from it a bit like this:
$arraytostring .= $row->keywordID.',';
If you left join the tables you could create something like this:
$arraytostring .= $row->keywordID.'-'.$row->name.' '.$row->content.',';
Or you could catch the array as
$array[] = $row->keywordID;
and create your string outside the loop.
Note: you have 2 fields called "name" one in articles and one in keywords it would be easier to rename one of them to avoid any conflicts (that is assuming they are not the same content) i.e. articles name = title and keywords name= keyword

Classic ASP - Split and Contains in a simple SQL query

I've the following sql query:
SQL = "SELECT * FROM Product WHERE ProductCategoryId = " & Request.QueryString("CategoryId")
This query tells to get all the products from ONE category. I want the ability so products can be from some categories, and not from one category only.
So, I changed the [Product].ProductCategoryId field to varchar(50), and wrriten some categories id separated in commas, for example:
1,5,7,4
Now, how can I get this working in the SQL query? In ASP there are the Split and Contains (http://www.devx.com/vb2themax/Tip/18364) functions, but how can I do this in SQL?
Thanks, and sorry for English...
Obligatory note: NEVER concatenate user-supplied data into a SQL query; you are opening yourself up to SQL injection attacks.
My previous answer (using "IN") was based on a mistaken idea that you were collecting multiple IDs from the user, and searching against a column with one value. But you're doing the opposite -- the column has multiple values, and you're trying to match against one value coming from the user.
This is a bad way to do it. Don't use comma-separated values; add a new table with a one-to-many relationship. Then the query will look like
SQL = "SELECT Product.* FROM Product, ProductCategory WHERE ProductCategory.ProductId=Product.ID AND ProductCategory.CategoryId = " & Request.QueryString("CategoryId")
If you really really want to do the comma-separated thing, you could do a search using LIKE and % wildcards, but it will be fragile (e.g. you'll have to make sure that an ID of "2" doesn't match "12").
Not for nothing, but WHERE ProductCategoryId = " & Request.QueryString("CategoryId") is a GOD AWFUL way of doing your query.
You've just opened yourself up to crazy SQL injection attacks.
http://www.sommarskog.se/arrays-in-sql.html
Looking past all injection issues, this is the best source for TSQL list manipulation. Without reading the entire article, here is what is needed to create a very fast loop free TSQL method to split strings.
You will end up with a TSQL split function, and can use it like:
SELECT
y.*
FROM YourTable y
INNER JOIN dbo.FN_ListToTable(',','1,2,3,444,5,,,6') s ON y.ID=s.ListValue
from the previous article, I prefer the number table approach and this is what you need to do to implement it.
For this method to work, you need to do this one time table setup:
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Number)
Once the Numbers table is set up, create this function:
CREATE FUNCTION [dbo].[FN_ListToTable]
(
#SplitOn char(1) --REQUIRED, the character to split the #List string on
,#List varchar(8000)--REQUIRED, the list to split apart
)
RETURNS TABLE
AS
RETURN
( ----------------
--SINGLE QUERY-- --this will not return empty rows
----------------
SELECT
ListValue
FROM (SELECT
LTRIM(RTRIM(SUBSTRING(List2, number+1, CHARINDEX(#SplitOn, List2, number+1)-number - 1))) AS ListValue
FROM (
SELECT #SplitOn + #List + #SplitOn AS List2
) AS dt
INNER JOIN Numbers n ON n.Number < LEN(dt.List2)
WHERE SUBSTRING(List2, number, 1) = #SplitOn
) dt2
WHERE ListValue IS NOT NULL AND ListValue!=''
);
GO
You can now easily split a CSV string into a table and join on it:
select * from dbo.FN_ListToTable(',','1,2,3,,,4,5,6777,,,')
OUTPUT:
ListValue
-----------------------
1
2
3
4
5
6777
(6 row(s) affected)
Your can pass in a CSV string into a procedure and process only rows for the given IDs, or just use it in the query like:
SELECT
y.*
FROM YourTable y
INNER JOIN dbo.FN_ListToTable(',',#GivenCSV) s ON y.ID=s.ListValue