Check if a list of items already exists in a SQL database - sql

I want to create a group of users only if the same group does not exist already in the database.
I have a GroupUser table with three columns: a primary key, a GroupId, and a UserId. A group of users is described as several lines in this table sharing a same GroupId.
Given a list of UserId, I would like to find a matching GroupId, if it exists.
What is the most efficient way to do that in SQL?

Let say your UserId list is stored in a table called 'MyUserIDList', the following query will efficiently return the list of GroupId containing exactly your user list. (SQL Server Syntax)
Select GroupId
From (
Select GroupId
, count(*) as GroupMemberCount
, Sum(case when MyUserIDList.UserID is null then 0 else 1 End) as GroupMemberCountInMyList
from GroupUser
left outer join MyUserIDList on GroupUser.UserID=MyUserIDList.UserID
group by GroupId
) As MySubQuery
Where GroupMemberCount=GroupMemberCountInMyList

There are couple of ways of doing this. This answer is for sql server only (as you have not mentioned it in your tags)
Pass the list of userids in comma seperated to a stored procedure and in the SP create a dynamic query with this and use the EXEC command to execute the query. This link will guide you in this regard
Use a table-valued parameter in a SP. This is applicable to sql server 2008 and higher only.
The following link will help you get started.
http://www.codeproject.com/Articles/113458/TSQL-Passing-array-list-set-to-stored-procedure-MS
Hope this helps.

One other solution is that you convert the input list into a table. This can be done with various approaches. Unions, temporary tables and others. A neat solution combines the answer of
user1461607 for another question here on SO, using a comma-separated string.
WITH split(word, csv) AS (
-- 'initial query' (see SQLite docs linked above)
SELECT
'', -- place holder for each word we are looking for
'Auto,A,1234444,' -- items you are looking for
-- make sure the list ends with a comma !!
UNION ALL SELECT
substr(csv, 0, instr(csv, ',')), -- each word contains text up to next ','
substr(csv, instr(csv, ',') + 1) -- next recursion parses csv after this ','
FROM split -- recurse
WHERE csv != '' -- break recursion once no more csv words exist
) SELECT word, exisiting_data
FROM split s
-- now join the key you want to check for existence!
-- for demonstration purpose, I use an outer join
LEFT OUTER JOIN (select 'A' as exisiting_data) as t on t.exisiting_data = s.word
WHERE s.word != '' -- make sure we clamp the empty strings from the split function
;
Results in:
Auto,null
A,A
1234444,null

Related

Compare comma separated list with individual row in table

I have to compare comma separated values with a column in the table and find out which values are not in database. [kind of master data validation]. Please have a look at the sample data below:
table data in database:
id name
1 abc
2 def
3 ghi
SQL part :
Here i am getting comma separated list like ('abc','def','ghi','xyz').
now xyz is invalid value, so i want to take that value and return it as output saying "invalid value".
It is possible if i split those value, take it in temp table, loop through each value and compare one by one.
but is there any other optimal way to do this ??
I'm sure if I got the question right, however, I would personally be trying to get to something like this:
SELECT
D.id,
CASE
WHEN B.Name IS NULL THEN D.name
ELSE "invalid value"
END
FROM
data AS D
INNER JOIN badNames B ON b.Name = d.Name
--as SQL is case insensitive, equal sign should work
There is one table with bad names or invalid values if You prefer. This can a temporary table as well - depending on usage (a black-listed words should be a table, ad hoc invalid values provided by a service should be temp table, etc.).
NOTE: The select above can be nested in a view, so the data remain as they were, yet you gain the correctness information. Otherwise I would create a cursor inside a function that would go through the select like the one above and alter the original data, if that is the goal...
It sounds like you just need a NOT EXISTS / LEFT JOIN, as in:
SELECT tmp.InvalidValue
FROM dbo.HopeThisIsNotAWhileBasedSplit(#CSVlist) tmp
WHERE NOT EXISTS (
SELECT *
FROM dbo.Table tbl
WHERE tbl.Field = tmp.InvalidValue
);
Of course, depending on the size of the CSV list coming in, the number of rows in the table you are checking, and the style of splitter you are using, it might be better to dump the CSV to a temp table first (as you mentioned doing in the question).
Try following query:
SELECT SplitedValues.name,
CASE WHEN YourTable.Id IS NULL THEN 'invalid value' ELSE NULL END AS Result
FROM SplitedValues
LEFT JOIN yourTable ON SplitedValues.name = YourTable.name

Combine Unique Column Values Into One to Avoid Duplicates

For simplicity, assume I have two tables joined by account#. The second table has two columns, id and comment. Each account could have one or more comments and each unique comment has a unique id.
I need to write a t-sql query to generate one row for each account - which I assume means I need to combine as many comments as might exit for each account. This assumes the result set will only show the account# once. Simple?
Sql Server is a RDBMS best tuned for storing data and retrieving data, you can retrieve the desired data with one very simple query but the desired format should be handled with any of the reporting tools available like ssrs or crystal reports
Your query will be a simple inner join something like this
SELECT A.Account , B.Comment
FROM TableA AS A INNER JOIN TableB AS B
ON A.Account = B.Account
Now you can use your reporting tool to Group all the Comments by Account when Displaying data.
I do agree with M. Ali, but if you don't have that option, the following will work.
SELECT [accountID]
, [name]
, (SELECT CAST(Comment + ', ' AS VARCHAR(MAX))
FROM [comments]
WHERE (accountID = accounts.accountID)
FOR XML PATH ('')
) AS Comments
FROM accounts
SQL Fiddle
In my actual project I have this exact situation.
What you need is a solution to aggregate the comments in order to show only one line per account#.
I solve it by creating a function to concatenate the comments, like this:
create function dbo.aggregateComments( #accountId integer, #separator varchar( 5 ) )
as
begin;
declare #comments varchar( max ); set #comments = '';
select #comments = #comments + #separator + YouCommentsTableName.CommentColumn
from dbo.YouCommentsTableNAme
where YouCommentsTableName.AccountId = #accountId;
return #comments;
end;
You can use it on you query this way:
select account#, dbo.aggretateComments( account#, ',' )
from dbo.YourAccountTableName
Creating a function will give you a common place to retrieve your comments. It's a good programming practice.

SQL mass string manipulation

I'm working with an oracle DB and need to manipulate a string column within it. The column contains multiple email addresses in this format:
jgooooll#gmail.com;dhookep#gmail.com;amoore#outlook.com
What I want to do is take out anything that does not have '#gmail.com' at the end (in this example amoore#outlook.com would be removed) however amoore#outlook.com may be the first email in the next row of the column so in this way there is no real fixed format, the only format being that each address is seperated by a semi-colon.
Is there anyway of implementing this through one command to run through every row in the column and remove anything thats not #gmail.com? I'm not really sure if this kind of processing is possible in SQL. Just looking for your thoughts!!
Thanks a lot you guys. Look forward to hearing from you!
Applicable to Oracle 11g (11.2) onward only. Because listagg function is supported only in 11.2 onward. If you are using 10.1 onward up to 11.1, you can write your own string aggregate function or take this one.
with T1 as (
select 1 id, 'jhd#jk.com;jgooooll#gmail.com;dhookep#gmail.com;amoore#outlook.com' emails from dual union all
select 2 id, 'jhd#jk.com;jgooooll#gmail.com;dhookep#gmail.com;amoore#outlook.com' emails from dual
)
select id
, listagg(email, ';') within group(order by id) emails
from (select id
, regexp_substr(emails,'[^;]+', 1, rn) email
from t1
cross join (select rownum rn
from(select max (regexp_count(emails, '[^;]+')) ml
from t1
)
connect by level <= ml
)
)
where email like '%#gmail.com%'
group by id
Id Emails
--------------------------------------
1 dhookep#gmail.com;jgooooll#gmail.com
2 dhookep#gmail.com;jgooooll#gmail.com
Here is a Demo
This answer is actually for SQL Server, as that is what I know. That being said, perhaps having an example of how to do it in one system will give you an idea of how to do it in yours. Or maybe there is a way to convert the code into the same type of thing in Oracle.
First, the thought process: In SQL Server combining the FOR XML PATH and STUFF functionality allows you to make a comma separated list. I'm adding a WHERE Split.SplitValue LIKE ... clause into this to filter it to only gmail addresses. I'm cross applying this whole thing to the main table, and that turns it into a filtered email list. You could then further filter the main table to run this on a more targeted set of rows.
Second, the SQL Server implementation:
SELECT
*
FROM #Table Base
CROSS APPLY
(
SELECT
STUFF(
(SELECT
';' + Split.SplitValue AS [text()]
FROM dbo.fUtility_Split(Base.Emails, ';') Split
WHERE Split.SplitValue LIKE '%#gmail.com'
FOR XML PATH (''))
, 1, 1, '') Emails
) FilteredEmails
EDIT: I forgot to mention that this answer requires you have some sort of function to split a string column based on a separator value. If you don't have that already, then google for it. There are tons of examples.

check whether an array values a subset of a query?

I've a set of rows
SELECT id from Users WHERE...
1
2
6
8
9
and I've and array with values 2,3,6
How can I check in SQL that the array is a sub set of the result of the query?
SQL doesn't as such support arrays so I'm not entirely sure how you're storing your array of numbers, and that will affect the best way to answer this question.
That said, I'd do this:
SELECT u.id
FROM Users U
RIGHT JOIN Numbers N
ON U.id=N.Number
WHERE N.Number IN (2,3,6)
That's the basic query; exact details from there depend on what you'd be doing to detect the failure. Any records where u.ID IS NULL indicate it isn't a subset. If you don't actually immediately want the set of IDs you could modify it to
SELECT COUNT(*) AS Missing
FROM Users U
RIGHT JOIN Numbers N
ON U.id=N.Number
WHERE N.Number IN (2,3,6)
AND u.id IS NULL
and, whenever Missing was > 0 you'd know you didn't have a subset. (In SQL Server at least you can then cast the int to a bit to get 0=false, !0=true if that's easier for your app to work with.)
Other details we can add with more info about what you're actually trying to do, but hopefully that makes sense as a basic technique.
(N.B. this all assumes that you've got a numbers / tally table in your database. They're incredibly useful so, if you haven't already, I'd get one set up.)
You have to check each record/item individually, then count them.
If the JOIN is the same size as the array, the array is a sub-set of the table.
Here is an example that assumes your array in in a table...
SELECT
COUNT(*)
FROM
Users
INNER JOIN
search
ON search.id = Users.id
HAVING
COUNT(*) = (SELECT COUNT(*) FROM search)
Use Dynamic SQL:
declare #cmd varchar(200)
select #cmd = "select id from Users WHERE id in (" + #array + ")"
exec(#cmd)
If you can populate a one column table with the values that you need to test against then you could do this.
Select count(*)
From
(
Select id
From users
Intersect
Select id
From testValues
) test
If the count is equal to the number of values you're testing against then the array forms a subset.

SP to find keywords like a list or strings

In my mssql database I have a table containing articles(id, name, content) a table containing keywords(id, name) and a link table between articles and keywords ArticleKeywords(articleId, keywordID, count). Count is the number of occurrences of that keyword in the article.
How can I write a SP that gets a list of comma separated strings and gives me the articles that have this keywords ordered by the number of occurrences of the keywords in the article?
If an article contains more keywords I want to sum the occurrences of each keyword.
Thanks, Radu
Although it isn't completely clear to me what the source of your comma-separated string is, I think what you want is an SP that takes a string as input and produces the desired result:
CREATE PROC KeywordArticleSearch(#KeywordString NVARCHAR(MAX)) AS BEGIN...
The first step is to verticalize the comma-separated string into a table with the values in rows. This is a problem that has been extensively treated in this question and another question, so just look there and choose one of the options. Whichever way you choose, store the results in a table variable or temp table.
DECLARE #KeywordTable TABLE (Keyword NVARCHAR(128))
-- or alternatively...
CREATE TABLE #KeywordTable (Keyword NVARCHAR(128))
For lookup speed, it is even better to store the KeywordID instead so your query only has to find matching ID's:
DECLARE #KeywordIDTable TABLE (KeywordID INT)
INSERT INTO #KeywordTable
SELECT K.KeywordID FROM SplitFunctionResult S
-- INNER JOIN: keywords that are nonexistent are omitted
INNER JOIN Keywords K ON S.Keyword = K.Keyword
Next, you can go about writing your query. This would be something like:
SELECT articleId, SUM(count)
FROM ArticleKeywords AK
WHERE K.KeywordID IN (SELECT KeywordID FROM #KeywordIDTable)
GROUP BY articleID
Or instead of the WHERE you could use an INNER JOIN. I don't think the query plan would be much different.
For the sake or argument lets say you want to look-up all articles containg the keywords Foo, Bar and Shazam.
ALTER PROCEDURE spArticlesFromKeywordList
#KeyWords varchar(1000) = 'Foo,Bar,Shazam'
AS
SET NOCOUNT ON
DECLARE #KeyWordInClause varchar(1000)
SET #KeyWordInClause = REPLACE (#KeyWords ,',',''',''')
EXEC(
'
SELECT
t1.Name as ArticleName,
t2.Name as KeyWordName,
t3.Count as [COUNT]
FROM ArticleKeywords t3
INNER JOIN Articles t1 on t3.ArticleId = t1.Id
INNER JOIN Keywords t2 on t3.KeywordId = t2.Id
WHERE t2.KeyWord in ( ''' + #KeyWordInClause + ''')
ORDER BY
3 descending, 1
'
)
SET NOCOUNT OFF
I think I understand what you are after so here goes ,(not sure what lang you are using but) in PHP (from your description) I would query ArticleKeywords using a ORDER BY count DESC statement (i.e. the highest comes first) - Obviously you can "select by keywordID or articleid. In very simple terms (cos that's me - simple & there may be much better people than me) you can return the array but create a string from it a bit like this:
$arraytostring .= $row->keywordID.',';
If you left join the tables you could create something like this:
$arraytostring .= $row->keywordID.'-'.$row->name.' '.$row->content.',';
Or you could catch the array as
$array[] = $row->keywordID;
and create your string outside the loop.
Note: you have 2 fields called "name" one in articles and one in keywords it would be easier to rename one of them to avoid any conflicts (that is assuming they are not the same content) i.e. articles name = title and keywords name= keyword