SQL query that gives distinct results that match multiple columns - sql

Sorry, I couldn't provide a better title for my problem as I am quite new to SQL.
I am looking for a SQL query string that solves the below problem.
Let's assume the following table:
DOCUMENT_ID | TAG
----------------------------
1 | tag1
1 | tag2
1 | tag3
2 | tag2
3 | tag1
3 | tag2
4 | tag1
5 | tag3
Now I want to select all distinct document id's that contain one or more tags (but those must provide all specified tags).
For example:
Select all document_id's with tag1 and tag2 would return 1 and 3 (but not 4 for example as it doesn't have tag2).
What would be the best way to do that?
Regards,
Kai

SELECT document_id
FROM table
WHERE tag = 'tag1' OR tag = 'tag2'
GROUP BY document_id
HAVING COUNT(DISTINCT tag) = 2
Edit:
Updated for lack of constraints...

This assumes DocumentID and Tag are the Primary Key.
Edit: Changed HAVING clause to count DISTINCT tags. That way it doesn't matter what the primary key is.
Test Data
-- Populate Test Data
CREATE TABLE #table (
DocumentID varchar(8) NOT NULL,
Tag varchar(8) NOT NULL
)
INSERT INTO #table VALUES ('1','tag1')
INSERT INTO #table VALUES ('1','tag2')
INSERT INTO #table VALUES ('1','tag3')
INSERT INTO #table VALUES ('2','tag2')
INSERT INTO #table VALUES ('3','tag1')
INSERT INTO #table VALUES ('3','tag2')
INSERT INTO #table VALUES ('4','tag1')
INSERT INTO #table VALUES ('5','tag3')
INSERT INTO #table VALUES ('3','tag2') -- Edit: test duplicate tags
Query
-- Return Results
SELECT DocumentID FROM #table
WHERE Tag IN ('tag1','tag2')
GROUP BY DocumentID
HAVING COUNT(DISTINCT Tag) = 2
Results
DocumentID
----------
1
3

select DOCUMENT_ID
TAG in ("tag1", "tag2", ... "tagN")
group by DOCUMENT_ID
having count(*) > N and
Adjust N and the tag list as needed.

Select distinct document_id
from {TABLE}
where tag in ('tag1','tag2')
group by id
having count(tag) >=2
How you generate the list of tags in the where clause depends on your application structure. If you are dynamically generating the query as part of your code then you might simply construct the query as a big dynamically generated string.
We always used stored procedures to query the data. In that case, we pass in the list of tags as an XML document. - a procedure like that might look something like one of these where the input argument would be
<tags>
<tag>tag1</tag>
<tag>tag2</tag>
</tags>
CREATE PROCEDURE [dbo].[GetDocumentIdsByTag]
#tagList xml
AS
BEGIN
declare #tagCount int
select #tagCount = count(distinct *) from #tagList.nodes('tags/tag') R(tags)
SELECT DISTINCT documentid
FROM {TABLE}
JOIN #tagList.nodes('tags/tag') R(tags) ON {TABLE}.tag = tags.value('.','varchar(20)')
group by id
having count(distict tag) >= #tagCount
END
OR
CREATE PROCEDURE [dbo].[GetDocumentIdsByTag]
#tagList xml
AS
BEGIN
declare #tagCount int
select #tagCount = count(*) from #tagList.nodes('tags/tag') R(tags)
SELECT DISTINCT documentid
FROM {TABLE}
WHERE tag in
(
SELECT tags.value('.','varchar(20)')
FROM #tagList.nodes('tags/tag') R(tags)
}
group by id
having count( distinct tag) >= #tagCount
END
END

Related

Select rows base on Subset

I've a scenario where I need to write sql query base on result of other query.
Consider the table data:
id attribute
1 a
1 b
2 a
3 a
3 b
3 c
I want to write query to select id base on attribute set.
I mean first I need to check attribute of id 1 using this query:
select attribute from table where id = 1
then base on this result I need to select subset of attribute. like in our case 1(a,b) is the subset of 3(a,b,c). My query should return 3 on that case.
And if I want to check base on 2(a) which is the subset of 1(a,b) and 3(a,b,c), it should return 1 and 3.
I hope, it's understandable. :)
You could use this query.
Logic is simple: If there isn't any item in A and isn't in B --> A is subset of B.
DECLARE #SampleData AS TABLE
(
Id int, attribute varchar(5)
)
INSERT INTO #SampleData
VALUES (1,'a'), (1,'b'),
(2,'a'),
(3,'a'),(3,'b'),(3,'c')
DECLARE #FilterId int = 1
;WITH temp AS
(
SELECT DISTINCT sd.Id FROM #SampleData sd
)
SELECT * FROM temp t
WHERE t.Id <> #FilterId
AND NOT EXISTS (
SELECT sd2.attribute FROM #SampleData sd2
WHERE sd2.Id = #FilterId
AND NOT EXISTS (SELECT * FROM #SampleData sd WHERE sd.Id = t.Id AND sd.attribute = sd2.attribute)
)
Demo link: Rextester
I would compose a query for that in three steps: first I'd get the attributes of the desired id, and this is the query you wrote
select attribute from table where id = 1
Then I would get the number of attributes for the required id
select count(distinct attribute) from table where id = 1
Finally I would use the above results as filters
select id
from table
where id <> 1 and
attribute in (
select attribute from table where id = 1 /* Step 1 */
)
group by id
having count(distinct attribute) = (
select count(distinct attribute) from table where id = 1 /* Step 2 */
)
This will get you all the id's that have a number of attributes among those of the initially provided id equal to the number the initial id has.

Dealing with a poorly designed "variable column" table in PostgreSQL

I am dealing with a poorly designed table, somewhat like this
create table (
entity_key integer,
tag1 varchar(10),
tag2 varchar(10),
tag3 varchar(10),
...
tag25 varchar(10)
);
An entity can have 0 or more tags indicated by the number of non-null columns. Tags are all the same type, and there should be a seperate "tags" table to which we can join the primary entities.
However, I'm stuck with this (quite large) table.
I want to run a query that gives me the distinct tags and a count of each.
If we had the normed "tags" table we could simply write
select tag, count(tag) from tags group by tag;
However, I haven't yet come up with a good approach for this query given the current table structure.
You can this by using an array and the unnest:
select x.tag, count(*)
from tags
cross join lateral unnest(array[tag1, tag2, tag3, tag4, tag5, tag6, tag7, ...]) as x(tag)
where x.tag is not null --<< git rid of any empty tags
group by x.tag;
This will group by the contents of the tag columns unlike Prdp's answer which groups by the "position" in the column list.
For this sample data:
insert into tags (entity_key, tag1, tag2, tag3, tag4, tag5)
values
(1, 'sql', 'dbms', null, null, null),
(2, 'sql', 'dbms', null, null, 'dml'),
(3, 'sql', null, null, 'ddl', null);
This will return this:
tag | count
-----+------
dml | 1
ddl | 1
sql | 3
dbms | 2
You can unpivot the data and do the count
select tag,count(data)
from
(
select tag1 as data,'tag1' as tag
from yourtable
Union All
select tag2,'tag2' as tag
from yourtable
Union All
..
select tag25,'tag25' as tag
from yourtable
) A
Group by tag
If postgresql supports Unpivot operator then you can use that

Inserting data to multiple tables Postgres

I currently have a MongoDB database with the following schema:
Image: { name: String, src: String, category: String, tags: [String] }
I'd like to migrate this to Postgres and for that I'd have 4 tables
image (id, src, name, category_id)
tag (id, name)
image_tag (image_id, tag_id)
category (id, name)
There might be new tags on every image inserts so when using CTE I need to select all the tags (and only insert new tags if they don't exist). I was thinking about using a cache (redis) to store the already inserted tags (so I don't need to select them from the db).
So my question is should I go with CTE with insert into tags.. where not exists statements or CTE + redis and only inserting tags when it could not be found in the cache?
So here is the small statement to insert an image with a category and multiple tags into multiple tables of a postgres database. The following expression assumes that the name in the tables category and tag has an unique constraint defined. For completion I also created an statement without that constraint (see the examples section).
Postgres statement
WITH image_values(image_name, src, category) AS (
VALUES
('Goldkraut', 'goldkraut.jpg', 'logo')
),
tag_values(tag_name) AS (
VALUES
('music'), ('band')
),
category_select AS (
SELECT id, name FROM category
WHERE name IN (SELECT category FROM image_values)
),
category_insert AS (
INSERT INTO category(name)
SELECT category FROM image_values
ON CONFLICT (name) DO NOTHING
RETURNING id, name
),
category_created AS (
SELECT id, name FROM category_select
UNION ALL
SELECT id, name FROM category_insert
),
tag_select AS (
SELECT id, name FROM tag
WHERE name IN (SELECT tag_name FROM tag_values)
),
tag_insert AS (
INSERT INTO tag(name)
SELECT tag_name FROM tag_values
ON CONFLICT (name) DO NOTHING
RETURNING id, name
),
tag_created AS (
SELECT id, name FROM tag_select
UNION ALL
SELECT id, name FROM tag_insert
),
image_insert AS (
INSERT INTO image(src, name, category_id)
SELECT src, image_name, category_created.id
FROM image_values
LEFT JOIN category_created ON(image_values.category=category_created.name)
RETURNING id, src, name, category_id
),
image_tag_insert AS (
INSERT INTO image_tag(image_id, tag_id)
SELECT image_insert.id, tag_created.id FROM image_insert
CROSS JOIN tag_created
RETURNING image_id, tag_id
)
SELECT image_insert.*, category_created.name as category_name, image_tag_insert.*, tag_created.name as "tag.name"
FROM image_tag_insert
LEFT JOIN image_insert ON (image_id = image_insert.id)
LEFT JOIN category_created ON (category_created.id = image_insert.category_id)
LEFT JOIN tag_created ON (tag_created.id = tag_id)
Explanation to the statement
In the first common table expression (CTE) image_values you will define all values for an image that has in a 1:1 relation. In the next expression tag_values all tag names for that image are defined.
Now lets start with the categories. To know if a category with the name already exist, you query for an category entry in category_select. In the expression category_insert you will create an new entry for the category if not already exits (instead of querying again from the database we use the cte category_select to find out if we already have an category with this name). To store the category id in the image table we need the category entry whether the existing (from category_select) or the inserted (from category_insert) so we union this two expressions in category_created.
Now we use the same pattern for the tags. Query for existing tags tag_select, insert tags if not exist tag_insert and union this entries in tag_created.
At next we insert the image in image_insert. Therefore we select the values from the expression image_values and join the expression category_created to get the id of the category. To insert the the relation image to tag we will need the id of the inserted image so we will return this value. The other return values are not really necessary but we will use them to get a nicer result set in the final query.
Now we have the primary key of the inserted image and we can store the associations of the image to the tags. In the expression image_tag_insert we select the id of the inserted image and cross join this with every tag id we selected or inserted.
For the final statement it will be enough to just do SELECT * FROM image_tag_insert to execute all the expression. But for an overview what was stored in the database i joined all the relations. So the result will look like this:
Joined result
| id | src | name | category_id | category_name | image_id | tag_id | tag.name |
|----|---------------|-----------|-------------|---------------|----------|--------|----------|
| 1 | goldkraut.jpg | Goldkraut | 2 | logo | 1 | 3 | band |
| 1 | goldkraut.jpg | Goldkraut | 2 | logo | 1 | 1 | music |
Example
On this sqlfiddle you will see the given query in action. In another sqlfiddle i have add some extras to the last statement to format all inserted tags as a list. If you have not add a unique constrain to the name column in the tables tag and category you can use this example

Add a rownumber based on the sequence of values provided

SELECT Code, Value FROM dbo.Sample
Output:
Code Value
Alpha Pig
Beta Horse
Charlie Dog
Delta Cat
Echo Fish
I want to add a sequence column by specifying a list of Codes and sort the list based on the order specified in the IN clause.
SELECT Code, Value FROM dbo.Sample
WHERE Code in ('Beta', 'Echo', 'Alpha')
I could declare a variable at the top to specify the Codes if that is easier.
The key is that I want to add the row number based on the order that I specify them in.
Output:
Row Code Value
1 Beta Horse
2 Echo Fish
3 Alpha Pig
Edit: I realized after that my Codes are all a fixed length which makes a big difference in how it could be done. I marked the answer below as correct, but my solution is to use a comma-separated string of values:
DECLARE #CodeList TABLE (Seq int, Code nchar(3))
DECLARE #CodeSequence varchar(255)
DECLARE #ThisCode char(3)
DECLARE #Codes int
SET #Codes = 0
-- string of comma-separated codes
SET #CodeSequence = 'ZZZ,ABC,FGH,YYY,BBB,CCC'
----loop through and create index and populate #CodeList
WHILE #Codes*4 < LEN(#CodeSequence)
BEGIN
SET #ThisCode = SUBSTRING(#CodeSequence,#Codes*4+1,3)
SET #Codes = #Codes + 1
INSERT #CodeList (Seq, Code) VALUES (#Codes, #ThisCode)
END
SELECT Seq, Code from #CodeList
Here are the only 2 ways I've seen work accurately:
The first uses CHARINDEX (similar to Gordon's, but I think the WHERE statement is more accurate using IN):
SELECT *
FROM Sample
WHERE Code IN ('Beta','Echo','Alpha')
ORDER BY CHARINDEX(Code+',','Beta,Echo,Alpha,')
Concatenating the comma with code should ensure sub-matches don't affect the results.
Alternatively, you could use a CASE statement:
SELECT *
FROM Sample
WHERE Code in ('Beta','Echo','Alpha')
ORDER BY CASE
WHEN Code = 'Beta' THEN 1
WHEN Code = 'Echo' THEN 2
WHEN Code = 'Alpha' THEN 3
END
SQL Fiddle Demo
Updated Demo with sub-matches.
Also you can use Values as Table Source
SELECT Row, Code, Value
FROM [Sample] s JOIN (
SELECT ROW_NUMBER() OVER(ORDER BY(SELECT 1)) AS Row, Match
FROM (VALUES ('Beta'),
('Echo'),
('Alpha'))
x (Match)
) o ON s.Code = o.Match
ORDER BY Row
Demo on SQLFiddle
Here is solution for any lenght code list.
Create table with self incrementing field and code. Insert in given order. Join tables and order by ...
Some details. Please read this. You will find there function that creates table with auto increment field from string (delimited by commas), i.e.
mysql> call insertEngineer('dinusha,nuwan,nirosh');
Query OK, 1 row affected (0.12 sec)
mysql> select * from engineer;
+----+----------+
| ID | NAME |
+----+----------+
| 1 | dinusha |
| 2 | nuwan |
| 3 | nirosh |
+----+----------+
Next join your Sample table with result of above. GL
Just a lil bit of change to whats been done above to include the rownumbers as well.
SELECT CASE
WHEN Code = 'BetaBeta' THEN 1
WHEN Code = 'Beta' THEN 2
WHEN Code = 'Alpha' THEN 3
END CodeOrder,
*
FROM Sample
WHERE Code in ('BetaBeta','Beta','Alpha')
ORDER BY CodeOrder
SQL Fiddle Demo
I might be tempted to do this using string functions:
declare #list varchar(8000) = 'Beta,Echo,Alpha';
with Sample as (
select 'Alpha' as Code, 'Pig' as Value union all
select 'Beta', 'Horse' union all
select 'Charlie', 'Dog' union all
select 'Delta', 'Cat' union all
select 'Echo', 'Fish'
)
select * from Sample
where charindex(Code, #list) > 0
order by charindex(Code, #list)
If you are worried about submatches, just do the "delimiter" trick:
where #list like '%,'+Code+',%'

Select query to get all data from junction table to one field

I have 2 tables and 1 junction table:
table 1 (Log): | Id | Title | Date | ...
table 2 (Category): | Id | Title | ...
junction table between table 1 and 2:
LogCategory: | Id | LogId | CategoryId
now, I want a sql query to get all logs with all categories title in one field,
something like this:
LogId, LogTitle, ..., Categories(that contains all category title assigned to this log id)
can any one help me solve this? thanks
Try this code:
DECLARE #results TABLE
(
idLog int,
LogTitle varchar(20),
idCategory int,
CategoryTitle varchar(20)
)
INSERT INTO #results
SELECT l.idLog, l.LogTitle, c.idCategory, c.CategoryTitle
FROM
LogCategory lc
INNER JOIN Log l
ON lc.IdLog = l.IdLog
INNER JOIN Category c
ON lc.IdCategory = c.IdCategory
SELECT DISTINCT
idLog,
LogTitle,
STUFF (
(SELECT ', ' + r1.CategoryTitle
FROM #results r1
WHERE r1.idLog = r2.idLog
ORDER BY r1.idLog
FOR XML PATH ('')
), 1, 2, '')
FROM
#results r2
Here you have a simple SQL Fiddle example
I'm sure this query can be written using only one select, but this way it is readable and I can explain what the code does.
The first select takes all Log - Category matches into a table variable.
The second part uses FOR XML to select the category names and return the result in an XML instead of in a table. by using FOR XML PATH ('') and placing a ', ' in the select, all the XML tags are removed from the result.
And finally, the STUFF instruction replaces the initial ', ' characters of every row and writes an empty string instead, this way the string formatting is correct.