Counting rows of a subgroup while ignoring duplicates

Counting rows of a subgroup while ignoring duplicates - sql

I can't find a way to describe my problem in an abstract and general manner, so I'll just provide a minimal example:
Let's say I have these 3 simple tables:
CREATE TABLE Document(
[Id] int IDENTITY(1, 1) NOT NULL PRIMARY KEY,
[Title] nvarchar(MAX),
[Patient] nvarchar(MAX)
);
CREATE TABLE Link(
DocumentId INT FOREIGN KEY REFERENCES Document(Id),
Text nvarchar(max)
);
CREATE TABLE ReadStatus(
DocumentId INT FOREIGN KEY REFERENCES Document(Id),
IsRead Bit NOT NULL,
UserId Int NOT NULL
);
We have a set of documents
A document can have 0 or more links
Documents can be read by users - this is tracked by the ReadStatus table, which associates a user with a document, and where IsRead=1 means the document has been read by that user and IsRead=0 means it hasn't been read by that user yet.
If, for document X and user A, a row does not exist in the ReadStatus table, we assume User A hasn't read document X yet.
Now, I need to run a query to select all patients. For each patient, I need the total number of documents available AND the number of documents that have already been read (i.e. IsRead=1). This is what I have so far:
SELECT d.Patient,
COUNT(DISTINCT d.Id) AS DocumentCount,
COUNT(NULLIF(rs.IsRead,0)) AS ReadDocumentCount,
COUNT(*) OVER () AS TotalPatientCount
FROM Document d
LEFT OUTER JOIN ReadStatus AS rs ON d.Id = rs.DocumentId AND rs.UserId = 123
INNER JOIN Link AS l ON d.Id = l.DocumentId AND l.Text IN ('Link W', 'Link X', 'Link T', 'Link Z')
GROUP BY d.Patient
The problem happens when a document (that has already been read) has more than one link. If that document has 3 links, the cartesian product produced by the INNER JOIN with the Link table will cause the ReadDocumentCount selection to be 3 instead of 1.
In other words, given this data:
INSERT INTO Document(Title, Patient) VALUES('Doc A', 'Mike')
INSERT INTO Document(Title, Patient) VALUES('Doc B', 'Mike')
INSERT INTO Link(DocumentId, Text) VALUES(1, N'Link W')
INSERT INTO Link(DocumentId, Text) VALUES(1, N'Link X')
INSERT INTO Link(DocumentId, Text) VALUES(1, N'Link Y')
INSERT INTO Link(DocumentId, Text) VALUES(2, N'Link Z')
INSERT INTO ReadStatus(DocumentID, IsRead, UserId) VALUES(1, 1, 123)
INSERT INTO ReadStatus(DocumentID, IsRead, UserId) VALUES(2, 0, 123)
I'm getting this as a result:
Patient DocumentCount ReadDocumentCount TotalPatientCount
Mike 2 3 1
Whereas this is what I want:
Patient DocumentCount ReadDocumentCount TotalPatientCount
Mike 2 1 1
SQL fiddle: http://sqlfiddle.com/#!6/e06bf/3

You can use COUNT(DISTINCT) conditionally as well:
SELECT d.Patient,
COUNT(DISTINCT d.Id) AS DocumentCount,
COUNT(DISTINCT (CASE WHEN rs.IsRead <> 0 THEN d.id END)) AS ReadDocumentCount,
COUNT(*) OVER () AS TotalPatientCount
FROM Document d LEFT OUTER JOIN
ReadStatus rs
ON d.Id = rs.DocumentId AND rs.UserId = 123 INNER JOIN
Link l
ON d.Id = l.DocumentId AND l.Text IN ('Link W', 'Link X', 'Link T', 'Link Z')
GROUP BY d.Patient;

Related

Insert records from two tables that match

I have the following tables:
CREATE TABLE forms
(
ID INT NOT NULL,
NAME TEXT NOT NULL,
TITLE TEXT NOT NULL
);
CREATE TABLE new_forms
(
ID INT NOT NULL,
NAME TEXT NULL,
TITLE TEXT NULL
);
INSERT INTO forms VALUES (0, 'test', 'test');
INSERT INTO new_forms VALUES (0, 'new_test', NULL);
And I'm using the following query:
INSERT INTO forms(id, name, title)
SELECT
1, COALESCE(nf.name, f.name), COALESCE(nf.title, f.title)
FROM
forms f
LEFT OUTER JOIN
new_forms nf ON nf.id = f.id;
SELECT * FROM forms;
The idea is to add both rows that match to the table.
In this example this two new records should be added:
1 test test
1 new_test test
But it's only adding the last one.
I have tried with all the join and none of them worked.
Fiddle
Thanks

You are using a join in the query which will give you only 1 row. If you need 2 rows. You have to use UNION ALL clause -
INSERT INTO forms(id, name, title)
SELECT
1, COALESCE(nf.name, f.name), COALESCE(nf.title, f.title)
FROM
forms f
LEFT OUTER JOIN
new_forms nf ON nf.id = f.id
UNION ALL
SELECT
1, COALESCE(f.name, nf.name), COALESCE(nf.title, f.title)
FROM
forms f
LEFT OUTER JOIN
new_forms nf ON nf.id = f.id;

Can't write sql query with RANK() function

In my application I need to retrieve the best file from database with the following rules:
file, that has the most number of upvotes - priority 1
file, that has the most number of comments - priority 2
If there are only files that have no upvotes and no comments, then pick up just the random one.
My tables:
CREATE TABLE "FILES"
( "ID" NUMBER,
"OBJ_ID" NUMBER,
"NAME" VARCHAR2(30 BYTE)
) ;
CREATE TABLE "UPVOTES"
( "ID" NUMBER,
"TO_ID" NUMBER,
"TO_TYPE" NUMBER
) ;
COMMENT ON COLUMN "UPVOTES"."TO_TYPE" IS '0 obj, 1 file, 2 comment';
CREATE TABLE "COMMENTS"
( "ID" NUMBER,
"OBJ_ID" NUMBER,
"CONTENT" VARCHAR2(20 BYTE),
"TO_TYPE" NUMBER,
"TO_ID" NUMBER
) ;
COMMENT ON COLUMN "COMMENTS"."TO_TYPE" IS '0 object, 1 file';
Insert into FILES (ID,OBJ_ID,NAME) values ('1','1','best file for obj id = 1');
Insert into FILES (ID,OBJ_ID,NAME) values ('2','1','file obj1');
Insert into FILES (ID,OBJ_ID,NAME) values ('3','1','file obj1');
Insert into FILES (ID,OBJ_ID,NAME) values ('4','2','best file for obj id = 2');
Insert into FILES (ID,OBJ_ID,NAME) values ('5','2','file obj2');
Insert into FILES (ID,OBJ_ID,NAME) values ('6','3','only one file obj 3');
Insert into FILES (ID,OBJ_ID,NAME) values ('7','4','probilem file obj 4');
Insert into UPVOTES (ID,TO_ID,TO_TYPE) values ('1','1','1');
Insert into UPVOTES (ID,TO_ID,TO_TYPE) values ('2','1','1');
Insert into UPVOTES (ID,TO_ID,TO_TYPE) values ('3','7','0');
Insert into UPVOTES (ID,TO_ID,TO_TYPE) values ('4','2','0');
Insert into UPVOTES (ID,TO_ID,TO_TYPE) values ('5','2','0');
Insert into UPVOTES (ID,TO_ID,TO_TYPE) values ('6','2','0');
Insert into COMMENTS (ID,OBJ_ID,CONTENT,TO_TYPE,TO_ID) values ('1','1','comment 1','1','2');
Insert into COMMENTS (ID,OBJ_ID,CONTENT,TO_TYPE,TO_ID) values ('2','1','comment 2','1','2');
Insert into COMMENTS (ID,OBJ_ID,CONTENT,TO_TYPE,TO_ID) values ('3','2','comment 3','1','4');
My sql query:
SELECT obj_id, name FROM (
SELECT obj_id, name, rank, ROW_NUMBER() OVER (PARTITION BY obj_id ORDER BY rank) rownumb FROM (
SELECT f.obj_id, f.name, RANK() OVER (PARTITION BY f.obj_id ORDER BY COUNT(v.id) DESC, COUNT(DISTINCT com.id) DESC) rank
FROM files f
LEFT OUTER JOIN upvotes v
ON f.id = v.to_id
LEFT OUTER JOIN comments com
ON f.id = com.to_id
WHERE (v.to_type = 1 OR v.to_type IS NULL)
AND (com.to_type = 1 OR com.to_type IS NULL)
GROUP BY f.obj_id, f.name
)
)
WHERE rownumb = 1;
Expected result:
obj_id file name
1 best file for obj id = 1
2 best file for obj id = 2
3 only one file obj 3
4 probilem file obj 4
The problem is with the line:
(v.to_type = 1 OR v.to_type IS NULL)
It fails, because there are upvotes for objects (TO_TYPE = 0) with the same TO_ID as file ID, but I still need to count upvotes for files (TO_TYPE = 1).
Can somebody help me to figure it out?
I use Oracle Database 11g XE R2.

Replace
FROM files f
LEFT OUTER JOIN upvotes v
ON f.id = v.to_id
LEFT OUTER JOIN comments com
ON f.id = com.to_id
WHERE (v.to_type = 1 OR v.to_type IS NULL)
AND (com.to_type = 1 OR com.to_type IS NULL)
by
FROM files f
LEFT OUTER JOIN upvotes v ON f.id = v.to_id AND v.to_type = 1
LEFT OUTER JOIN comments com ON f.id = com.to_id AND com.to_type = 1

How to make an OUTER JOIN return ZERO instead of NULL

I am trying to accomplish this on SQL Server. The simplest table structure with data is shown below.
Table:Blog
BlogID, Title
----------------
1, FirstBlog
23, Pizza
Table:User
UserID, Name
-------------------
123, james
444, John
Table:UserBlogMapping
UserBlogMappingID, BlogID,UserID
----------------------------------
1, 1, 123
I want to get FormID and UserBlogMappingID in one SQL query. If provided UserID is not in the mapping table, return ZERO otherwise return the valid userBlogMappingID. I am trying to run the below query but its not correct.
SELECT
B.BlogID,
BUM.BlogUserMappingID
FROM
Blog AS B
LEFT JOIN BlogUserMapping AS BUM ON B.BlogID = BUM.BlogID
WHERE
(B.BlogID = 23) -- it exists in the table
AND BUM.userID = 444 -- it is NOT in the mmaping table but i want a ZERO return in such case
Assumption:
We can assume that the UserID provided in the WHERE clause is always valid UserID and is present in the User table.

You could put the criteria for the userID=444 in the ON clause of the LEFT JOIN.
And an ISNULL or a COALESCE to change a NULL to a 0.
Example using table variables:
declare #Blog table (BlogID int, Title varchar(30));
insert into #Blog (BlogId, Title) values
(1, 'FirstBlog'),
(23, 'Pizza');
declare #User table (UserID int, Name varchar(30));
insert into #User (UserID, Name) values
(123,'james'),
(444,'John');
declare #BlogUserMapping table (BlogUserMappingID int, BlogID int, UserID int);
insert into #BlogUserMapping (BlogUserMappingID, BlogID, UserID) values
(1, 1, 123),
(2, 23, 123),
(3, 1, 444);
-- Using the criteria in ON clause of the LEFT JOIN
SELECT
B.BlogID,
ISNULL(BUM.BlogUserMappingID,0) as BlogUserMappingID
FROM #Blog B
LEFT JOIN #BlogUserMapping BUM ON (B.BlogID = BUM.BlogID AND BUM.userID = 444)
WHERE B.BlogID = 23;
-- If there are more BlogId=23 with userID=444.
-- But only 1 row needs to be returned then you could also GROUP BY and take the maximum BlogUserMappingID
SELECT
B.BlogID,
MAX(ISNULL(BUM.BlogUserMappingID,0)) as BlogUserMappingID
FROM #Blog B
LEFT JOIN #BlogUserMapping BUM ON (B.BlogID = BUM.BlogID AND BUM.userID = 444)
WHERE B.BlogID = 23
GROUP BY B.BlogID;
-- Using an OR in the WHERE clause would also return a 0.
-- But it would also return nothing if the mapping table has a BlogID=23 with a userID<>444.
-- So not usefull in this case.
SELECT
B.BlogID,
ISNULL(BUM.BlogUserMappingID,0) as BlogUserMappingID
FROM #Blog B
LEFT JOIN #BlogUserMapping BUM ON B.BlogID = BUM.BlogID
WHERE B.BlogID = 23
AND (BUM.userID IS NULL OR BUM.userID = 444);

SQL Delete Where Not In with Composite Key

I have a system integration project which needs to CRUD from one DB to another. Not especially complicated. However, when it comes to deleting rows which exist in the target but not in the source, I ran into a little trouble. The standard patterns include: LEFT JOIN, NOT EXISTS or NOT IN. I chose the LEFT JOIN. My 'Phone' table uses a composite key, Employee 'Id' and the PhoneType: Work, Home, Mobile, etc. The standard left join will delete ANY target Phone number NOT in the source. This clears out the whole table. NOTE: I am updating only records which have changed since the last update, NOT the whole target & source. So, I wrote a fix which I suspect is really poor SQL:
-- SOURCE
DECLARE #tmpPhones TABLE(Id varchar(8), PhoneType int, PhoneNumber varchar(30), PRIMARY KEY (Id, PhoneType))
INSERT into #tmpPhones values
('TEST123', 1, '12345678'),
('TEST123', 2, '12345678'),
('TEST123', 3, '12345678')
-- TARGET
DECLARE#Phone TABLE( Id varchar(8), PhoneType int, PhoneNumber varchar(30), PRIMARY KEY (Id, PhoneType))
INSERT into #Phone values
('TEST123', 1, '12345678'), <-- Exists in both, leave
('TEST123', 2, '12345678'), <-- Exists in both, leave
('TEST123', 3, '12345678'), <-- Exists in both, leave
('TEST123', 4, '12345678'), <-- ONLY delete this one!
('TEST456', 2, '12345678'), <-- Ignore this employee Id
('TEST456', 3, '12345678'), ""
('TEST456', 4, '12345678') ""
DELETE p
FROM #Phone p
LEFT JOIN #tmpPhones t
ON t.Id = p.Id AND t.PhoneType = p.PhoneType
WHERE t.Id IS NULL AND t.PhoneType IS NULL
AND p.Id IN (SELECT Id FROM #tmpPhones) <-- a sad hack?
This works, but I feel like there is a better way to make sure we are only deleting records for THIS employee, not all the others.
Any suggestions?

Use exists.
DELETE p
FROM #Phone p
where exists (select 1 from #tmpPhones where Id = p.Id)
AND not exists (select 1 from #tmpPhones where PhoneType = p.PhoneType)
Edit: Deleting using cte.
with todelete as (
select id,phonetype from phone
except
select id,phonetype from tmpphones t
where exists (select 1 from phone where id = t.id)
)
delete from phone
where exists (select 1 from todelete where phone.id = id and phone.phonetype = phonetype)

I think two exists statements pretty much capture the logic: as you describe it
DELETE p
FROM #Phone p
WHERE EXISTS (SELECT 1 FROM #tmpPhone t WHERE t.id = p.id) AND
NOT EXISTS (SELECT 1 FROM #tmpPhone t WHERE t.id = p.id AND t.PhoneType = p.PhoneType) ;

merge seems to work ok - but you still need to see if the id is in your reference set, I don't see a clean way around that
MERGE #Phone AS TGT
USING (
SELECT * FROM #tmpPhones
) AS SRC
ON TGT.ID=SRC.ID AND TGT.PHONETYPE=SRC.PHONETYPE
WHEN NOT MATCHED BY SOURCE AND tgt.id IN (SELECT id FROM #tmpPhones) THEN DELETE;

SQL Query to select the COUNT from another SQL query

I'm having a little trouble making an SQL SERVER 2000 query. Here is my scenario:
I have a table called Folders with 3 columns: pk_folderID, folderName and fk_userID.
Also, I have another table called FolderMedia which stores what media (whatever) belong to a certain folder. There are 2 columns: fk_folderID, fk_media.
And the last, I have a table called Media which stores some media details. It has a primary key pk_media and among other columns, it has a MediaType column which tells the type of that media: image or video.
Now, I would like a query that does the following:
Select all folders that belong to a certain fk_userID, and then also get the number of media in that folder. I've seen a query like this here on StackOverflow, but I didn't manage to upgrade it to get 2 counts of media (based on their type)
Basically, get the folder details (name, etc) for all folders that belong to a user(fk_userID) and also, for each folder get the number of images and videos in it (as separate values).
The select would basically return:
folderName, count(images in folder), count(videos in folder), other folder details.
One obvious solution would be to just get all folders and then manually calculate the number of images/videos in them... but I would first like to try with a query.
Thank you,

Basically something like this:
SELECT
f.pk_folderID,
f.folderName,
VideoCount = COUNT(CASE m.MediaType WHEN 'Video' THEN 1 END),
ImageCount = COUNT(CASE m.MediaType WHEN 'Image' THEN 1 END)
FROM Folder f
LEFT JOIN FolderMedia fm ON f.pk_folderID = fm.fk_folderID
LEFT JOIN Media m ON fm.fk_media = m.pk_media
WHERE f.fk_userID = #UserID
GROUP BY
f.pk_folderID,
f.folderName
UPDATE (based on the additional request):
To include a sort of TOP 1 Media.Name into the result set, the above query could be changed like this:
SELECT
f.pk_folderID,
f.folderName,
VideoCount = COUNT(CASE m.MediaType WHEN 'Video' THEN 1 END),
ImageCount = COUNT(CASE m.MediaType WHEN 'Image' THEN 1 END),
MediaName = MAX(CASE fm.timestamp WHEN t.timestamp THEN m.Name END)
FROM Folder f
LEFT JOIN FolderMedia fm ON f.pk_folderID = fm.fk_folderID
LEFT JOIN Media m ON fm.fk_media = m.pk_media
LEFT JOIN (
SELECT
fk_folderID,
timestamp = MIN(timestamp)
FROM FolderMedia
GROUP BY fk_folderID
) t ON fm.fk_folderID = t.fk_folderID AND fm.timestamp = t.timestamp
WHERE f.fk_userID = #UserID
GROUP BY
f.pk_folderID,
f.folderName
In cases where minimal FolderMedia.timestamp values are not unique within their folders, the ultimate value of the corresponding Media.Name will be decided by its alphabetical sorting. In particular, the above query selects the last one of the set (with MAX()).

Get all data you need from Folders table left join it with FolderMedia and Media
use sum with case inside to count all videos and images.
SUM(CASE WHEN mediaTypeId = videoId THEN 1 ELSE 0 END) as videoCount

//select folderName ,count(case when Folders.pk_folderID then 1 else null end)
count(SELECT FolderMedia.fk_media from FolderMedia JOIN Media on fk_media=pk_media where image is not null) as nrImg
etc..

Here is a query with some example data. Hope it helps.
declare #Folders table (pk_folderID int, folderName varchar(32), fk_userID int)
declare #Media table (pk_media int, name varchar(50), type varchar(32))
declare #FolderMedia table (fk_folderID int, fk_media int)
insert into #Folders values (1, 'Folder1', 1000)
insert into #Folders values (2, 'Folder2', 1000)
insert into #Folders values (3, 'Folder1', 2000)
insert into #Folders values (4, 'Folder1', 2000)
insert into #Media values (1, 'graph.jpg', 'image')
insert into #Media values (2, 'timer.jpg', 'image')
insert into #Media values (3, 'timer1.jpg', 'image')
insert into #Media values (4, 'harry_potter.mpeg', 'video')
insert into #Media values (5, 'harry_potter1.mpeg', 'video')
insert into #Media values (6, 'harry_potter2.mpeg', 'video')
insert into #FolderMedia values (1, 1)
insert into #FolderMedia values (1, 3)
insert into #FolderMedia values (1, 6)
insert into #FolderMedia values (2, 2)
insert into #FolderMedia values (2, 4)
select folderName, fk_userID, imageData.imgCount, videoData.videoCount from
#Folders
left outer join
(
select fk_folderID, COUNT(*) as imgCount
from #FolderMedia
inner join #Media
on fk_media = pk_media
and type = 'image'
group by fk_folderID
) as imageData
on imageData.fk_folderID = pk_folderID
left outer join
(
select fk_folderID, COUNT(*) as videoCount
from #FolderMedia
inner join #Media
on fk_media = pk_media
and type = 'video'
group by fk_folderID
) as videoData
on videoData.fk_folderID = pk_folderID
where fk_userID = 1000

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Counting rows of a subgroup while ignoring duplicates - sql

Related

Insert records from two tables that match

Can't write sql query with RANK() function

How to make an OUTER JOIN return ZERO instead of NULL

SQL Delete Where Not In with Composite Key

SQL Query to select the COUNT from another SQL query

Categories

Resources