Can't write sql query with RANK() function - sql

In my application I need to retrieve the best file from database with the following rules:
file, that has the most number of upvotes - priority 1
file, that has the most number of comments - priority 2
If there are only files that have no upvotes and no comments, then pick up just the random one.
My tables:
CREATE TABLE "FILES"
( "ID" NUMBER,
"OBJ_ID" NUMBER,
"NAME" VARCHAR2(30 BYTE)
) ;
CREATE TABLE "UPVOTES"
( "ID" NUMBER,
"TO_ID" NUMBER,
"TO_TYPE" NUMBER
) ;
COMMENT ON COLUMN "UPVOTES"."TO_TYPE" IS '0 obj, 1 file, 2 comment';
CREATE TABLE "COMMENTS"
( "ID" NUMBER,
"OBJ_ID" NUMBER,
"CONTENT" VARCHAR2(20 BYTE),
"TO_TYPE" NUMBER,
"TO_ID" NUMBER
) ;
COMMENT ON COLUMN "COMMENTS"."TO_TYPE" IS '0 object, 1 file';
Insert into FILES (ID,OBJ_ID,NAME) values ('1','1','best file for obj id = 1');
Insert into FILES (ID,OBJ_ID,NAME) values ('2','1','file obj1');
Insert into FILES (ID,OBJ_ID,NAME) values ('3','1','file obj1');
Insert into FILES (ID,OBJ_ID,NAME) values ('4','2','best file for obj id = 2');
Insert into FILES (ID,OBJ_ID,NAME) values ('5','2','file obj2');
Insert into FILES (ID,OBJ_ID,NAME) values ('6','3','only one file obj 3');
Insert into FILES (ID,OBJ_ID,NAME) values ('7','4','probilem file obj 4');
Insert into UPVOTES (ID,TO_ID,TO_TYPE) values ('1','1','1');
Insert into UPVOTES (ID,TO_ID,TO_TYPE) values ('2','1','1');
Insert into UPVOTES (ID,TO_ID,TO_TYPE) values ('3','7','0');
Insert into UPVOTES (ID,TO_ID,TO_TYPE) values ('4','2','0');
Insert into UPVOTES (ID,TO_ID,TO_TYPE) values ('5','2','0');
Insert into UPVOTES (ID,TO_ID,TO_TYPE) values ('6','2','0');
Insert into COMMENTS (ID,OBJ_ID,CONTENT,TO_TYPE,TO_ID) values ('1','1','comment 1','1','2');
Insert into COMMENTS (ID,OBJ_ID,CONTENT,TO_TYPE,TO_ID) values ('2','1','comment 2','1','2');
Insert into COMMENTS (ID,OBJ_ID,CONTENT,TO_TYPE,TO_ID) values ('3','2','comment 3','1','4');
My sql query:
SELECT obj_id, name FROM (
SELECT obj_id, name, rank, ROW_NUMBER() OVER (PARTITION BY obj_id ORDER BY rank) rownumb FROM (
SELECT f.obj_id, f.name, RANK() OVER (PARTITION BY f.obj_id ORDER BY COUNT(v.id) DESC, COUNT(DISTINCT com.id) DESC) rank
FROM files f
LEFT OUTER JOIN upvotes v
ON f.id = v.to_id
LEFT OUTER JOIN comments com
ON f.id = com.to_id
WHERE (v.to_type = 1 OR v.to_type IS NULL)
AND (com.to_type = 1 OR com.to_type IS NULL)
GROUP BY f.obj_id, f.name
)
)
WHERE rownumb = 1;
Expected result:
obj_id file name
1 best file for obj id = 1
2 best file for obj id = 2
3 only one file obj 3
4 probilem file obj 4
The problem is with the line:
(v.to_type = 1 OR v.to_type IS NULL)
It fails, because there are upvotes for objects (TO_TYPE = 0) with the same TO_ID as file ID, but I still need to count upvotes for files (TO_TYPE = 1).
Can somebody help me to figure it out?
I use Oracle Database 11g XE R2.

Replace
FROM files f
LEFT OUTER JOIN upvotes v
ON f.id = v.to_id
LEFT OUTER JOIN comments com
ON f.id = com.to_id
WHERE (v.to_type = 1 OR v.to_type IS NULL)
AND (com.to_type = 1 OR com.to_type IS NULL)
by
FROM files f
LEFT OUTER JOIN upvotes v ON f.id = v.to_id AND v.to_type = 1
LEFT OUTER JOIN comments com ON f.id = com.to_id AND com.to_type = 1

Related

Insert records from two tables that match

I have the following tables:
CREATE TABLE forms
(
ID INT NOT NULL,
NAME TEXT NOT NULL,
TITLE TEXT NOT NULL
);
CREATE TABLE new_forms
(
ID INT NOT NULL,
NAME TEXT NULL,
TITLE TEXT NULL
);
INSERT INTO forms VALUES (0, 'test', 'test');
INSERT INTO new_forms VALUES (0, 'new_test', NULL);
And I'm using the following query:
INSERT INTO forms(id, name, title)
SELECT
1, COALESCE(nf.name, f.name), COALESCE(nf.title, f.title)
FROM
forms f
LEFT OUTER JOIN
new_forms nf ON nf.id = f.id;
SELECT * FROM forms;
The idea is to add both rows that match to the table.
In this example this two new records should be added:
1 test test
1 new_test test
But it's only adding the last one.
I have tried with all the join and none of them worked.
Fiddle
Thanks
You are using a join in the query which will give you only 1 row. If you need 2 rows. You have to use UNION ALL clause -
INSERT INTO forms(id, name, title)
SELECT
1, COALESCE(nf.name, f.name), COALESCE(nf.title, f.title)
FROM
forms f
LEFT OUTER JOIN
new_forms nf ON nf.id = f.id
UNION ALL
SELECT
1, COALESCE(f.name, nf.name), COALESCE(nf.title, f.title)
FROM
forms f
LEFT OUTER JOIN
new_forms nf ON nf.id = f.id;

How do I pull all results form a table based on another tables many to one relationship?

Setup
Sorry for the poorly phrased question. I'm not sure how to phrase it better. Feel free to try your hand at it if you are more versed in sql phrasing.
I have 3 related tables.
Person => person_id, name, etc
Cases => case_id, person_id, incedent_date, etc
Files => file_id, case_id, file_path, etc
Problem
For a given case_id I want to pull all file_id's for the same person.
Requirements:
1 query.
without duplicates.
without using UNIQUE/DISTINCT flag.
without changing table structure.
e.g. Bob has 2 cases, auto and house.
He has 10 files on each case.
I have the case_id for auto.
I want the files for both auto and house (20 files).
My attempt
This returns all files for all cases.
SELECT
f.file_id AS id
FROM files f
LEFT JOIN Cases c1 ON f.case_id = c1.case_id
LEFT JOIN Cases c2 ON f.case_id = c2.case_id
WHERE (f.case_id = 3566 OR c1.person_id = c2.person_id)
AND f.active = 1
ORDER BY f.upload_date ASC
This returns files for only given case:
SELECT
f.file_id AS id
FROM files f
LEFT JOIN Cases c1 ON f.case_id = c1.case_id
LEFT JOIN Cases c2 ON f.case_id = c2.case_id
WHERE (f.case_id = 3566 OR (c1.case_id = 3566 AND c1.person_id = c2.person_id)
AND f.active = 1
ORDER BY f.upload_date ASC
This returns duplicate values and seems to pull only the given case:
SELECT
f.file_id AS id
FROM files f
LEFT JOIN Cases c1 ON f.case_id = c1.case_id
LEFT JOIN Cases c2 ON c1.person_id = c2.person_id
WHERE f.case_id = 3566
AND f.active = 1
ORDER BY f.upload_date ASC
I hope this is what you want.
Create table #Person (person_id int, name varchar(10))
Insert into #Person values (1,'Ajay')
Insert into #Person values (2,'Vijay')
Create table #Cases (case_id int, person_id int)
Insert into #Cases values (1,1)
Insert into #Cases values (2,1)
Insert into #Cases values (3,1)
Insert into #Cases values (4,2)
Create table #Files (file_id int, case_id int)
Insert into #Files values (1,1)
Insert into #Files values (2,1)
Insert into #Files values (3,1)
Insert into #Files values (4,2)
Insert into #Files values (5,4)
SELECT
f.file_id AS id
FROM #files f
LEFT JOIN #Cases c1 ON f.case_id = c1.case_id
LEFT JOIN #Cases c2 ON c1.person_id = c2.person_id and c2.case_id = 2
where c2.case_id is not null
--OR
SELECT *,
f.file_id AS id
FROM #files f
LEFT JOIN #Cases c1 ON f.case_id = c1.case_id
INNER JOIN #Cases c2 ON c1.person_id = c2.person_id and c2.case_id = 2

Dynamically update table with column from another table

I have a table customer like this:
CREATE TABLE tbl_customer (
id INTEGER,
name VARCHAR(16),
voucher VARCHAR(16)
);
and a voucher table like this:
CREATE TABLE tbl_voucher (
id INTEGER,
code VARCHAR(16)
);
Now imagine that the customer table always has rows with id and name filled in, however the voucher needs to be inserted periodically from the tbl_voucher table.
Important: every voucher may only be assigned to one specific customer (i.e. must be unique)
I wrote a query like this:
UPDATE tbl_customer
SET voucher = (
SELECT code
FROM tbl_voucher
WHERE code NOT IN (
SELECT voucher
FROM tbl_customer
WHERE voucher IS NOT NULL
)
LIMIT 1
)
WHERE voucher IS NULL;
However this is not working as expected, since the part that looks for an unused voucher is executed once and said voucher is then applied to every customer.
Any ideas on how I can solve this without using programming structures such as loops?
Also, some example data so you can imagine what I would like to happen:
INSERT INTO tbl_customer VALUES (1, 'Sara', 'ABC');
INSERT INTO tbl_customer VALUES (1, 'Simon', 'DEF');
INSERT INTO tbl_customer VALUES (1, 'Andy', NULL);
INSERT INTO tbl_customer VALUES (1, 'Alice', NULL);
INSERT INTO tbl_voucher VALUES (1, 'ABC');
INSERT INTO tbl_voucher VALUES (2, 'LOL');
INSERT INTO tbl_voucher VALUES (3, 'ZZZ');
INSERT INTO tbl_voucher VALUES (4, 'BBB');
INSERT INTO tbl_voucher VALUES (5, 'CCC');
After the wanted query is executed, I'd expect Andy to have the voucher LOL and Alice should get ZZZ
I am going to guess this is MySQL. The answer is that this is a pain. The following assigns the values in a select:
select c.*, v.voucher
from (select c.*, (#rnc := #rnc + 1) as rn
from tbl_customer c cross join
(select #rnc := 0) params
where c.voucher is null
) c join
(select v.*, (#rnv := #rnv + 1) as rn
from tbl_vouchers v cross join
(select #rnv := 0) params
where not exists (select 1 from tbl_customers c where c.voucher = v.voucher)
) v
on c.rn = v.rn;
You can now use this for the update:
update tbl_customer c join
(select c.*, v.voucher
from (select c.*, (#rnc := #rnc + 1) as rn
from tbl_customer c cross join
(select #rnc := 0) params
where c.voucher is null
) c join
(select v.*, (#rnv := #rnv + 1) as rn
from tbl_vouchers v cross join
(select #rnv := 0) params
where not exists (select 1 from tbl_customers c where c.voucher = v.voucher)
) v
on c.rn = v.rn
) cv
on c.id = cv.id
set c.voucher = cv.voucher;

Counting rows of a subgroup while ignoring duplicates

I can't find a way to describe my problem in an abstract and general manner, so I'll just provide a minimal example:
Let's say I have these 3 simple tables:
CREATE TABLE Document(
[Id] int IDENTITY(1, 1) NOT NULL PRIMARY KEY,
[Title] nvarchar(MAX),
[Patient] nvarchar(MAX)
);
CREATE TABLE Link(
DocumentId INT FOREIGN KEY REFERENCES Document(Id),
Text nvarchar(max)
);
CREATE TABLE ReadStatus(
DocumentId INT FOREIGN KEY REFERENCES Document(Id),
IsRead Bit NOT NULL,
UserId Int NOT NULL
);
We have a set of documents
A document can have 0 or more links
Documents can be read by users - this is tracked by the ReadStatus table, which associates a user with a document, and where IsRead=1 means the document has been read by that user and IsRead=0 means it hasn't been read by that user yet.
If, for document X and user A, a row does not exist in the ReadStatus table, we assume User A hasn't read document X yet.
Now, I need to run a query to select all patients. For each patient, I need the total number of documents available AND the number of documents that have already been read (i.e. IsRead=1). This is what I have so far:
SELECT d.Patient,
COUNT(DISTINCT d.Id) AS DocumentCount,
COUNT(NULLIF(rs.IsRead,0)) AS ReadDocumentCount,
COUNT(*) OVER () AS TotalPatientCount
FROM Document d
LEFT OUTER JOIN ReadStatus AS rs ON d.Id = rs.DocumentId AND rs.UserId = 123
INNER JOIN Link AS l ON d.Id = l.DocumentId AND l.Text IN ('Link W', 'Link X', 'Link T', 'Link Z')
GROUP BY d.Patient
The problem happens when a document (that has already been read) has more than one link. If that document has 3 links, the cartesian product produced by the INNER JOIN with the Link table will cause the ReadDocumentCount selection to be 3 instead of 1.
In other words, given this data:
INSERT INTO Document(Title, Patient) VALUES('Doc A', 'Mike')
INSERT INTO Document(Title, Patient) VALUES('Doc B', 'Mike')
INSERT INTO Link(DocumentId, Text) VALUES(1, N'Link W')
INSERT INTO Link(DocumentId, Text) VALUES(1, N'Link X')
INSERT INTO Link(DocumentId, Text) VALUES(1, N'Link Y')
INSERT INTO Link(DocumentId, Text) VALUES(2, N'Link Z')
INSERT INTO ReadStatus(DocumentID, IsRead, UserId) VALUES(1, 1, 123)
INSERT INTO ReadStatus(DocumentID, IsRead, UserId) VALUES(2, 0, 123)
I'm getting this as a result:
Patient DocumentCount ReadDocumentCount TotalPatientCount
Mike 2 3 1
Whereas this is what I want:
Patient DocumentCount ReadDocumentCount TotalPatientCount
Mike 2 1 1
SQL fiddle: http://sqlfiddle.com/#!6/e06bf/3
You can use COUNT(DISTINCT) conditionally as well:
SELECT d.Patient,
COUNT(DISTINCT d.Id) AS DocumentCount,
COUNT(DISTINCT (CASE WHEN rs.IsRead <> 0 THEN d.id END)) AS ReadDocumentCount,
COUNT(*) OVER () AS TotalPatientCount
FROM Document d LEFT OUTER JOIN
ReadStatus rs
ON d.Id = rs.DocumentId AND rs.UserId = 123 INNER JOIN
Link l
ON d.Id = l.DocumentId AND l.Text IN ('Link W', 'Link X', 'Link T', 'Link Z')
GROUP BY d.Patient;

Need assistance with SQL query

I have 3 tables that I'm trying to create a query from:
Table 1 (iuieEmployee) ->position number
Table 2 (jbEmployeeH1BInfo) -> position number, LCA number, start date
Table 3 (jbEmployeeLCA) -> LCA number
Table 4 (jbInternationsl) -> Main demographic table
I have a query that works fine where there's only 1 record in each table, but tables 2 and 3 can have multiple records. I want it to find the record with he most recent start date and verify that there is a matching LCA number in the 3rd table and a matching position number int he first table and show me any records where this isn't the case. How can I accomplish this? I currently have:
SELECT DISTINCT jbInternational.idnumber, jbInternational.lastname, jbInternational.firstname, jbInternational.midname,
jbInternational.campus, jbInternational.universityid, jbInternational.sevisid, jbInternational.citizenship,
jbInternational.immigrationstatus, jbEmployeeH1BInfo.lcaNumber AS lcaNumber1, jbEmployeeLCA.lcaNumber AS lcaNumber2
FROM (select jbEmployeeH1BInfo.idnumber, MAX(jbEmployeeH1BInfo.approvalStartDate) AS MaxDateStamp FROM [internationalservices].[dbo].jbEmployeeH1BInfo GROUP BY idnumber ) my
INNER JOIN [internationalservices].[dbo].jbEmployeeH1BInfo WITH (nolock) ON my.idnumber=jbEmployeeH1BInfo.idnumber AND my.MaxDateStamp=jbEmployeeH1BInfo.approvalStartDate
INNER JOIN [internationalservices].[dbo].jbInternational WITH (nolock) ON jbInternational.idnumber=jbEmployeeH1BInfo.idnumber
inner join [internationalservices].[dbo].jbEmployeeLCA ON jbInternational.idnumber = jbEmployeeLCA.idnumber
WHERE jbInternational.idnumber not in(
SELECT DISTINCT jbInternational.idnumber
FROM (select distinct jbEmployeeH1BInfo.idnumber, MAX(jbEmployeeH1BInfo.approvalStartDate) AS MaxDateStamp
FROM [internationalservices].[dbo].jbEmployeeH1BInfo GROUP BY idnumber ) my
INNER JOIN [internationalservices].[dbo].jbEmployeeH1BInfo WITH (nolock) ON my.idnumber=jbEmployeeH1BInfo.idnumber AND my.MaxDateStamp=jbEmployeeH1BInfo.approvalStartDate
INNER JOIN [internationalservices].[dbo].jbInternational WITH (nolock) ON jbInternational.idnumber=jbEmployeeH1BInfo.idnumber
inner join [internationalservices].[dbo].jbEmployeeLCA ON jbInternational.idnumber = jbEmployeeLCA.idnumber
AND jbEmployeeH1BInfo.lcaNumber = jbEmployeeLCA.lcaNumber)
Table Schema:
create table iuieEmployee(idnumber int, POS_NBR varchar(8));
insert into iuieEmployee values(123456, '470V13');
insert into iuieEmployee values(123457, '98X000');
insert into iuieEmployee values(123458, '98X000');
insert into iuieEmployee values(123455, '98X000');
create table jbEmployeeH1BInfo (idnumber int, approvalStartDate smalldatetime, lcaNumber varchar(20), positionNumber varchar(200));
insert into jbEmployeeH1BInfo values (123456, 07/01/2012, '1-200-3000', '98X000');
insert into jbEmployeeH1BInfo values (123456, 07/30/2013, '1-200-4000', '470V13');
insert into jbEmployeeH1BInfo values (123457, 07/01/2012, '1-200-5000', '98X000');
insert into jbEmployeeH1BInfo values (123458, 07/01/2012, '1-200-6000', '98X000');
insert into jbEmployeeH1BInfo values (123455, 07/30/2014, '1-200-7000', '98X000');
insert into jbEmployeeH1BInfo values (123455, 07/01/2012, '1-200-8000', '470V13');
create table jbEmployeeLCA (idnumber int, lcaNumber varchar(20));
insert into jbEmployeeLCA values (123456, 1-200-3000);
insert into jbEmployeeLCA values (123456, 1-200-4111);
insert into jbEmployeeLCA values (123457, 1-200-5000);
insert into jbEmployeeLCA values (123458, 1-200-6000);
insert into jbEmployeeLCA values (123455, 1-200-7000);
insert into jbEmployeeLCA values (123455, 1-200-8000);
create table jbInternational(idnumber int);
insert into jbInternational values(123456);
insert into jbInternational values(123457);
insert into jbInternational values(123458);
insert into jbInternational values(123455);
Should only return 1 line:
123456, 07/30/2013, '1-200-4000'
but is instead returning two lines:
123456, 07/30/2013, '1-200-4000 (not matching 1-200-4111)
123456, 07/30/2013, '1-200-4000 (not matching 1-200-3000)
It shouldn't return the second row because the position number with the -3000 lca number doesn't have the most current date.
Your explanation is hard to understand. I guess if you could explain it well, then you could probably write the query yourself. Here's what I think you meant:
Employee contains the main records.
You want to find all idnumbers such that
idnumber is in International
the H1BInfo record with the most recent approvalStartDate does not have an LCA number matching the LCA record
The first thing to do is to simplify that H1BInfo table. We are only looking for the rows with the most recent approvalStartDate. We can do that by partitioning by idnumber and ordering by approvalStartDate:
with rankedH1BInfo as (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY jbEmployeeH1BInfo.idnumber
ORDER BY jbEmployeeH1BInfo.approvalStartDate desc) as r
FROM [internationalservices].[dbo].jbEmployeeH1BInfo
)
Let's only get the first row of each partition:
, MostRecentH1BInfo as (
SELECT * FROM rankedH1BInfo
WHERE r = 1
)
Now we can do the join to find all the good ones:
, goodIDs as (
SELECT i.idnumber
FROM [internationalservices].[dbo].jbInternational i WITH (NOLOCK)
JOIN [internationalservices].[dbo].jbEmployeeLCA l WITH (NOLOCK) on l.idnumber = i.idnumber
JOIN MostRecentH1BInfo h WITH (NOLOCK) on h.idnumber = i.idnumber
JOIN iuieEmployee e WITH (NOLOCK) on e.positionNumber = h.positionNumber
WHERE h.lcaNumber = l.lcaNumber
)
To put it all together and get the ones where this is false:
with rankedH1BInfo as (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY jbEmployeeH1BInfo.idnumber
ORDER BY jbEmployeeH1BInfo.approvalStartDate desc) as r
FROM [internationalservices].[dbo].jbEmployeeH1BInfo
), MostRecentH1BInfo as (
SELECT * FROM rankedH1BInfo
WHERE r = 1
), goodIDs as (
SELECT i.idnumber
FROM [internationalservices].[dbo].jbInternational i WITH (NOLOCK)
JOIN [internationalservices].[dbo].jbEmployeeLCA l WITH (NOLOCK) on l.idnumber = i.idnumber
JOIN MostRecentH1BInfo h WITH (NOLOCK) on h.idnumber = i.idnumber
JOIN iuieEmployee e WITH (NOLOCK) on e.positionNumber = h.positionNumber
WHERE h.lcaNumber = l.lcaNumber
)
SELECT DISTINCT jbInternational.idnumber, jbInternational.lastname, jbInternational.firstname, jbInternational.midname,
jbInternational.campus, jbInternational.universityid, jbInternational.sevisid, jbInternational.citizenship,
jbInternational.immigrationstatus, jbEmployeeH1BInfo.lcaNumber AS lcaNumber1, jbEmployeeLCA.lcaNumber AS lcaNumber2
FROM (select jbEmployeeH1BInfo.idnumber, MAX(jbEmployeeH1BInfo.approvalStartDate) AS MaxDateStamp FROM [internationalservices].[dbo].jbEmployeeH1BInfo GROUP BY idnumber ) my
INNER JOIN [internationalservices].[dbo].jbEmployeeH1BInfo WITH (nolock) ON my.idnumber=jbEmployeeH1BInfo.idnumber AND my.MaxDateStamp=jbEmployeeH1BInfo.approvalStartDate
INNER JOIN [internationalservices].[dbo].jbInternational WITH (nolock) ON jbInternational.idnumber=jbEmployeeH1BInfo.idnumber
inner join [internationalservices].[dbo].jbEmployeeLCA ON jbInternational.idnumber = jbEmployeeLCA.idnumber
WHERE jbInternational.idnumber not in (select idnumber from goodIDs)