MySQL: multiple grouping - sql

So I have an example table called items with the following columns:
item_id (int)
person_id (int)
item_name (varchar)
item_type (varchar) - examples: "news", "event", "document"
item_date (datetime)
...and a table person with the following columns: "person_id", "person_name".
I was hoping to display a list of the top 2 submitters (+ the COUNT() of items submitted) in a given time period for each item_type. Here's basically what I was hoping the MySQL output would look like:
person_name | item_type | item_count
Steve Jobs | document | 11
Bill Gates | document | 6
John Doe | event | 4
John Smith | event | 2
Bill Jones | news | 24
Bill Nye | news | 21
How is this possible without making a separate query for each item_type? Thanks in advance!

SELECT item_type, person_name, item_count
FROM (
SELECT item_type, person_name, item_count,
#r := IFNULL(#r, 0) + 1 AS rc,
CASE WHEN #_item_type IS NULL OR #_item_type <> item_type THEN #r := 0 ELSE 1 END,
#_item_type := item_type,
FROM (
SELECT #r := 0,
#_item_type := NULL
) vars,
(
SELECT item_type, person_name, COUNT(*) AS item_count
FROM items
GROUP BY
item_type, person_name
ORDER BY
item_type, person_name, item_count DESC
) vo
) voi
WHERE rc < 3

Something like this shoul work:
SELECT
p.person_name, i.item_type, COUNT(1) AS item_count
FROM
person p
LEFT JOIN item i
ON p.person_id = i.person_id
GROUP BY
p.person_id,
i.item_type
HAVING
COUNT(1) >= (
SELECT
COUNT(1)
FROM
item i2
WHERE
i2.item_type = i.item_type
GROUP BY
i2.person_id
LIMIT 1,1
)

I think this should do it:
SELECT person_name,item_type,count(item_id) AS item_count
FROM person
LEFT JOIN items USING person_id
GROUP BY person_id
The "item_type" column is going to be dodgy though, each row represents multiple items, and you're only showing the item_type from one of them. You can list all of them with "GROUP_CONCAT", that's a lot of fun.

Related

SELECT only rows when count=1 - without additional SELECT or/ and having

I wonder if there is a way to build a query without joins or/and having clause that would return the same result as the query below? I already found similar question (select and count rows) but didn't find the answer.
SELECT ID, CATEGORY, PRODUCT, DESC
FROM SALES s
JOIN (SELECT ID, COUNT(CATEGORY)
FROM SALES
GROUP by ID
HAVING count(CATEGORY)=1) S2 ON S.ID=S2.ID;
So the table looks like
ID | Country | Product | DESC
1 | USA | Cream | Super cream
1 | Canada | Toothpaste| Great Toothpaste
2 | Germany | Beer | Tasty Beer
and the result I would like to get is
ID | Country | Product | DESC
2 | Germany | Beer | Tasty Beer
because id=1 has 2 different countries assigned
I'm using SQL Server
In general I'm interested in the 'fastest' solution. The table is huge and I just wonder if there is a way to do it smarter.
you may want to consider this query.
select t2.id, t2.category, t2.product, t2.desc from (
select id, category, product,
case when (select count(1) from sales where id=t1.id group by id) as ct
,desc
from sales t1) t2 where t2.ct = 1
You can try this Query:
SELECT ID, CATEGORY, PRODUCT, DESC
FROM SALES s
WHERE 1 = (
SELECT COUNT(*)
FROM SALES x
WHERE x.ID = s.ID
);
One method uses window functions:
SELECT ID, CATEGORY, PRODUCT, DESC
FROM (SELECT s.*, COUNT(*) OVER (PARTITION BY ID) as cnt
FROM SALES s
) s
WHERE cnt = 1;
However, the fastest solution would require a unique id and an index. That would be:
select s.*
from sales s
where not exists (select 1
from sales s2
where s2.id = s.id and
s2.<unique key> <> s.<unique key>
);
This can take advantage of an index on (id, <unique key>).
Note: This particular formulation assumes that category is never null.

Postgres: select all row with count of a field greater than 1

i have table storing product price information, the table looks similar to, (no is the primary key)
no name price date
1 paper 1.99 3-23
2 paper 2.99 5-25
3 paper 1.99 5-29
4 orange 4.56 4-23
5 apple 3.43 3-11
right now I want to select all the rows where the "name" field appeared more than once in the table. Basically, i want my query to return the first three rows.
I tried:
SELECT * FROM product_price_info GROUP BY name HAVING COUNT(*) > 1
but i get an error saying:
column "product_price_info.no" must appear in the GROUP BY clause or be used in an aggregate function
SELECT *
FROM product_price_info
WHERE name IN (SELECT name
FROM product_price_info
GROUP BY name HAVING COUNT(*) > 1)
Try this:
SELECT no, name, price, "date"
FROM (
SELECT no, name, price, "date",
COUNT(*) OVER (PARTITION BY name) AS cnt
FROM product_price_info ) AS t
WHERE t.cnt > 1
You can use the window version of COUNT to get the population of each name partition. Then, in an outer query, filter out name partitions having a population that is less than 2.
Window Functions are really nice for this.
SELECT p.*, count(*) OVER (PARTITION BY name) FROM product p;
For a full example:
CREATE TABLE product (no SERIAL, name text, price NUMERIC(8,2), date DATE);
INSERT INTO product(name, price, date) values
('paper', 1.99, '2017-03-23'),
('paper', 2.99, '2017-05-25'),
('paper', 1.99, '2017-05-29'),
('orange', 4.56, '2017-04-23'),
('apple', 3.43, '2017-03-11')
;
WITH report AS (
SELECT p.*, count(*) OVER (PARTITION BY name) as count FROM product p
)
SELECT * FROM report WHERE count > 1;
Gives:
no | name | price | date | count
----+--------+-------+------------+-------
1 | paper | 1.99 | 2017-03-23 | 3
2 | paper | 2.99 | 2017-05-25 | 3
3 | paper | 1.99 | 2017-05-29 | 3
(3 rows)
Self join version, use a sub-query that returns the name's that appears more than once.
select t1.*
from tablename t1
join (select name from tablename group by name having count(*) > 1) t2
on t1.name = t2.name
Basically the same as IN/EXISTS versions, but probably a bit faster.
SELECT name, count(name)
FROM product_price_info
GROUP BY name
HAVING COUNT(name) > 1
LIMIT 3

Using table 'as' across a join

I am trying to find all of the skills a every user doesn't have for a position.
I know this is incorrect, but I can't think of a way to make this work.
This is what I'm trying to do:
select id, count(skillcode)
from person p, (
select skillcode from requires_skill where poscode='1'
minus
select skillcode from hasskill where id=p.id)
group by p.id;
The part that isn't working is id=p.id.
I am using Oracle SQL.
Edit:
These are the sample tables
requires_skill
------------------
poscode | skillcode
-------------------
1 | 2
1 | 3
1 | 4
hasskill
--------------------
id | skillcode
--------------------
1 | 2
2 | 2
2 | 3
Expected output:
id | count(skillcode)
--------------------------
1 | 2
2 | 1
You can use scalar subquery like this
select id, (select count(1) from requires_skill) - cnt from (select id, count(1) cnt from hasskill group by id);
It works as long as there is foreign key relation ship between requires_skill and hasskill.
I think you need to JOIN the person and hasskill tables and use NOT EXISTS instead of MINUS.
Something like,
SELECT p.ID,
COUNT(h.skillcode)
FROM person p
JOIN hasskill h
ON p.ID = h.ID
WHERE NOT EXISTS
( SELECT skillcode FROM requires_skill WHERE poscode='1'
)
GROUP BY p.ID;
UPDATE
Regarding your question about the JOIN, you need to use an ALIAS for the sub-query.
For example,
WITH DATA AS
( SELECT skillcode FROM requires_skill WHERE poscode='1'
MINUS
SELECT skillcode FROM hasskill h, person p WHERE h.ID=p.ID
)
SELECT p.ID,
COUNT(a.skillcode)
FROM person p
JOIN data A
ON a.skillcode = p.skillcode
GROUP BY p.ID;

Compare Multiple rows In SQL Server

I have a SQL Server database full of the following (fictional) data in the following structure:
ID | PatientID | Exam | (NON DB COLUMN FOR REFERENCE)
------------------------------------
1 | 12345 | CT | OK
2 | 11234 | CT | OK(Same PID but Different Exam)
3 | 11234 | MRI | OK(Same PID but Different Exam)
4 | 11123 | CT | BAD(Same PID, Same Exam)
5 | 11123 | CT | BAD(Same PID, Same Exam)
6 | 11112 | CT | BAD(Conflicts With ID 8)
7 | 11112 | MRI | OK(SAME PID but different Exam)
8 | 11112 | CT | BAD(Conflicts With ID 6)
9 | 11123 | CT | BAD(Same PID, Same Exam)
10 | 11123 | CT | BAD(Same PID, Same Exam)
I am trying to write a query with will go through an identify everything that isn't bad as per my example above.
Overall, a patient (identified by PatientId) can have many rows, but may not have 2 or more rows with the same exam!
I have attempted various modifications of exams I found on here but still with no luck.
Thanks.
You seem to want to identify duplicates, ranking them as good or bad. Here is a method using window functions:
select t.id, t.patientid, t.exam,
(case when cnt > 1 then 'BAD' else 'OK' end)
from (select t.*, count(*) over (partition by patientid, exam) as cnt
from table t
) t;
use Count() over() :
select *,case when COUNT(*) over(partition by PatientID, Exam) > 1 then 'bad' else 'ok'
from yourtable
You can also use:
;WITH CTE_Patients
(ID, PatientID, Exam, RowNumber)
AS
(
SELECT ID, PatientID, Exam
ROW_NUMBER() OVER (PARTITION BY PatientID, Exam ORDER BY ID)
FROM YourTableName
)
SELECT TableB.ID, TableB.PatientID, TableB.Exam, [DuplicateOf] = TableA.ID
FROM CTE_Patients TableB
INNER JOIN CTE_Patients TableA
ON TableB.PatientID = TableA.PatientID
AND TableB.Exam = TableA.Exam
WHERE TableB.RowNumber > 1 -- Duplicate rows
AND TableA.RowNumber = 1 -- Unique rows
I have a sample here: SQL Server – Identifying unique and duplicate rows in a table, you can identify unique rows as well as duplicate rows
If you don't want to use a CTE or Count Over, you can also group the Source table, and select from there...(but I'd be surprised if #Gordon was too far off the mark with the original answer :) )
SELECT a.PatientID, a.Exam, CASE WHEN a.cnt > 1 THEN 'BAD' ELSE 'OK' END
FROM ( SELECT PatientID
,Exam
,COUNT(*) AS cnt
FROM tableName
GROUP BY Exam
,PatientID
) a
Select those patients that never have 2 or more exams of same type.
select * from patients t1
where not exists (select 1 from patients t2
where t1.PatientID = t2.PatientID
group by exam
having count(*) > 1)
Or, if you want all rows, like in your example:
select ID,
PatientID,
Exam,
case when exists (select 1 from patients t2
where t1.PatientID = t2.PatientID
group by exam
having count(*) > 1) then 'BAD' else 'OK' end
from patients

Tsql looping father-son relationship between tables

I have a table like this:
table item
(
id int,
quantity float,
father int, -- refer to item itself in case of subitem
)
I need to sum al quantity plus sons quantity like this way:
select i.id, max(i.quantity)+sum(ft.quantity) as quantity
from item i
left join item ft on ft.id=i.id
group by i.id
My trouble is because relationship between father-son is recursive so I would like to sum also his grandfather quantity and so on... and i don't know the maximum deepness, than I can not join many times.
What can i do?
Thank you.
You have to use a recursive CTE. Somthing like this:
;WITH FathersSonsTree
AS
(
SELECT Id, quantity, 0 AS Level
FROM Items WHERE fatherid IS NULL
UNION ALL
SELECT c.id, c.quantity, p.level+1
FROM FathersSonsTree p
INNER JOIN items c ON c.fatherid = p.id
), ItemsWithMaxQuantities
AS
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY level
ORDER BY quantity DESC) rownum
FROM FathersSonsTree
)
SELECT
ID,
(SELECT MAX(Quantity)
FROM FathersSonsTree t3
WHERE t3.level = t1.level
) +
ISNULL((SELECT SUM(t2.Quantity)
FROM FathersSonsTree t2
WHERE t1.level - t2.level = 1), 0)
FROM FathersSonsTree t1
ORDER BY ID;
SQL Fiddle Demo
This will give you something like:
| ID | QUANTITY |
-----------------
| 1 | 10 |
| 2 | 20 |
| 3 | 20 |
| 4 | 20 |
| 5 | 32 |
| 6 | 32 |
| 7 | 32 |
| 8 | 32 |
You might try building a recursive CTE (common table expression) as described in this article on SQLAuthority:
http://blog.sqlauthority.com/2012/04/24/sql-server-introduction-to-hierarchical-query-using-a-recursive-cte-a-primer/
The author, Pinal Dave, discusses using a recursive CTE on an employees table that has a self referencing foreign key for ManagerID to return a list of employees with a count of how many levels are between them and the top of the hierarchy where the employee has no manager (ManagerID = NULL). That's not exactly what you're wanting but it might get you started.
I did a little experimentation and ended up with something very similar to Mahmoud Gamal's solution but with a slight difference to include the not just the parent, grandparents, great-grandparents, etc. quantity but also the child quantity.
Here's the test table I used:
CREATE TABLE Items(ID int IDENTITY
CONSTRAINT PK_Items PRIMARY KEY,
Quantity int NOT NULL,
ParentID int NULL
CONSTRAINT FK_Item_Parents REFERENCES Items(ID));
And the data:
ID Quantity ParentID
------------------------------------------------------------
1 10 {NULL}
2 10 1
3 10 2
4 10 3
5 10 2
Here's my recursive query:
WITH cteRecursiveItems
AS (SELECT Id,
quantity,
0
AS Level
FROM Items
WHERE ParentID IS NULL
UNION ALL
SELECT i.id,
i.quantity,
cri.level + 1
FROM
cteRecursiveItems cri
INNER JOIN items i ON i.ParentID = cri.id)
SELECT ID,
Quantity + (
SELECT MAX(Quantity)
FROM cteRecursiveItems cri3
WHERE cri3.level = cri1.level) + (
SELECT SUM(cri2.Quantity)
FROM cteRecursiveItems cri2
WHERE cri1.level - cri2.level = 1) as Total
FROM cteRecursiveItems cri1
ORDER BY ID;
And here's the results I get from running it against the test table:
ID Total
----------------------------------------
1 {NULL}
2 30
3 30
4 40
5 30
It still needs a little tweaking because the first and 2nd row are off by 10. Row 1 should have a total of 10 and row 2 should have a total of 20. I'm making a note to try and fix that when I get home. Can't spend too much of my employer's time on this right now. :) The other rows have the value I was expecting.