postgresql count uniques - sql

I have a structure pageviews like:
| main_group | subgroup | page | uid | viewcount |
-------------------------------------------------------------
| foo | targeted | A | 111 | 3 |
------------------------------------------------------------
| foo | targeted | B | 111 | 2 |
------------------------------------------------------------
| foo | targeted | A | 222 | 1 |
------------------------------------------------------------
| foo | targeted | A | 333 | 4 |
------------------------------------------------------------
| foo | targeted | B | 333 | 3 |
------------------------------------------------------------
| foo | external | A | 444 | 1 |
------------------------------------------------------------
| foo | external | A | 555 | 1 |
------------------------------------------------------------
| foo | external | B | 555 | 1 |
------------------------------------------------------------
So uids represent users who viewed a certain page viewcount number of times. But I only want the unique user counts, while keeping the group, subgroup, page information. I want this result:
| main_group | subgroup | page | unique_viewcount |
------------------------------------------------------------
| foo | targeted | A | 3 |
------------------------------------------------------------
| foo | targeted | B | 2 |
------------------------------------------------------------
| foo | external | A | 2 |
------------------------------------------------------------
| foo | external | B | 1 |
------------------------------------------------------------
I can't figure out how to write the select statement. I've tried:
select count (distinct (page, uid)) as unique_viewcount, main_group, subgroup, page
from pageviews
group by (main_group, subgroup, page, uid);
but each unique_viewcount is 1.

I think you just want count(distinct uid):
select main_group, subgroup, page, count(distinct uid) as unique_viewcount
from pageviews
group by main_group, subgroup, page;

Related

TSQL - Number groups based on distinct values in certain columns

Let's say I have a table like this:
| ID | ColA | ColB | ColC | ... |
|-----|------|------|------|-----|
| 1 | 111 | XXX | foo | |
| 1 | 111 | XXX | bar | |
| ... | ... | ... | ... | |
| 1 | 111 | YYY | foo | |
| 1 | 111 | YYY | bar | |
| ... | ... | ... | ... | |
| 1 | 999 | XXX | foo | |
| 1 | 999 | XXX | bar | |
| ... | ... | ... | ... | |
| 1 | 999 | YYY | foo | |
| 1 | 999 | YYY | bar | |
| ... | ... | ... | ... | |
| 2 | 111 | XXX | foo | |
| 2 | 111 | XXX | bar | |
| ... | ... | ... | ... | |
There are further columns to the right with all sorts of other values.
I want to partition this table in T-SQL into distinct groups only by columns "ID", "ColA" and "ColB", without regard to all other columns. Then I want to sequentially number those groups. My final result should look like this:
| ID | ColA | ColB | ColC | ... | GroupNumber |
|-----|------|------|------|-----|-------------|
| 1 | 111 | XXX | foo | | 1 |
| 1 | 111 | XXX | bar | | 1 |
| ... | ... | ... | ... | | ... |
| 1 | 111 | YYY | foo | | 2 |
| 1 | 111 | YYY | bar | | 2 |
| ... | ... | ... | ... | | ... |
| 1 | 999 | XXX | foo | | 3 |
| 1 | 999 | XXX | bar | | 3 |
| ... | ... | ... | ... | | ... |
| 1 | 999 | YYY | foo | | 4 |
| 1 | 999 | YYY | bar | | 4 |
| ... | ... | ... | ... | | ... |
| 2 | 111 | XXX | foo | | 5 |
| 2 | 111 | XXX | bar | | 5 |
| ... | ... | ... | ... | | ... |
It seems like this should be an easy problem but I struggle to get a handle on it. I have a certain suspicion that this should work somehow with DENSE_RANK and the partitioning clause in that function. My approach is:
SELECT
*,
DENSE_RANK() OVER(
PARTITION BY ID, ColA, ColB
ORDER BY ColC
) AS GroupNumber
FROM my_table
but this keeps increasing the GroupNumber within each one of these blocks as well.
If I'm understanding what you're looking for, you have the right idea, however you don't need to partition the data within the ranking function - you're looking for the rank of the combination of columns Id, ColA, and ColB within the entire dataset, not the rank of records within those combination of columns.
If that's the case, you simply would remove your partition clause in your dense_rank(), like this:
SELECT
*,
DENSE_RANK() OVER(ORDER BY ID, ColA, ColB) AS GroupNumber
FROM my_table
That assumes that you aren't trying to assign group #'s in any specific order other than the order of ID, ColA, and ColB, which I think is what you want, however you also used an "ORDER BY ColC" clause in your original example - I'm guessing you did that because you need to add an order by clause to a ranking function.
If you are however trying to order the groups a different way, would need to know that and would require something a little different.

SQL Query - Add column data from another table adding nulls

I have 2 tables, tableStock and tableParts:
tableStock
+----+----------+-------------+
| ID | Num_Part | Description |
+----+----------+-------------+
| 1 | sr37 | plate |
+----+----------+-------------+
| 2 | sr56 | punch |
+----+----------+-------------+
| 3 | sl30 | crimper |
+----+----------+-------------+
| 4 | mp11 | holder |
+----+----------+-------------+
tableParts
+----+----------+-------+
| ID | Location | Stock |
+----+----------+-------+
| 1 | A | 2 |
+----+----------+-------+
| 3 | B | 5 |
+----+----------+-------+
| 5 | C | 2 |
+----+----------+-------+
| 7 | A | 1 |
+----+----------+-------+
And I just want to do this:
+----+----------+-------------+----------+-------+
| ID | Num_Part | Description | Location | Stock |
+----+----------+-------------+----------+-------+
| 1 | sr37 | plate | A | 2 |
+----+----------+-------------+----------+-------+
| 2 | sr56 | punch | NULL | NULL |
+----+----------+-------------+----------+-------+
| 3 | sl30 | crimper | B | 5 |
+----+----------+-------------+----------+-------+
| 4 | mp11 | holder | NULL | NULL |
+----+----------+-------------+----------+-------+
List ALL the rows of the first table and if the second table has the info, in this case 'location' and 'stock', add to the column, if not, just null.
I have been using inner and left join but some rows of the first table disappear because the lack of data in the second one:
select tableStock.ID, tableStock.Num_Part, tableStock.Description, tableParts.Location, tableParts.Stock from tableStock inner join tableParts on tableStock.ID = tableParts.ID;
What can I do?
You can use left join. Here is the demo.
select
s.ID,
Num_Part,
Description,
Location,
Stock
from Stock s
left join Parts p
on s.ID = p.ID
order by
s.ID
output:
| id | num_part | description | location | stock |
| --- | -------- | ----------- | -------- | ----- |
| 1 | sr37 | plate | A | 2 |
| 2 | sr56 | punch | NULL | NULL |
| 3 | sl30 | crimper | B | 5 |
| 4 | mp11 | holder | NULL | NULL |

Query to group 5 records

I have table for eg "employee" with just one column "id". Say you have records from 1 through 1000.
Employee
------------
ID
------------
1
2
3
..
..
999
1000
Now I would like to write a query which gives the following results i.e. sort by ascending order and concatenate first 5 to 1 record, second 5 to 2 second, and so on. Any ideas how I can do this?
Here is the output I am looking to have.
1,2,3,4,5
6,7,8,9,10
11,12,13,14,15
...........
...........
996,997,998,999,1000
Use row_number and listagg functions, in this way:
SELECT listagg( id, ',' ) within group( order by group_no, id )
FROM (
select id,
trunc((row_number() over( order by id ) -1) / 5) as group_no
from employee
)
GROUP BY group_no
Working demo: http://sqlfiddle.com/#!4/ef526/10
| LISTAGG(ID,',')WITHINGROUP(ORDERBYGROUP_NO,ID) |
|------------------------------------------------|
| 1,2,3,4,5 |
| 6,7,8,9,10 |
| 11,12,13,14,15 |
| 16,17,18,19,20 |
| 21,22,23,24,25 |
| 26,27,28,29,30 |
| 31,32,33,34,35 |
| 36,37,38,39,40 |
| 41,42,43,44,45 |
| 46,47,48,49,50 |
| 51,52,53,54,55 |
| 56,57,58,59,60 |
| 61,62,63,64,65 |
| 66,67,68,69,70 |
| 71,72,73,74,75 |
| 76,77,78,79,80 |
| 81,82,83,84,85 |
| 86,87,88,89,90 |
| 91,92,93,94,95 |
| 96,97,98,99,100 |
| 101,102,103,104,105 |
| 106,107,108,109,110 |
| 111,112,113,114,115 |
| 116,117,118,119,120 |
| 121,122,123,124,125 |
| 126,127,128,129,130 |
| 131,132,133,134,135 |
| 136,137,138,139,140 |
| 141,142,143,144,145 |
| 146,147,148,149,150 |
| 151,152,153,154,155 |
| 156,157,158,159,160 |
| 161,162,163,164,165 |
| 166,167,168,169,170 |
| 171,172,173,174,175 |
| 176,177,178,179,180 |
| 181,182,183,184,185 |
| 186,187,188,189,190 |
| 191,192,193,194,195 |
| 196,197,198,199,200 |

SQL compare multiple rows or partitions to find matches

The database I'm working on is DB2 and I have a problem similar to the following scenario:
Table Structure
-------------------------------
| Teacher Seating Arrangement |
-------------------------------
| PK | seat_argmt_id |
| | teacher_id |
-------------------------------
-----------------------------
| Seating Arrangement |
-----------------------------
|PK FK | seat_argmt_id |
|PK | Row_num |
|PK | seat_num |
|PK | child_name |
-----------------------------
Table Data
------------------------------
| Teacher Seating Arrangement|
------------------------------
| seat_argmt_id | teacher_id |
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
| 5 | 2 |
------------------------------
---------------------------------------------------
| Seating Arrangement |
---------------------------------------------------
| seat_argmt_id | row_num | seat_num | child_name |
| 1 | 1 | 1 | Abe |
| 1 | 1 | 2 | Bob |
| 1 | 1 | 3 | Cat |
| | | | |
| 2 | 1 | 1 | Abe |
| 2 | 1 | 2 | Bob |
| 2 | 1 | 3 | Cat |
| | | | |
| 3 | 1 | 1 | Abe |
| 3 | 1 | 2 | Cat |
| 3 | 1 | 3 | Bob |
| | | | |
| 4 | 1 | 1 | Abe |
| 4 | 1 | 2 | Bob |
| 4 | 1 | 3 | Cat |
| 4 | 2 | 2 | Dan |
---------------------------------------------------
I want to see where there are duplicate seating arrangements for a teacher. And by duplicates I mean where the row_num, seat_num, and child_name are the same among different seat_argmt_id for one teacher_id. So with the data provided above, only seat id 1 and 2 are what I would want to pull back, as they are duplicates on everything but the seat id. If all the children on the 2nd table are exact (sans the primary & foreign key, which is seat_argmt_id in this case), I want to see that.
My initial thought was to do a count(*) group by row#, seat#, and child. Everything with a count of > 1 would mean it's a dupe and = 1 would mean it's unique. That logic only works if you are comparing single rows though. I need to compare multiple rows. I cannot figure out a way to do it via SQL. The solution I have involves going outside of SQL and works (probably). I'm just wondering if there is a way to do it in DB2.
Does this do what you want?
select d.teacher_id, sa.row_num, sa.seat_num, sa.child_name
from seatingarrangement sa join
data d
on sa.seat_argmt_id = d.seat_argmt_id
group by d.teacher_id, sa.row_num, sa.seat_num, sa.child_name
having count(*) > 1;
EDIT:
If you want to find two arrangements that are the same:
select sa1.seat_argmt_id, sa2.seat_argmt_id
from seatingarrangement sa1 join
seatingarrangement sa2
on sa1.seat_argmt_id < sa2.seat_argmt_id and
sa1.row_num = sa2.row_num and
sa1.seat_num = sa2.seat_num and
sa1.child_name = sa2.child_name
group by sa1.seat_argmt_id, sa2.seat_argmt_id
having count(*) = (select count(*) from seatingarrangement sa where sa.seat_argmt_id = sa1.seat_argmt_id) and
count(*) = (select count(*) from seatingarrangement sa where sa.seat_argmt_id = sa2.seat_argmt_id);
This finds the matches between two arrangements and then verifies that the counts are correct.

sort a table while keeping the hierarchy of rows

I have a table which represents the hierarchy of departments:
+-----------+--------------+--------------+--------------+-----------+-------+
| Top Dept. | 2-tier Dept. | 3-tire Dept. | 4-tier Dept. | name | tier |
+-----------+--------------+--------------+--------------+-----------+-------+
| 00 | | | | abc | 0 |
| | 00-01 | | | bcd | 1 |
| | | 00-01-01 | | cde | 2 |
| | | 00-01-02 | | abc | 2 |
| | 00-02 | | | aef | 1 |
| | | 00-02-01 | | qwe | 2 |
| | | 00-02-03 | | abc | 2 |
| | | | 00-02-03-01 | abc | 3 |
+-----------+--------------+--------------+--------------+-----------+-------+
now I want to sort the rows which are in the same tier by their names while keeping the hierarchy overall, That's what I expect:
+-----------+--------------+--------------+--------------+-----------+-------+
| Top Dept. | 2-tier Dept. | 3-tire Dept. | 4-tier Dept. | name | tier |
+-----------+--------------+--------------+--------------+-----------+-------+
| 00 | | | | abc | 0 |
| | 00-02 | | | aef | 1 |
| | | 00-02-03 | | abc | 2 |
| | | 00-02-01 | | qwe | 2 |
| | 00-01 | | | def | 1 |
| | | 00-01-02 | | abc | 2 |
| | | 00-01-01 | | cde | 2 |
| | | | 00-02-03-01 | abc | 3 |
+-----------+--------------+--------------+--------------+-----------+-------+
the missing data means null, I'm using Oracle DB, can anyone help me?
EDIT: Actually, it's a simple version of this sql, I've tried to add a new column which concats the values of the first four columns and then order by it and by name, but it did't work.
Update: This appears to be working... SQL Fiddle
All that was really needed from my original comment was to amend name to department in that order in both selects. This allows the engine to sort by name first, while maintaining the hierarchy.
WITH cte(Dept, superiorDept, name, depth, sort)AS (
SELECT
Dept,
superiorDept,
name,
0,
name|| dept
FROM hierarchy h
WHERE superiorDept IS NULL
UNION ALL
SELECT
h2.Dept,
h2.superiorDept,
h2.name,
cte.depth + 1,
cte.sort || h2.name ||h2.dept
FROM hierarchy h2
INNER JOIN cte ON h2.superiorDept = cte.Dept
)
SELECT
CASE WHEN depth = 0 THEN Dept END AS 一级部门,
CASE WHEN depth = 1 THEN Dept END AS 二级部门,
CASE WHEN depth = 2 THEN Dept END AS 三级部门,
CASE WHEN depth = 3 THEN Dept END AS 四级部门,
name,
depth,
sort
FROM cte
ORDER BY sort, name