TSQL - Number groups based on distinct values in certain columns - sql

Let's say I have a table like this:
| ID | ColA | ColB | ColC | ... |
|-----|------|------|------|-----|
| 1 | 111 | XXX | foo | |
| 1 | 111 | XXX | bar | |
| ... | ... | ... | ... | |
| 1 | 111 | YYY | foo | |
| 1 | 111 | YYY | bar | |
| ... | ... | ... | ... | |
| 1 | 999 | XXX | foo | |
| 1 | 999 | XXX | bar | |
| ... | ... | ... | ... | |
| 1 | 999 | YYY | foo | |
| 1 | 999 | YYY | bar | |
| ... | ... | ... | ... | |
| 2 | 111 | XXX | foo | |
| 2 | 111 | XXX | bar | |
| ... | ... | ... | ... | |
There are further columns to the right with all sorts of other values.
I want to partition this table in T-SQL into distinct groups only by columns "ID", "ColA" and "ColB", without regard to all other columns. Then I want to sequentially number those groups. My final result should look like this:
| ID | ColA | ColB | ColC | ... | GroupNumber |
|-----|------|------|------|-----|-------------|
| 1 | 111 | XXX | foo | | 1 |
| 1 | 111 | XXX | bar | | 1 |
| ... | ... | ... | ... | | ... |
| 1 | 111 | YYY | foo | | 2 |
| 1 | 111 | YYY | bar | | 2 |
| ... | ... | ... | ... | | ... |
| 1 | 999 | XXX | foo | | 3 |
| 1 | 999 | XXX | bar | | 3 |
| ... | ... | ... | ... | | ... |
| 1 | 999 | YYY | foo | | 4 |
| 1 | 999 | YYY | bar | | 4 |
| ... | ... | ... | ... | | ... |
| 2 | 111 | XXX | foo | | 5 |
| 2 | 111 | XXX | bar | | 5 |
| ... | ... | ... | ... | | ... |
It seems like this should be an easy problem but I struggle to get a handle on it. I have a certain suspicion that this should work somehow with DENSE_RANK and the partitioning clause in that function. My approach is:
SELECT
*,
DENSE_RANK() OVER(
PARTITION BY ID, ColA, ColB
ORDER BY ColC
) AS GroupNumber
FROM my_table
but this keeps increasing the GroupNumber within each one of these blocks as well.

If I'm understanding what you're looking for, you have the right idea, however you don't need to partition the data within the ranking function - you're looking for the rank of the combination of columns Id, ColA, and ColB within the entire dataset, not the rank of records within those combination of columns.
If that's the case, you simply would remove your partition clause in your dense_rank(), like this:
SELECT
*,
DENSE_RANK() OVER(ORDER BY ID, ColA, ColB) AS GroupNumber
FROM my_table
That assumes that you aren't trying to assign group #'s in any specific order other than the order of ID, ColA, and ColB, which I think is what you want, however you also used an "ORDER BY ColC" clause in your original example - I'm guessing you did that because you need to add an order by clause to a ranking function.
If you are however trying to order the groups a different way, would need to know that and would require something a little different.

Related

Script for TSQL

Please help me write a Transact-SQL script.
There is a table with conditional columns PK_ID (Number), FK_ID (Number) and SEQ (Number):
PK_ID is always unique.
Several PK_IDs can be assigned to one FK_ID.
SEQ is the sequence number for the PK_ID, in order of priority assignment to the FK_ID.
I just can’t write a script that would change the entries in SEQ from 1 to the sequence number of the last FK_ID entry in the PK_ID context, provided that initially all SEQ = 1.
What is now (Example):
| PK_ID| FK_ID | SEQ |
| ---- | ------ | - |
| 121 | 18821 | 1 |
| 358 | 18821 | 1 |
| 611 | 18821 | 1 |
| 7251 | 23357 | 1 |
| 95 | 23357 | 1 |
| 922 | 15511 | 1 |
| 1 | 53810 | 1 |
| 777 | 53810 | 1 |
| 953 | 53810 | 1 |
| 120 | 53810 | 1 |
Result:
| PK_ID| FK_ID | SEQ |
| ---- | ------ | - |
| 121 | 18821 | 1 |
| 358 | 18821 | 2 |
| 611 | 18821 | 3 |
| 7251 | 23357 | 1 |
| 95 | 23357 | 2 |
| 922 | 15511 | 1 |
| 1 | 53810 | 1 |
| 777 | 53810 | 2 |
| 953 | 53810 | 3 |
| 120 | 53810 | 4 |
Please help me and thanks in advance.
Assuming you want to update your table.
Note: The window functions are well worth your time getting comfortable with.
Example or dbFiddle
;with cte as (
Select *
,NewSEQ = row_number() over (partition by FK_ID order by PK_ID)
From YourTable
)
Update cte set SEQ=NewSEQ
Where SEQ<>NewSEQ
Results

postgresql count uniques

I have a structure pageviews like:
| main_group | subgroup | page | uid | viewcount |
-------------------------------------------------------------
| foo | targeted | A | 111 | 3 |
------------------------------------------------------------
| foo | targeted | B | 111 | 2 |
------------------------------------------------------------
| foo | targeted | A | 222 | 1 |
------------------------------------------------------------
| foo | targeted | A | 333 | 4 |
------------------------------------------------------------
| foo | targeted | B | 333 | 3 |
------------------------------------------------------------
| foo | external | A | 444 | 1 |
------------------------------------------------------------
| foo | external | A | 555 | 1 |
------------------------------------------------------------
| foo | external | B | 555 | 1 |
------------------------------------------------------------
So uids represent users who viewed a certain page viewcount number of times. But I only want the unique user counts, while keeping the group, subgroup, page information. I want this result:
| main_group | subgroup | page | unique_viewcount |
------------------------------------------------------------
| foo | targeted | A | 3 |
------------------------------------------------------------
| foo | targeted | B | 2 |
------------------------------------------------------------
| foo | external | A | 2 |
------------------------------------------------------------
| foo | external | B | 1 |
------------------------------------------------------------
I can't figure out how to write the select statement. I've tried:
select count (distinct (page, uid)) as unique_viewcount, main_group, subgroup, page
from pageviews
group by (main_group, subgroup, page, uid);
but each unique_viewcount is 1.
I think you just want count(distinct uid):
select main_group, subgroup, page, count(distinct uid) as unique_viewcount
from pageviews
group by main_group, subgroup, page;

Query to group 5 records

I have table for eg "employee" with just one column "id". Say you have records from 1 through 1000.
Employee
------------
ID
------------
1
2
3
..
..
999
1000
Now I would like to write a query which gives the following results i.e. sort by ascending order and concatenate first 5 to 1 record, second 5 to 2 second, and so on. Any ideas how I can do this?
Here is the output I am looking to have.
1,2,3,4,5
6,7,8,9,10
11,12,13,14,15
...........
...........
996,997,998,999,1000
Use row_number and listagg functions, in this way:
SELECT listagg( id, ',' ) within group( order by group_no, id )
FROM (
select id,
trunc((row_number() over( order by id ) -1) / 5) as group_no
from employee
)
GROUP BY group_no
Working demo: http://sqlfiddle.com/#!4/ef526/10
| LISTAGG(ID,',')WITHINGROUP(ORDERBYGROUP_NO,ID) |
|------------------------------------------------|
| 1,2,3,4,5 |
| 6,7,8,9,10 |
| 11,12,13,14,15 |
| 16,17,18,19,20 |
| 21,22,23,24,25 |
| 26,27,28,29,30 |
| 31,32,33,34,35 |
| 36,37,38,39,40 |
| 41,42,43,44,45 |
| 46,47,48,49,50 |
| 51,52,53,54,55 |
| 56,57,58,59,60 |
| 61,62,63,64,65 |
| 66,67,68,69,70 |
| 71,72,73,74,75 |
| 76,77,78,79,80 |
| 81,82,83,84,85 |
| 86,87,88,89,90 |
| 91,92,93,94,95 |
| 96,97,98,99,100 |
| 101,102,103,104,105 |
| 106,107,108,109,110 |
| 111,112,113,114,115 |
| 116,117,118,119,120 |
| 121,122,123,124,125 |
| 126,127,128,129,130 |
| 131,132,133,134,135 |
| 136,137,138,139,140 |
| 141,142,143,144,145 |
| 146,147,148,149,150 |
| 151,152,153,154,155 |
| 156,157,158,159,160 |
| 161,162,163,164,165 |
| 166,167,168,169,170 |
| 171,172,173,174,175 |
| 176,177,178,179,180 |
| 181,182,183,184,185 |
| 186,187,188,189,190 |
| 191,192,193,194,195 |
| 196,197,198,199,200 |

Hive Find Start and End of Group or Changing point

Here is the table:
+------+------+
| Name | Time |
+------+------+
| A | 1 |
| A | 2 |
| A | 3 |
| A | 4 |
| B | 5 |
| B | 6 |
| A | 7 |
| B | 8 |
| B | 9 |
| B | 10 |
+------+------+
I want to write a query to get:
+-------+--------+-----+
| Name | Start | End |
+-------+--------+-----+
| A | 1 | 4 |
| B | 5 | 6 |
| A | 7 | 7 |
| B | 8 | 10 |
+-------+--------+-----+
Does anyone know how to do it?
This is not the most efficient way, but it this works.
SELECT name, min(time) AS start,max(time) As end
FROM (
SELECT name,time, time- DENSE_RANK() OVER (partition by name ORDER BY
time) AS diff
FROM foo
) t
GROUP BY name,diff;
I would suggest try the following query and build a GenericUDF to identify the gaps, much more easier :)
SELECT name, sort_array(collect_list(time)) FROM foo GROUP BY name;

sort a table while keeping the hierarchy of rows

I have a table which represents the hierarchy of departments:
+-----------+--------------+--------------+--------------+-----------+-------+
| Top Dept. | 2-tier Dept. | 3-tire Dept. | 4-tier Dept. | name | tier |
+-----------+--------------+--------------+--------------+-----------+-------+
| 00 | | | | abc | 0 |
| | 00-01 | | | bcd | 1 |
| | | 00-01-01 | | cde | 2 |
| | | 00-01-02 | | abc | 2 |
| | 00-02 | | | aef | 1 |
| | | 00-02-01 | | qwe | 2 |
| | | 00-02-03 | | abc | 2 |
| | | | 00-02-03-01 | abc | 3 |
+-----------+--------------+--------------+--------------+-----------+-------+
now I want to sort the rows which are in the same tier by their names while keeping the hierarchy overall, That's what I expect:
+-----------+--------------+--------------+--------------+-----------+-------+
| Top Dept. | 2-tier Dept. | 3-tire Dept. | 4-tier Dept. | name | tier |
+-----------+--------------+--------------+--------------+-----------+-------+
| 00 | | | | abc | 0 |
| | 00-02 | | | aef | 1 |
| | | 00-02-03 | | abc | 2 |
| | | 00-02-01 | | qwe | 2 |
| | 00-01 | | | def | 1 |
| | | 00-01-02 | | abc | 2 |
| | | 00-01-01 | | cde | 2 |
| | | | 00-02-03-01 | abc | 3 |
+-----------+--------------+--------------+--------------+-----------+-------+
the missing data means null, I'm using Oracle DB, can anyone help me?
EDIT: Actually, it's a simple version of this sql, I've tried to add a new column which concats the values of the first four columns and then order by it and by name, but it did't work.
Update: This appears to be working... SQL Fiddle
All that was really needed from my original comment was to amend name to department in that order in both selects. This allows the engine to sort by name first, while maintaining the hierarchy.
WITH cte(Dept, superiorDept, name, depth, sort)AS (
SELECT
Dept,
superiorDept,
name,
0,
name|| dept
FROM hierarchy h
WHERE superiorDept IS NULL
UNION ALL
SELECT
h2.Dept,
h2.superiorDept,
h2.name,
cte.depth + 1,
cte.sort || h2.name ||h2.dept
FROM hierarchy h2
INNER JOIN cte ON h2.superiorDept = cte.Dept
)
SELECT
CASE WHEN depth = 0 THEN Dept END AS 一级部门,
CASE WHEN depth = 1 THEN Dept END AS 二级部门,
CASE WHEN depth = 2 THEN Dept END AS 三级部门,
CASE WHEN depth = 3 THEN Dept END AS 四级部门,
name,
depth,
sort
FROM cte
ORDER BY sort, name