Rank of partitions in T-SQL - sql

Given the following table:
CREATE TABLE #values (ID int, TYPE nchar(2), NUMBER int)
INSERT INTO #values values (1, 'A', 0)
INSERT INTO #values values (2, 'A', 0)
INSERT INTO #values values (3, 'B', 1)
INSERT INTO #values values (4, 'A', 1)
INSERT INTO #values values (5, 'B', 2)
SELECT * FROM #values
I would like to generate this table:
Id | T | N | COUNT
------------------
1 | A | 0 | 1000
2 | A | 0 | 1000
3 | B | 1 | 1001
4 | A | 1 | 1002
5 | B | 2 | 1003
How can I do this in T-SQL?
I've been fiddling with ROW_NUMBER() OVER(PARTITION BY) but this does not solve the problem, as it resets the count at each partition, which is not what I would like to do.

I think you're looking for dense_rank:
SELECT
ID,
TYPE,
NUMBER,
DENSE_RANK() over (order by TYPE, Number)
FROM #values
This produces
1 A 0 1
2 A 0 1
4 A 1 2
3 B 1 3
5 B 2 4

Related

SQL group by and sum based on distinct value in other column (sum once if value in other column is duplicated)

I need help with a group-by query. My table looks like this:
CREATE MULTISET TABLE MY_TABLE (PERSON CHAR(1), ITEM CHAR(1), COST INT);
INSERT INTO MY_TABLE VALUES ('A', '1', 5);
INSERT INTO MY_TABLE VALUES ('A', '1', 5);
INSERT INTO MY_TABLE VALUES ('A', '2', 1);
INSERT INTO MY_TABLE VALUES ('B', '3', 0);
INSERT INTO MY_TABLE VALUES ('B', '4', 10);
INSERT INTO MY_TABLE VALUES ('B', '4', 10);
INSERT INTO MY_TABLE VALUES ('C', '5', 1);
INSERT INTO MY_TABLE VALUES ('C', '5', 1);
INSERT INTO MY_TABLE VALUES ('C', '5', 1);
+--------+------+------+
| PERSON | ITEM | COST |
+--------+------+------+
| A | 1 | 5 |
| A | 1 | 5 |
| A | 2 | 1 |
| B | 3 | 0 |
| B | 4 | 10 |
| B | 4 | 10 |
| C | 5 | 1 |
| C | 5 | 1 |
| C | 5 | 1 |
+--------+------+------+
I need to group items and costs by person, but in different ways. For each person, I need the number of unique items they have. Ex: Person A has two distinct items, item 1 and item 2. I can get this with COUNT(DISTINCT ITEM).
Then for each person, I need to sum the cost but only once per distinct item (for duplicate items, the cost is always the same). Ex: Person A has item 1 for $5, item 1 for $5, and item 2 for $1. Since this person has item 1 twice, I count the $5 once, and then add the $1 from item 2 for a total of $6. The output should look like this:
+--------+---------------------+------------------------+
| PERSON | ITEM_DISTINCT_COUNT | COST_DISTINCT_ITEM_SUM |
+--------+---------------------+------------------------+
| A | 2 | 6 |
| B | 2 | 10 |
| C | 1 | 1 |
+--------+---------------------+------------------------+
Is there an easy way to do this that performs good on a lot of rows?
SELECT PERSON
,COUNT(DISTINCT ITEM) ITEM_DISTINCT_COUNT
-- help with COST_DISTINCT_ITEM_SUM
FROM MY_TABLE
GROUP BY PERSON
You can make a subquery which gets the distinct values of item and cost for each person, and then aggregate over that:
SELECT PERSON,
COUNT(ITEM) AS ITEM_DISTINCT_COUNT,
SUM(COST) AS COST_DISTINCT_ITEM_SUM
FROM (
SELECT DISTINCT PERSON, ITEM, COST
FROM MY_TABLE
) M
GROUP BY PERSON
Output:
PERSON ITEM_DISTINCT_COUNT COST_DISTINCT_ITEM_SUM
A 2 6
B 2 10
C 1 1
Demo on dbfiddle
I recommend a two levels of aggregation:
select person, count(*) as num_items, sum(cost)
from (select person, item, avg(cost) as cost
from my_table t
group by person, item
) t
group by person;

Classify records based on matching table

I have two tables: ITEMS and MATCHING_ITEMS, as below:
ITEMS:
|---------------------|------------------|
| ID | Name |
|---------------------|------------------|
| 1 | A |
| 2 | B |
| 3 | C |
| 4 | D |
| 5 | E |
| 6 | F |
| 7 | G |
|---------------------|------------------|
MATCHING_ITEMS:
|---------------------|------------------|
| ID_1 | ID_2 |
|---------------------|------------------|
| 1 | 2 |
| 1 | 3 |
| 2 | 3 |
| 4 | 5 |
| 4 | 6 |
| 5 | 6 |
|---------------------|------------------|
The MATCHING_ITEMS table defines items that match each other, and thus belong to the same group, i.e. items 1,2, and 3 match with each other and thus belong in a group, and the same for items 4,5, and 6. Item 7 does not have a match belong to any group.
I now need to add a 'Group' column on the ITEMS table which contains a unique integer for each group, so it would look as follows:
ITEMS:
|---------------------|------------------|------------------|
| ID | Name | Group |
|---------------------|------------------|------------------|
| 1 | A | 1 |
| 2 | B | 1 |
| 3 | C | 1 |
| 4 | D | 2 |
| 5 | E | 2 |
| 6 | F | 2 |
| 7 | G | NULL |
|---------------------|------------------|------------------|
So far I have been using a stored procedure to do this, looping over each line in the MATCHING_ITEMS table and updating the ITEMS table with a group value. The problem is that I eventually need to do this for a table containing millions of records, and the looping method is far too slow.
Is there a way that I can achieve this without using a loop?
If you have all pairs of matches in the matching table, then you can just use the minimum id to assign the group. For this:
select i.*,
(case when grp_id is not null
then dense_rank() over (order by grp_id)
end) as grouping
from items i left join
(select mi.id_1, least(mi.id1, min(mi.id2)) as grp_id
from matching_items mi
group by mi.id_1
) mi
on i.id = mi.id_1;
Note: This works only if all pairs are in the matching items table. Otherwise, you will need a recursive/hierarchical query to get all the pairs.
You could use min and max at first, then dense_rank to assign group numbers:
select id, name, dense_rank() over (order by mn, mx) grp
from (
select distinct id, name,
min(id_1) over (partition by name) mn,
max(id_2) over (partition by name) mx
from items left join matching_items on id in (id_1, id_2))
order by id
demo
The pairs 2,3 and 5,6 in the Matching_items table seem redundant as they could be derived (if I am reading your question right)
Here is how I did it. I just reused id_1 from your example as the group no:
create table
items (
ID number,
name varchar2 (2)
);
insert into items values (1, 'A');
insert into items values (2, 'B');
insert into items values (3, 'C');
insert into items values (4, 'D');
insert into items values (5, 'E');
insert into items values (6, 'F');
insert into items values (7, 'G');
create table
matching_items (
ID number,
ID_2 number
);
insert into matching_items values (1, 2);
insert into matching_items values (1, 3);
insert into matching_items values (2, 3);
insert into matching_items values (4, 5);
insert into matching_items values (4, 6);
insert into matching_items values (5, 6);
with new_grp as
(
select id, id_2, id as group_no
from matching_items
where id in (select id from items)
and id not in (select id_2 from matching_items)),
assign_grp as
(
select id, group_no
from new_grp
union
select id_2, group_no
from new_grp)
select items.id, name, group_no
from items left outer join assign_grp
on items.id = assign_grp.id;

Running sum for subgroups of records in SQL Server

I need to calculate a running total for groups of data within a table (in SQL Server 2014). Please see the example below. I need to calculate the RunningTotalByID column.
Any thoughts/ideas would be much appreciated.
Thanks!
SQL Server 2014 supports SUM() OVER (PARTITION BY ... ORDER BY ...), which calculates running sum:
DECLARE #T TABLE (ID char(1), Value int);
INSERT INTO #T (ID, Value) VALUES
('A', 1),
('A', 1),
('B', 1),
('B', 1),
('B', 1),
('C', 1),
('C', 1);
SELECT
ID
,Value
,SUM(Value) OVER (PARTITION BY ID ORDER BY Value
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunningTotal
FROM #T
ORDER BY ID, RunningTotal;
Result
+----+-------+--------------+
| ID | Value | RunningTotal |
+----+-------+--------------+
| A | 1 | 1 |
| A | 1 | 2 |
| B | 1 | 1 |
| B | 1 | 2 |
| B | 1 | 3 |
| C | 1 | 1 |
| C | 1 | 2 |
+----+-------+--------------+

How to select shipments that have one activity but doesn't have another one

Simplified version of a table
Table ActivityHistory:
ActivityHistoryid(PK) | ShipmentID | ActivityCode | Datetime
1 | 1 | CodeA |
2 | 1 | CodeB |
3 | 1 | CodeC |
4 | 2 | CodeA |
5 | 3 | CodeA |
6 | 3 | CodeB |
7 | 4 | CodeC |
This table contains the list of activities that occurred to given shipments.
Task: I need to select shipments(shipment ids) that has "CodeA" and doesn't have a "CodeC" activity.
In this example, shipment id 2 and 3 will match the criteria.
Table Shipment: (ShipmentID(PK), other shipment related columns)
Thank you.
Try this one -
Query:
DECLARE #temp TABLE
(
ActivityHistoryid INT
, ShipmentID INT
, ActivityCode VARCHAR(20)
)
INSERT INTO #temp (ActivityHistoryid, ShipmentID, ActivityCode)
VALUES
(1, 1, 'CodeA'),
(2, 1, 'CodeB'),
(3, 1, 'CodeC'),
(4, 2, 'CodeA'),
(5, 3, 'CodeA'),
(6, 3, 'CodeB'),
(7, 4, 'CodeC')
SELECT *
FROM #temp t
WHERE ActivityCode = 'CodeA'
AND NOT EXISTS(
SELECT 1
FROM #temp t2
WHERE t2.ActivityCode = 'CodeC'
AND t2.ShipmentID = t.ShipmentID
)
Output:
ActivityHistoryid ShipmentID ActivityCode
----------------- ----------- --------------------
4 2 CodeA
5 3 CodeA

SELECT inherit values from parent in a hierarchy

I'm trying to acheive through T-SQL (in a stored procedure) a way to copy a value from a parent into the child when retrieving rows. Here is some example data:
DROP TABLE TEST_LEVELS
CREATE TABLE TEST_LEVELS(
ID INT NOT NULL
,VALUE INT NULL
,PARENT_ID INT NULL
,LEVEL_NO INT NOT NULL
)
INSERT INTO TEST_LEVELS (ID, VALUE, PARENT_ID, LEVEL_NO) VALUES (1, 10000, NULL, 1)
INSERT INTO TEST_LEVELS (ID, VALUE, PARENT_ID, LEVEL_NO) VALUES (2, NULL, 1, 2)
INSERT INTO TEST_LEVELS (ID, VALUE, PARENT_ID, LEVEL_NO) VALUES (3, NULL, 2, 3)
INSERT INTO TEST_LEVELS (ID, VALUE, PARENT_ID, LEVEL_NO) VALUES (4, 20000, NULL, 1)
INSERT INTO TEST_LEVELS (ID, VALUE, PARENT_ID, LEVEL_NO) VALUES (5, NULL, 4, 2)
INSERT INTO TEST_LEVELS (ID, VALUE, PARENT_ID, LEVEL_NO) VALUES (6, 25000, 5, 3)
INSERT INTO TEST_LEVELS (ID, VALUE, PARENT_ID, LEVEL_NO) VALUES (7, NULL, 6, 4)
Selecting the data as follows:
SELECT ID, VALUE, LEVEL_NO
FROM TEST_LEVELS
results in:
+----+-------+----------+
| ID | VALUE | LEVEL_NO |
+----+-------+----------+
| 1 | 10000 | 1 |
| 2 | NULL | 2 |
| 3 | NULL | 3 |
| 4 | 20000 | 1 |
| 5 | NULL | 2 |
| 6 | 25000 | 3 |
| 7 | NULL | 4 |
+----+-------+----------+
But I need something like this (values are inherited by the parent):
+----+-------+----------+
| ID | VALUE | LEVEL_NO |
+----+-------+----------+
| 1 | 10000 | 1 |
| 2 | 10000 | 2 |
| 3 | 10000 | 3 |
| 4 | 20000 | 1 |
| 5 | 20000 | 2 |
| 6 | 25000 | 3 |
| 7 | 25000 | 4 |
+----+-------+----------+
Can this be achieved without using cursors (it must also run on SQL Server 2005)?
Use:
;with cte
as
(
select t.ID, t.VALUE, t.PARENT_ID, t.LEVEL_NO
from #t t
where t.Value is not null
union all
select t.ID, c.Value, t.PARENT_ID, t.LEVEL_NO
from cte c
join #t t on t.PARENT_ID = c.ID
where t.Value is null
)
select c.ID, c.Value, c.LEVEL_NO
from cte c
order by c.ID
Output:
ID Value LEVEL_NO
----------- ----------- -----------
1 10000 1
2 10000 2
3 10000 3
4 20000 1
5 20000 2
6 25000 3
7 25000 4
Maybe something like this:
;WITH cte_name(ID,VALUE,PARENT_ID,LEVEL_NO)
AS
(
SELECT
tbl.ID,
tbl.VALUE,
tbl.PARENT_ID,
tbl.LEVEL_NO
FROM
TEST_LEVELS AS tbl
WHERE
tbl.PARENT_ID IS NULL
UNION ALL
SELECT
tbl.ID,
ISNULL(tbl.VALUE,cte_name.VALUE),
tbl.PARENT_ID,
tbl.LEVEL_NO
FROM
cte_name
JOIN TEST_LEVELS AS tbl
ON cte_name.ID=tbl.PARENT_ID
)
SELECT
*
FROM
cte_name
ORDER BY
ID
One way to do it:
SELECT T.ID,
case when T.VALUE IS NULL
THEN (SELECT A.VALUE FROM TEST_LEVELS A WHERE A.ID = T.PARENT_ID)
ELSE T.VALUE
END,
T.LEVEL_NO
FROM TEST_LEVELS T