Running total over duplicate column values and no other columns

Running total over duplicate column values and no other columns - sql

I want to do running total but there is no unique column or id column to be used in over clause.
CREATE TABLE piv2([name] varchar(5), [no] int);
INSERT INTO piv2
([name], [no])
VALUES
('a', 1),
('a', 2),
('a', 3),
('a', 4),
('b', 1),
('b', 2),
('b', 3);
there are only 2 columns, name which has duplicate values and the no on which I want to do running total in SQL Server 2017 .
expected result:
a 1
a 3
a 6
a 10
b 11
b 13
b 16
Any help?

The following query would generate the output you expect, at least for the exact sample data you did show us:
SELECT
name,
SUM(no) OVER (ORDER BY name, no) AS no_sum
FROM piv2;
If the order you intend to use for the rolling sum is something other than the order given by the name and no columns, then you should reveal that logic along with sample data.

Related

Count Distinct not working as expected, output is equal to count

I have a table where I'm trying to count the distinct number of members per group. I know there's duplicates based on the count(distinct *) function. But when I try to group them into the group and count distinct, it's not spitting out the number I'd expect.
select count(distinct memberid), count(*)
from dbo.condition c
output:
count
count
303,781
348,722
select groupid, count(*), count(distinct memberid)
from dbo.condition c
group by groupid
output:
groupid
count
count
2
19,984
19,984
3
25,689
25,689
5
14,400
14,400
24
56,058
56,058
25
200,106
200,106
29
27,847
27,847
30
1,370
1,370
31
3,268
3,268
The numbers in the second query equate when they shouldn't be. Does anyone know what I'm doing wrong? I need the 3rd column to be equal to 303,781 not 348,722.
Thanks!

There's nothing wrong with your second query. Since you're aggregating on the "groupid" field, the output you get tells you that there are no duplicates within the same groupid of the "memberid" values (basically counting values equates to counting distinctively).
On the other hand, in the first query the aggregation happens without any partitioning, whose output hints there are duplicate values across different "groupid" values.
Took the liberty of adding of an example that corroborates your answer:
create table aa (groupid int not null, memberid int not null );
insert into aa (groupid, memberid)
values
(1, 1), (1, 2), (1, 3), (2, 1), (3, 1), (3, 2), (3, 3), (4, 1), (4, 2), (4, 3), (4, 5), (5, 3)
select groupid, count(*), count(distinct memberid)
from aa group by groupid;
select count(*), count(distinct memberid)
from aa

SELECT and COUNT data in a specific range

I would like to check all records for a certain range (1-10) and output the quantity. If there is no record with the value in the database, 0 should also be output.
Example database:
CREATE TABLE exampledata (
ID int,
port int,
name varchar(255));
Example data:
INSERT INTO exampledata
VALUES (1, 1, 'a'), (2, 1, 'b'), (3, 2, 'c'), (4, 2, 'd'), (5, 3, 'e'), (6, 4, 'f'), (7, 8, 'f');
My example query would be:
SELECT
port,
count(port) as amount
FROM exampledata
GROUP BY port
Which would result in:
port
amount
1
2
2
2
3
1
4
1
8
1
But I need it to look like that:
port
amount
1
2
2
2
3
1
4
1
5
0
6
0
7
0
8
1
9
0
10
0
I have thought about a join with a database that has the values 1-10 but this does not seem efficient. Several attempts with different case and if structures were all unsuccessful...
I have prepared the data in a db<>fiddle.

This "simple" answer here would be to use an inline tally. As you just want the values 1-10, this can be achieved with a simple VALUES table construct:
SELECT V.I AS Port,
COUNT(ed.ID) AS Amount
FROM (VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10))V(I)
LEFT JOIN dbo.exampledata ed ON V.I = ed.port
GROUP BY V.I;
Presumably, however, you actually have a table of ports, and so what you should be doing is LEFT JOINing from that:
SELECT P.PortID AS Port,
COUNT(ed.ID) AS Amount
FROM dbo.Port P
LEFT JOIN dbo.exampledata ed ON P.PortID = ed.port
WHERE P.PortID BETWEEN 1 AND 10
GROUP BY V.I;
If you don't have a table of ports (why don't you?), and you need to parametrise the values, I suggest using a actual Tally Table or Tally function; a search of these will give you a wealth of resources on how to create these.

running sums, find blocks of rows that sum to given list of values

here is the test data:
declare #trial table (id int, val int)
insert into #trial (id, val)
values (1, 1), (2, 3),(3, 2), (4, 4), (5, 5),(6, 6), (7, 7), (8, 2),(9, 3), (10, 4), (11, 6),(12, 10), (13, 5), (14, 3),(15, 2) ;
select * from #trial order by id asc
description of data:
i have a list of n values that represent sums. assume they are (10, 53) for this example. the values in the #trial can be both negative & positive. note that the values in #trial will always sum to the given sums.
description of pattern:
10 in this example is the 1st sum i want to match & 53 is the 2nd sum i want to match. the dataset has been set up in such a way that a block of consecutive rows will always sum to these sums with this feature: in this example, the 1st 4 rows sum to 10, & then the next 11 rows sum to 53. the dataset will always have this feature. in other words, the 1st given sum can be found from summing 1 to ith row, then 2nd sum from i + 1 row to jth row, & so on....
finally i want an id to identify the groups of rows that sum to the given sums. so in this example, 1 to 4th row will take id 1, 5th to 15th row will take id 2.

This answers the original question.
From what you describe you can do something like this:
select v.grp, t.*
from (select t.*, sum(val) over (order by id) as running_val
from #trial t
) t left join
(select grp lag(upper, 1, -1) over (order by upper) as lower, uper
from (values (1, 10), (2, 53)) v(grp, upper)
) v
on t.running_val > lower and
t.running_val <= v.upper

How to specify a linear programming-like constraint (i.e. max number of rows for a dimension's attributes) in SQL server?

I'm looking to assign unique person IDs to a marketing program, but need to optimize based on each person's Probability Score (some people can be sent to multiple programs, some only one) and have two constraints such as budgeted mail quantity for each program.
I'm using SQL Server and am able to put IDs into their highest scoring program using the row_number() over(partition by person_ID order by Prob_Score), but I need to return a table where each ID is assigned to a program, but I'm not sure how to add the max mail quantity constraint specific to each individual program. I've looked into the Check() constraint functionality, but I'm not sure if that's applicable.
create table test_marketing_table(
PersonID int,
MarketingProgram varchar(255),
ProbabilityScore real
);
insert into test_marketing_table (PersonID, MarketingProgram, ProbabilityScore)
values (1, 'A', 0.07)
,(1, 'B', 0.06)
,(1, 'C', 0.02)
,(2, 'A', 0.02)
,(3, 'B', 0.08)
,(3, 'C', 0.13)
,(4, 'C', 0.02)
,(5, 'A', 0.04)
,(6, 'B', 0.045)
,(6, 'C', 0.09);
--this section assigns everyone to their highest scoring program,
--but this isn't necessarily what I need
with x
as
(
select *, row_number()over(partition by PersonID order by ProbabilityScore desc) as PersonScoreRank
from test_marketing_table
)
select *
from x
where PersonScoreRank='1';
I also need to specify some constraints: two max C packages, one max A & one max B package can be sent. How can I reassign the IDs to a program while also using the highest probability score left available?
The final result should look like:
PersonID MarketingProgram ProbabilityScore PersonScoreRank
3 C 0.13 1
6 C 0.09 1
1 A 0.07 1
6 B 0.045 2

You need to rethink your ROW_NUMBER() formula based on your actual need, and you should also have a table of Marketing Programs to make this work efficiently. This covers the basic ideas you need to incorporate to efficiently perform the filtering you need.
MarketingPrograms Table
CREATE TABLE MarketingPrograms (
ProgramID varchar(10),
PeopleDesired int
)
Populate the MarketingPrograms Table
INSERT INTO MarketingPrograms (ProgramID, PeopleDesired) Values
('A', 1),
('B', 1),
('C', 2)
Use the MarketingPrograms Table
with x as (
select *,
row_number()over(partition by ProgramId order by ProbabilityScore desc) as ProgramScoreRank
from test_marketing_table
)
select *
from x
INNER JOIN MarketingPrograms m
ON x.MarketingProgram = m.ProgramID
WHERE x.ProgramScoreRank <= m.PeopleDesired

SQL Select: Do rows matching id all have the same column value

I have a table like this
sub_id reference
1 A
1 A
1 A
1 A
1 A
1 A
1 C
2 B
2 B
3 D
3 D
I want to make sure all the references in each group have the same reference.
Meaning, for example, all references in:
group 1 should be A
group 2 should be B
group 3 should be D
If they are not, then I would like to have returned a list of sub_id's.
So for the table above my result would be: 1
Ideally, with these conditions reference would be in a separate table with sub_id as PK, but I need to fix first for a massive dataset before I can move on restructuring the database.

You could use the following method:
select t.sub_id
from YourTable t
group by t.sub_id
having max(t.reference) <> min(t.reference)
Change YourTable to suit.

Are you looking for simple aggregation ?
select sub_id
from table t
group by sub_id
having count(distinct reference) > 1;

The query you want:
SELECT sub_id
FROM test_sub
GROUP BY sub_id HAVING count(DISTINCT reference) > 1
;
Here is what I used to test it:
CREATE TABLE `test_sub` (
sub_id int(11) NOT NULL,
reference varchar(45) DEFAULT NULL
);
INSERT INTO test_sub (sub_id, reference) VALUES
(1, 'A'),
(1, 'A'),
(1, 'A'),
(1, 'A'),
(1, 'C'),
(2, 'B'),
(2, 'B'),
(3, 'D'),
(3, 'D'),
(3, 'D'),
(4, 'E'),
(4, 'E'),
(4, 'E'),
(5, 'F'),
(5, 'G')
;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Running total over duplicate column values and no other columns - sql

Related

Count Distinct not working as expected, output is equal to count

SELECT and COUNT data in a specific range

running sums, find blocks of rows that sum to given list of values

How to specify a linear programming-like constraint (i.e. max number of rows for a dimension's attributes) in SQL server?

SQL Select: Do rows matching id all have the same column value

Categories

Resources