How do you create a conditional count across multiple fields? - sql

I have the huge table of over 100 million rows of data which is joined to another reference table that I want to create a conditional count for.
The first table is the large one which is an audit log and contains data which lists data on countries and contains a date of audit.
The second table is a smaller table which contains relational data to the audit log.
The first part is the easy bit which is to identify which audit data I want to see. I have the following code to identify this:
select aud.*
from audit_log aud
join database db on db.id=aud.release_id
where aud.event_description like '% opted in'
and r.creation_source = 'system_a'
This gives me the data in the following format:
Country Event Description Audit Date
Czech Republic Czech Republic has been automatically opted in 11-AUG-14 07.01.52.606000000
Denmark Denmark has been automatically opted in 12-AUG-15 07.01.53.239000000
Denmark Denmark has been automatically opted in 11-SEP-15 07.01.53.902000000
Dominican Republic Dominican Republic has been automatically opted in 11-SEP-15 07.01.54.187000000
Ecuador Ecuador has been automatically opted in 11-DEC-14 07.01.54.427000000
Ecuador Ecuador has been automatically opted in 11-NOV-14 07.01.54.679000000
The number of results from this query still returns over 5 million rows so I cannot export the data to Excel to create a count.
My two main issues are the number of rows and the date format of the 'Audit Date' field.
Ideally I want to create a count which shows the data as:
Country |Aug-14|Nov-14|Dec-14|Aug-15|Sep-15
Czech Republic | 1 | | | |
Denmark | | | | 1 | 1
Dominican Republic | | | | | 1
Ecuador | | 1 | 1 | |
Any idea's on how I extract the month and year and drop the figures into column by country?
Thanks
Edit - Thank you xQbert for you solution, it worked perfectly!
The problem now is that I have run into a new problem.
I need to constrain the count by another query, but there is no unique identifier between the tables involved.
For example, I amended your query to fit my db:
select cty.country_name,
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='AUG-2014' then 1 else 0 end) as "AUG-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='SEP-2014' then 1 else 0 end) as "SEP-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='OCT-2014' then 1 else 0 end) as "OCT-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='NOV-2014' then 1 else 0 end) as "NOV-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='DEC-2014' then 1 else 0 end) as "DEC-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='JAN-2015' then 1 else 0 end) as "JAN-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='FEB-2015' then 1 else 0 end) as "FEB-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='MAR-2015' then 1 else 0 end) as "MAR-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='APR-2015' then 1 else 0 end) as "APR-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='MAY-2015' then 1 else 0 end) as "MAY-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='JUN-2015' then 1 else 0 end) as "JUN-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='JUL-2015' then 1 else 0 end) as "JUL-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='AUG-2015' then 1 else 0 end) as "AUG-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='SEP-2015' then 1 else 0 end) as "SEP-15"
from dschd.audit_trail aud
join dschd.release r on r.id=aud.release_id
join dschd.country cty on aud.EVENT_COUNTRY_ID=cty.id
where aud.event_description like '% opted in'
and r.creation_source = 'DSCHED'
GROUP BY cty.COUNTRY_name
My second query is:
select *
from DSCHD.RELEASE_COUNTRY_RIGHT rcr
join dschd.release r on rcr.RELEASE_ID=r.ID
join dschd.country cty on rcr.COUNTRY_ID=cty.id
where r.release_status in ('DRAFT', 'SCHEDULED', 'FINAL', 'DELIVERED')
and r.is_active = 'Y'
and rcr.MARKETING_RIGHT = 'Y'
and rcr.OPT_OUT = 'N'
and r.creation_source = 'DSCHED'
The problem is that I have many countries which can relate to one ID (Release_ID) but there is no unique identifier between the tables on a country level. Each country has an ID though.
So for query 1, to identify each unique row I would need the 'aud.Release_ID' and the 'aud.Event_country_id' and for query 2 to achieve the same I would need to use the 'rcr.Release_ID' and 'rcr.country_id'.
select cty.country_name,
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='AUG-2014' then 1 else 0 end) as "AUG-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='SEP-2014' then 1 else 0 end) as "SEP-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='OCT-2014' then 1 else 0 end) as "OCT-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='NOV-2014' then 1 else 0 end) as "NOV-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='DEC-2014' then 1 else 0 end) as "DEC-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='JAN-2015' then 1 else 0 end) as "JAN-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='FEB-2015' then 1 else 0 end) as "FEB-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='MAR-2015' then 1 else 0 end) as "MAR-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='APR-2015' then 1 else 0 end) as "APR-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='MAY-2015' then 1 else 0 end) as "MAY-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='JUN-2015' then 1 else 0 end) as "JUN-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='JUL-2015' then 1 else 0 end) as "JUL-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='AUG-2015' then 1 else 0 end) as "AUG-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='SEP-2015' then 1 else 0 end) as "SEP-15"
from dschd.audit_trail aud
join dschd.release r on r.id=aud.release_id
join dschd.country cty on aud.EVENT_COUNTRY_ID=cty.id
where aud.event_description like '% opted in'
and ***** in (select ******
from DSCHD.RELEASE_COUNTRY_RIGHT rcr
join dschd.release r on rcr.RELEASE_ID=r.ID
join dschd.country cty on rcr.COUNTRY_ID=cty.id
where r.release_status in ('DRAFT', 'SCHEDULED', 'FINAL', 'DELIVERED')
and r.is_active = 'Y'
and rcr.MARKETING_RIGHT = 'Y'
and rcr.OPT_OUT = 'N'
and r.creation_source = 'DSCHED')
GROUP BY cty.COUNTRY_name
The bit I am stuck at are the two parts which are indicated by '*****' as the join criteria is two fields.
Any ideas?

Quick and dirty, not dynamic floating based on a 12 month cylce or anything...
select country,
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='AUG-2014' then 1 else 0 end) as "AUG-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='SEP-2014' then 1 else 0 end) as "SEP-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='OCT-2014' then 1 else 0 end) as "OCT-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='NOV-2014' then 1 else 0 end) as "NOV-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='DEC-2014' then 1 else 0 end) as "DEC-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='JAN-2015' then 1 else 0 end) as "JAN-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='FEB-2015' then 1 else 0 end) as "FEB-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='MAR-2015' then 1 else 0 end) as "MAR-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='APR-2015' then 1 else 0 end) as "APR-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='MAY-2015' then 1 else 0 end) as "MAY-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='JUN-2015' then 1 else 0 end) as "JUN-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='JUL-2015' then 1 else 0 end) as "JUL-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='AUG-2015' then 1 else 0 end) as "AUG-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='SEP-2015' then 1 else 0 end) as "SEP-15"
from audit_log aud
join database db on db.id=aud.release_id
where aud.event_description like '% opted in'
and r.creation_source = 'system_a'
GROUP BY COUNTRY
Ideally we'd simply use a Pivot statement or base it on earliest date in range and go on... Such as found in this prior stack article Dynamic pivot in oracle sql
update based on changing requirements you do know you can join on multiple criteria right? :P
Note we created an inline view with your second query alias it as z table name and then add the two columns desired to match on as part of the results. Then we join it as if it were a table!
select cty.country_name,
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='AUG-2014' then 1 else 0 end) as "AUG-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='SEP-2014' then 1 else 0 end) as "SEP-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='OCT-2014' then 1 else 0 end) as "OCT-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='NOV-2014' then 1 else 0 end) as "NOV-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='DEC-2014' then 1 else 0 end) as "DEC-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='JAN-2015' then 1 else 0 end) as "JAN-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='FEB-2015' then 1 else 0 end) as "FEB-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='MAR-2015' then 1 else 0 end) as "MAR-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='APR-2015' then 1 else 0 end) as "APR-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='MAY-2015' then 1 else 0 end) as "MAY-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='JUN-2015' then 1 else 0 end) as "JUN-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='JUL-2015' then 1 else 0 end) as "JUL-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='AUG-2015' then 1 else 0 end) as "AUG-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='SEP-2015' then 1 else 0 end) as "SEP-15"
from dschd.audit_trail aud
join dschd.release r on r.id=aud.release_id
join dschd.country cty on aud.EVENT_COUNTRY_ID=cty.id
join (select Release_ID, country_id
from DSCHD.RELEASE_COUNTRY_RIGHT rcr
join dschd.release r on rcr.RELEASE_ID=r.ID
join dschd.country cty on rcr.COUNTRY_ID=cty.id
where r.release_status in ('DRAFT', 'SCHEDULED', 'FINAL', 'DELIVERED')
and r.is_active = 'Y'
and rcr.MARKETING_RIGHT = 'Y'
and rcr.OPT_OUT = 'N'
and r.creation_source = 'DSCHED') Z
ON aud.Release_ID = z.Realease_ID and
aud.Event_country_id = z.country_id
where aud.event_description like '% opted in'
GROUP BY cty.COUNTRY_name

Related

What's the best way to split my results by day?

not sure where to go with this one. I know I need to split the date and time up from 'createdon' but then I'm stumped.
I can bring back values with the query but I have to manually enter each day.
`SELECT
sum(CASE WHEN title LIKE '%Environmental%' THEN 1 ELSE 0 END)As Environmental
,sum(CASE WHEN title LIKE '%Let%' THEN 1 ELSE 0 END)As Let
,sum(CASE WHEN title LIKE '%Lease%' THEN 1 ELSE 0 END)As Lease
,sum(CASE WHEN title LIKE '%Pay%' THEN 1 ELSE 0 END)As Paym
,sum(CASE WHEN title LIKE '%Manage%' THEN 1 ELSE 0 END)As Manage
,sum(CASE WHEN title LIKE '%Rent%' THEN 1 ELSE 0 END)As Rent
,sum(CASE WHEN title LIKE '%Works%' THEN 1 ELSE 0 END)As Works
FROM
incident
WHERE
(createdon > ('01/09/2021 00:01')
AND
createdon < ('01/09/2021 23:59'))`
Ideally, this is what I'm trying to bring back
SELECT
sum(CASE WHEN title LIKE '%Environmental%' THEN 1 ELSE 0 END)As Environmental
,sum(CASE WHEN title LIKE '%Let%' THEN 1 ELSE 0 END)As Let
,sum(CASE WHEN title LIKE '%Lease%' THEN 1 ELSE 0 END)As Lease
,sum(CASE WHEN title LIKE '%Pay%' THEN 1 ELSE 0 END)As Paym
,sum(CASE WHEN title LIKE '%Manage%' THEN 1 ELSE 0 END)As Manage
,sum(CASE WHEN title LIKE '%Rent%' THEN 1 ELSE 0 END)As Rent
,sum(CASE WHEN title LIKE '%Works%' THEN 1 ELSE 0 END)As Works
,createdon
FROM incident WHERE createdon between <your_start_date> and <your_end_date>
GROUP BY createdon
You need to aggregate by the date. One method looks like this:
SELECT CAST(createdon as DATE),
SUM(CASE WHEN title LIKE '%Environmental%' THEN 1 ELSE 0 END)As Environmental,
SUM(CASE WHEN title LIKE '%Let%' THEN 1 ELSE 0 END) As Let,
SUM(CASE WHEN title LIKE '%Lease%' THEN 1 ELSE 0 END) As Lease,
SUM(CASE WHEN title LIKE '%Pay%' THEN 1 ELSE 0 END) As Paym,
SUM(CASE WHEN title LIKE '%Manage%' THEN 1 ELSE 0 END) As Manage,
SUM(CASE WHEN title LIKE '%Rent%' THEN 1 ELSE 0 END) As Rent,
SUM(CASE WHEN title LIKE '%Works%' THEN 1 ELSE 0 END)A s Works,
FROM incident
GROUP BY CAST(createdon as DATE)
ORDER BY CAST(createdon as DATE);
The problem is that date/time functions are notoriously database dependent. Here are some syntaxes:
CAST(createdon as DATE) should work in SQL Server, MySQL, and Postgres.
TRUNC(createdon) works in Oracle.
DATE() in SQLite.

How to do mathematical comparision with SQL CASE statement

I have a student table around 100k records and I have two types of data in it: student name and level type with selection values primary, secondary, intermediate & university
I want to filter out the student from this table, whose have count > 0, in all level primary, secondary, intermediate & university
I was able to find the sum for each student in each level using the following query
SELECT
student_id,
SUM(CASE WHEN lev_type = 'primary' THEN 1 ELSE 0 END) AS primary,
SUM(CASE WHEN lev_type = 'secondary' THEN 1 ELSE 0 END) AS secondary,
SUM(CASE WHEN lev_type = 'intermediate' THEN 1 ELSE 0 END) AS intermediate,
SUM(CASE WHEN level_type = 'university' THEN 1 ELSE 0 END) AS university
FROM
student_details
GROUP BY
student_id
and I am getting a result like (note that my result is 92242 row(s))
attendee_id primary secondary intermediate uni
student1 0 1 1 2
student2 0 1 1 0
student3 88 209 92 32
student4 0 1 1 0
student5 0 1 1 0
How to filter out student3 from this result?
You can simply add a where statement as follows:
SELECT student_id,
SUM(case when lev_type = 'primary' then 1 else 0 end) as primary,
SUM(case when lev_type = 'secondary' then 1 else 0 end) as secondary ,
SUM(case when lev_type = 'intermediate' then 1 else 0 end) as intermediate ,
SUM(case when lev_type = 'university' then 1 else 0 end) as university
FROM student_details
GROUP BY student_id
WHERE primary = 0 OR secondary = 0 OR intermediate = 0 OR university = 0
HAVING might get you what you want. For example:
SELECT student_id,
sum(case when lev_type = 'primary' then 1 else 0 end) as primary,
sum(case when lev_type = 'secondary' then 1 else 0 end) as secondary ,
sum(case when lev_type = 'intermediate' then 1 else 0 end) as intermediate ,
sum(case when level_type = 'university' then 1 else 0 end) as university
from student_details
group by student_id
-- Put the criteria here by which you want to filter
having sum(case when lev_type = 'primary' then 1 else 0 end) = 0
and sum(case when lev_type = 'secondary' then 1 else 0 end) = 0
and sum(case when lev_type = 'intermediate' then 1 else 0 end) = 0
and sum(case when level_type = 'university' then 1 else 0 end) = 0

SAS/SQL sum amounts distinctly for each group by object?

I have the following code:
proc sql;
CREATE TABLE temp AS
(SELECT asofdt,
SUM(CASE WHEN trans_state ='cur_cur' THEN 1 ELSE 0 END) AS _cur_cur,
SUM(CASE WHEN trans_state ='cur_worse' THEN 1 ELSE 0 END) AS _cur_worse,
SUM(CASE WHEN trans_state ='cur_pre' THEN 1 ELSE 0 END) AS _cur_pre,
SUM(CASE WHEN trans_state ='30_better' THEN 1 ELSE 0 END) AS _30_better,
SUM(CASE WHEN trans_state ='30_30' THEN 1 ELSE 0 END) AS _30_30,
SUM(CASE WHEN trans_state ='60_90' THEN 1 ELSE 0 END) AS _60_90
FROM PERFORMANCE_TRANS_STATES_CLEAN
GROUP BY asofdt);
run;
The problem is it is adding the value from the previous group by asofdt onto the next one. So it is a cumulative sum as I go down the group bys. I would like the sum to be specific to each group by object. Any ideas on how?
Here's a picture of my output.
Your program seems fine to me. I reproduced it below with fewer observations and did not find that the total was cumulative.
data df;
input asofdt MMDDYY8. trans_state $;
datalines;
01/01/16 cur_cur
01/02/16 cur_pre
01/02/16 cur_pre
01/02/16 cur_cur
01/03/16 cur_pre
;
run;
proc sql;
CREATE TABLE temp AS
(SELECT asofdt,
SUM(CASE WHEN trans_state ='cur_cur' THEN 1 ELSE 0 END) AS _cur_cur,
SUM(CASE WHEN trans_state ='cur_worse' THEN 1 ELSE 0 END) AS _cur_worse,
SUM(CASE WHEN trans_state ='cur_pre' THEN 1 ELSE 0 END) AS _cur_pre,
SUM(CASE WHEN trans_state ='30_better' THEN 1 ELSE 0 END) AS _30_better,
SUM(CASE WHEN trans_state ='30_30' THEN 1 ELSE 0 END) AS _30_30,
SUM(CASE WHEN trans_state ='60_90' THEN 1 ELSE 0 END) AS _60_90
FROM df
GROUP BY asofdt);
quit;
You might want to check your data, as this query is fine. It is indeed running separately on each ASOFDT. You can check that trivially by comparing a single line with a WHERE (WHERE ASOFDT='01OCT2016'd or WHERE ASOFDT='10/01/2016' depending on the type of that variable).
proc sql;
CREATE TABLE temp AS
(SELECT stock,
SUM(CASE WHEN month(date)=01 THEN 1 ELSE 0 END) AS _jan,
SUM(CASE WHEN month(date)=02 THEN 1 ELSE 0 END) AS _feb,
SUM(CASE WHEN month(date)=03 THEN 1 ELSE 0 END) AS _mar,
SUM(CASE WHEN month(date)=04 THEN 1 ELSE 0 END) AS _apr
FROM sashelp.stocks
GROUP BY stock);
quit;
Nothing about that should be cumulative. Unless your data is cumulative, which it sort of makes sense it would be with "ASOFDT"?

Combinations of Products as single count

I need to count combinations of products within transactions differently to other products and I'm struggling with how to do this within a single select statement from SQL 2008. This would then become a data set to manipulate in Reporting Services
raw data looks like this
txn, prod, units
1, a, 2
1, c, 1
2, a, 1
2, b, 1
2, c, 1
3, a, 2
3, b, 1
4, a, 3
4, c, 2
So a+b should = one if in same trans number, however a or b should equal one if not paired. So a=1 and b=1 but a+b=1, a+b+a=2, a+b+a+b=2 given the example data here is my desired result with an explanation of why
txn 1 is 3 units -- 2a + c
txn 2 is 2 units -- (a+b) + c
txn 3 is 2 units -- (a+b) + a
txn 4 is 5 units -- 3a + 2c
My query is more complex than this and includes other aggregates so I would like to group by transaction which I can't do as I need to manipulate at a lower grain
Update Progress :
Possible solution, I've generated columns based on the products I'm measuring. This allows me to group on Txn as I am now aggregating that field. Unsure if there's a better way to do it as it does take a little while
CASE WHEN SUM(CASE WHEN Prod='a' then 1 else 0 end)-
SUM(CASE WHEN Prod='b' then 1 else 0 end)=0
THEN SUM(CASE WHEN Prod='a' then 1 else 0 end)
ELSE 0 END AS MixProd
, CASE WHEN SUM(CASE WHEN Prod='a' then 1 else 0 end)-
SUM(CASE WHEN Prod='b' then 1 else 0 end)!=0
THEN ABS(SUM(CASE WHEN Prod='a' then 1 else 0 end)-
SUM(CASE WHEN Prod='b' then 1 else 0 end))
ELSE 0 END AS NotMixProd
I will then need to sort out the current unit aggregate to remove the extras but this certainly gives me a start
Update Progress 2 :
This failed to handle 0 correctly where a or b was 0 it would still give a value for mix because a-b was not zero. I reverted to an earlier draft that I lost and expanded as per below
, CASE WHEN SUM(CASE WHEN Prod='a' then 1 else 0 end) = 0 THEN 0
WHEN SUM(CASE WHEN Prod='b' then 1 else 0 end) = 0 THEN 0
WHEN SUM(CASE WHEN Prod='a' then 1 else 0 end)-
SUM(CASE WHEN Prod='b' then 1 else 0 end)=0
THEN SUM(CASE WHEN Prod='a' then 1 else 0 end)
ELSE ABS(SUM(CASE WHEN Prod='a' then 1 else 0 end)-
SUM(CASE WHEN Prod='b' then 1 else 0 end))
END AS MixProd
, CASE WHEN SUM(CASE WHEN Prod='a' then 1 else 0 end)-
SUM(CASE WHEN Prod='b' then 1 else 0 end)!=0
THEN ABS(SUM(CASE WHEN Prod='a' then 1 else 0 end)-
SUM(CASE WHEN Prod='b' then 1 else 0 end))
ELSE 0 END AS NotMixProd
UPDATE: This should work in SQL Server 2008 (based on LAG solution from here).
Here is the demo: http://rextester.com/GNI23706
WITH CTE AS
(
select txn, prod, units,
row_number() over (partition by txn order by prod) rn,
(row_number() over (partition by txn order by prod))/2 rndiv2,
(row_number() over (partition by txn order by prod)+1)/2 rnplus1div2,
count(*) over (partition by txn) partitioncount
from test_data
)
select
txn,
sum(case when prev_prod = 'a' and prod = 'b' and prev_units >= units then 0
when prev_prod = 'a' and prod = 'b' and prev_units < units then units - prev_units
else units
end) units
from
(
select
txn,
prod,
units,
CASE WHEN rn%2=1
THEN MAX(CASE WHEN rn%2=0 THEN prod END) OVER (PARTITION BY txn,rndiv2)
ELSE MAX(CASE WHEN rn%2=1 THEN prod END) OVER (PARTITION BY txn,rnplus1div2)
END AS prev_prod,
CASE WHEN rn%2=1
THEN MAX(CASE WHEN rn%2=0 THEN units END) OVER (PARTITION BY txn,rndiv2)
ELSE MAX(CASE WHEN rn%2=1 THEN units END) OVER (PARTITION BY txn,rnplus1div2)
END AS prev_units
from cte
) temp
group by txn
For SQL Server 2012+, use LAG:
select
txn,
sum(
case when prev_prod = 'a' and prod = 'b' and prev_units >= units then 0
when prev_prod = 'a' and prod = 'b' and prev_units < units then units - prev_units
else units
end) units
from
(
select
txn,
prod,
units,
lag(prod) over (partition by txn order by prod) prev_prod,
lag(units) over (partition by txn order by prod) prev_units
from test_data
) temp
group by txn
I decided in the end that a temp table was the best way to go, because I couldn't group on a collation. So I eventually tweaked the code above as it was failing to pick up the spare items correctly
SUM(Units) AS OldUnits
SUM(Units) -
(CASE WHEN
SUM(CASE WHEN Prod='a' THEN 1 ELSE 0 END) = 0 THEN 0 WHEN
SUM(CASE WHEN Prod='b' THEN 1 ELSE 0 END) = 0 THEN 0 WHEN
SUM(CASE WHEN Prod='a' THEN 1 ELSE 0 END) -
SUM(CASE WHEN Prod='b' THEN 1 ELSE 0 END) = 0 THEN
SUM(CASE WHEN Prod='a' THEN 1 ELSE 0 END) WHEN
(SUM(CASE WHEN Prod='a' THEN 1 ELSE 0 END) -
SUM(CASE WHEN Prod='b' THEN 1 ELSE 0 END)) < 0 THEN
SUM(CASE WHEN Prod='a' THEN 1 ELSE 0 END) ELSE
SUM(CASE WHEN Prod='b' THEN 1 ELSE 0 END) END) AS NewUnits
This was stored in a temptable that I could then collate on Trans as the next step. Works fine for my purposes and helped me overcome a mild irrational fear I have of temptables

How to select on table and count occurrences some values

I am asking you for a help because I do not know SQL very well.
I need to count occurrences some values from table column to achieve effect like statistics table which will look like at the picture below:
Needed result:
Comment:
My Result Table needs to have first two columns (contry and site) comes from first table "Violations" and next 5 columns which will contain numbers (count) of occurrences status_id in "Violations" in each of possible values of id from Status table.
Explanation:
So, I have existing two tables: Violations and Status. Please look at my sqlfiddle
Violations:
id long,
country varchar(20),
site varchar(20),
status_id long, <-- this is the id of status in Status table.
... other columns not important in this case
Status:
id long,
status long
Column "status" have values (1-4) which are mapped to string values: Suspected Violation (1), Confirmed Violation (2), Confirmed No Violation (3), Not Determined (4)
In result of my join (or based only on one table Violations) is to have table which should contain columns:
from Violations table: "Country" and "Site"
from Status table: "Suspected Violation", "Confirmed Violation", "Confirmed No Violation", "Not Determined", "Total" (where this columns are counters of occurrences in Violation table).
Current Status and new Requirements:
First try is done (thanks to bluefeet) below and is almost perfect...
select v.country,
v.site,
SUM(case when s.id = 1 then 1 else 0 end) Total_SuspectedViolations,
SUM(case when s.id = 2 then 1 else 0 end) Total_ConfirmedViolations,
SUM(case when s.id = 3 then 1 else 0 end) Total_ConfirmedNoViolations,
SUM(case when s.id = 4 then 1 else 0 end) Total_NotDetermined,
COUNT(*) Total
from violations v
inner join status s
on v.status_id = s.id
group by v.country, v.site
or without JOIN:
select v.country,
v.site,
SUM(case when v.status_id = 1 then 1 else 0 end) Total_SuspectedViolations,
SUM(case when v.status_id = 2 then 1 else 0 end) Total_ConfirmedViolations,
SUM(case when v.status_id = 3 then 1 else 0 end) Total_ConfirmedNoViolations,
SUM(case when v.status_id = 4 then 1 else 0 end) Total_NotDetermined,
COUNT(*) Total
from violations v
group by v.country, v.site
...but is not including 3 issues as you can see in the picture which should be. I mean:
"- All -" which should count occurrences for all countries
"- Unknown -" which should count occurrences for some not recognized countries
"- All -" (regarding to each country) - which should count occurrences within one country
Additional Explanation:
-Unknown- meaning:
Unknown should count occurrences for countries which for example do not exists in DB Country table or have a wrong name/id and that's why is treated here as Unknown (I forgot mention that there is table Country in DB).
The same for sites, Unknown for sites means that someone put wrong value in Violations.status_id not from range (1-4) because these are only acceptable values existing in Status table.
We can assume that table Country looks like:
Country:
id long,
name varchar(30)
Please help me to write correct sql query which would include these 3 conditions, because I have a big problem to do that.
The All case can be easily done using UNION statement (see sqlFiddle for results):
(SELECT v.country,
v.site,
SUM(CASE WHEN v.status_id = 1 THEN 1 ELSE 0 END) Total_SuspectedViolations,
SUM(CASE WHEN v.status_id = 2 THEN 1 ELSE 0 END) Total_ConfirmedViolations,
SUM(CASE WHEN v.status_id = 3 THEN 1 ELSE 0 END) Total_ConfirmedNoViolations,
SUM(CASE WHEN v.status_id = 4 THEN 1 ELSE 0 END) Total_NotDetermined,
COUNT(*) Total,
0 'isAll'
FROM violations v
GROUP BY v.country, v.site)
union(
SELECT v.country,
'- All -',
SUM(CASE WHEN v.status_id = 1 THEN 1 ELSE 0 END) Total_SuspectedViolations,
SUM(CASE WHEN v.status_id = 2 THEN 1 ELSE 0 END) Total_ConfirmedViolations,
SUM(CASE WHEN v.status_id = 3 THEN 1 ELSE 0 END) Total_ConfirmedNoViolations,
SUM(CASE WHEN v.status_id = 4 THEN 1 ELSE 0 END) Total_NotDetermined,
COUNT(*) Total,
1 'isAll'
FROM violations v
GROUP BY v.country)
UNION (
SELECT '- All -',
'- All -',
SUM(CASE WHEN v.status_id = 1 THEN 1 ELSE 0 END) Total_SuspectedViolations,
SUM(CASE WHEN v.status_id = 2 THEN 1 ELSE 0 END) Total_ConfirmedViolations,
SUM(CASE WHEN v.status_id = 3 THEN 1 ELSE 0 END) Total_ConfirmedNoViolations,
SUM(CASE WHEN v.status_id = 4 THEN 1 ELSE 0 END) Total_NotDetermined,
COUNT(*) Total,
1 'isAll'
FROM violations v)
ORDER BY country, isAll DESC, site
However, the performance may not be really great with that kind of query, so I'm not saying it's the best possible solution - but it works.
Version with 'Unknow'
http://www.sqlfiddle.com/#!2/abfb7/21
(SELECT IF(c.name IS NULL, '- Unknow -', c.name) as name,
v.site,
SUM(CASE WHEN v.status_id = 1 THEN 1 ELSE 0 END) Total_SuspectedViolations,
SUM(CASE WHEN v.status_id = 2 THEN 1 ELSE 0 END) Total_ConfirmedViolations,
SUM(CASE WHEN v.status_id = 3 THEN 1 ELSE 0 END) Total_ConfirmedNoViolations,
SUM(CASE WHEN v.status_id = 4 THEN 1 ELSE 0 END) Total_NotDetermined,
COUNT(*) Total,
0 'isAll'
FROM violations v LEFT JOIN country c ON c.name = v.country
GROUP BY c.name, v.site)
union(
SELECT IF(c.name IS NULL, '- Unknow -', c.name) as name,
'- All -',
SUM(CASE WHEN v.status_id = 1 THEN 1 ELSE 0 END) Total_SuspectedViolations,
SUM(CASE WHEN v.status_id = 2 THEN 1 ELSE 0 END) Total_ConfirmedViolations,
SUM(CASE WHEN v.status_id = 3 THEN 1 ELSE 0 END) Total_ConfirmedNoViolations,
SUM(CASE WHEN v.status_id = 4 THEN 1 ELSE 0 END) Total_NotDetermined,
COUNT(*) Total,
1 'isAll'
FROM violations v LEFT JOIN country c ON c.name = v.country
GROUP BY c.name)
UNION (
SELECT '- All -',
'- All -',
SUM(CASE WHEN v.status_id = 1 THEN 1 ELSE 0 END) Total_SuspectedViolations,
SUM(CASE WHEN v.status_id = 2 THEN 1 ELSE 0 END) Total_ConfirmedViolations,
SUM(CASE WHEN v.status_id = 3 THEN 1 ELSE 0 END) Total_ConfirmedNoViolations,
SUM(CASE WHEN v.status_id = 4 THEN 1 ELSE 0 END) Total_NotDetermined,
COUNT(*) Total,
1 'isAll'
FROM violations v LEFT JOIN country c ON c.name = v.country)
ORDER BY name, isAll DESC, site
Using with rollup you will get the needed sum field which you want
select v.country,
v.site,
SUM(case when s.id = 1 then 1 else 0 end) Total_SuspectedViolations,
SUM(case when s.id = 2 then 1 else 0 end) Total_ConfirmedViolations,
SUM(case when s.id = 3 then 1 else 0 end) Total_ConfirmedNoViolations,
SUM(case when s.id = 4 then 1 else 0 end) Total_NotDetermined,
COUNT(*) Total
from violations v
inner join status s
on v.status_id = s.id
group by v.country, v.site WITH ROLLUP
Hope this helps
REFER for with rollup documentation
FIDDLE