SQL: grouping by number of entries and entry date - sql

I have the following table log:
event_time | name |
-------------------------
2014-07-16 11:40 Bob
2014-07-16 10:00 John
2014-07-16 09:20 Bob
2014-07-16 08:20 Bob
2014-07-15 11:20 Bob
2014-07-15 10:20 John
2014-07-15 09:00 Bob
I would like to generate a report, where I can group data by number of entries per day and by entry day. So the resulting report for the table above would be something like this:
event_date | 0-2 | 3 | 4-99 |
-------------------------------
2014-07-16 1 1 0
2014-07-15 2 0 0
I use the following approached to solve it:
Select with grouping in range
How to select the count of values grouped by ranges
If I find answer before anybody post it here, I will share it.
Added
I would like to count a number of daily entries for each name. Then I check to which column this value belongs to, and the I add 1 to that column.

I took it in two steps. Inner query gets the base counts. The outer query uses case statements to sum counts.
SQL Fiddle Example
select event_date,
sum(case when cnt between 0 and 2 then 1 else 0 end) as "0-2",
sum(case when cnt = 3 then 1 else 0 end) as "3",
sum(case when cnt between 4 and 99 then 1 else 0 end) as "4-99"
from
(select cast(event_time as date) as event_date,
name,
count(1) as cnt
from log
group by cast(event_time as date), name) baseCnt
group by event_date
order by event_date

try like this
select da,sum(case when c<3 then 1 else 0 end) as "0-2",
sum(case when c=3 then 1 else 0 end) as "3",
sum(case when c>3 then 1 else 0 end) as "4-66" from (
select cast(event_time as date) as da,count(*) as c from
table1 group by cast(event_time as date),name) as aa group by da

First aggregate in two steps:
SELECT day, CASE
WHEN ct < 3 THEN '0-2'
WHEN ct > 3 THEN '4_or_more'
ELSE '3'
END AS cat
,count(*)::int AS val
FROM (
SELECT event_time::date AS day, count(*) AS ct
FROM tbl
GROUP BY 1
) sub
GROUP BY 1,2
ORDER BY 1,2;
Names should be completely irrelevant according to your description.
Then take the query and run it through crosstab():
SELECT *
FROM crosstab(
$$SELECT day, CASE
WHEN ct < 3 THEN '0-2'
WHEN ct > 3 THEN '4_or_more'
ELSE '3'
END AS cat
,count(*)::int AS val
FROM (
SELECT event_time::date AS day, count(*) AS ct
FROM tbl
GROUP BY 1
) sub
GROUP BY 1,2
ORDER BY 1,2$$
,$$VALUES ('0-2'::text), ('3'), ('4_or_more')$$
) AS f (day date, "0-2" int, "3" int, "4_or_more" int);
crosstab() is supplied by the additional module tablefunc. Details and instructions in this related answer:
PostgreSQL Crosstab Query

This is a variation on a PIVOT query (although PostgreSQL supports this via the crosstab(...) table functions). The existing answers cover the basic technique, I just prefer to construct queries without the use of CASE, where possible.
To get started, we need a couple of things. The first is essentially a Calendar Table, or entries from one (if you don't already have one, they're among the most useful dimension tables). If you don't have one, the entries for the specified dates can easily be generated:
WITH Calendar_Range AS (SELECT startOfDay, startOfDay + INTERVAL '1 DAY' AS nextDay
FROM GENERATE_SERIES(CAST('2014-07-01' AS DATE),
CAST('2014-08-01' AS DATE),
INTERVAL '1 DAY') AS dr(startOfDay))
SQL Fiddle Demo
This is primarily used to create the first step in the double aggregate, like so:
SELECT Calendar_Range.startOfDay, COUNT(Log.name)
FROM Calendar_Range
LEFT JOIN Log
ON Log.event_time >= Calendar_Range.startOfDay
AND Log.event_time < Calendar_Range.nextDay
GROUP BY Calendar_Range.startOfDay, Log.name
SQL Fiddle Demo
Remember that most aggregate columns with a nullable expression (here, COUNT(Log.name)) will ignore null values (not count them). This is also one of the few times it's acceptable to not include a grouped-by column in the SELECT list (normally it makes the results ambiguous). For the actual queries I'll put this into a subquery, but it would also work as a CTE.
We also need a way to construct our COUNT ranges. That's pretty easy too:
Count_Range AS (SELECT text, start, LEAD(start) OVER(ORDER BY start) as next
FROM (VALUES('0 - 2', 0),
('3', 3),
('4+', 4)) e(text, start))
SQL Fiddle Demo
We'll be querying these as "exclusive upper-bound" as well.
We now have all the pieces we need to do the query. We can actually use these virtual tables to make queries in both veins of the current answers.
First, the SUM(CASE...) style.
For this query, we'll take advantage of the null-ignoring qualities of aggregate functions again:
WITH Calendar_Range AS (SELECT startOfDay, startOfDay + INTERVAL '1 DAY' AS nextDay
FROM GENERATE_SERIES(CAST('2014-07-14' AS DATE),
CAST('2014-07-17' AS DATE),
INTERVAL '1 DAY') AS dr(startOfDay)),
Count_Range AS (SELECT text, start, LEAD(start) OVER(ORDER BY start) as next
FROM (VALUES('0 - 2', 0),
('3', 3),
('4+', 4)) e(text, start))
SELECT startOfDay,
COUNT(Zero_To_Two.text) AS Zero_To_Two,
COUNT(Three.text) AS Three,
COUNT(Four_And_Up.text) AS Four_And_Up
FROM (SELECT Calendar_Range.startOfDay, COUNT(Log.name) AS count
FROM Calendar_Range
LEFT JOIN Log
ON Log.event_time >= Calendar_Range.startOfDay
AND Log.event_time < Calendar_Range.nextDay
GROUP BY Calendar_Range.startOfDay, Log.name) Entry_Count
LEFT JOIN Count_Range Zero_To_Two
ON Zero_To_Two.text = '0 - 2'
AND Entry_Count.count >= Zero_To_Two.start
AND Entry_Count.count < Zero_To_Two.next
LEFT JOIN Count_Range Three
ON Three.text = '3'
AND Entry_Count.count >= Three.start
AND Entry_Count.count < Three.next
LEFT JOIN Count_Range Four_And_Up
ON Four_And_Up.text = '4+'
AND Entry_Count.count >= Four_And_Up.start
GROUP BY startOfDay
ORDER BY startOfDay
SQL Fiddle Example
The other option is of course the crosstab query, where the CASE was being used to segment the results. We'll use the Count_Range table to decode the values for us:
SELECT startOfDay, "0 -2", "3", "4+"
FROM CROSSTAB($$WITH Calendar_Range AS (SELECT startOfDay, startOfDay + INTERVAL '1 DAY' AS nextDay
FROM GENERATE_SERIES(CAST('2014-07-14' AS DATE),
CAST('2014-07-17' AS DATE),
INTERVAL '1 DAY') AS dr(startOfDay)),
Count_Range AS (SELECT text, start, LEAD(start) OVER(ORDER BY start) as next
FROM (VALUES('0 - 2', 0),
('3', 3),
('4+', 4)) e(text, start))
SELECT Calendar_Range.startOfDay, Count_Range.text, COUNT(*) AS count
FROM (SELECT Calendar_Range.startOfDay, COUNT(Log.name) AS count
FROM Calendar_Range
LEFT JOIN Log
ON Log.event_time >= Calendar_Range.startOfDay
AND Log.event_time < Calendar_Range.nextDay
GROUP BY Calendar_Range.startOfDay, Log.name) Entry_Count
JOIN Count_Range
ON Entry_Count.count >= Count_Range.start
AND (Entry_Count.count < Count_Range.end OR Count_Range.end IS NULL)
GROUP BY Calendar_Range.startOfDay, Count_Range.text
ORDER BY Calendar_Range.startOfDay, Count_Range.text$$,
$$VALUES('0 - 2', '3', '4+')$$) Data(startOfDay DATE, "0 - 2" INT, "3" INT, "4+" INT)
(I believe this is correct, but don't have a way to test it - Fiddle doesn't seem to have the crosstab functionality loaded. In particular, CTEs probably must go inside the function itself, but I'm not sure....)

Related

SQL - SUMIF substitute?

This question is best asked using an example - if I have daily data (in this case, daily Domestic Box Office for the movie Elvis), how can I sum only the weekend values?
If the data looks like this:
Date
DBO
6/24/2022
12755467
6/25/2022
9929779
6/26/2022
8526333
6/27/2022
4253038
6/28/2022
5267391
6/29/2022
4010762
6/30/2022
3577241
7/1/2022
5320812
7/2/2022
6841224
7/3/2022
6290576
7/4/2022
4248679
7/5/2022
3639110
7/6/2022
3002182
7/7/2022
2460108
7/8/2022
3326066
7/9/2022
4324040
7/10/2022
3530965
I'd like to be able to get results that look like this:
Weekend
DBO Sum
1
31211579
2
18452612
3
11181071
Also - not sure how tricky this would be but would love to include percent change v. last weekend.
Weekend
DBO Sum
% Change
1
31211579
2
18452612
-41%
3
11181071
-39%
I tried this with CASE WHEN but I got the results in different columns, which was not what I was looking for.
SELECT
,SUM(CASE
WHEN DATE BETWEEN '2022-06-24' AND '2022-06-26' THEN index
ELSE 0
END) AS Weekend1
,SUM(CASE
WHEN DATE BETWEEN '2022-07-01' AND '2022-07-03' THEN index
ELSE 0
END) AS Weekend2
,SUM(CASE
WHEN DATE BETWEEN '2022-07-08' AND '2022-07-10' THEN index
ELSE 0
END) AS Weekend3
FROM Elvis
I would start by filtering the data on week-end days only. Then we can group by week to get the index sum ; the last step is to use window functions to compare each week-end with the previous one:
select iso_week,
row_number() over(order by iso_week) weekend_number,
sum(index) as dbo_sum,
( sum(index) - lag(sum(index) over(order by iso_week) )
/ nullif(lag(sum(index)) over(order by iso_week), 0) as ratio_change
from (
select e.*, extract(isoweek from date) iso_week
from elvis e
where extract(dayofweek from date) in (1, 7)
) e
group by iso_week
order by iso_week
Consider below
select *,
round(100 * safe_divide(dbo_sum - lag(dbo_sum) over(order by week), lag(dbo_sum) over(order by week)), 2) change_percent
from (
select extract(week from date + 2) week, sum(dbo) dbo_sum
from your_table
where extract(dayofweek from date + 2) in (1, 2, 3)
group by week
)
if applied to sample data in your question - output is

How do I display records in the same row although I am using group by on 2 columns that appear in different rows right now?

This is the output I am getting now but I want all the records for one gateway in one row I am trying to find the damage count and total count of packages processed by an airport in a week. Currently I am grouping by airport and week so I am getting the records in different rows for an airport and week. I want to have the records for a particular airport in a single row with weeks being in the same row.
I tried putting a conditional group by but that did not work.
select tmp.gateway,tmp.weekbucket, sum(tmp.damaged_count) as DamageCount, sum(tmp.total_count) as TotalCount, round(sum(tmp.DPMO),0) as DPMO from
(
select a.gateway,
date_trunc('week', (a.processing_date + interval '1 day')) - interval '1 day' as weekbucket,
count(distinct(b.fulfillment_shipment_id||b.package_id)) as damaged_count,
count(distinct(a.fulfillment_shipment_id||a.package_id)) as total_count,
count(distinct(b.fulfillment_shipment_id||b.package_id))*1.00/count(distinct(a.Fulfillment_Shipment_id || a.package_id))*1000000 as DPMO
from booker.d_air_shipments_na a
left join trex.d_ps_packages b
on (a.fulfillment_shipment_id||a.package_id =b.Fulfillment_Shipment_id||b.package_id)
where a.processing_date >= current_date-7
and (exception_summary in ('Reprint-Damaged Label') or exception_summary IS NULL)
and substring(route, position(a.gateway IN route) +6, 1) <> 'K'
group by a.gateway, weekbucket) as tmp
group by tmp.gateway, tmp.weekbucket
order by tmp.gateway, tmp.weekbucket desc;
As you get two week's days starting and ending hence its likely that youll get 2 rows for each. Can try to remove week bucket from group by after performing your actual select/within the inner select and put a max on week bucket with summing both counts of both start and end of week dates.
select
tmp.gateway,max(tmp.weekbucket),
sum(tmp.damaged_count) as
DamageCount,
sum(tmp.total_count) as TotalCount,
round(sum(tmp.DPMO),0) as DPMO
from
(
select a.gateway,
date_trunc('week', (a.processing_date +
interval '1 day')) - interval '1 day' as
weekbucket, count(distinct(b.fulfillment_shipment_id||b
.package_id)) as damaged_count,
count(distinct(a.fulfillment_shipment_id||a .package_id)) as total_count,
count(distinct(b.fulfillment_shipment_id||b.package_id))*1.00/count(distinct(a.Fulfillment_Shipment_id || a.package_id))*1000000 as DPMO
from booker.d_air_shipments_na a
left join trex.d_ps_packages b
on (a.fulfillment_shipment_id||a.package_id =b.Fulfillment_Shipment_id||b.package_id)
where a.processing_date >= current_date-7
and (exception_summary in ('Reprint-Damaged Label') or exception_summary IS NULL)
and substring(route, position(a.gateway IN route) +6, 1) <> 'K'
group by a.gateway, weekbucket) as tmp
group by tmp.gateway
order by tmp.gateway,
max(tmp.weekbucket) desc;
So you want to pivot the two weeks into a single row with two sets of aggregates?:
select
tmp.gateway,
tmp.weekbucket,
min(case when rn = 1 then tmp.damaged_count end) as DamageCountWeek1,
min(case when rn = 2 then tmp.damaged_count end) as DamageCountWeek2,
min(case when rn = 1 then tmp.total_count end) as TotalCountWeek1,
min(case when rn = 2 then tmp.total_count end) as TotalCountWeek2,
min(case when rn = 1 then round(tmp.DPMO, 0) end) as DPMOWeek1,
min(case when rn = 2 then round(tmp.DPMO, 0) end) as DPMOWeek2,
from (
select row_number() over (partition by gateway order by weekbucket) as rn,
...
) as tmp
group by tmp.gateway
order by tmp.gateway;

Group by in columns and rows, counts and percentages per day

I have a table that has data like following.
attr |time
----------------|--------------------------
abc |2018-08-06 10:17:25.282546
def |2018-08-06 10:17:25.325676
pqr |2018-08-05 10:17:25.366823
abc |2018-08-06 10:17:25.407941
def |2018-08-05 10:17:25.449249
I want to group them and count by attr column row wise and also create additional columns in to show their counts per day and percentages as shown below.
attr |day1_count| day1_%| day2_count| day2_%
----------------|----------|-------|-----------|-------
abc |2 |66.6% | 0 | 0.0%
def |1 |33.3% | 1 | 50.0%
pqr |0 |0.0% | 1 | 50.0%
I'm able to display one count by using group by but unable to find out how to even seperate them to multiple columns. I tried to generate day1 percentage with
SELECT attr, count(attr), count(attr) / sum(sub.day1_count) * 100 as percentage from (
SELECT attr, count(*) as day1_count FROM my_table WHERE DATEPART(week, time) = DATEPART(day, GETDate()) GROUP BY attr) as sub
GROUP BY attr;
But this also is not giving me correct answer, I'm getting all zeroes for percentage and count as 1. Any help is appreciated. I'm trying to do this in Redshift which follows postgresql syntax.
Let's nail the logic before presenting:
with CTE1 as
(
select attr, DATEPART(day, time) as theday, count(*) as thecount
from MyTable
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
select t1.attr, t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
From here you can pivot to create a day by day if you feel the need
I am trying to enhance the query #johnHC btw if you needs for 7days then you have to those days in case when
with CTE1 as
(
select attr, time::date as theday, count(*) as thecount
from t group by attr,time::date
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
,
CTE3 as
(
select t1.attr, EXTRACT(DOW FROM t1.theday) as day_nmbr,t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
)
select CTE3.attr,
max(case when day_nmbr=0 then CTE3.thecount end) as day1Cnt,
max(case when day_nmbr=0 then percentofday end) as day1,
max(case when day_nmbr=1 then CTE3.thecount end) as day2Cnt,
max( case when day_nmbr=1 then percentofday end) day2
from CTE3 group by CTE3.attr
http://sqlfiddle.com/#!17/54ace/20
In case that you have only 2 days:
http://sqlfiddle.com/#!17/3bdad/3 (days descending as in your example from left to right)
http://sqlfiddle.com/#!17/3bdad/5 (days ascending)
The main idea is already mentioned in the other answers. Instead of joining the CTEs for calculating the values I am using window functions which is a bit shorter and more readable I think. The pivot is done the same way.
SELECT
attr,
COALESCE(max(count) FILTER (WHERE day_number = 0), 0) as day1_count, -- D
COALESCE(max(percent) FILTER (WHERE day_number = 0), 0) as day1_percent,
COALESCE(max(count) FILTER (WHERE day_number = 1), 0) as day2_count,
COALESCE(max(percent) FILTER (WHERE day_number = 1), 0) as day2_percent
/*
Add more days here
*/
FROM(
SELECT *, (count::float/count_per_day)::decimal(5, 2) as percent -- C
FROM (
SELECT DISTINCT
attr,
MAX(time::date) OVER () - time::date as day_number, -- B
count(*) OVER (partition by time::date, attr) as count, -- A
count(*) OVER (partition by time::date) as count_per_day
FROM test_table
)s
)s
GROUP BY attr
ORDER BY attr
A counting the rows per day and counting the rows per day AND attr
B for more readability I convert the date into numbers. Here I take the difference between current date of the row and the maximum date available in the table. So I get a counter from 0 (first day) up to n - 1 (last day)
C calculating the percentage and rounding
D pivot by filter the day numbers. The COALESCE avoids the NULL values and switched them into 0. To add more days you can multiply these columns.
Edit: Made the day counter more flexible for more days; new SQL Fiddle
Basically, I see this as conditional aggregation. But you need to get an enumerator for the date for the pivoting. So:
SELECT attr,
COUNT(*) FILTER (WHERE day_number = 1) as day1_count,
COUNT(*) FILTER (WHERE day_number = 1) / cnt as day1_percent,
COUNT(*) FILTER (WHERE day_number = 2) as day2_count,
COUNT(*) FILTER (WHERE day_number = 2) / cnt as day2_percent
FROM (SELECT attr,
DENSE_RANK() OVER (ORDER BY time::date DESC) as day_number,
1.0 * COUNT(*) OVER (PARTITION BY attr) as cnt
FROM test_table
) s
GROUP BY attr, cnt
ORDER BY attr;
Here is a SQL Fiddle.

List all months with a total regardless of null

I have a very small SQL table that lists courses attended and the date of attendance. I can use the code below to count the attendees for each month
select to_char(DATE_ATTENDED,'YYYY/MM'),
COUNT (*)
FROM TRAINING_COURSE_ATTENDED
WHERE COURSE_ATTENDED = 'Fire Safety'
GROUP BY to_char(DATE_ATTENDED,'YYYY/MM')
ORDER BY to_char(DATE_ATTENDED,'YYYY/MM')
This returns a list as expected for each month that has attendees. However I would like to list it as
January 2
February 0
March 5
How do I show the count results along with the nulls? My table is very basic
1234 01-JAN-15 Fire Safety
108 01-JAN-15 Fire Safety
1443 02-DEC-15 Healthcare
1388 03-FEB-15 Emergency
1355 06-MAR-15 Fire Safety
1322 09-SEP-15 Fire Safety
1234 11-DEC-15 Fire Safety
I just need to display each month and the total attendees for Fire Safety only. Not used SQL developer for a while so any help appreciated.
You would need a calendar table to select a period you want to display. Simplified code would look like this:
select to_char(c.Date_dt,'YYYY/MM')
, COUNT (*)
FROM calendar as c
left join TRAINING_COURSE_ATTENDED as tca
on tca.DATE_ATTENDED = c.Date_dt
WHERE tca.COURSE_ATTENDED = 'Fire Safety'
and c.Date_dt between [period_start_dt] and [period_end_dt]
GROUP BY to_char(c.Date_dt,'YYYY/MM')
ORDER BY to_char(c.Date_dt,'YYYY/MM')
You can create your own set required year month's on-fly with 0 count and use query as below.
Select yrmth,sum(counter) from
(
select to_char(date_attended,'YYYYMM') yrmth,
COUNT (1) counter
From TRAINING_COURSE_ATTENDED Where COURSE_ATTENDED = 'Fire Safety'
Group By Y to_char(date_attended,'YYYYMM')
Union All
Select To_Char(2015||Lpad(Rownum,2,0)),0 from Dual Connect By Rownum <= 12
)
group by yrmth
order by 1
If you want to show multiple year's, just change the 2nd query to
Select To_Char(Year||Lpad(Month,2,0)) , 0
From
(select Rownum Month from Dual Connect By Rownum <= 12),
(select 2015+Rownum-1 Year from Dual Connect By Rownum <= 3)
Try this :
SELECT Trunc(date_attended, 'MM') Month,
Sum(CASE
WHEN course_attended = 'Fire Safety' THEN 1
ELSE 0
END) Fire_Safety
FROM training_course_attended
GROUP BY Trunc(date_attended, 'MM')
ORDER BY Trunc(date_attended, 'MM')
Another way to generate a calendar table inline:
with calendar (month_start, month_end) as
( select add_months(date '2014-12-01', rownum)
, add_months(date '2014-12-01', rownum +1) - interval '1' second
from dual
connect by rownum <= 12 )
select to_char(c.month_start,'YYYY/MM') as course_month
, count(tca.course_attended) as attended
from calendar c
left join training_course_attended tca
on tca.date_attended between c.month_start and c.month_end
and tca.course_attended = 'Fire Safety'
group by to_char(c.month_start,'YYYY/MM')
order by 1;
(You could also have only the month start in the calendar table, and join on trunc(tca.date_attended,'MONTH') = c.month_start, though if you had indexes or partitioning on tca.date_attended that might be less efficient.)

PIVOT SQL Server Assistance

Given the following table structure:
CrimeID | No_Of_Crimes | CrimeDate | Violence | Robbery | ASB
1 1 22/02/2011 Y Y N
2 3 18/02/2011 Y N N
3 3 23/02/2011 N N Y
4 2 16/02/2011 N N Y
5 1 17/02/2011 N N Y
Is there a chance of producing a result set that looks like this with T-SQL?
Category | This Week | Last Week
Violence 1 3
Robbery 1 0
ASB 3 1
Where last week shuld be a data less than '20/02/2011' and this week should be greater than or equal to '20/02/2011'
I'm not looking for someone to code this out for me, though a code snippet would be handy :), just some advice on whether this is possible, and how i should go about it with SQL Server.
For info, i'm currently performing all this aggregation using LINQ on the web server, but this requires 19MB being sent over the network every time this request is made. (The table has lots of categories, and > 150,000 rows). I want to make the DB do all the work and only send a small amount of data over the network
Many thanks
EDIT removed incorrect sql for clarity
EDIT Forget the above try the below
select *
from (
select wk, crime, SUM(number) number
from (
select case when datepart(week, crimedate) = datepart(week, GETDATE()) then 'This Week'
when datepart(week, crimedate) = datepart(week, GETDATE())-1 then 'Last Week'
else 'OLDER' end as wk,
crimedate,
case when violence ='Y' then no_of_crimes else 0 end as violence,
case when robbery ='Y' then no_of_crimes else 0 end as robbery,
case when asb ='Y' then no_of_crimes else 0 end as asb
from crimetable) as src
UNPIVOT
(number for crime in
(violence, robbery, asb)) as pivtab
group by wk, crime
) z
PIVOT
( sum(number)
for wk in ([This Week], [Last Week])
) as pivtab
Late to the party, but a solution with an optimal query plan:
Sample data
create table crimes(
CrimeID int, No_Of_Crimes int, CrimeDate datetime,
Violence char(1), Robbery char(1), ASB char(1));
insert crimes
select 1,1,'20110221','Y','Y','N' union all
select 2,3,'20110218','Y','N','N' union all
select 3,3,'20110223','N','N','Y' union all
select 4,2,'20110216','N','N','Y' union all
select 5,1,'20110217','N','N','Y';
Make more data - about 10240 rows in total in addition to the 5 above, each 5 being 2 weeks prior to the previous 5. Also create an index that will help on crimedate.
insert crimes
select crimeId+number*5, no_of_Crimes, DATEADD(wk,-number*2,crimedate),
violence, robbery, asb
from crimes, master..spt_values
where type='P'
create index ix_crimedate on crimes(crimedate)
From here on, check output of each to see where this is going. Check also the execution plan.
Standard Unpivot to break the categories.
select CrimeID, No_Of_Crimes, CrimeDate, Category, YesNo
from crimes
unpivot (YesNo for Category in (Violence,Robbery,ASB)) upv
where YesNo='Y'
Notes:
The filter on YesNo is actually applied AFTER unpivoting. You can comment it out to see.
Unpivot again, but this time select data only for last week and this week.
select CrimeID, No_Of_Crimes, Category,
Week = sign(datediff(d,CrimeDate,w.firstDayThisWeek)+0.1)
from crimes
unpivot (YesNo for Category in (Violence,Robbery,ASB)) upv
cross join (select DATEADD(wk, DateDiff(wk, 0, getdate()), 0)) w(firstDayThisWeek)
where YesNo='Y'
and CrimeDate >= w.firstDayThisWeek -7
and CrimeDate < w.firstDayThisWeek +7
Notes:
(select DATEADD(wk, DateDiff(wk, 0, getdate()), 0)) w(firstDayThisWeek) makes a single-column table where the column contains the pivotal date for this query, being the first day of the current week (using DATEFIRST setting)
The filter on CrimeDate is actually applied on the BASE TABLE prior to unpivoting. Check plan
Sign() just breaks the data into 3 buckets (-1/0/+1). Adding +0.1 ensures that there are only two buckets -1 and +1.
The final query, pivoting by this/last week
select Category, isnull([1],0) ThisWeek, isnull([-1],0) LastWeek
from
(
select Category, No_Of_Crimes,
Week = sign(datediff(d,w.firstDayThisWeek,CrimeDate)+0.1)
from crimes
unpivot (YesNo for Category in (Violence,Robbery,ASB)) upv
cross join (select DATEADD(wk, DateDiff(wk, 0, getdate()), -1)) w(firstDayThisWeek)
where YesNo='Y'
and CrimeDate >= w.firstDayThisWeek -7
and CrimeDate < w.firstDayThisWeek +7
) p
pivot (sum(No_Of_Crimes) for Week in ([-1],[1])) pv
order by Category Desc
Output
Category ThisWeek LastWeek
--------- ----------- -----------
Violence 1 3
Robbery 1 0
ASB 3 3
I would try this:
declare #FirstDayOfThisWeek date = '20110220';
select cat.category,
ThisWeek = sum(case when cat.CrimeDate >= #FirstDayOfThisWeek
then crt.No_of_crimes else 0 end),
LastWeek = sum(case when cat.CrimeDate >= #FirstDayOfThisWeek
then 0 else crt.No_of_crimes end)
from crimetable crt
cross apply (values
('Violence', crt.Violence),
('Robbery', crt.Robbery),
('ASB', crt.ASB))
cat (category, incategory)
where cat.incategory = 'Y'
and crt.CrimeDate >= #FirstDayOfThisWeek-7
group by cat.category;