Calculate difference of data in respective weeks using week number - sql

My data is stored in Google Big QUery in a database. This is how my table looks like. Here Epid_ID is unique for each row and the count is calculated using this value.
Admin_Level_2_district WeekNumber Epid_ID
Jhapa 18 COV-NEP-PR1-SUN-20-00072
Jhapa 19 COV-NEP-PR1-SUN-20-00073
Morang 18 COV-NEP-PR1-SUN-20-00074
Morang 19 COV-NEP-PR1-SUN-20-00075
I want to find the difference in data in two weeks. This is my expected output.
Admin_Level_2_district count_Week_18 count_Week 19 Difference
Jhapa 50 60 10
Morang 60 50 -10
Following is the query I have tried.
SELECT
Admin_Level_2_district,
Week_number,
count(Epid_ID)
FROM `interim-data.casedata.Interim EpiData`
GROUP BY
Admin_Level_2_district,
Week_number
HAVING Week_number = '18'
or Week_number = '19'
Please help!

I think you want conditional aggregation here:
SELECT
Admin_Level_2_district,
COUNT(CASE WHEN WeekNumber = 18 THEN 1 END) AS count_Week_18,
COUNT(CASE WHEN WeekNumber = 19 THEN 1 END) AS count_Week_19,
COUNT(CASE WHEN WeekNumber = 19 THEN 1 END) -
COUNT(CASE WHEN WeekNumber = 18 THEN 1 END) AS Difference
FROM `interim-data.casedata.Interim EpiData`
GROUP BY
Admin_Level_2_district;

You want to use conditional aggregation. In BigQuery, I would recommend countif():
SELECT Admin_Level_2_district,
COUNTIF(week_number = '18') as count_week_18,
COUNTIF(week_number = '19') as count_week_19,
COUNTIF(week_number = '19') - COUNTIF(week_number = '18') as diff
FROM `interim-data.casedata.Interim EpiData`
WHERE Week_number IN ('18', '19')
GROUP BY Admin_Level_2_district;
Note: I would expect week_number to be a number, in which case you would not use single quotes. However, your code treats that as a string, so I left that in.

Related

How to use two where conditions in SQL?

Following is the query I have written and I need to where conditions.
Admin_Level_3_palika is not null
Year = '2021'
However, the following query is still giving me null values for Admin_Level_3_palika
SELECT
Admin_Level_3_palika,
COUNT(CASE WHEN Week_number = '21' THEN 1 END) AS count_Week_21,
COUNT(CASE WHEN Week_number = '22' THEN 1 END) AS count_Week_22,
(COUNT(CASE WHEN Week_number = '22' THEN 1 END) -
COUNT(CASE WHEN Week_number = '21' THEN 1 END)) AS Difference
FROM `interim-data.casedata.Interim Latest`
where Admin_Level_3_palika is not null or YEAR = '2021'
GROUP BY
Admin_Level_3_palika
ORDER BY
count_Week_22 desc limit 20
Please help me with how to work with this. Following is an example of my dataset, Epid_ID being unique for each row.
Admin_Level_3_palika Week_number YEAR Epid_ID
Lamkichuha MC 21 2020 COV-NEP-PR5-RUP-20-00022
Lamkichuha MC 21 2021 COV-NEP-PR5-RUP-20-00023
If these are your conditions:
1. Admin_Level_3_palika is not null
2. Year = '2021'
Then you need and:
where Admin_Level_3_palika is not null and Year = '2021'
If year is an integer (as I would expect it to be), drop the single quotes. Don't mix data types in comparisons.
For performance, you might also want to limit the week number:
where Admin_Level_3_palika is not null and
Year = '2021' and
week_number in ('21', '22')
And finally, BigQuery offers countif() which I recommend:
SELECT Admin_Level_3_palika,
COUNTIF(Week_number = '21') AS count_Week_21,
COUNTIF(Week_number = '22') AS count_Week_22,
(COUNTIF(Week_number = '22') - COUNTIF(Week_number = '21')) AS Difference
FROM `interim-data.casedata.Interim Latest`
WHERE Admin_Level_3_palika is not null AND
YEAR = '2021' AND
week_number IN ('21', '22')
GROUP BY Admin_Level_3_palika
ORDER BY count_Week_22 desc
LIMIT 20
change the or to and
the line:
where Admin_Level_3_palika is not null or YEAR = '2021'
should be:
where Admin_Level_3_palika is not null AND YEAR = '2021'
if year is not of type string you can do
where Admin_Level_3_palika is not null AND YEAR = 2021

SQL case operation

Im fairly new with sql, and been trying to solve a problem where you have a table information about orders. In this case, Im trying to use the case operation to get a monthly report on orders, so I should have a column which states the year,another one which states the month, and then I should have columns for days 1-20,21-22,23-24 and above 25. Im trying to use the case operation to get the amount of orders that happened on those days.
I tried the following query :
SELECT
DATEPART(YEAR,date) AS year,DATEPART(MONTH,date) AS month,
COUNT(CASE WHEN DATEPART(DAY,date) BETWEEN 1 AND 20 THEN order ELSE 0 END) AS D1_D20,
COUNT(CASE WHEN DATEPART(DAY,date) BETWEEN 21 AND 22 THEN order ELSE 0 END) AS D21_D22,
COUNT(CASE WHEN DATEPART(DAY,date) BETWEEN 23 AND 24 THEN order ELSE 0 END) AS D23_D24,
COUNT(CASE WHEN DATEPART(DAY,date) > 25 THEN order ELSE 0 END) AS D25_END
FROM ORDERS
GROUP BY DATEPART(YEAR,date),DATEPART(MONTH,date)
Obviously the problem with that query is that, now I just get the total number of orders for each of the days, I know I should count the orders, but dont know the syntax. Help would be greatly appreciated!
Use SUM():
SELECT
DATEPART(YEAR, date) AS year, DATEPART(MONTH, date) AS month,
SUM(CASE WHEN DATEPART(DAY,date) BETWEEN 1 AND 20 THEN 1 ELSE 0 END) AS D1_D20,
SUM(CASE WHEN DATEPART(DAY,date) BETWEEN 21 AND 22 THEN 1 ELSE 0 END) AS D21_D22,
SUM(CASE WHEN DATEPART(DAY,date) BETWEEN 23 AND 24 THEN 1 ELSE 0 END) AS D23_D24,
SUM(CASE WHEN DATEPART(DAY,date) > 25 THEN 1 ELSE 0 END) AS D25_END
FROM ORDERS
GROUP BY DATEPART(YEAR, date), DATEPART(MONTH, date);
I would recommend using the functions DAY(), YEAR(), and MONTH() because they are simpler to type.
By the way, you can use COUNT() if you remove the ELSE clause. Your particular problem is that COUNT(0) = COUNT(1) because COUNT() counts non-NULL values. I prefer SUM() because it is more intuitive in this respect.

Trying to get data for every 6 months

I am running SQL Server and trying to get data for every 6 months.
Here is my query but it is for year, I want it to be every 6 months:
SELECT
DISTINCT YEAR(datein) AS 'Year',
COUNT(*) AS 'Total'
FROM
users
GROUP BY
YEAR(datein)
I want the column value should appear as: MONTH/YEAR
Use month() or quarter() and some arithmetic. Here is one way:
select YEAR(datein) as Year, FLOOR((MONTH(datein) - 1) / 6) as Year_Part,
COUNT(*) as Total
from users
group by YEAR(datein), FLOOR((MONTH(datein) - 1) / 6);
Or, you can put this into two columns:
select year(datein) as year,
sum(case when month(datein) <= 6 then 1 else 0 end) as total_half_1,
sum(case when month(datein) > 6 then 1 else 0 end) as total_half_2
from users
group by year(datein)
order by year(datein);

select query output not as expected

i need one single query which will give result like the one i give below
createddate recordcount acceptdate submitdate createddate
27-MAR-16 24 36 11
28-MAR-16 79 207 58
for reference i am providing some queries which i want to merge into one single query
select trim(date_created) createddate,count(*) recordcount
from man
where status IN ('CREATED')and date_created>sysdate-15
group by trim(date_created) ORDER BY TO_DATE(createddate,'DD/MM/YYYY');
this query will result like the following.
createddate recordcount
27-MAR-16 11
28-MAR-16 58
the second query
select trim(DATE_SUB) submitdate,count(*) recordcount
from man
where status IN ('SUBMITTED')and DATE_SUB>sysdate-15
group by trim(date_sub) ORDER BY TO_DATE(submitdate,'DD/MM/YYYY');
result of this query is like
submitdate recordcount
27-MAR-16 36
28-MAR-16 207
and the third query is like -
select trim(DATE_PUB) acceptdate,count(*) recordcount
from man
where status IN ('ACCEPTED')and DATE_PUB>sysdate-15
group by trim(DATE_PUB) ORDER BY TO_DATE(acceptdate,'DD/MM/YYYY');
acceptdate recordcount
27-MAR-16 24
28-MAR-16 79
how can i merger these three query so that i can get count for all in single query?which will give me result like
createddate recordcount acceptdate submitdate createddate
27-MAR-16 24 36 11
28-MAR-16 79 207 58
Your first query where clause has date but second query where clause has DATE_P.
Try like this
SELECT Trim(date) createddate,
COUNT(*) recordcount,
SUM(case when status = 'A' then 1 else 0 end) as a,
SUM(case when status = 'S' then 1 else 0 end) as s,
SUM(case when status = 'C' then 1 else 0 end) as c,
SUM(case when status = 'R' then 1 else 0 end) as r
FROM man
WHERE status IN ('A','S','C','R')and date >sysdate-15
GROUP BY trim(date) ORDER BY createddate;
You seem to want to get counts for each status type, for each day. The first step is generate all the dates you're interested in, which you can do with:
select trunc(sysdate) + 1 - level as dt
from dual
connect by level <= 15;
You can then (outer) join to your actual table where any of the three date columns match a generated date, and expand your case conditions to check which one you're looking at:
with t as (
select trunc(sysdate) + 1 - level as dt
from dual
connect by level <= 15
)
select t.dt,
count(*) as recordcount,
count(case when status = 'ACCEPTED' and trunc(m.date_pub) = t.dt
then 1 end) as acceptdate,
count(case when status = 'SUBMITTED' and trunc(m.date_sub) = t.dt
then 1 end) as submitdate,
count(case when status = 'CREATED' and trunc(m.date_created) = t.dt
then 1 end) as createddate
from t
left join man m
on (m.date_pub >= t.dt and m.date_pub < t.dt + 1)
or (m.date_sub >= t.dt and m.date_sub < t.dt + 1)
or (m.date_created >= t.dt and m.date_created < t.dt + 1)
group by t.dt
order by t.dt;
I've used range checks for the join conditions - it isn't clear if all your date columns are set at midnight, but it's safer to assume they might have other times and you cant everything from the matching day.
Each of the three count results is now only of those rows which match the status and where the specific date column matches, which I think is what you want. I've used trunc() here instead of a range comparison, as it doesn't have the potential performance penalty you can see in the where clause (from it potentially stopping an index being used).
This may throw out your recordcount though, depending on your actual data, as that will include rows that now might not match any of the case conditions. You can repeat the case conditions, or use an inline view to calculate the total of the three individual counts, depending on what you want it to include and what will be the easiest for you to maintain. If those are the only three statuses in your table then it may be OK with count(*) but check it gets the value you expect.

SQL statement to get record datetime field value as column of result

I have the following two tables
activity(activity_id, title, description, group_id)
statistic(statistic_id, activity_id, date, user_id, result)
group_id and user_id come from active directory. Result is an integer.
Given a user_id and a date range of 6 days (Mon - Sat) which I've calculated on the business logic side, and the fact that some of the dates in the date range may not have a statistic result for the particular date (ie. day1 and day 4 may have entered statistic rows for a particular activity, but there may not be any entries for days 2, 3, 5 and 6) how can I get a SQL result with the following format? Keep in mind that if a particular activity doesn't have a record for the particular date in the statistics table, then that day should return 0 in the SQL result.
activity_id group_id day1result day2result day3result day4result day5result day6 result
----------- -------- ---------- ---------- ---------- ---------- ---------- -----------
sample1 Secured 0 5 1 0 2 1
sample2 Unsecured 1 0 0 4 3 2
Note: Currently I am planning on handling this in the business logic, but that would require multiple queries (one to create a list of distinct activities for that user for the date range, and one for each activity looping through each date for a result or lack of result, to populate the 2nd dimension of the array with date-related results). That could end up with 50+ queries for each user per date range, which seems like overkill to me.
I got this working for 4 days and I can get it working for all 6 days, but it seems like overkill. Is there a way to simplify this?:
SELECT d1d2.activity_id, ISNULL(d1d2.result1,0) AS day1, ISNULL(d1d2.result2,0) AS day2, ISNULL(d3d4.result3,0) AS day3, ISNULL(d3d4.result4,0) AS day4
FROM
(SELECT ISNULL(d1.activity_id,0) AS activity_id, ISNULL(result1,0) AS result1, ISNULL(result2,0) AS result2
FROM
(SELECT ISNULL(statistic_result,0) AS result1, ISNULL(activity_id,0) AS activity_id
FROM statistic
WHERE user_id='jeremiah' AND statistic_date='11/22/2011'
) d1
FROM JOIN
(SELECT ISNULL(statistic_result,0) AS result2, ISNULL(activity_id,0) AS activity_id
FROM statistic WHERE user_id='jeremiah' AND statistic_date='11/23/2011'
) d2
ON d1.activity_id=d2.activity_id
) d1d2
FULL JOIN
(SELECT d3.activity_id AS activity_id, ISNULL(d3.result3,0) AS result3, ISNULL(d4.result4,0) AS result4
FROM
(SELECT ISNULL(statistic_result,0) AS result3, ISNULL(activity_id,0) AS activity_id
FROM statistic WHERE user_id='jeremiah' AND statistic_date='11/24/2011'
) d3
FULL JOIN
(SELECT ISNULL(statistic_result,0) AS result4, ISNULL(activity_id,0) AS activity_id
FROM statistic WHERE user_id='jeremiah' AND statistic_date='11/25/2011'
) d4
ON d3.activity_id=d4.activity_id
) d3d4
ON d1d2.activity_id=d3d4.activity_id
ORDER BY d1d2.activity_id
Here is a typical approach for this kind of thing:
DECLARE #minDate DATETIME,
#maxdate DATETIME,
#userID VARCHAR(200)
SELECT #minDate = '2011-11-15 00:00:00',
#maxDate = '2011-11-22 23:59:59',
#userID = 'jeremiah'
SELECT A.activity_id, A.group_id,
SUM(CASE WHEN DATEDIFF(day, #minDate, S.date) = 0 THEN S.Result ELSE 0 END) AS Day1Result,
SUM(CASE WHEN DATEDIFF(day, #minDate, S.date) = 1 THEN S.Result ELSE 0 END) AS Day2Result,
SUM(CASE WHEN DATEDIFF(day, #minDate, S.date) = 2 THEN S.Result ELSE 0 END) AS Day3Result,
SUM(CASE WHEN DATEDIFF(day, #minDate, S.date) = 3 THEN S.Result ELSE 0 END) AS Day4Result,
SUM(CASE WHEN DATEDIFF(day, #minDate, S.date) = 4 THEN S.Result ELSE 0 END) AS Day5Result,
SUM(CASE WHEN DATEDIFF(day, #minDate, S.date) = 5 THEN S.Result ELSE 0 END) AS Day6Result
FROM activity A
LEFT OUTER JOIN statistic S
ON A.activity_id = S.activity_ID
AND S.user_id = #userID
WHERE S.date between #minDate AND #maxDate
GROUP BY A.activity_id, A.group_id
First, I'm using group by to reduce the resultset to one row per activity_id/group_id, then I'm using CASE to separate values for each individual column. In this case I'm looking at which day in the last seven, but you can use whatever logic there to determine what date. The case statements will return the value of S.result if the row is for that particular day, or 0 if it's not. SUM will add up the individual values (or just the one, if there is only one) and consolidate that into a single row.
You'll also note my date range is based on midnight on the first day in the range and 11:59PM on the last day of the range to ensure all times are included in the range.
Finally, I'm performing a left join so you will always have a 0 in your columns, even if there are no statistics.
I'm not entirely sure how your results are segregated by group in addition to activity (unless group is a higher level construct), but here is the approach I would take:
SELECT activity_id
day1result = SUM(CASE DATEPART(weekday, date) WHEN 1 THEN result ELSE 0 END)
FROM statistic
GROUP BY activity_id
I will leave the rest of the days and addition of group_id to you, but you should see the general approach.