Related
With the query, I basically want to compare avg_clicks at different time periods and set a filter according to the avg_clicks.
The below query gives us avg_clicks for each shop in January 2020. But I want to see the avg_clicks that is higher than 0 in January 2020.
Question 1: When I add the where avg_clicks > 0 in the query, I am getting the following error: Column 'avg_clicks' cannot be resolved. Where to put the filter?
SELECT AVG(a.clicks) AS avg_clicks,
a.shop_id,
b.shop_name
FROM
(SELECT SUM(clicks_on) AS clicks,
shop_id,
date
FROM X
WHERE site = ‘com’
AND date >= CAST('2020-01-01' AS date)
AND date <= CAST('2020-01-31' AS date)
GROUP BY shop_id, date) as a
JOIN Y as b
ON a.shop_id = b.shop_id
GROUP BY a.shop_id, b.shop_name
Question 2: As I wrote, I want to compare two different times. And now, I want to see avg_clicks that is 0 in February 2020.
As a result, the desired output will show me the list of shops that had more than 0 clicks in January, but 0 clicks in February.
Hope I could explain my question. Thanks in advance.
For your Question 1 try to use having clause. Read execution order of SQL statement which gives you a better idea why are you getting avg_clicks() error.
SELECT AVG(a.clicks) AS avg_clicks,
a.shop_id,
b.shop_name
FROM
(SELECT SUM(clicks_on) AS clicks,
shop_id,
date
FROM X
WHERE site = ‘com’
AND date >= '2020-01-01'
AND date <= '2020-01-31'
GROUP BY shop_id, date) as a
JOIN Y as b
ON a.shop_id = b.shop_id
GROUP BY a.shop_id, b.shop_name
HAVING AVG(a.clicks) > 0
For your Question 2, you can do something like this
SELECT
shop_id,
b.shop_name,
jan_avg_clicks,
feb_avg_clicks
FROM
(
SELECT
AVG(clicks) AS jan_avg_clicks,
shop_id
FROM
(
SELECT
SUM(clicks_on) AS clicks,
shop_id,
date
FROM X
WHERE site = ‘com’
AND date >= '2020-01-01'
AND date <= '2020-01-31'
GROUP BY
shop_id,
date
) as a
GROUP BY
shop_id
HAVING AVG(clicks) > 0
) jan
join
(
SELECT
AVG(clicks) AS feb_avg_clicks,
shop_id
FROM
(
SELECT
SUM(clicks_on) AS clicks,
shop_id,
date
FROM X
WHERE site = ‘com’
AND date >= '2020-02-01'
AND date < '2020-03-01'
GROUP BY
shop_id,
date
) as a
GROUP BY
shop_id
HAVING AVG(clicks) = 0
) feb
on jan.shop_id = feb.shop_id
join Y as b
on jan.shop_id = b.shop_id
Start with conditional aggregation:
SELECT shop_id,
SUM(CASE WHEN DATE_TRUNC('month', date) = '2020-01-01' THEN clicks_on END) / COUNT(DISTINCT date) as avg_clicks_jan,
SUM(CASE WHEN DATE_TRUNC('month', date) = '2020-02-01' THEN clicks_on END) / COUNT(DISTINCT date) as avg_clicks_feb
FROM X
WHERE site = 'com' AND
date >= '2020-01-01' AND
date < '2020-03-01'
GROUP BY shop_id;
I'm not sure what comparison you want to make. But if you want to filter based on the aggregated values, use a HAVING clause.
I have these table:
server ocurrences date
A 122 20200101
B 1 20200101
C 15 20200101
............
I'm tring to get these result:
A;B;C
20200101 122;1;15
I make these query:
select server, ocurrences, date FROM NET_REPORT
where to_char(date,'YYYYMMDD') >= '20200101'
AND server IN ('A','B','C') GROUP BY date, server,ocurrences ORDER BY date,server;
But I can't get what I want.
Could you help me please?
Thanks
I think you want string_agg() or array_agg(). I strongly recommend the latter:
select date, array_agg(server) as servers,
array_agg(ocurrences) as occurrences
from net_report
where date >= '2020-01-01' and
date < '2020-01-02' and
server in ('des', 'pre', 'prod')
group by date
order by date;
I often face the situation where I need to compare aggregated data of different periods from the same source.
I usually deal with it this way:
SELECT
COALESCE(SalesThisYear.StoreId, SalesLastYear.StoreId) StoreId
, SalesThisYear.Sum_Revenue RevenueThisYear
, SalesLastYear.Sum_Revenue RevenueLastYear
FROM
(
SELECT StoreId, SUM(Revenue) Sum_Revenue
FROM Sales
WHERE Date BETWEEN '2017-09-01' AND '2017-09-30'
GROUP BY StoreId
) SalesThisYear
FULL JOIN (
SELECT StoreId, SUM(Revenue) Sum_Revenue
FROM Sales
WHERE Date BETWEEN '2016-09-01' AND '2016-09-30'
GROUP BY StoreId
) SalesLastYear
ON (SalesLastYear.StoreId = SalesThisYear.StoreId)
-- execution time 337 ms
It is not very elegant in my opinion, because it visits the table twice, but it works.
Another similar way to achieve the same is:
SELECT
Sales.StoreId
, SUM(CASE YEAR(Date) WHEN 2017 THEN Revenue ELSE 0 END) RevenueThisYear
, SUM(CASE YEAR(Date) WHEN 2016 THEN Revenue ELSE 0 END) RevenueLastYear
FROM
Sales
WHERE
Date BETWEEN '2017-09-01' AND '2017-09-30'
or Date BETWEEN '2016-09-01' AND '2016-09-30'
GROUP BY
StoreId
-- execution time 548 ms
Both solutions performs almost the same on my data set (1,929,419 rows in the selected period, all indexes on their places), the first one a little better in time. And it doesn't matter if I include more periods, the first one is always better on my data set.
This is only a simple example but, sometimes, it involves more than two intervals and even some logic (e.g. compare isoweek/weekday instead of month/day, compare different stores, etc).
Although I already have figured out several ways to achieve the same, I was wondering if there is a clever way to achieve the same. Maybe a more cleaner solution, or a more suitable for big data sets (over a TB).
For example, I suppose the second one is less resource intensive for a big data set, since it does a single Index Scan over the table. The first one, on the other hand, requires two Index Scans and a Merge. If the table is too big to fit in memory, what will happen? Or the first one is always better?
There is very rarely a This way of doing things is always better, especially when they are doing very similar things.
What I will suggest however is that you try to utilise best practise wherever you can, such as minimising the use of scalar functions in your queries as this inhibits index usage.
For example, by changing your second query to the following I would imagine you will see at least some improvement performance wise:
SELECT
Sales.StoreId
, SUM(CASE WHEN Date BETWEEN '2017-09-01' AND '2017-09-30' THEN Revenue ELSE 0 END) RevenueThisYear
, SUM(CASE WHEN Date BETWEEN '2016-09-01' AND '2016-09-30' THEN Revenue ELSE 0 END) RevenueLastYear
FROM
Sales
WHERE
Date BETWEEN '2017-09-01' AND '2017-09-30'
or Date BETWEEN '2016-09-01' AND '2016-09-30'
GROUP BY
StoreId
The second looks better. But I guess the year part is slowing the query. Lets take out the year and put this. 2017-01-01 will be greater for this year range('2017-09-01' AND '2017-09-30') and less for last year range ('2016-09-01' AND '2016-09-30') .
SELECT
Sales.StoreId
, SUM(CASE WHEN date > 2017-01-01 THEN Revenue ELSE 0 END) RevenueThisYear
, SUM(CASE WHEN date < 2017-01-01 THEN Revenue ELSE 0 END) RevenueLastYear
FROM
Sales
WHERE
Date BETWEEN '2017-09-01' AND '2017-09-30'
or Date BETWEEN '2016-09-01' AND '2016-09-30'
GROUP BY
StoreId
IF FULL join is working great, lets try this.
SELECT
COALESCE(SalesThisYear.StoreId, SalesLastYear.StoreId) StoreId
, sum(SalesThisYear.Revenue) RevenueThisYear
, sum(SalesLastYear.Revenue) RevenueLastYear
FROM Sales SalesThisYear full join
Sales SalesLastYear
ON SalesLastYear.StoreId = SalesThisYear.StoreId
WHERE SalesThisYear.Date BETWEEN '2017-09-01' AND '2017-09-30'
AND SalesLastYear.Date BETWEEN '2016-09-01' AND '2016-09-30'
GROUP BY COALESCE(SalesThisYear.StoreId, SalesLastYear.StoreId)
Edit *
SELECT Sales.StoreId
, SUM(CASE WHEN date > '2017-01-01' THEN Revenue ELSE 0 END) RevenueThisYear
, SUM(CASE WHEN date < '2017-01-01' THEN Revenue ELSE 0 END) RevenueLastYear
FROM
(Select store_id, date, revenue
from Sales
WHERE Date BETWEEN '2017-09-01' AND '2017-09-30'
or Date BETWEEN '2016-09-01' AND '2016-09-30') q
GROUP BY StoreId
I have 2 tables:
Ticket_Report
Ticket_Report_Snapshot
The Ticket_Report_Snapshot table is an exact copy of the Ticket_Report table, but has 1 extra column:
Snapshot_Date
A snap shot of the Ticket report table is taken everyday, with the Snapshot_date being the date that the snapshot was taken.
The columns that both Tables have that I am working with are:
Project_group, Ticket_Status
I need to create a stored procedure that takes 2 Date parameters. From these 2 dates, I need to print the count of all open tickets for each project on the last day of each month in between the 2 dates passed (the last day of each month is to be searched for in the Snapshot_Date column of the Ticket_report_snapshot table).
This is what I have so far:
--This query gives me the last day of any particular month
DECLARE #dtDate DATETIME
SET #dtDate = '1/6/2016'
SELECT DATEADD(s,-1,DATEADD(mm, DATEDIFF(m,0,#dtDate)+1,0))
-- ouput: 2016-01-31 23:59:59.000
SELECT Project_Group as Project_Name, count(ticket_status) as Open_Tickets
FROM Ticket_Report_SnapShot
WHERE ticket_status != 'closed' AND ticket_status != 'cancelled'
AND snapshot_date = '2016-01-06'
GROUP BY Project_Group
--Right now, the output is perfect for this 1 date, hard coded in
--OutPut:
Project_Name Open_Tickets
Project 1 77
Project 2 5
Project 3 118
Project 4 22 --I need this kind of output, but for the last
Project 5 1 --day of each month between the 2 parameters
Project 6 2 --instead of just 1 date
Project 7 1
So I have 2 queries so far, 1 to give me the last day of any particular month, and 1 to show me the open tickets for 1 particular hard coded date.
How can I edit/combine these queries to use 2 date parameters, and give me the open tickets for each project for the last day of every month in between 2 date ranges?
Ex. 1/1/2016 and 3/3/2016 (1/31, 2/29, 3/31, these 3 dates would be searched for in the ticket_report_snapshot table, in the snapshot_date column)
You can use a recursive cte to get your dates and join your snapshot table to the cte
DECLARE #StartDate DATETIME = '2016-01-01',
#EndDate DATETIME = '2016-03-03';
WITH DateCTE AS
(
SELECT EOMONTH(#StartDate) snapshot_date
UNION ALL
SELECT EOMONTH(DATEADD(MONTH,1,snapshot_date))
FROM DateCTE
WHERE EOMONTH(DATEADD(MONTH,1,snapshot_date)) <= EOMONTH(#EndDate)
)
SELECT Project_Group AS Project_Name,
trs.snapshot_date,
COUNT(ticket_status) AS Open_Tickets
FROM Ticket_Report_SnapShot trs
INNER JOIN DateCTE cte ON trs.snapshot_date = cte.snapshot_date
WHERE ticket_status != 'closed'
AND ticket_status != 'cancelled'
GROUP BY Project_Group,
trs.snapshot_date
You can use this if you are still using SQL Server 2008
DECLARE #StartDate DATETIME = '2016-01-01',
#EndDate DATETIME = '2016-03-03';
WITH DateCTE AS
(
SELECT DATEADD(dd,-1,DATEADD(mm,DATEDIFF(m,0,#StartDate) + 1,0)) snapshot_date
UNION ALL
SELECT DATEADD(dd,-1,DATEADD(mm,DATEDIFF(m,0,DATEADD(MONTH,1,snapshot_date)) + 1,0))
FROM DateCTE
WHERE DATEADD(dd,-1,DATEADD(mm,DATEDIFF(m,0,DATEADD(MONTH,1,snapshot_date)) + 1,0)) <= DATEADD(dd,-1,DATEADD(mm,DATEDIFF(m,0,#EndDate) + 1,0))
)
SELECT Project_Group AS Project_Name,
trs.snapshot_date,
COUNT(ticket_status) AS Open_Tickets
FROM Ticket_Report_SnapShot trs
INNER JOIN DateCTE cte ON trs.snapshot_date = cte.snapshot_date
WHERE ticket_status != 'closed'
AND ticket_status != 'cancelled'
GROUP BY Project_Group,
trs.snapshot_date
I have a search set up which gives total count of new patient visits and total count of patient visits, and comparing the totals for the requested year to the previous year's totals.
The SQL queries the date fields firstexam and lastexam from the table patient_info.
I have since found out that some users do not update the lastexam with every patient visit, and therefore the lastexam would not give the total number of patient visits.
Total number of patient visits can be obtained by searching the transactions table. Invoices in the transaction table are marked with the column transtype as 'Inv'. So, the total number of patient visits would be the total number of invoices in the date range (taking into account that two invoices entered for a patient in a single day count as one visit).
Below is the code for the SQL query set up based on firstexam and lastexam.
I have been struggling with this for some time now and am stuck. Any help would be greatly appreciated.
select
to_char(('2012-' || m || '-01')::date, 'Month'),
thisyear, lastyear, totalthisyear, totallastyear
from (
select
extract(month from m) as m,
sum(case
when firstexam between '2013-01-01' and '2013-12-31' then firstexam_count
else 0 end
) as thisyear,
sum(case
when firstexam between '2012-01-01' and '2012-12-31' then firstexam_count
else 0 end
) as lastyear,
sum(case
when lastexam between '2013-01-01' and '2013-12-31' then lastexam_count
else 0 end
) as totalthisyear,
sum(case
when lastexam between '2012-01-01' and '2012-12-31' then lastexam_count
else 0 end
) as totallastyear
from
generate_series (
'2012-01-01'::date, '2013-12-31', '1 month'
) g(m)
left join (
select count(*) as firstexam_count, date_trunc('month', firstexam) as firstexam
from patient_info
where firstexam between '2012-01-01' and '2013-12-31'
group by 2
) pif on firstexam = m
left join (
select count(*) as lastexam_count, date_trunc('month', lastexam) as lastexam
from patient_info
where lastexam between '2012-01-01' and '2013-12-31'
group by 2
) pil on lastexam = m
group by 1
) s
order by m
If you want to report information about exams, you ought to store information about exams.
More specifically, if you want to count exams, you ought to store information about each exam.
Don't use column names like "thisyear" and "lastyear". This year isn't 2013, although that's how you present it.
Usually, visits and exams are different things. Be careful with terminology. (Here it's not such a big deal, because we don't have information about either visits or exams. Only about invoices. Still, it's a good habit.)
If you're concerned about a particular output format, ask yourself whether you're building a query or a report. Build queries in SQL. Build reports with a report writer or application code.
For simplicity, I'm going to
ignore the "patient_info" table,
ignore the outer join you need in order to generate zeroes for months in which there were no exams, and
use common table expressions. (In production I'd rather use views than common table expressions).
Let's start with just a table of transactions.
create table transactions (
ptnumber INT,
dateofservice date,
transtype varchar(3)
);
-- Not quite the same data you started with.
insert into transactions (ptnumber, dateofservice, transtype)
values
(1, '2012-01-01', 'Inv'),
(1, '2012-02-11', 'Inv'),
(2, '2012-01-02', 'Inv'),
(3, '2013-01-01', 'Inv'),
(4, '2013-02-12', 'Inv'),
(5, '2012-12-31', 'Inv'),
(5, '2013-12-31', 'Inv'),
(5, '2013-12-31', 'Inv'),
(6, '2013-06-21', 'Inv');
You said "two invoices entered for a patient in a single day count as one [exam]". I guess that means two or more. So we can extract the set of patient exams like this. I expect two rows for patient 5--one in 2012 and one in 2013.
select distinct ptnumber, dateofservice
from transactions
where transtype = 'Inv'
and dateofservice between '2012-01-01' and '2013-12-31'
order by ptnumber;
ptnumber dateofservice
--
1 2012-01-01
1 2012-02-11
2 2012-01-02
3 2013-01-01
4 2013-02-12
5 2012-12-31
5 2013-12-31
6 2013-06-21
This is the key to your whole problem--a set of distinct patient exams over a defined range of dates. Based on this set, counting patient visits by month is straightforward. (Counting them every which way is straightforward.)
with patient_exams as (
select distinct ptnumber, dateofservice
from transactions
where transtype = 'Inv'
and dateofservice between '2012-01-01' and '2013-12-31'
)
select to_char(dateofservice, 'YYYY-MM') as month_of_service, count(*) as num_patient_exams
from patient_exams
group by 1
order by 1;
month_of_service num_patient_visits
--
2012-01 2
2012-02 1
2012-12 1
2013-01 1
2013-02 1
2013-06 1
2013-12 1
First exams
Again, start by deriving a set that will give you reliable counts. You want one row per patient, and you want the earliest invoice date. The date of a patient's first exam has nothing to do with the date range you want to report; including the date range in this query's WHERE clause will give you the wrong data.
select ptnumber, min(dateofservice) as first_exam_date
from transactions
where transtype = 'Inv'
group by ptnumber
order by ptnumber;
ptnumber first_exam_date
--
1 2012-01-01
2 2012-01-02
3 2013-01-01
4 2013-02-12
5 2012-12-31
6 2013-06-21
Now counting how many new patients you gained each month is straightforward.
with first_exams as (
select ptnumber, min(dateofservice) as first_exam_date
from transactions
where transtype = 'Inv'
group by ptnumber
)
select to_char(first_exam_date, 'YYYY-MM') exam_month, count(*) num_first_exams
from first_exams
where first_exam_date between '2012-01-01' and '2013-12-31'
group by 1
order by 1;
exam_month num_first_exams
--
2012-01 2
2012-12 1
2013-01 1
2013-02 1
2013-06 1