SQL sum 2 different column by different condtion then subtraction and add - sql

what I am trying is kind of complex, I will try my best to explain.
I achieved the first part which is to sum the column by hours.
example
ID TIMESTAMP CUSTAFFECTED
1 10-01-2013 01:00:23 23
2 10-01-2013 03:00:23 55
3 10-01-2013 05:00:23 2369
4 10-01-2013 04:00:23 12
5 10-01-2013 01:00:23 1
6 10-01-2013 12:00:23 99
7 10-01-2013 01:00:23 22
8 10-01-2013 02:00:23 3
output would be
Hour TotalCALLS CUSTAFFECTED
10/1/2013 01:00 3 46
10/1/2013 02:00 1 3
10/1/2013 03:00 1 55
10/1/2013 04:00 1 12
10/1/2013 05:00 1 2369
10/1/2013 12:00 1 99
Query
SELECT TRUNC(STARTDATETIME, 'HH24') AS hour,
COUNT(*) AS TotalCalls,
sum(CUSTAFFECTED) AS CUSTAFFECTED
FROM some_table
where STARTDATETIME >= To_Date('09-12-2013 00:00:00','MM-DD-YYYY HH24:MI:SS') and
STARTDATETIME <= To_Date('09-13-2013 00:00:00','MM-DD-YYYY HH24:MI:SS') and
GROUP BY TRUNC(STARTDATETIME, 'HH')
what I need
what I need sum 2 queries and group by timestamp/hour. 2nd query is exactly same as first but just the where clause is different.
2nd query
SELECT TRUNC(RESTOREDDATETIME , 'HH24') AS hour,
COUNT(*) AS TotalCalls,
SUM(CUSTAFFECTED) AS CUSTRESTORED
FROM some_table
where RESTOREDDATETIME >= To_Date('09-12-2013 00:00:00','MM-DD-YYYY HH24:MI:SS') and
RESTOREDDATETIME <= To_Date('09-13-2013 00:00:00','MM-DD-YYYY HH24:MI:SS')
GROUP BY TRUNC(RESTOREDDATETIME , 'HH24')
so I need to subtract custaffected - custrestoed, and display tht total.
I added link to excel file. http://goo.gl/ioo9hg
Thanks

Ok, now that correct sql is in question text, try this:
SELECT TRUNC(STARTDATETIME, 'HH24') AS hour,
COUNT(*) AS TotalCalls,
Sum(case when RESTOREDDATETIME is null Then 0 else 1 end) RestoredCount,
Sum(CUSTAFFECTED) as CUSTAFFECTED,
Sum(case when RESTOREDDATETIME is null Then 0 else CUSTAFFECTED end) CustRestored,
SUM(CUSTAFFECTED) -
Sum(case when RESTOREDDATETIME is null Then 0 else CUSTAFFECTED end) AS CUSTNotRestored
FROM some_table
where STARTDATETIME >= To_Date('09-12-2013 00:00:00','MM-DD-YYYY HH24:MI:SS')
and STARTDATETIME <= To_Date('09-13-2013 00:00:00','MM-DD-YYYY HH24:MI:SS')
GROUP BY TRUNC(STARTDATETIME, 'HH24')

I recently needed to do this and had to play with it some to get it to work.
The challenge is to get the results of one query to link over to another query all inside the same query and then manipulate the returned value of a field so that the value in a given field in one query's resultset, call it FieldA, is subtracted from the value in a field in a different resultset, call it FieldB. It doesn't matter if the subject values are the result of an aggregation function like COUNT(...); they could be any numeric field in a resultset needing grouping or not. Looking at values from aggregation functions just means you need to adjust your query logic to use GROUP BY for the proper fields. The approach requires creating in-line views in the query and using those as the source of data for doing the subtraction.
A red herring when dealing with this kind of thing is the MINUS operator (assuming you are using an Oracle database) but that will not work since MINUS is not about subtracting values inside a resultset's field values from one another, but subtracting one set of matching records found in another set of records from the final result set returned from the query. In addition, MINUS is not a SQL standard operator so your database probably won't support it if it isn't Oracle you are using. Still, it's awfully nice to have around when you need it.
OK, enough prelude. Here's the query form you will want to use, taking for example a date range we want grouped by YYYY-MM:
select inlineview1.year_mon, (inlineview1.CNT - inlineview2.CNT) as finalcnt from
(SELECT TO_CHAR(*date_field*, 'YYYY-MM') AS year_mon, count(*any_field_name*) as CNT
FROM *schemaname.tablename*
WHERE *date_field* > TO_DATE('*{a year}-{a month}-{a day}*', 'YYYY-MM-DD') and
*date_field* < TO_DATE('*{a year}-{a month}-{a day}*', 'YYYY-MM-DD') and
*another_field* = *{value_of_some_kind}* -- ... etc. ...
GROUP BY TO_CHAR(*date_field*, 'YYYY-MM')) inlineview1,
(SELECT TO_CHAR(*date_field*, 'YYYY-MM') AS year_mon, count(*any_field_name*) as CNT
FROM *schemaname.tablename*
WHERE *date_field* > TO_DATE('*{a year}-{a month}-{a day}*', 'YYYY-MM-DD') and
*date_field* < TO_DATE('*{a year}-{a month}-{a day}*', 'YYYY-MM-DD') and
*another_field* = *{value_of_some_kind}* -- ... etc. ...
GROUP BY TO_CHAR(*date_field*, 'YYYY-MM')) inlineview2
WHERE
inlineview1.year_mon = inlineview2.year_mon
order by *either or any of the final resultset's fields* -- optional
A bit less abstractly, an example wherein a bookseller wants to see the net number of books that were sold in any given month in 2013. To do this, the seller must subtract the number of books retruned for refund from the number sold. He does not care when the book was sold, as he feels a returned book represents a loss of a sale and income statistically no matter when it occurs vs. when the book was sold. Example:
select bookssold.year_mon, (bookssold.CNT - booksreturned.CNT) as netsalescount from
(SELECT TO_CHAR(SALE_DATE, 'YYYY-MM') AS year_mon, count(TITLE) as CNT
FROM RETAILOPS.ACTIVITY
WHERE SALE_DATE > TO_DATE('2012-12-31', 'YYYY-MM-DD') and
SALE_DATE < TO_DATE('2014-01-01', 'YYYY-MM-DD') and
OPERATION = 'sale'
GROUP BY TO_CHAR(SALE_DATE, 'YYYY-MM')) bookssold,
(SELECT TO_CHAR(SALE_DATE, 'YYYY-MM') AS year_mon, count(TITLE) as CNT
FROM RETAILOPS.ACTIVITY
WHERE SALE_DATE > TO_DATE('2012-12-31', 'YYYY-MM-DD') and
SALE_DATE < TO_DATE('2014-01-01', 'YYYY-MM-DD') and
OPERATION = 'return'
GROUP BY TO_CHAR(SALE_DATE, 'YYYY-MM')) booksreturned
WHERE
bookssold.year_mon = booksreturned.year_mon
order by bookssold.year_mon desc
Note that to be sure the query returns as expected, the two in-line views must be equijoined based as shown above on some criteria, as in:
bookssold.year_mon = booksreturned.year_mon
or the subtraction of the counted records can't be done on a 1:1 basis, as the query parser will not know which of the records returned with a grouped count value is to be subtracted from which. Failing to specifiy an equijoin condition will yield a Cartesian join result, probably not what you want (though you may inded want that). For example, adding 'booksreturned.year_mon' right after 'bookssold.year_mon' to the returned fields list in the top-level select statement in the above example and eliminating the
bookssold.year_mon = booksreturned.year_mon
criteria in its WHERE clause will produce a working query that does the subtraction calculation on the CNT values for the YYYY-MM values in the first two columns of the resultset and shows them in the third column. Handy to know this if you need it, as it has solid application in business trends analysis if you can compare sales and returns not just within a given atomic timeframe but as compared across such timeframes in a 1:N fashion.

Related

Calculating Datediff of two days based on when the sum of a column hits a number cap

Tried to see if this was asked anywhere else but doesn't seem like it. Trying to create a sql query to give me the date difference in days between '2022-10-01' and the date when our impression sum hits our cap of 5.
For context, we may see duplicate dates because someone revisit our website that day so we'll get a different session number to pair with that count. Here's an example table of one individual and how many impressions logged.
My goal is to get the number of days it takes to hit an impression cap of 5. So for this individual, they would hit the cap on '2022-10-07' and the days between '2022-10-01' and '2022-10-07' is 6. I am also calculating the difference before/after '2023-01-01' since I need this count for Q4 of '22 and Q1 of '23 but will not include in the example table. I have other individuals to include but for the purpose of asking here, I kept it to one.
Current Query:
select
click_date,
case
when date(click_date) < date('2023-01-01') and sum(impression_cnt = 5) then datediff('day', '2022-10-01', click_date)
when date(click_date) >= date('2023-01-01') and sum(impression_cnt = 5) then datediff('day', '2023-01-01', click_date)
else 0
end days_to_capped
from table
group by customer, click_date, impression_cnt
customer
click date
impression_cnt
123456
2022-10-05
2
123456
2022-10-05
1
123456
2022-10-06
1
123456
2022-10-07
1
123456
2022-10-11
1
123456
2022-10-11
3
Result Table
customer
days_to_cap
123456
6
I'm currently only getting 0 days and then 81 days once it hits 2022-12-21 (last date) for this individual so i know I need to fix my query. Any help would be appreciated!
Edited: This is in snowflake!
So, the issue with your query is that the sum is being calculated at the level that you are grouping by, which is every field, so it will always just be the value of the impressions field every time.
What you need to do is a running sum, which is a SUM() OVER (PARTITION BY...) statement. And then qualify the results of that:
First, just to get the data that you have:
with x as (
select *
from values
(123456,'2022-10-05'::date,2),
(123456,'2022-10-05'::date,1),
(123456,'2022-10-06'::date,1),
(123456,'2022-10-07'::date,1),
(123456,'2022-10-11'::date,1),
(123456,'2022-10-11'::date,3) x (customer,click_date,impression_cnt)
)
Then, I query the CTE to do the running sum with a QUALIFY statement to choose the record that actually has the value I'm looking for
select
customer,
case
when click_date < '2023-01-01'::date and sum(impression_cnt) OVER (partition by customer order by click_date) = 5 then datediff('day', '2022-10-01', click_date)
when click_date >= '2023-01-01'::date and sum(impression_cnt) OVER (partition by customer order by click_date) = 5 then datediff('day', '2023-01-01', click_date)
else 0
end days_to_capped
from x
qualify days_to_capped > 0;
The qualify filters your results to just the record that you cared about.

Declaration of the date parameter with the automatic addition of the month in the sql query

I work in SQL Developer by Oracle. I have a longer query where I have multiple dates in some conditions but every dates based on a start_date and only difference between them are months and days.
I want to declare only one date e.g. start_date='2021-06-01' and afterwards in query where I have condition like COLUMN_DATE BETWEEN DATE '2021-08-01' AND DATE '2021-08-31' only add months (in that example add 2 months in query and get the results from whole August/ e.g. 2021-08-01=start_date+(2months)). Is it possible to get results like that without entering each value separately? Below is my sample code.
Def start_date='2021-06-01'
Select
1column,
2column,
(case when exist(select 1
from table2
where between date '2021-08-01' and date '2021-08-31')
then 1 else 0 end) as 3column
from table1;
Use ADD_MONTHS and pass in your substitution variable:
Select column1,
column2,
case
when exist(select 1
from table2 t2
where t2.date_column >= ADD_MONTHS(TO_DATE(&start_date, 'YYYY-MM-DD'), 2)
and t2.date_column < ADD_MONTHS(TO_DATE(&start_date, 'YYYY-MM-DD'), 3)
)
then 1
else 0
end as column3
from table1;
Note: In Oracle, a DATE always has a time component (the user interface you are using may chose not to show the time component though, but it will still be there) so if you want a month's worth of data and you compare to DATE '2021-08-31' then you will miss any values between 2021-08-31 00:00:01 and 2021-08-31 23:59:59.

Basic Teradata SQL add column and summing columns

Not great with SQL, so sorry if my questions seem dumb. I have this working code that pulls the entry date and the number of people that entered Store 1 on that date.
select entry_date as Enter_Date
,count(entry_date) as Entries
from db_entry
where entry_date between '2017-03-05' and '2017-03-11'
and entry_code like 'STR1%'
group by entry_date
This is what it shows up as
Enter_Date Entries
3/5/2017 35
3/9/2017 30
3/10/2017 27
3/8/2017 23
3/7/2017 29
3/6/2017 32
3/11/2017 39
I was wondering if there was a way to add another column for store 2, where the entry_code is 'STR2%'. The reason I'm not sure what to do is because I'm not pulling a different column from the db_entry, so I'm not sure how to differentiate the two columns in the WHERE clause.
In addition, I was wondering if there was a quick way to sum each column and have the latest date as the Enter Date. Ideally this is what I'd like my table to look like:
Enter_Date Store 1 Store 2
3/11/2017 215 301
Use case expressions to do conditional counting.
select entry_date as Enter_Date,
count(case when entry_code like 'STR1%' then entry_date end) as Entries1,
count(case when entry_code like 'STR2%' then entry_date end) as Entries2
from db_entry
where entry_date between '2017-03-05' and '2017-03-11'
and entry_code like any ('STR1%', 'STR2%')
group by entry_date
Note: The WHERE clause's like str1/str2 isn't really needed now, but may perhaps speed the query up.
Edit: Now using like any, as suggested by #Dudu Markovitz!
To answer your 2nd question, simply remove the GROUP BY and switch to:
select MAX(entry_date) as Enter_Date,
count(case when entry_code like 'STR1%' then entry_date end) as "Store 1",
count(case when entry_code like 'STR2%' then entry_date end) as "Store 2"
from db_entry
where entry_date between date '2017-03-05' and date '2017-03-11'
and entry_code like any ('STR1%', 'STR2%')

Multiple aggregate sums from different conditions in one sql query

Whereas I believe this is a fairly general SQL question, I am working in PostgreSQL 9.4 without an option to use other database software, and thus request that any answer be compatible with its capabilities.
I need to be able to return multiple aggregate totals from one query, such that each sum is in a new row, and each of the groupings are determined by a unique span of time, e.g. WHERE time_stamp BETWEEN '2016-02-07' AND '2016-02-14'. The number of records that satisfy there WHERE clause is unknown and may be zero, in which case ideally the result is "0". This is what I have worked out so far:
(
SELECT SUM(minutes) AS min
FROM downtime
WHERE time_stamp BETWEEN '2016-02-07' AND '2016-02-14'
)
UNION ALL
(
SELECT SUM(minutes)
FROM downtime
WHERE time_stamp BETWEEN '2016-02-14' AND '2016-02-21'
)
UNION ALL
(
SELECT SUM(minutes)
FROM downtime
WHERE time_stamp BETWEEN '2016-02-28' AND '2016-03-06'
)
UNION ALL
(
SELECT SUM(minutes)
FROM downtime
WHERE time_stamp BETWEEN '2016-03-06' AND '2016-03-13'
)
UNION ALL
(
SELECT SUM(minutes))
FROM downtime
WHERE time_stamp BETWEEN '2016-03-13' AND '2016-03-20'
)
UNION ALL
(
SELECT SUM(minutes)
FROM downtime
WHERE time_stamp BETWEEN '2016-03-20' AND '2016-03-27'
)
Result:
min
---+-----
1 | 119
2 | 4
3 | 30
4 |
5 | 62
6 | 350
That query gets me almost the exact result that I want; certainly good enough in that I can do exactly what I need with the results. Time spans with no records are blank but that was predictable, and whereas I would prefer "0" I can account for the blank rows in software.
But, while it isn't terrible for the 6 weeks that it represents, I want to be flexible and to be able to do the same thing for different time spans, and for a different number of data points, such as each day in a week, each week in 3 months, 6 months, each month in 1 year, 2 years, etc... As written above, it feels as if it is going to get tedious fast... for instance 1 week spans over a 2 year period is 104 sub-queries.
What I'm after is a more elegant way to get the same (or similar) result.
I also don't know if doing 104 iterations of a similar query to the above (vs. the 6 that it does now) is a particularly efficient usage.
Ultimately I am going to write some code which will help me build (and thus abstract away) the long, ugly query--but it would still be great to have a more concise and scale-able query.
In Postgres, you can generate a series of times and then use these for the aggregation:
select g.dte, coalesce(sum(dt.minutes), 0) as minutes
from generate_series('2016-02-07'::timestamp, '2016-03-20'::timestamp, interval '7 day') g(dte) left join
downtime dt
on dt.timestamp >= g.dte and dt.timestamp < g.dte + interval '7 day'
group by g.dte
order by g.dte;

Oracle SQL WHERE within a time range using sysdate

I'm trying to select data from the previous day, and within a certain time frame, but I may be calculating my where clause incorrectly. I've tried switching times around etc. Basically I want to see all data from 6am-6pm, and then 7pm-3am, but My results aren't relecting such. I've tried between trunc(sysdate)-1 '00:00:00'<- but specifying the time, but I feel I'm not familiar enough with the function.
Note: DB is in UTC hence the 8/24.
Query:
--TOTAL PROBLEM STOW EVENTS
SELECT to_char(entry_date -8/24, 'DD-MON-YYYY HH12:MI:SSam'), OLD_BIN_ID old_bin, NEW_BIN_ID NEW_BIN, ISBN ASIN, QUANTITY
FROM BINEDIT_ENTRIES
WHERE ENTRY_DATE BETWEEN trunc(SYSDATE) -1 +4/24 AND trunc(SYSDATE) -1 +16/24
--where entry_date BETWEEN trunc(sysdate)-1 '00:00:00' AND trunc(sysdate)-1 '00:00:00.000'
AND substr(old_bin_id,1,2) = 'SC'
AND substr(new_bin_id,1,2) = 'vt'
GROUP BY ENTRY_DATE, OLD_BIN_ID, NEW_BIN_ID, ISBN, Quantity
ORDER BY QUANTITY DESC;
Result:
This appears to look correct, BUT when I change to look at other time range, it shows me this..
Second Query(Night Time):
--TOTAL PROBLEM STOW EVENTS
SELECT to_char(entry_date -8/24, 'DD-MON-YYYY HH12:MI:SSam'), OLD_BIN_ID old_bin, NEW_BIN_ID NEW_BIN, ISBN ASIN, QUANTITY
FROM BINEDIT_ENTRIES
WHERE ENTRY_DATE BETWEEN trunc(SYSDATE) -1 +16/24 AND trunc(SYSDATE) -1 +24/24
--where entry_date BETWEEN trunc(sysdate)-1 '00:00:00' AND trunc(sysdate)-1 '00:00:00.000'
AND substr(old_bin_id,1,2) = 'SC'
AND substr(new_bin_id,1,2) = 'vt'
GROUP BY ENTRY_DATE, OLD_BIN_ID, NEW_BIN_ID, ISBN, Quantity
ORDER BY QUANTITY DESC;
Result:
As you can see it doesn't appear to be looking at the where clause, I believe I have it formatted incorrectly, I typically just look at yesterday as a whole, and not a time range, so this is my first time attempting this. Thank you.
Effectively you're asking for everything between 8 AM and 4 PM local time. I say 8 AM since you're adding 16 hours in the WHERE clause and subtracting 8 in the SELECT clause.
If you meant to query between 7 PM local time and 3AM you would just add 8 hours in the WHERE clause:
WHERE ENTRY_DATE BETWEEN
trunc(SYSDATE) -1 +19/24 + 8/24
AND trunc(SYSDATE) -1 +27/24 + 8/24