SQL - Different sum levels in one select with where clause - sql

I have 2 tables. One has the orginal amount that remains static. The second table has a list of partial amounts applied over time against the orginal amount in the first table.
DB Tables:
***memotable***
ID [primary, unique]
Amount (Orginal Amount)
***transtable***
ID [many IDs in transtable to single ID in memotable]
AmountUsed (amount applied)
ApplyDate (date applied)
I would like to find, in a single select, the ID, amount used since last week (ApplyDate > 2011-04-21), amount used to date.
The only rows that should appear in the result is when an amount has been used since last week (ApplyDate > 2011-04-21).
I'm stuck on trying to get the sum for the amount used to date, since that needs to include AmountUsed values that are outside of when ApplyDate > 2011-04-21.

It is possible to avoid subselects in this case:
SELECT
ID,
AmountUsedSinceLastWeek = SUM(CASE WHEN ApplyDate > '4/21/2011' THEN AmountUsed END)
AmountUsedToDate = SUM(AmountUsed)
FROM TransTable
GROUP BY ID

Since you want to limit it to rows that happened since last week, but also want to include the total to date, I think the most efficient method would be to use sub-selects...
SELECT
lastWeek.ID,
lastWeek.AmountUsedSinceLastWeek,
toDate.AmountUsedToDate
FROM
(
SELECT
ID,
SUM(AmountUsed) AS AmountUsedSinceLastWeek
FROM TransTable
WHERE ApplyDate > '4/21/2011'
GROUP BY ID
) lastWeek JOIN
(
SELECT
ID,
SUM(AmountUsed) AS AmountUsedToDate
FROM TransTable
GROUP BY ID
) toDate ON lastWeek.ID = toDate.ID

Related

SQL query to add column that counts number of encounters for the past year from each encounter

I am trying to identify High-Usage status for customers, so at time of order how many orders did the customer place in the last year. Each customer has unique ID and each order has unique ID, with a date/time stamp at time of order. This is not just adding a count column, but a conditional count. I can recreate this in Excel using sumproduct, but wanted to see if I can automate the process in SMSS before my pull.
I tried a subquery column and then doing a join on a subquery result:
SELECT (*)
,HU_CUSTOMER_YOY
FROM data
LEFT JOIN (SELECT MAX(ORDER_ID) AS ORDER_ID
,COUNT(CUSTOMER_ID) AS HU_CUSTOMER_YOY
FROM data AS CUS_HU
WHERE ORDER_DTTM > DATEADD(YEAR, -1, ORDER_DTTM)
GROUP BY CUS_HU.CUSTOMER_ID)
CUSHU on CUSHU.ORDER_ID = data.ORDER_ID
This pulls in a value ONLY on the most recent order and counts ALL previous orders. To reiterate, I need a value on EACH unique order to count every order for that customer for the previous year from that unique order. My issue is using the DTTM column. If I use a static date like getdate(), it will count but I need the count for the DTTM-1year on EACH order to view historical data, i.e., when a customer began and fell-off High-Usage status, what contributed to the change, etc.
This is for a rather large dataset that is refreshed daily. I would prefer to not have the main query be aggregated, if possible, which is why I thought creating and joining a reference table would be preferred.
Is this possible?
Adding expected query results:
customer_id
order_id
HU_count
order_dttm
c1
c1-1
0
1/1/2020
c1
c1-2
1
7/1/2020
c1
c1-3
0
1/1/2022
c1
c1-4
1
1/10/2022
c2
c2-1
0
1/11/2022
c1
c1-5
2
1/14/2022
c2
c2-2
1
1/15/2022
I assume you are using SQL Server from your usage of the DATEADD function.
Based on my understanding of the requirements, this will show the count for each order for each customer in the previous year.
SELECT DISTINCT
customer_id as hu_customer_yoy
,COUNT(case when order_dttm > DATEADD(YEAR, -1, order_dttm) THEN 1 ELSE null END)
over (partition by customer_id, order_id) AS ORDER_COUNT
,ORDER_ID
FROM data

Multiple sum subqueries for percentage

I need help with the following problem: I want to make a query that contains multiples sums and then takes those sums and uses them to get a percentage: percentage= s1/s1+s2.
I have as input the following data:
Orders shipping date, Nb of orders that have arrived late, Nb of orders that have arrived on time
What I want as output: The percentage of orders that have arrived late and orders that have arrived on time.
I want another column in the table that will have the percentage using SQL.
Concrete example:
*On 2022/01/04 **10:00 AM** I have 3 orders late and 4 order on time=> 7 orders in total. Percentage=3/7 (late), (4/7) on time
*At 2022/01/04 **11:00 AM** I have 5 orders late and 6 orders on time=>11 orders in total (but all this entry is summed with the previous entry so:) <=> 5+3 orders late, 4+6 orders on time, 18 orders in total => percentage= 8/18 late, 10 on time.
In order to sum previous entries order numbers with status "LATE" to current on time order number I wrote the following sql:
(sum1=s1)
SELECT s1.EventDate, (
SELECT SUM(s2.NbOfOrders)
FROM OrderShipmentStats s2
WHERE s2.EventDate <= s1.EventDate AND s2.Status='LATE'
) AS cnt
FROM OrderShipmentStats s1
GROUP BY s1.EventDate, s1.Status
The same kind of sql was written for "On Time" and it works. But what I need to do now is get the values and add them together of the two sql queries and based on the status which is late or on time do s1/s1+s2 or s2/s2+s1.
My problem is that I do not know how to do this formula in a single query using those 2 subqueries, any help would be great.
Picture with Table
Above there is the link with the picture containing how the table looks(I am new so I am not allowed to embed a photo).
The percentage column is the one I will add and there are lines pointing towards how that is calculated.
I created the table based on your image and added a few rows to it.
In the query you could see total orders count per hour, per status and the grand total as you mentioned in the image.
The query looks like:
create table OrderShipmentsStats
(
EventDate datetime not null,
Status varchar(10) not null,
OrdersCount int not null
)
insert into OrderShipmentsStats
values
('2022-01-04T10:00:00','Late',3),
('2022-01-04T10:00:00','On Time',4),
('2022-01-04T11:00:00','Late',5),
('2022-01-04T11:00:00','On Time',6),
('2022-01-04T12:00:00','Late',1),
('2022-01-04T12:00:00','On Time',2)
SELECT
EventDate,
Status,
OrdersCount,
TotalPerHour,
StatusTotal,
GrandStatusTotal,
-- at the line below, multiplying by 1.0 will convert the result and we would receive smth like 0.45, 0.123, some percentage
-- but we want the actual percent like 15%, or 50%. to obtain it, just multiply by 100
cast(1.0 * o.StatusTotal / o.GrandStatusTotal as decimal(5,3)) * 100 as Percentage
from
(
select
EventDate,
Status,
OrdersCount,
TotalPerHour,
StatusTotal,
SUM(TotalPerHour) over (partition by Status order by EventDate asc) as GrandStatusTotal
from
(
select
EventDate,
Status,
OrdersCount,
Sum(OrdersCount) over (partition by EventDate order by EventDate asc) as TotalPerHour,
SUM(OrdersCount) over (partition by Status order by EventDate asc) as StatusTotal
from OrderShipmentsStats
) as t
) as o
order by EventDate, Status

SQL SUM and GROUP BY based on WHERE clause

I'm running PostgreSQL 9.4 and have the following table structure for invoicing:
id BIGINT, time UNIX_TIMESTAMP, customer TEXT, amount BIGINT, status TEXT, billing_id TEXT
I hope I can explain my challenge correctly.
A invoice record can have 2 different status; begin, ongoing and done.
Several invoice records can be part of the same invoice line, over time.
So when an invoice period begins, a record is started with status begin.
Then every 6 hour there will be generated a new record with status ongoing containing the current amount spend in amount.
When an invoice is closed a record with status done is generated with the total amount spend in column amount. All the invoice records within the same invoice contains the same billing_id.
To calcuate a customers current spendings I can run the following:
SELECT sum(amount) FROM invoice_records where id = $1 and time between '2017-06-01' and '2017-07-01' and status = 'done'
But that does not take into account if there's an ongoing invoice which are not closed yet.
How can I also count the largest billing_id with no status done?
Hope it make sense.
Per invoice (i.e. billing_id) you want the amount of the record with status = 'done' if such exists or of the last record with status = 'ongoing'. You can use PostgreSQL's DISTINCT ON for this (or use standard SQL's ROW_NUMBER to rank the records per invoice).
SELECT DISTINCT ON (billing_id) billing_id, amount
FROM invoice_records
WHERE status IN ('done', 'ongoing', 'begin')
ORDER BY
billing_id,
CASE status WHEN 'done' THEN 1 WHEN 'ongoing' THEN 2 ELSE 3 END,
unix_timestamp desc;
The ORDER BY clause represents the ranking.
select sum (amount), id
from (
select distinct on (billing_id) *
from (
select distinct on (status, billing_id) *
from invoice_records
where
id = $1
and time between '2017-06-01' and '2017-07-01'
and status in ('done', 'ongoing')
order by status, billing_id desc
) s
order by billing_id desc
) s

SQLite ROLLUP query

I am trying to get a summary of the balance per month within my database. The table has the following fields
tran_date
type (Income or Expense)
amount
I can get as far as retrieving the sum for each type for every month but want the sum for the whole month. This is my current query:
SELECT DISTINCT strftime('%m%Y', tran_date), type, SUM(amount) FROM tran WHERE exclude = 0 GROUP BY tran_date, type
This returns
032013 Income 100
032013 Expense 200
I would like the summary on one row, in this example 032013 -100.
Just use the right group by. This uses conditional aggregation, assuming that you want "income - expense":
SELECT strftime('%m%Y', tran_date), type,
SUM(case when type = 'Income' then amount when type = 'Expense' then - amount end)
FROM tran WHERE exclude = 0
GROUP BY tran_date;
If you want just the full sum, then this is easier:
SELECT strftime('%m%Y', tran_date), type,
SUM(amount)
FROM tran WHERE exclude = 0
GROUP BY tran_date;
Your original query returned type rows because "type" was in the group by clause.
Also, distinct is (almost) never needed with group by.

SQL query group by nearby timestamp

I have a table with a timestamp column. I would like to be able to group by an identifier column (e.g. cusip), sum over another column (e.g. quantity), but only for rows that are within 30 seconds of each other, i.e. not in fixed 30 second bucket intervals. Given the data:
cusip| quantity| timestamp
============|=========|=============
BE0000310194| 100| 16:20:49.000
BE0000314238| 50| 16:38:38.110
BE0000314238| 50| 16:46:21.323
BE0000314238| 50| 16:46:35.323
I would like to write a query that returns:
cusip| quantity
============|=========
BE0000310194| 100
BE0000314238| 50
BE0000314238| 100
Edit:
In addition, it would greatly simplify things if I could also get the MIN(timestamp) out of the query.
From Sean G solution, I have removed Group By on complete Table. In Fact re adjected few parts for Oracle SQL.
First after finding previous time, assign self parent id. If there a null in Previous Time, then we exclude giving it an ID.
Now based on take the nearest self parent id by avoiding nulls so that all nearest 30 seconds cusip fall under one Group.
As There is a CUSIP column, I assumed the dataset would be large market transactional data. Instead using group by on complete table, use partition by CUSIP and final Group Parent ID for better performance.
SELECT
id,
sub.parent_id,
sub.cusip,
timestamp,
quantity,
sum(sub.quantity) OVER(
PARTITION BY cusip, parent_id
) sum_quantity,
MIN(sub.timestamp) OVER(
PARTITION BY cusip, parent_id
) min_timestamp
FROM
(
SELECT
base_sub.*,
CASE
WHEN base_sub.self_parent_id IS NOT NULL THEN
base_sub.self_parent_id
ELSE
LAG(base_sub.self_parent_id) IGNORE NULLS OVER(
PARTITION BY cusip
ORDER BY
timestamp, id
)
END parent_id
FROM
(
SELECT
c.*,
CASE
WHEN nvl(abs(EXTRACT(SECOND FROM to_timestamp(previous_timestamp, 'yyyy/mm/dd hh24:mi:ss') - to_timestamp
(timestamp, 'yyyy/mm/dd hh24:mi:ss'))), 31) > 30 THEN
id
ELSE
NULL
END self_parent_id
FROM
(
SELECT
my_table.id,
my_table.cusip,
my_table.timestamp,
my_table.quantity,
LAG(my_table.timestamp) OVER(
PARTITION BY my_table.cusip
ORDER BY
my_table.timestamp, my_table.id
) previous_timestamp
FROM
my_table
) c
) base_sub
) sub
Below are the Table Rows
Input Data:
Below is the Output
RESULT
Following may be helpful to you.
Grouping of 30 second periods stating form a given time. Here it is '2012-01-01 00:00:00'. DATEDIFF counts the number of seconds between time stamp value and stating time. Then its is divided by 30 to get grouping column.
SELECT MIN(TimeColumn) AS TimeGroup, SUM(Quantity) AS TotalQuantity FROM YourTable
GROUP BY (DATEDIFF(ss, TimeColumn, '2012-01-01') / 30)
Here minimum time stamp of each group will output as TimeGroup. But you can use maximum or even grouping column value can be converted to time again for display.
Looking at the above comments, I'm assuming Chris's first scenario is the one you want (all 3 get grouped even though values 1 and 3 are not within 30 seconds of eachother, but are each within 30 seconds of value 2). Also going to assume that each row in your table has some unique ID called 'id'. You can do the following:
Create a new grouping, determining if the preceding row in your partition is more than 30 seconds behind the current row (e.g. determine if you need a new 30 second grouping, or to continue the previous). We'll call that parent_id.
Sum quantity over parent_id (plus any other aggregations)
The code could look like this
select
sub.parent_id,
sub.cusip,
min(sub.timestamp) min_timestamp,
sum(sub.quantity) quantity
from
(
select
base_sub.*,
case
when base_sub.self_parent_id is not null
then base_sub.self_parent_id
else lag(base_sub.self_parent_id) ignore nulls over (
partition by
my_table.cusip
order by
my_table.timestamp,
my_table.id
) parent_id
from
(
select
my_table.id,
my_table.cusip,
my_table.timestamp,
my_table.quantity,
lag(my_table.timestamp) over (
partition by
my_table.cusip
order by
my_table.timestamp,
my_table.id
) previous_timestamp,
case
when datediff(
second,
nvl(previous_timestamp, to_date('1900/01/01', 'yyyy/mm/dd')),
my_table.timestamp) > 30
then my_table.id
else null
end self_parent_id
from
my_table
) base_sub
) sub
group by
sub.time_group_parent_id,
sub.cusip