SQL SUM and GROUP BY based on WHERE clause

SQL SUM and GROUP BY based on WHERE clause - sql

I'm running PostgreSQL 9.4 and have the following table structure for invoicing:
id BIGINT, time UNIX_TIMESTAMP, customer TEXT, amount BIGINT, status TEXT, billing_id TEXT
I hope I can explain my challenge correctly.
A invoice record can have 2 different status; begin, ongoing and done.
Several invoice records can be part of the same invoice line, over time.
So when an invoice period begins, a record is started with status begin.
Then every 6 hour there will be generated a new record with status ongoing containing the current amount spend in amount.
When an invoice is closed a record with status done is generated with the total amount spend in column amount. All the invoice records within the same invoice contains the same billing_id.
To calcuate a customers current spendings I can run the following:
SELECT sum(amount) FROM invoice_records where id = $1 and time between '2017-06-01' and '2017-07-01' and status = 'done'
But that does not take into account if there's an ongoing invoice which are not closed yet.
How can I also count the largest billing_id with no status done?
Hope it make sense.

Per invoice (i.e. billing_id) you want the amount of the record with status = 'done' if such exists or of the last record with status = 'ongoing'. You can use PostgreSQL's DISTINCT ON for this (or use standard SQL's ROW_NUMBER to rank the records per invoice).
SELECT DISTINCT ON (billing_id) billing_id, amount
FROM invoice_records
WHERE status IN ('done', 'ongoing', 'begin')
ORDER BY
billing_id,
CASE status WHEN 'done' THEN 1 WHEN 'ongoing' THEN 2 ELSE 3 END,
unix_timestamp desc;
The ORDER BY clause represents the ranking.

select sum (amount), id
from (
select distinct on (billing_id) *
from (
select distinct on (status, billing_id) *
from invoice_records
where
id = $1
and time between '2017-06-01' and '2017-07-01'
and status in ('done', 'ongoing')
order by status, billing_id desc
) s
order by billing_id desc
) s

Related

SQL query to add column that counts number of encounters for the past year from each encounter

I am trying to identify High-Usage status for customers, so at time of order how many orders did the customer place in the last year. Each customer has unique ID and each order has unique ID, with a date/time stamp at time of order. This is not just adding a count column, but a conditional count. I can recreate this in Excel using sumproduct, but wanted to see if I can automate the process in SMSS before my pull.
I tried a subquery column and then doing a join on a subquery result:
SELECT (*)
,HU_CUSTOMER_YOY
FROM data
LEFT JOIN (SELECT MAX(ORDER_ID) AS ORDER_ID
,COUNT(CUSTOMER_ID) AS HU_CUSTOMER_YOY
FROM data AS CUS_HU
WHERE ORDER_DTTM > DATEADD(YEAR, -1, ORDER_DTTM)
GROUP BY CUS_HU.CUSTOMER_ID)
CUSHU on CUSHU.ORDER_ID = data.ORDER_ID
This pulls in a value ONLY on the most recent order and counts ALL previous orders. To reiterate, I need a value on EACH unique order to count every order for that customer for the previous year from that unique order. My issue is using the DTTM column. If I use a static date like getdate(), it will count but I need the count for the DTTM-1year on EACH order to view historical data, i.e., when a customer began and fell-off High-Usage status, what contributed to the change, etc.
This is for a rather large dataset that is refreshed daily. I would prefer to not have the main query be aggregated, if possible, which is why I thought creating and joining a reference table would be preferred.
Is this possible?
Adding expected query results:
customer_id
order_id
HU_count
order_dttm
c1
c1-1
0
1/1/2020
c1
c1-2
1
7/1/2020
c1
c1-3
0
1/1/2022
c1
c1-4
1
1/10/2022
c2
c2-1
0
1/11/2022
c1
c1-5
2
1/14/2022
c2
c2-2
1
1/15/2022

I assume you are using SQL Server from your usage of the DATEADD function.
Based on my understanding of the requirements, this will show the count for each order for each customer in the previous year.
SELECT DISTINCT
customer_id as hu_customer_yoy
,COUNT(case when order_dttm > DATEADD(YEAR, -1, order_dttm) THEN 1 ELSE null END)
over (partition by customer_id, order_id) AS ORDER_COUNT
,ORDER_ID
FROM data

Multiple sum subqueries for percentage

I need help with the following problem: I want to make a query that contains multiples sums and then takes those sums and uses them to get a percentage: percentage= s1/s1+s2.
I have as input the following data:
Orders shipping date, Nb of orders that have arrived late, Nb of orders that have arrived on time
What I want as output: The percentage of orders that have arrived late and orders that have arrived on time.
I want another column in the table that will have the percentage using SQL.
Concrete example:
*On 2022/01/04 **10:00 AM** I have 3 orders late and 4 order on time=> 7 orders in total. Percentage=3/7 (late), (4/7) on time
*At 2022/01/04 **11:00 AM** I have 5 orders late and 6 orders on time=>11 orders in total (but all this entry is summed with the previous entry so:) <=> 5+3 orders late, 4+6 orders on time, 18 orders in total => percentage= 8/18 late, 10 on time.
In order to sum previous entries order numbers with status "LATE" to current on time order number I wrote the following sql:
(sum1=s1)
SELECT s1.EventDate, (
SELECT SUM(s2.NbOfOrders)
FROM OrderShipmentStats s2
WHERE s2.EventDate <= s1.EventDate AND s2.Status='LATE'
) AS cnt
FROM OrderShipmentStats s1
GROUP BY s1.EventDate, s1.Status
The same kind of sql was written for "On Time" and it works. But what I need to do now is get the values and add them together of the two sql queries and based on the status which is late or on time do s1/s1+s2 or s2/s2+s1.
My problem is that I do not know how to do this formula in a single query using those 2 subqueries, any help would be great.
Picture with Table
Above there is the link with the picture containing how the table looks(I am new so I am not allowed to embed a photo).
The percentage column is the one I will add and there are lines pointing towards how that is calculated.

I created the table based on your image and added a few rows to it.
In the query you could see total orders count per hour, per status and the grand total as you mentioned in the image.
The query looks like:
create table OrderShipmentsStats
(
EventDate datetime not null,
Status varchar(10) not null,
OrdersCount int not null
)
insert into OrderShipmentsStats
values
('2022-01-04T10:00:00','Late',3),
('2022-01-04T10:00:00','On Time',4),
('2022-01-04T11:00:00','Late',5),
('2022-01-04T11:00:00','On Time',6),
('2022-01-04T12:00:00','Late',1),
('2022-01-04T12:00:00','On Time',2)
SELECT
EventDate,
Status,
OrdersCount,
TotalPerHour,
StatusTotal,
GrandStatusTotal,
-- at the line below, multiplying by 1.0 will convert the result and we would receive smth like 0.45, 0.123, some percentage
-- but we want the actual percent like 15%, or 50%. to obtain it, just multiply by 100
cast(1.0 * o.StatusTotal / o.GrandStatusTotal as decimal(5,3)) * 100 as Percentage
from
(
select
EventDate,
Status,
OrdersCount,
TotalPerHour,
StatusTotal,
SUM(TotalPerHour) over (partition by Status order by EventDate asc) as GrandStatusTotal
from
(
select
EventDate,
Status,
OrdersCount,
Sum(OrdersCount) over (partition by EventDate order by EventDate asc) as TotalPerHour,
SUM(OrdersCount) over (partition by Status order by EventDate asc) as StatusTotal
from OrderShipmentsStats
) as t
) as o
order by EventDate, Status

How show the last status of a mobile number and old data in the same row ? using SQL

I'm working in a telecom and part of work is to check the last status for a specific mobile number along with that last de-active status,it's easy to get the active number by using the condition ACTIVE int the statement ,but it's not easy to pick the last de-active status because each number might have more than one de-active status or only one status ACTIVE, I use the EXP_DATE as an indicator for the last de-active status,I want to show both new data and old data in one row,but I'm struggling with that ,below my table and my expected result :-
my expected result
query that I use on daily basis
select * from test where exp_date>sysdate; to get the active numbers , to get the de-active number select * from test where exp_date<sysdate;

You just need to do outer join with one subquery containing ACTIVE records and one with latest DE-ACTIVE record as following:
SELECT A.MSISDN,
A.NAME,
A.SUB_STATUS,
A.CREATED_DATE,
A.EXP_DATE,
D.MSISDN AS MSISDN_,
D.NAME AS OLD_NAME,
D.SUB_STATUS OLD_STATUS,
D.CREATED_DATE AS OLD_CREATED_DATE,
D.EXP_DATE AS OLD_EXP_DATE
FROM
(SELECT * FROM TEST
WHERE EXP_DATE > SYSDATE
AND SUB_STATUS = 'ACTIVE') A -- ACTIVE RECORD
-- USE CONDITION TO FETCH ACTIVE RECORD AS PER YOUR REQUIREMENT
FULL OUTER JOIN
(SELECT * FROM
(SELECT T.*,
ROW_NUMBER() OVER (PARTITION BY T.MSISDN ORDER BY EXP_DATE DESC NULLS LAST) AS RN
FROM TEST T
WHERE T.EXP_DATE < SYSDATE
AND T.SUB_STATUS='DE-ACTIVE')
-- USE CONDITION TO FETCH DEACTIVE RECORD AS PER YOUR REQUIREMENT
WHERE RN = 1
) D
ON (A.MSISDN = D.MSISDN)
Cheers!!

Here is an overview of how to do this -- one query to get a distinct list of all the phone numbers, left join to a list of the most recent active on that phone number,left join to a list of the most recent de-active on the phone number

How about conditional aggregation?
select msidn,
max(case when status = 'DE-ACTIVE' then create_date end) as deactive_date,
max(case when status = 'ACTIVE' then exp_date end) as active_date
from test
group by msisdn

sql count/sum the number of calls until a specific date in another column

I have data that shows the customer calls. I have columns for customer number, phone number(1 customer can have many), date record for each voice call and duration of a call. Table looks lie below example.
CusID | PhoneNum | Date | Duration
20111 43576233 20.01.2016-14:00 00:10:12
20111 44498228 14.01.2016-15:30 00:05:12
20112 43898983 14.01.2016-15:30
What I want is to count the number of call attempts for each number before It is answered(Duration is > 0). So that I can estimate how many time I should call on average to reach a customer or phone number. It should basically count any column per phone number before min(Date) where duration is >0.
SELECT Phone, Min(Date) FROM XX WHERE Duration IS NOT NULL GROUP BY Phone --
I think This should give me the time limit until when I should count the number of calls. I could not figure out how to finish the rest of the job
EDIT- I will add an example
And the result should only count row number 5 since it is the call before the customer is reached for the first time. So resulted table should be like :

Your first step is valid:
SELECT
CusID
,PhoneNum
,MIN(Date) AS MinDate
FROM XX
WHERE Duration IS NOT NULL
GROUP BY CusID, PhoneNum
This gives you one row per PhoneNum with the date of the first successful call.
Now join this to original table and leave only those rows that have a prior date (per PhoneNum). Group it by PhoneNum again and count. The join should be LEFT JOIN to have a row with zero count for numbers that were answered on the first attempt.
WITH
CTE
AS
(
SELECT
CusID
,PhoneNum
,MIN(Date) AS MinDate
FROM XX
WHERE Duration IS NOT NULL
GROUP BY CusID, PhoneNum
)
SELECT
CusID
,PhoneNum
,COUNT(XX.PhoneNum) AS Count
FROM
CTE
LEFT JOIN XX
ON XX.PhoneNum = CTE.PhoneNum
AND XX.Date < CTE.MinDate
GROUP BY CusID, PhoneNum
;
If a number was never answered, it will not be included in the result set at all.

Please try this query:
SELECT phonecalls.CusID, COUNT(0) AS failedcalls, phonenumber, success.firstsuccess FROM phonecalls,
(SELECT min(Date) AS firstsuccess, CusID, phonenumber FROM phonecalls WHERE Duration IS NOT NULL GROUP BY CusID, phonenumber) success
WHERE phonecalls.CusID = success.CusID AND phonecalls.phonenumber = success.phonenumber AND phonecalls.Date < success.firstsuccess
GROUP BY phonecalls.CusID, phonecalls.phonenumber, success.firstsuccess;
I've not tested it...
Note: users which have not established a successfull call are not listed. Is this ok, or do you need them listed as well? If so, you need to "left join":
SELECT phonecalls.CusID, COUNT(0) AS failedcalls, phonenumber, success.firstsuccess FROM phonecalls LEFT JOIN
(SELECT min(Date) AS firstsuccess, CusID, phonenumber FROM phonecalls WHERE Duration IS NOT NULL GROUP BY CusID, phonenumber) success ON
phonecalls.CusID = success.CusID AND phonecalls.phonenumber = success.phonenumber AND phonecalls.Date < success.firstsuccess
GROUP BY phonecalls.CusID, phonecalls.phonenumber, success.firstsuccess;

In SQL Server 2012+, you can use the following logic:
Assign the number of "unanswered" calls to each row in the data. This uses conditional aggregation with a window function.
Then, take the maximum of the count for answered calls for each user.
Count the number of answered calls.
The ratio is the average.
This ignores strings of unanswered calls not followed by an answered call.
The resulting query:
select phone, max(cume_unanswered), count(*) as num_answered,
max(cume_unanswered) * 1.0 / count(*) as ratio
from (select t.*,
sum(case when duration is null then 1 else 0 end) over (partition by phone order by date) as cume_unanswered
from t
) t
where duration is not null
group by phone;

SQL - Different sum levels in one select with where clause

I have 2 tables. One has the orginal amount that remains static. The second table has a list of partial amounts applied over time against the orginal amount in the first table.
DB Tables:
***memotable***
ID [primary, unique]
Amount (Orginal Amount)
***transtable***
ID [many IDs in transtable to single ID in memotable]
AmountUsed (amount applied)
ApplyDate (date applied)
I would like to find, in a single select, the ID, amount used since last week (ApplyDate > 2011-04-21), amount used to date.
The only rows that should appear in the result is when an amount has been used since last week (ApplyDate > 2011-04-21).
I'm stuck on trying to get the sum for the amount used to date, since that needs to include AmountUsed values that are outside of when ApplyDate > 2011-04-21.

It is possible to avoid subselects in this case:
SELECT
ID,
AmountUsedSinceLastWeek = SUM(CASE WHEN ApplyDate > '4/21/2011' THEN AmountUsed END)
AmountUsedToDate = SUM(AmountUsed)
FROM TransTable
GROUP BY ID

Since you want to limit it to rows that happened since last week, but also want to include the total to date, I think the most efficient method would be to use sub-selects...
SELECT
lastWeek.ID,
lastWeek.AmountUsedSinceLastWeek,
toDate.AmountUsedToDate
FROM
(
SELECT
ID,
SUM(AmountUsed) AS AmountUsedSinceLastWeek
FROM TransTable
WHERE ApplyDate > '4/21/2011'
GROUP BY ID
) lastWeek JOIN
(
SELECT
ID,
SUM(AmountUsed) AS AmountUsedToDate
FROM TransTable
GROUP BY ID
) toDate ON lastWeek.ID = toDate.ID

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL SUM and GROUP BY based on WHERE clause - sql

select sum (amount), id from ( select distinct on (billing_id) * from ( select distinct on (status, billing_id) * from invoice_records where id = $1 and time between '2017-06-01' and '2017-07-01' and status in ('done', 'ongoing') order by status, billing_id desc ) s order by billing_id desc ) s

Related

SQL query to add column that counts number of encounters for the past year from each encounter

Multiple sum subqueries for percentage

How show the last status of a mobile number and old data in the same row ? using SQL

sql count/sum the number of calls until a specific date in another column

SQL - Different sum levels in one select with where clause

Categories

Resources