Counting current items by month - sql

I'm trying to build a monthly tally of active equipment, grouped by service area from a database log table. I think I'm 90% of the way there; I have a list of months, along with the total number of items that existed, and grouped by region.
However, I also need to know the state of each item as they were on the first of each month, and this is the part I'm stuck on. For instance, Item 1 is in region A in January, but moves to Region B in February. Item 2 is marked as 'inactive' in February, so shouldn't be counted. My existing query will always count item 1 in region A, and item 2 as 'active'.
I can correctly show that Item 3 is deleted in March, and Item 4 doesn't show up until the April count. I realize that I'm getting the first values because my query is specifying the min date, I'm just not sure how I need to change it to get what I want.
I think I'm looking for a way to group by Max(OperationDate) for each Month.
The Table looks like this:
| EQUIPID | EQUIPNAME | EQUIPACTIVE | DISTRICT | REGION | OPERATIONDATE | OPERATION |
|---------|-----------|-------------|----------|--------|----------------------|-----------|
| 1 | Item 1 | 1 | 1 | A | 2015-01-01T00:00:00Z | INS |
| 2 | Item 2 | 1 | 1 | A | 2015-01-01T00:00:00Z | INS |
| 3 | Item 3 | 1 | 1 | A | 2015-01-01T00:00:00Z | INS |
| 2 | Item 2 | 0 | 1 | A | 2015-02-10T00:00:00Z | UPD |
| 1 | Item 1 | 1 | 1 | B | 2015-02-15T00:00:00Z | UPD |
| 3 | (null) | (null) | (null) | (null) | 2015-02-21T00:00:00Z | DEL |
| 1 | Item 1 | 1 | 1 | A | 2015-03-01T00:00:00Z | UPD |
| 4 | Item 4 | 1 | 1 | B | 2015-03-10T00:00:00Z | INS |
There is also a subtable that holds attributes that I care about. It's structure is similar. Unfortunately, due to previous design decisions, there is no correlation to operations between the two tables. Any joins will need to be done using the EquipmentID, and have the overlapping states matched up for each date.
Current query:
--cte to build date list
WITH calendar (dt) AS
(SELECT &fromdate from dual
UNION ALL
SELECT Add_Months(dt,1)
FROM calendar
WHERE dt < &todate)
SELECT dt, a.district, a.region, count(*)
FROM
(SELECT EQUIPID, DISTRICT, REGION, OPERATION, MIN(OPERATIONDATE ) AS FirstOp, deleted.deldate
FROM Equipment_Log
LEFT JOIN
(SELECT EQUIPID,MAX(OPERATIONDATE) as DelDate
FROM Equipment_Log
WHERE OPERATION = 'DEL'
GROUP BY EQUIPID
) Deleted
ON Equipment_Log.EQUIPID = Deleted.EQUIPID
WHERE OPERATION <> 'DEL' --AND additional unimportant filters
GROUP BY EQUIPID,DISTRICT, REGION , OPERATION, deldate
) a
INNER JOIN calendar
ON (calendar.dt >= FirstOp AND calendar.dt < deldate)
OR (calendar.dt >= FirstOp AND deldate is null)
LEFT JOIN
( SELECT EQUIPID, MAX(OPERATIONDATE) as latestop
FROM SpecialEquip_Table_Log
--where SpecialEquip filters
group by EQUIPID
) SpecialEquip
ON a.EQUIPID = SpecialEquip.EQUIPID and calendar.dt >= SpecialEquip.latestop
GROUP BY dt, district, region
ORDER BY dt, district, region

Take only last operation for each id. This is what row_number() and where rn = 1 do.
We have calendar and data. Make partitioned join.
I assumed that you need to fill values for months where entries for id are missing. So nvl(lag() ignore nulls) are needed, because if something appeared in January it still exists in Feb, March and we need district, region values from last not empty row.
Now you have everything to make count. That part where you mentioned SpecialEquip_Table_Log is up to you, because you left-joined this table and not used it later, so what is it for? Join if you need it, you have id.
db<>fiddle
with
calendar(mth) as (
select date '2015-01-01' from dual union all
select add_months(mth, 1) from calendar where mth < date '2015-05-01'),
data as (
select id, dis, reg, dt, op, act
from (
select equipid id, district dis, region reg,
to_char(operationdate, 'yyyy-mm') dt,
row_number()
over (partition by equipid, trunc(operationdate, 'month')
order by operationdate desc) rn,
operation op, nvl(equipactive, 0) act
from t)
where rn = 1 )
select mth, dis, reg, sum(act) cnt
from (
select id, mth,
nvl(dis, lag(dis) ignore nulls over (partition by id order by mth)) dis,
nvl(reg, lag(reg) ignore nulls over (partition by id order by mth)) reg,
nvl(act, lag(act) ignore nulls over (partition by id order by mth)) act
from calendar
left join data partition by (id) on dt = to_char(mth, 'yyyy-mm') )
group by mth, dis, reg
having sum(act) > 0
order by mth, dis, reg
It may seem complicated, so please run subqueries separately at first to see what is going on. And test :) Hope this helps.

Related

Running Total OVER clause, but Select Distinct instead of sum?

I have the following data set:
| EMAIL | SIGNUP_DATE |
| A#ABC.COM | 1/1/2021 |
| B#ABC.COM | 1/2/2021 |
| C#ABC.COM | 1/3/2021 |
In order to find the running total of email signups as of a certain day, I ran the following sql query:
select
signup_date,
count(email) OVER (order by signup_date ASC) as running_total_signups
I got the following results:
| SIGNUP_DATE | RUNNING_TOTAL_SIGNUPS |
| 1/1/21 | 1 |
| 1/2/21 | 2 |
| 1/3/21 | 3 |
However for my next step, I want to be able to see not just the running total signups, but the actual signup names themselves. Therefore I want to run the same window function (count(email) OVER (order by signup_date ASC)) but instead of a count(email) just a select distinct email. This would hopefully result in the following output:
| SIGNUP_DATE | RUNNING_TOTAL_SIGNUPS |
| 1/1/21 | a#abc.com |
| 1/2/21 | a#abc.com |
| 1/2/21 | b#abc.com |
| 1/3/21 | a#abc.com |
| 1/3/21 | b#abc.com |
| 1/3/21 | c#abc.com |
How would I do this? I'm getting an error on this code:
select
signup_date,
distinct email OVER (order by signup_date ASC) as running_total_signups
One way would be to cross-join the results and filter the joined table having a total <= to the running total:
with counts as (
select *,
Count(*) over (order by SIGNUP_DATE asc) as tot
from t
)
select c1.EMAIL, c1.SIGNUP_DATE
from counts c1
cross join counts c2
where c2.tot <= c1.tot
I want to run the same window function (count(email) OVER (order by
signup_date ASC)) but instead of a count(email) just a select distinct
email
Why do you want COUNT() window function?
It has nothing to do with with your reqirement.
All you need is a simple self join:
SELECT t1.SIGNUP_DATE, t2.EMAIL
FROM tablename t1 INNER JOIN tablename t2
ON t2.SIGNUP_DATE <= t1.SIGNUP_DATE
ORDER BY t1.SIGNUP_DATE, t2.EMAIL;
which will work for your sample data, but just in case there are more than 1 rows for each day in your table you should use:
SELECT t1.SIGNUP_DATE, t2.EMAIL
FROM (SELECT DISTINCT SIGNUP_DATE FROM tablename) t1 INNER JOIN tablename t2
ON t2.SIGNUP_DATE <= t1.SIGNUP_DATE
ORDER BY t1.SIGNUP_DATE, t2.EMAIL;
See the demo.
It's actually slightly simpler than Stu proposed:
select
x2.signup_date,
x1.email
from
signups x1
INNER JOIN signups x2 ON x1.signup_date <= x2.signup_date
order by signup_date
If you join the table to itself but for any date that is less than or equal to, it causes a half cartesian explosion. The lowest dated row matches with only itself. The next one matches with itself and the earlier one, so one of the table aliases has its data repeated.. This continues adding more rows to the explosion as the dates increase:
In this resultset we can see we want the emails from x1, and the dates from x2

How to calculate occurrence depending on months/years

My table looks like that:
ID | Start | End
1 | 2010-01-02 | 2010-01-04
1 | 2010-01-22 | 2010-01-24
1 | 2011-01-31 | 2011-02-02
2 | 2012-05-02 | 2012-05-08
3 | 2013-01-02 | 2013-01-03
4 | 2010-09-15 | 2010-09-20
4 | 2010-09-30 | 2010-10-05
I'm looking for a way to count the number of occurrences for each ID in a Year per Month.
But what is important, If some record has a Start date in the following month compared to the End date (of course from the same year) then occurrence should be counted for both months [e.g. ID 1 in the 3rd row has a situation like that. So in this situation, the occurrence for this ID should be +1 for January and +1 for February].
So I'd like to have it in this way:
Year | Month | Id | Occurrence
2010 | 01 | 1 | 2
2010 | 09 | 4 | 2
2010 | 10 | 4 | 1
2011 | 01 | 1 | 1
2011 | 02 | 1 | 1
2012 | 05 | 2 | 1
2013 | 01 | 3 | 1
I created only this for now...
CREATE TABLE IF NOT EXISTS counts AS
(SELECT
id,
YEAR (CAST(Start AS DATE)) AS Year_St,
MONTH (CAST(Start AS DATE)) AS Month_St,
YEAR (CAST(End AS DATE)) AS Year_End,
MONTH (CAST(End AS DATE)) AS Month_End
FROM source)
And I don't know how to move with that further. I'd appreciate your help.
I'm using Spark SQL.
Try the following strategy to achieve this:
Note:
I have created few intermediate tables. If you wish you can use sub-query or CTE depending on the permissions
I have taken care of 2 scenarios you mentioned (whether to count it as 1 occurrence or 2 occurrence) as you explained
Query:
Firstly, creating a table with flags to decide whether start and end date are falling on same year and month (1 means YES, 2 means NO):
/* Creating a table with flags whether to count the occurrences once or twice */
CREATE TABLE flagged as
(
SELECT *,
CASE
WHEN Year_st = Year_end and Month_st = Month_end then 1
WHEN Year_st = Year_end and Month_st <> Month_end then 2
Else 0
end as flag
FROM
(
SELECT
id,
YEAR (CAST(Start AS DATE)) AS Year_St,
MONTH (CAST(Start AS DATE)) AS Month_St,
YEAR (CAST(End AS DATE)) AS Year_End,
MONTH (CAST(End AS DATE)) AS Month_End
FROM source
) as calc
)
Now the flag in the above table will have 1 if year and month are same for start and end 2 if month differs. You can have more categories of flag if you have more scenarios.
Secondly, counting the occurrences for flag 1. As we know year and month are same for flag 1, we can take either of it. I have taken start:
/* Counting occurrences only for flag 1 */
CREATE TABLE flg1 as (
SELECT distinct id, year_st, month_st, count(*) as occurrence
FROM flagged
where flag=1
GROUP BY id, year_st, month_st
)
Similarly, counting the occurrences for flag 2. Since month differs for both the dates, we can UNION them before counting to get both the dates in same column:
/* Counting occurrences only for flag 2 */
CREATE TABLE flg2 as
(
SELECT distinct id, year_dt, month_dt, count(*) as occurrence
FROM
(
select ID, year_st as year_dt, month_st as month_dt FROM flagged where flag=2
UNION
SELECT ID, year_end as year_dt, month_end as month_dt FROM flagged where flag=2
) as unioned
GROUP BY id, year_dt, month_dt
)
Finally, we just have to SUM the occurrences from both the flags. Note that we use UNION ALL here to combine both the tables. This is very important because we need to count duplicates as well:
/* UNIONING both the final tables and summing the occurrences */
SELECT distinct year, month, id, SUM(occurrence) as occurrence
FROM
(
SELECT distinct id, year_st as year, month_st as month, occurrence
FROM flg1
UNION ALL
SELECT distinct id, year_dt as year, month_dt as month, occurrence
FROM flg2
) as fin_unioned
GROUP BY id, year, month
ORDER BY year, month, id, occurrence desc
Output of above query will be your expected output. I know this is not an optimized one, yet it works perfect. I will update if I come across optimized strategy. Comment if you have question.
db<>fiddle link here
Not sure if this works in Spark SQL.
But if the ranges aren't bigger than 1 month, then just add the extra to the count via a UNION ALL.
And the extra are those with the end in a higher month than the start.
SELECT YearOcc, MonthOcc, Id
, COUNT(*) as Occurrence
FROM
(
SELECT Id
, YEAR(CAST(Start AS DATE)) as YearOcc
, MONTH(CAST(Start AS DATE)) as MonthOcc
FROM source
UNION ALL
SELECT Id
, YEAR(CAST(End AS DATE)) as YearOcc
, MONTH(CAST(End AS DATE)) as MonthOcc
FROM source
WHERE MONTH(CAST(Start AS DATE)) < MONTH(CAST(End AS DATE))
) q
GROUP BY YearOcc, MonthOcc, Id
ORDER BY YearOcc, MonthOcc, Id
YearOcc | MonthOcc | Id | Occurrence
------: | -------: | -: | ---------:
2010 | 1 | 1 | 2
2010 | 9 | 4 | 2
2010 | 10 | 4 | 1
2011 | 1 | 1 | 1
2011 | 2 | 1 | 1
2012 | 5 | 2 | 1
2013 | 1 | 3 | 1
db<>fiddle here

How to create BigQuery this query in retail dataset

I have a table with user retail transactions. It includes sales and cancels. If Qty is positive - it sells, if negative - cancels. I want to attach cancels to the most appropriate sell. So, I have tables likes that:
| CustomerId | StockId | Qty | Date |
|--------------+-----------+-------+------------|
| 1 | 100 | 50 | 2020-01-01 |
| 1 | 100 | -10 | 2020-01-10 |
| 1 | 100 | 60 | 2020-02-10 |
| 1 | 100 | -20 | 2020-02-10 |
| 1 | 100 | 200 | 2020-03-01 |
| 1 | 100 | 10 | 2020-03-05 |
| 1 | 100 | -90 | 2020-03-10 |
User with ID 1 has the following actions: buy 50 -> return 10 -> buy 60 -> return 20 -> buy 200 -> buy 10 - return 90. For each cancel row (with negative Qty) I find the previous row (by Date) with positive Qty and greater than cancel Qty.
So I need to create BigQuery queries to create table likes this:
| CustomerId | StockId | Qty | Date | CancelQty |
|--------------+-----------+-------+------------+-------------|
| 1 | 100 | 50 | 2020-01-01 | -10 |
| 1 | 100 | 60 | 2020-02-10 | -20 |
| 1 | 100 | 200 | 2020-03-01 | -90 |
| 1 | 100 | 10 | 2020-03-05 | 0 |
Does anybody help me with these queries? I have created one candidate query (split cancel and sales, join them, and do some staff for removing), but it works incorrectly in the above case.
I use BigQuery, so any BQ SQL features could be applied.
Any ideas will be helpful.
You can use the following query.
;WITH result AS (
select t1.*,t2.Qty as cQty,t2.Date as Date_t2 from
(select *,ROW_NUMBER() OVER (ORDER BY qty DESC) AS [ROW NUMBER] from Test) t1
join
(select *,ROW_NUMBER() OVER (ORDER BY qty) AS [ROW NUMBER] from Test) t2
on t1.[ROW NUMBER] = t2.[ROW NUMBER]
)
select CustomerId,StockId,Qty,Date,ISNULL(cQty, 0) As CancelQty,Date_t2
from (select CustomerId,StockId,Qty,Date,case
when cQty < 0 then cQty
else NULL
end AS cQty,
case
when cQty < 0 then Date_t2
else NULL
end AS Date_t2 from result) t
where qty > 0
order by cQty desc
result: https://dbfiddle.uk
You can do this as a gaps-and-islands problem. Basically, add a grouping column to the rows based on a cumulative reverse count of negative values. Then within each group, choose the first row where the sum is positive. So:
select t.* (except cancelqty, grp),
(case when min(case when cancelqty + qty >= 0 then date end) over (partition by customerid grp) = date
then cancelqty
else 0
end) as cancelqty
from (select t.*,
min(cancelqty) over (partition by customerid, grp) as cancelqty
from (select t.*,
countif(qty < 0) over (partition by customerid order by date desc) as grp
from transactions t
) t
from t
) t;
Note: This works for the data you have provided. However, there may be complicated scenarios where this does not work. In fact, I don't think there is a simple optimal solution assuming that the returns are not connected to the original sales. I would suggest that you fix the data model so you record where the returns come from.
The below query seems to satisfy the conditions and the output mentioned.The solution is based on mapping the base table (t) and having the corresponding canceled qty row alongside from same table(t1)
First, a self join based on the customer and StockId is done since they need to correspond to the same customer and product.
Additionally, we are bringing in the canceled transactions t1 that happened after the base row in table t t.Dt<=t1.Dt and to ensure this is a negative qty t1.Qty<0 clause is added
Further we cannot attribute the canceled qty if they are less than the Original Qty. Therefore I am checking if the positive is greater than the canceled qty. This is done by adding a '-' sign to the cancel qty so that they can be compared easily. -(t1.Qty)<=t.Qty
After the Join, we are interested only in the positive qty, so adding a where clause to filter the other rows from the base table t with canceled quantities t.Qty>0.
Now we have the table joined to every other canceled qty row which is less than the transaction date. For example, the Qty 50 can have all the canceled qty mapped to it but we are interested only in the immediate one came after. So we first group all the base quantity values and then choose the date of the canceled Qty that came in first in the Having clause condition HAVING IFNULL(t1.dt, '0')=MIN(IFNULL(t1.dt, '0'))
Finally we get the rows we need and we can exclude the last column if required using an outer select query
SELECT t.CustomerId,t.StockId,t.Qty,t.Dt,IFNULL(t1.Qty, 0) CancelQty
,t1.dt dt_t1
FROM tbl t
LEFT JOIN tbl t1 ON t.CustomerId=t1.CustomerId AND
t.StockId=t1.StockId
AND t.Dt<=t1.Dt AND t1.Qty<0 AND -(t1.Qty)<=t.Qty
WHERE t.Qty>0
GROUP BY 1,2,3,4
HAVING IFNULL(t1.dt, '0')=MIN(IFNULL(t1.dt, '0'))
ORDER BY 1,2,4,3
fiddle
Consider below approach
with sales as (
select * from `project.dataset.table` where Qty > 0
), cancels as (
select * from `project.dataset.table` where Qty < 0
)
select any_value(s).*,
ifnull(array_agg(c.Qty order by c.Date limit 1)[offset(0)], 0) as CancelQty
from sales s
left join cancels c
on s.CustomerId = c.CustomerId
and s.StockId = c.StockId
and s.Date <= c.Date
and s.Qty > abs(c.Qty)
group by format('%t', s)
if applied to sample data in your question - output is

Calculating elapsed time in non-contiguous rows using SQL

I need to deduce uptime for servers using SQL with a table that looks as follows:
| Row | ID | Status | Timestamp |
-----------------------------------
| 1 | A1 | UP | 1598451078 |
-----------------------------------
| 2 | A2 | UP | 1598457488 |
-----------------------------------
| 3 | A3 | UP | 1598457489 |
-----------------------------------
| 4 | A1 | DOWN | 1598458076 |
-----------------------------------
| 5 | A3 | DOWN | 1598461096 |
-----------------------------------
| 6 | A1 | UP | 1598466510 |
-----------------------------------
In this example, A1 went down on Wed, 26 Aug 2020 16:07:56 and came back up at Wed, 26 Aug 2020 18:28:30. This means I need to find the difference between rows 6 and 4 using the ID field and display it as an additional column named "Uptime".
I have found several answers that explain how to use aliases and inner joins to calculate the difference between contiguous rows (e.g. How to get difference between two rows for a column field?), but none that explains how to do so for non-contiguous rows.
For example, this piece of code from https://www.mysqltutorial.org/mysql-tips/mysql-compare-calculate-difference-successive-rows/ gives a possible solution, but I don't know how to adapt it to compare the roaws based on the ID field:
SELECT
g1.item_no,
g1.counted_date from_date,
g2.counted_date to_date,
(g2.qty - g1.qty) AS receipt_qty
FROM
inventory g1
INNER JOIN
inventory g2 ON g2.id = g1.id + 1
WHERE
g1.item_no = 'A';
Any help would be much appreciated.
Basically, you need the total time minus the downtime.
If you want the different periods, you can use:
select status, max(timestamp), min(timestamp),
max(timestamp) - min(timestamp)
from (select t.*,
row_number() over (order by timestamp) as seqnum,
row_number() over (partition by status order by timestamp) as seqnum2
from t
) t
group by status, (seqnum - seqnum2);
However, for your purposes, for the total uptime:
select sum( coalesce(next_timestamp, max_uptimestamp) - min(timestamp))
from (select t.*,
lag(timestamp) over (order by status) as prev_status,
lead(timestamp) over (order by timestamp) as next_timestamp,
max(case when status = 'UP' then timestamp end) over () as max_uptimestamp
from t
) t
where status = 'UP' and
(prev_status = 'DOWN' or pre_status is null);
Basically, this counts all the time from the first UP to the next DOWN or to the last UP. It then sums that up.

Postgres: Deleting rows that are duplicated in one column based on the conditions of another column

I have a PostgreSQL table that stores user details called users as shown below
ID | user name | item | dos | Charge|
1 | Ed | 32 |01-02-1987| 1 |
2 | Taya | 01 |05-07-1981|-1 |
3 | Damian | 32 |22-19-1990| 1 |
2 | Taya | 01 |05-07-1981| 1 |
2 | Taya | 01 |05-07-1981| 1 |
1 | Ed | 32 |01-02-1987|-1 |
I want to delete rows where they are same across id, username, item and dos & sum of charges is 0. This means both row 1 and row 6 for ed gets deleted.
With more than 2 occurences, if the sum of charge is 1, i want one of the row with charge -1 and 1 deleted which means one row with charge 1 will be retained. For eg: ROw 2 and Row for Taya will be deleted.
The output table that i am after is:
ID | user name | item | dos | Charge|
3 | Damian | 32 |22-19-1990| 1 |
2 | Taya | 01 |05-07-1981| 1 |
Any ideas?
You want the having clause:
This will get you the output you want:
select
id, user_name, item, dos, sum (charge)
from table
group by
id, user_name, item, dos
having
sum (charge) != 0
If you're really trying to delete the records that make it zero:
delete from table
where (id, user_name, item, dos) in (
select id, user_name, item, dos
from table
group by id, user_name, item, dos
having sum (charge) = 0
)
This does the same thing, and is quite a bit more code, but because it's using a semi-join it might be better for really large datasets:
with delete_me as (
select id, user_name, item, dos
from table
group by id, user_name, item, dos
having sum (charge) = 0
)
delete from table t
where exists (
select null
from delete_me d
where
t.id = d.id and
t.user_name = d.user_name and
t.item = d.item and
t.dos = d.dos
)