RowNumber() and SUM() in one query - sql

is there some way who to get last record using rownumber() and SUM of one field (money in this case)?
I've tried to come up with a query like:
SELECT
[date]
,...
FROM
(
SELECT
CAST(t.timestamp AS DATE) AS [date]
,.../some fields/
,row_number() over (partition by ca.logical_number order by t.timestamp DESC) as rownumber --last update(record) transaction
--,amount_transferred =
--(
-- SELECT
-- ,SUM(t.money_value) AS amount_transferred
-- FROM
-- TO_Transaction t
-- GROUP BY
-- CAST(t.timestamp AS Date)
--)
) AS t
WHERE rownumber=1
What the query is supposed to do is to find current purse balance and all money transferred during a day.
Any help would be aprreciated.
Thanks.

you can also do sum(field) over (...)
select
row_number() over (partition by ca.logical_number order by t.timestamp DESC) as rownumber,
sum(amount_transfered) over (partition by ca.logical_number ) as total_amount_transfered
from ...

Related

Update bigquery value based on partition by row number

I have a table in which I have records on the wrong date. I want to update them to be the day before for "snapshot_date". I have written the query to select the values I want to update the date for, but I don't know how to write the update query to change it to the previous day.
See screenshot
Query to select problematic records
Select * FROM(
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY Period, User_Struct) rn
FROM `XXX.YYY.TABLE`
where Snapshot_Date = '2021-10-04'
order by Period, User_Struct, Num_Active_Users asc
) where rn = 1
Using DATE_SUB you may get the previous day i.e.
SELECT DATE_SUB(cast('2021-10-04' as DATE), interval '1' day)
will give 2021-10-03.
You may try the following using Big Query Update Statement Syntax
UPDATE
`XXX.YYY.TABLE` t0
SET
t0.Snapshot_Date = DATE_SUB(t2.Snapshot_Date, interval '1' day)
FROM (
SELECT * FROM(
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY Period, User_Struct) rn
FROM
`XXX.YYY.TABLE`
WHERE
Snapshot_Date = '2021-10-04'
ORDER BY -- recommend removing order by here and use recommendation below for row_number
Period, User_Struct, Num_Active_Users asc
) t1
WHERE rn = 1
) t2
WHERE
t0.Snapshot_Date = t2.Snapshot_Date AND -- include other columns to match/join subquery with main table on
You should also specify how your rows should be ordered when using ROW_NUMBER eg
ROW_NUMBER() OVER (PARTITION BY Period, User_Struct ORDER BY Num_Active_Users asc)
if this generates the same/desired results.
Let me know if this works for you.

How to select last record from table consider to Year and WorkingPeriod(Month)

I have a table like this:
I want last [Status] for each [Guid], consider to latest [Year] and [WorkingPeriodTitle].
By the way I know that [WorkingPeriodTitle] should be replace by [WorkingPeriodId].
With ROW_NUMBER() window function:
select
t.[PaymentAllocationGuid], t.[Status]
from (
select *,
row_number() over (partition by [PaymentAllocationGuid] order by [Year] desc, [WorkingPeriodTitle] desc) rn
from tablename
) t
where t.rn = 1
SELECT *,
LAST_VALUE(Status) OVER (PARTITION BY PaymentAllocationGuid ORDER BY Year,
WorkingPeriodTitle RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
AS LastStatus
FROM tablexyz

Running count distinct

I am trying to see how the cumulative number of subscribers changed over time based on unique email addresses and date they were created. Below is an example of a table I am working with.
I am trying to turn it into the table below. Email 1#gmail.com was created twice and I would like to count it once. I cannot figure out how to generate the Running count distinct column.
Thanks for the help.
I would usually do this using row_number():
select date, count(*),
sum(count(*)) over (order by date),
sum(sum(case when seqnum = 1 then 1 else 0 end)) over (order by date)
from (select t.*,
row_number() over (partition by email order by date) as seqnum
from t
) t
group by date
order by date;
This is similar to the version using lag(). However, I get nervous using lag if the same email appears multiple times on the same date.
Getting the total count and cumulative count is straight forward. To get the cumulative distinct count, use lag to check if the email had a row with a previous date, and set the flag to 0 so it would be ignored during a running sum.
select distinct dt
,count(*) over(partition by dt) as day_total
,count(*) over(order by dt) as cumsum
,sum(flag) over(order by dt) as cumdist
from (select t.*
,case when lag(dt) over(partition by email order by dt) is not null then 0 else 1 end as flag
from tbl t
) t
DEMO HERE
Here is a solution that does not uses sum over, neither lag... And does produces the correct results.
Hence it could appear as simpler to read and to maintain.
select
t1.date_created,
(select count(*) from my_table where date_created = t1.date_created) emails_created,
(select count(*) from my_table where date_created <= t1.date_created) cumulative_sum,
(select count( distinct email) from my_table where date_created <= t1.date_created) running_count_distinct
from
(select distinct date_created from my_table) t1
order by 1

Group data by latest date per month

I store data on a daily basis in the following table
CREATE TABLE dbo.DemoTable
(
ReportDate DATE NOT NULL,
IdOne INT NOT NULL,
IdTwo INT NOT NULL,
NumberOfThings INT NOT NULL DEFAULT 0
CONSTRAINT PK_DemoTable PRIMARY KEY NONCLUSTERED (ReportDate, IdOne, IdTwo)
)
I'd like to report on this but only pull out data (sum of NumberOfThings) for the latest date we have for each month.
Example data
INSERT INTO DemoTable
(ReportDate, IdOne, IdTwo, NumberOfThings)
VALUES
('2016-11-02',1,2,2), ('2016-11-02',1,3,2), ('2016-11-01',1,2,20), ('2016-11-01',1,3,20),
('2016-10-31',1,2,2), ('2016-10-31',1,3,2), ('2016-10-30',1,2,20), ('2016-10-30',1,3,20), ('2016-10-29',1,2,200), ('2016-10-29',1,3,200),
('2016-09-30',1,2,5), ('2016-09-30',1,3,5), ('2016-09-29',1,2,55), ('2016-09-29',1,3,55)
So for this data I want to see:
2016-11-02 | 4
2016-10-31 | 4
2016-09-30 | 10
Thanks
You can use RANK() to spot the latest date rows on each month, and them sum them .
SELECT s.ReportDate,SUM(s.NumberOfThings)
FROM (
SELECT t.*,
RANK() OVER(PARTITION BY YEAR(t.ReportDate), MONTH(t.ReportDate) ORDER BY t.ReportDate DESC) as rnk
FROM DemoTable t) s
WHERE s.rnk = 1
GROUP BY s.ReportDate
You can use query like this
select ReportDate, sum(NumberofThings) as SumNumberofThings from DemoTable where ReportDate in
(
select max(ReportDate) MaxReportDate from DemoTable
group by datepart(yy,reportdate), datepart(m,reportdate)
)
group by ReportDate
A typical method involves row_number(). The only trick is using date functions to get the year and the month:
select dt.*
from (select dt.*,
row_number() over (partition by year(ReportDate), month(ReportDate)
order by ReportDate desc
) as seqnum
from DemoTable dt
) dt
where seqnum = 1;
If there are duplicates per date, you would just do the same thing with aggregation:
select dt.ReportDate, dt.NumberOfThings
from (select dt.ReportDate, sum(NumberOfThings) as NumberOfThings,
row_number() over (partition by year(ReportDate), month(ReportDate)
order by ReportDate desc
) as seqnum
from DemoTable dt
group by NumberOfThings
) dt
where seqnum = 1;
Aggregate your data so as to get the sum per date. Then rank your records by date within month. Then pick the best ranked records.
SELECT
ReportDate,
SumNumberOfThings
FROM
(
SELECT
ReportDate,
ROW_NUMBER() OVER (PARTITION BY YEAR(ReportDate), MONTH(ReportDate)
ORDER BY ReportDate DESC) AS rn
SUM(NumberOfThings) AS SumNumberOfThings
FROM DemoTable
GROUP BY ReportDate
) ranked
WHERE rn = 1
ORDER BY ReportDate;

How to add numbers to grouped columns that might repeat, in chronological order

Query:
DECLARE #EmploymentLength TABLE
(
EmployeeID INT,
Date DATE,
DateFlag CHAR(1),
RowNumber INT
);
INSERT INTO #EmploymentLength
(
EmployeeID,
Date,
DateFlag
)
SELECT z.EmployeeID,
z.Date,
z.DateFlag
FROM (SELECT EmployeeId,
HireDate AS Date,
'H' AS DateFlag
FROM dbo.Employment
WHERE EmployeeId = 328195
AND HireDate IS NOT NULL
UNION
SELECT EmployeeId,
TerminationDate AS Date,
'T' AS DateFlag
FROM dbo.Employment
WHERE EmployeeId = 328195
AND TerminationDate IS NOT NULL) z;
SELECT *
FROM #EmploymentLength
ORDER BY Date;
Result:
I need this to end up like this:
After this is done, I can group by the RowNumber to get the MAX() and MIN() of each row number group (1, 2, 3...).
If the last 2 records were "T", then I'd have 2 4's and so on.
EDIT
To clarify, I need to group each DateFlag and add a number to each group but it has to be in order ... (by date).
So in this example, you have 2 records that fall into the first group (group 1).
Then one record for group 2 (T)
Then one record for group 3 (H)
Then one record for group 4 (T)
You can do this with a difference of row_number() values to describe the group and then an additional dense_rank() to enumerate them. I think the following works:
select el.*, dense_rank() over (partition by EmployeeId order by grp)
from (select el.*,
(row_number() over (partition by EmployeeId order by date) -
row_number() over (partition by EmployeeId, DateFlag order by date)
) as grp
from #EmploymentLength el
) el;
There are situations where the grp value might actually repeat for different groups within an employee. In that case, it is better to use the minimum date for each group for the enumeration:
select el.*, dense_rank() over (partition by EmployeeId, order by grpdate)
from (select el.*, min(date) over (partition by EmployeeId, DateFlag, grp) as grpdate
from (select el.*,
(row_number() over (partition by EmployeeId order by date) -
row_number() over (partition by EmployeeId, DateFlag order by date)
) as grp
from #EmploymentLength el
) el
) el