Query Construction: Count, Case, Datediff, and Group By - sql

[Reframing prior question, which had been posed as a question about Cursors.]
I am looking for a way to select counts under certain date conditions.
Say there is a table, T1, with 2 fields (ID, Date). The ID is not a unique key. The table records events by id, and some ids occur frequently, some infrequently.
For example:
ID | Date
1 | 2010-01-01
2 | 2010-02-01
3 | 2010-02-15
2 | 2010-02-15
4 | 2010-03-01
I would like to create a new table with the following fields: ID, Date, Count of times ID appears in 6 months previous to Date, Count of times ID appears in 6 months after Date.
In essence, for every row in the existing table, I want to add a column that looks back for times the same ID has appeared in previous six months, and look ahead for times the same ID has appeared in following six months.
So the output for the example would hopefully look something like:
ID | Date | Lookback | Lookahead
1 | 2010-01-01 | 0 | 0
2 | 2010-02-01 | 0 | 1
3 | 2010-02-15 | 0 | 0
2 | 2010-02-15 | 1 | 0
4 | 2010-03-01 | 0 | 0
Is there a best way to formulate the appropriate query?

You can do this with a self join (Assuming you have a primary key of KeyID):
SELECT T.ID,
T.Date,
Lookback = COUNT(CASE WHEN t2.Date < T.Date THEN t2.ID END),
Lookahead = COUNT(CASE WHEN t2.Date > T.Date THEN t2.ID END)
FROM T
INNER JOIN T t2
ON t2.ID = t.ID
AND t2.Date >= DATEADD(MONTH, -6, T.Date)
AND T2.Date < DATEADD(MONTH, 6, T.Date)
GROUP BY T.ID, T.Date, T.KeyID;
Example on SQL Fiddle
The key is that it just joins all rows for the previous 6 months and the next 6 months, and counts the result. The COUNT(CASE WHEN... ensures that for the before column you are only counting the records before, and the after only the records after.

Related

How to calculate occurrence depending on months/years

My table looks like that:
ID | Start | End
1 | 2010-01-02 | 2010-01-04
1 | 2010-01-22 | 2010-01-24
1 | 2011-01-31 | 2011-02-02
2 | 2012-05-02 | 2012-05-08
3 | 2013-01-02 | 2013-01-03
4 | 2010-09-15 | 2010-09-20
4 | 2010-09-30 | 2010-10-05
I'm looking for a way to count the number of occurrences for each ID in a Year per Month.
But what is important, If some record has a Start date in the following month compared to the End date (of course from the same year) then occurrence should be counted for both months [e.g. ID 1 in the 3rd row has a situation like that. So in this situation, the occurrence for this ID should be +1 for January and +1 for February].
So I'd like to have it in this way:
Year | Month | Id | Occurrence
2010 | 01 | 1 | 2
2010 | 09 | 4 | 2
2010 | 10 | 4 | 1
2011 | 01 | 1 | 1
2011 | 02 | 1 | 1
2012 | 05 | 2 | 1
2013 | 01 | 3 | 1
I created only this for now...
CREATE TABLE IF NOT EXISTS counts AS
(SELECT
id,
YEAR (CAST(Start AS DATE)) AS Year_St,
MONTH (CAST(Start AS DATE)) AS Month_St,
YEAR (CAST(End AS DATE)) AS Year_End,
MONTH (CAST(End AS DATE)) AS Month_End
FROM source)
And I don't know how to move with that further. I'd appreciate your help.
I'm using Spark SQL.
Try the following strategy to achieve this:
Note:
I have created few intermediate tables. If you wish you can use sub-query or CTE depending on the permissions
I have taken care of 2 scenarios you mentioned (whether to count it as 1 occurrence or 2 occurrence) as you explained
Query:
Firstly, creating a table with flags to decide whether start and end date are falling on same year and month (1 means YES, 2 means NO):
/* Creating a table with flags whether to count the occurrences once or twice */
CREATE TABLE flagged as
(
SELECT *,
CASE
WHEN Year_st = Year_end and Month_st = Month_end then 1
WHEN Year_st = Year_end and Month_st <> Month_end then 2
Else 0
end as flag
FROM
(
SELECT
id,
YEAR (CAST(Start AS DATE)) AS Year_St,
MONTH (CAST(Start AS DATE)) AS Month_St,
YEAR (CAST(End AS DATE)) AS Year_End,
MONTH (CAST(End AS DATE)) AS Month_End
FROM source
) as calc
)
Now the flag in the above table will have 1 if year and month are same for start and end 2 if month differs. You can have more categories of flag if you have more scenarios.
Secondly, counting the occurrences for flag 1. As we know year and month are same for flag 1, we can take either of it. I have taken start:
/* Counting occurrences only for flag 1 */
CREATE TABLE flg1 as (
SELECT distinct id, year_st, month_st, count(*) as occurrence
FROM flagged
where flag=1
GROUP BY id, year_st, month_st
)
Similarly, counting the occurrences for flag 2. Since month differs for both the dates, we can UNION them before counting to get both the dates in same column:
/* Counting occurrences only for flag 2 */
CREATE TABLE flg2 as
(
SELECT distinct id, year_dt, month_dt, count(*) as occurrence
FROM
(
select ID, year_st as year_dt, month_st as month_dt FROM flagged where flag=2
UNION
SELECT ID, year_end as year_dt, month_end as month_dt FROM flagged where flag=2
) as unioned
GROUP BY id, year_dt, month_dt
)
Finally, we just have to SUM the occurrences from both the flags. Note that we use UNION ALL here to combine both the tables. This is very important because we need to count duplicates as well:
/* UNIONING both the final tables and summing the occurrences */
SELECT distinct year, month, id, SUM(occurrence) as occurrence
FROM
(
SELECT distinct id, year_st as year, month_st as month, occurrence
FROM flg1
UNION ALL
SELECT distinct id, year_dt as year, month_dt as month, occurrence
FROM flg2
) as fin_unioned
GROUP BY id, year, month
ORDER BY year, month, id, occurrence desc
Output of above query will be your expected output. I know this is not an optimized one, yet it works perfect. I will update if I come across optimized strategy. Comment if you have question.
db<>fiddle link here
Not sure if this works in Spark SQL.
But if the ranges aren't bigger than 1 month, then just add the extra to the count via a UNION ALL.
And the extra are those with the end in a higher month than the start.
SELECT YearOcc, MonthOcc, Id
, COUNT(*) as Occurrence
FROM
(
SELECT Id
, YEAR(CAST(Start AS DATE)) as YearOcc
, MONTH(CAST(Start AS DATE)) as MonthOcc
FROM source
UNION ALL
SELECT Id
, YEAR(CAST(End AS DATE)) as YearOcc
, MONTH(CAST(End AS DATE)) as MonthOcc
FROM source
WHERE MONTH(CAST(Start AS DATE)) < MONTH(CAST(End AS DATE))
) q
GROUP BY YearOcc, MonthOcc, Id
ORDER BY YearOcc, MonthOcc, Id
YearOcc | MonthOcc | Id | Occurrence
------: | -------: | -: | ---------:
2010 | 1 | 1 | 2
2010 | 9 | 4 | 2
2010 | 10 | 4 | 1
2011 | 1 | 1 | 1
2011 | 2 | 1 | 1
2012 | 5 | 2 | 1
2013 | 1 | 3 | 1
db<>fiddle here

Get the previous record based on conditions in Postgresql

I have two tables, the 1st contains transaction details and the 2nd contains user's orders :
id | transaction_date
1 | 2019-01-01
2 | 2019-02-01
3 | 2019-01-01
id | transaction_id | amount | user_id
15 1 7 1
20 2 15 1
25 3 25 1
And I would like to have this result, that is to say for all users orders have also the previous amount he paid based on the transaction date.
user_id | amount | previous amount
1 7 NULL
1 15 7
1 25 15
I tried multiple things including using the LAG function, but it doesn't seems to be possible with it because I have to join on another table to get the transaction_date. I think I should do a subquery with a left join but I don't figure out how to get only the previous order
Thanks
This is a join and lag():
select t2.user_id, t2.amount,
lag(t2.amount) over (partition by t2.user_id order by t1.date) as prev_amount
from table1 t1 join
table2 t2
on t2.transaction_id = t1.id;

Aggregate rows between two rows with certain value

I'm trying to formulate a query to aggregate rows that are between rows with a specific value: in this example I want to collapse and sum time of all rows that have an ID other than 1, but still show rows with ID 1.
This is my table:
ID | Time
----+-----------
1 | 60
2 | 10
3 | 15
1 | 30
4 | 100
1 | 20
This is the result I'm looking for:
ID | Time
--------+-----------
1 | 60
Other | 25
1 | 30
Other | 100
1 | 20
I have attempted to SUM and add a condition with CASE, or but so far my solutions only get me to sum ALL rows and I lose the intervals, so I get this:
ID | Time
------------+-----------
Other | 125
1 | 110
Any help or suggestions in the right direction would be greatly appreciated, thanks!
You need to define the groupings. SQLite is not great for this sort of manipulation, but you can do it by summing the "1" values up to each value.
In SQLite, we can use the rowid column for the ordering:
select (case when id = 1 then '1' else 'other' end) as which,
sum(time)
from (select t.*,
(select count(*) from t t2 where t2.rowid <= t.rowid and t2.id = 1) as grp
from t
) t
group by (case when id = 1 then '1' else 'other' end), grp
order by grp, which;

How to fill missing dates by groups in a table in sql

I want to know how to use loops to fill in missing dates with value zero based on the start/end dates by groups in sql so that i have consecutive time series in each group. I have two questions.
how to loop for each group?
How to use start/end dates for each group to dynamically fill in missing dates?
My input and expected output are listed as below.
Input: I have a table A like
date value grp_no
8/06/12 1 1
8/08/12 1 1
8/09/12 0 1
8/07/12 2 2
8/08/12 1 2
8/12/12 3 2
Also I have a table B which can be used to left join with A to fill in missing dates.
date
...
8/05/12
8/06/12
8/07/12
8/08/12
8/09/12
8/10/12
8/11/12
8/12/12
8/13/12
...
How can I use A and B to generate the following output in sql?
Output:
date value grp_no
8/06/12 1 1
8/07/12 0 1
8/08/12 1 1
8/09/12 0 1
8/07/12 2 2
8/08/12 1 2
8/09/12 0 2
8/10/12 0 2
8/11/12 0 2
8/12/12 3 2
Please send me your code and suggestion. Thank you so much in advance!!!
You can do it like this without loops
SELECT p.date, COALESCE(a.value, 0) value, p.grp_no
FROM
(
SELECT grp_no, date
FROM
(
SELECT grp_no, MIN(date) min_date, MAX(date) max_date
FROM tableA
GROUP BY grp_no
) q CROSS JOIN tableb b
WHERE b.date BETWEEN q.min_date AND q.max_date
) p LEFT JOIN TableA a
ON p.grp_no = a.grp_no
AND p.date = a.date
The innermost subquery grabs min and max dates per group. Then cross join with TableB produces all possible dates within the min-max range per group. And finally outer select uses outer join with TableA and fills value column with 0 for dates that are missing in TableA.
Output:
| DATE | VALUE | GRP_NO |
|------------|-------|--------|
| 2012-08-06 | 1 | 1 |
| 2012-08-07 | 0 | 1 |
| 2012-08-08 | 1 | 1 |
| 2012-08-09 | 0 | 1 |
| 2012-08-07 | 2 | 2 |
| 2012-08-08 | 1 | 2 |
| 2012-08-09 | 0 | 2 |
| 2012-08-10 | 0 | 2 |
| 2012-08-11 | 0 | 2 |
| 2012-08-12 | 3 | 2 |
Here is SQLFiddle demo
I just needed the query to return all the dates in the period I wanted. Without the joins. Thought I'd share for those wanting to put them in your query. Just change the 365 to whatever timeframe you are wanting.
DECLARE #s DATE = GETDATE()-365, #e DATE = GETDATE();
SELECT TOP (DATEDIFF(DAY, #s, #e)+1)
DATEADD(DAY, ROW_NUMBER() OVER (ORDER BY number)-1, #s)
FROM [master].dbo.spt_values
WHERE [type] = N'P' ORDER BY number
The following query does a union with tableA and tableB. It then uses group by to merge the rows from tableA and tableB so that all of the dates from tableB are in the result. If a date is not in tableA, then the row has 0 for value and grp_no. Otherwise, the row has the actual values for value and grp_no.
select
dat,
sum(val),
sum(grp)
from
(
select
date as dat,
value as val,
grp_no as grp
from
tableA
union
select
date,
0,
0
from
tableB
where
date >= date '2012-08-06' and
date <= date '2012-08-13'
)
group by
dat
order by
dat
I find this query to be easier for me to understand. It also runs faster. It takes 16 seconds whereas a similar right join query takes 32 seconds.
This solution only works with numerical data.
This solution assumes a fixed date range. With some extra work this query can be adapted to limit the date range to what is found in tableA.

SQL Calculate Days between two dates in one table

I have a table dbo.Trans which contains an id called bd_id(varchar) and transfer_date(Datetime), also an identifier member_id pk is trns_id and is sequential
Duplicates of bd_id and member_id exist in the table.
transfer_date |bd_id| member_id | trns_id
2008-01-01 00:00:00 | 432 | 111 | 1
2008-01-03 00:00:00 | 123 | 111 | 2
2008-01-08 00:00:00 | 128 | 111 | 3
2008-02-04 00:00:00 | 123 | 432 | 4
.......
For each member_id, I want to get the amount of days between dates and for each bd_id
E.G., member 111 used 432 from 2008-01-01 until 2008-02-01 so return should be 2
Then next would be 5
I know the DATEDIFF() function exists but I am not sure how to get the difference when dates are in the same table.
Any help appreciated.
You could try something like this.
select T1.member_id,
datediff(day, T1.transfer_date, T3.transfer_date) as DD
from YourTable as T1
cross apply (select top 1 T2.transfer_date
from YourTable as T2
where T2.transfer_date > T1.transfer_date and
T2.member_id = T1.member_id
order by T2.transfer_date) as T3
SE-Data
You must select 1st and 2nd records that you want, then get their dates and get DATEDIFF of those two dates.
DATEDIFF(date1, date2);
Your problem is getting the next member date.
Here is an example using a correlated subquery to get the next date:
select t.*, datediff(day, t.transfer_date, nextdate) as Days_Between
from (select t.*,
(select min(transfer_date)
from trans t2
where t.bd_id = t2.bd_id and
t.member_id = t2.member_id and
t.transfer_date < t2.transfer_date
) as NextDate
from trans t
) t
SQL Server 2012 has a function called lag() that makes this a bit easier to express.