Selecting first element in Group by object Postgres - sql

I have the following table and I want to get the specidic Amount per loan_ID that corresponds to the earliest observation with greater than or equal to 10 dpd per month.
Loan_ID date dpd Amount
1 1/1/2017 1 55
1 1/2/2017 2 100
1 1/3/2017 3 5000
1 1/4/2017 5 6000
1 1/5/2017 10 50000
1 1/6/2017 15 50001
1 1/9/2017 31 50004
1 1/10/2017 55 50005
1 1/11/2017 59 50006
1 1/12/2017 65 50007
1 1/13/2017 70 80000
1 1/20/2017 85 900000
1 1/29/2017 92 100000
1 1/30/2017 93 10000
2 1/1/2017 0 522
2 1/2/2017 8 5444
2 1/3/2017 12 8784
2 1/6/2017 15 6221
2 1/12/2017 18 2220
2 1/13/2017 20 177
2 1/29/2017 35 5151
2 1/30/2017 60 40000
2 1/31/2017 61 5500
The expected output:
Loan_ID Month Amount
1 1 50000
2 1 8784

SELECT DISTINCT ON ("Loan_ID", date_trunc('month', "date"))
"Loan_ID",
date_trunc('month', "date")::date as month,
"Amount"
FROM
loans
WHERE
dpd >= 10
ORDER BY
"Loan_ID",
date_trunc('month', "date"),
"date"
;
Returns:
Loan_ID
month
Amount
1
2017-01-01
50000
2
2017-01-01
8784
You can find test case in db<>fiddle

Hmmm . . . if you want the amount per month and the first date that matches the condition, then you want conditional aggregation:
select loan_id, date_trunc('month', date) as mon,
sum(dpd),
min(case when dpd >= 10 then dpd end) as first_dpd_10
from t
group by load_id, mon;
Edit: Based on your comment, you can use distinct on:
select distinct on (loan_id, date_trunc('month', date)) t.*
min(case when dpd >= 10 then dpd end) as first_dpd_10
from t
where dpd >= 10
order by load_id, date_trunc('month', date), date

Related

Finding most recent startdate, and endDate from consecutive dates

I have a table like below:
user_id
store_id
stock
date
116
2
0
2021-10-18
116
2
0
2021-10-19
116
2
0
2021-10-20
116
2
0
2021-08-16
116
2
0
2021-08-15
116
2
0
2021-07-04
116
2
0
,2021-07-03
389
2
0
2021-07-02
389
2
0
2021-07-01
389
2
0
2021-10-27
52
6
0
2021-10-28
52
6
0
2021-10-29
52
6
0
2021-10-30
116
38
0
2021-05-02
116
38
0
2021-05-03
116
38
0
2021-05-04
116
38
0
2021-04-06
The table can have multiple consecutive days where a product ran out of stock, so I'd like to create a query with the last startDate and endDate where the product ran out of stock. For the table above, the results have to be:
user_Id
store_id
startDate
endDate
116
2
2021-10-18
2021-10-20
116
38
2021-05-02
2021-05-04
389
2
2021-07-01
2021-07-02
52
6
2021-10-28
2021-10-30
I have tried the solution with row_number(), but it didn't work. Does someone have a tip or idea to solve this problem with SQL (PostgreSQL)?
here is how you can do it :
select user_id, store_id,min(date) startdate,max(date) enddate
from (
select *, rank() over (partition by user_id, store_id order by grp desc) rn from (
select *, date - row_number() over (partition by user_id,store_id order by date) * interval '1 day' grp
from tablename
) t) t where rn = 1
group by user_id, store_id,grp
db<>fiddle here

How To Check If Value Is Decreasing Over Months SQLite

i got revenue over accounts monthly what am looking for is to view earnings for each account in descending order from last decrease
here is the query
SELECT account_id,
monthly_date,
earnings
FROM accounts_revenue
GROUP BY account_id,
monthly_date
the data is something like that
account_id
monthly_date
earnings
55
2017-01-01
2000
55
2017-02-01
1950
55
2017-10-01
2000
55
2018-02-01
1500
55
2018-05-01
1200
55
2018-12-01
3000
55
2019-01-01
900
55
2019-02-01
810
55
2019-04-01
1000
55
2019-05-01
600
55
2020-01-01
800
55
2020-02-01
100
122
2020-01-01
800
122
2020-02-01
100
so the data should be like that
account_id
monthly_date
earnings
55
2017-01-01
2000
55
2017-02-01
1950
55
2018-02-01
1500
55
2018-05-01
1200
55
2019-01-01
900
55
2019-02-01
810
55
2019-05-01
600
55
2020-02-01
100
122
2020-01-01
800
122
2020-02-01
100
any idea how to achieve this ??
Use NOT EXISTS:
SELECT ar1.*
FROM accounts_revenue ar1
WHERE NOT EXISTS (
SELECT 1
FROM accounts_revenue ar2
WHERE ar2.account_id = ar1.account_id
AND ar2.monthly_date < ar1.monthly_date
AND ar2.earnings <= ar1.earnings
)
ORDER BY ar1.account_id, ar1.monthly_date;
See the demo.
You can use the lag() window function and a CTE (Or subquery if you prefer) to filter out rows you don't want:
WITH revenue AS
(SELECT account_id, monthly_date, earnings,
lag(earnings) OVER (PARTITION BY account_id ORDER BY monthly_date) AS prev_earnings
FROM accounts_revenue)
SELECT account_id, monthly_date, earnings
FROM revenue
WHERE earnings < prev_earnings OR prev_earnings IS NULL
ORDER BY account_id, monthly_date;
For efficiency, you'll want an index on accounts_revenue(account_id, monthly_date).

How to group data weekly in column and hourly in row

I have data like following
ID SalesTime Qty Unit Price Item
1 01/01/2021 08:10:00 10 10 A
2 01/01/2021 11:30:00 2 9 B
3 01/01/2021 11:59:50 1 8 C
4 01/02/2021 13:00:00 5 15 D
5 01/03/2021 10:00:00 4 10 A
6 01/03/2021 12:00:00 5 9 B
7 01/03/2021 12:50:00 6 15 D
8 01/04/2021 10:50:00 5 8 C
9 01/04/2021 11:10:00 2 10 A
10 ............
I wanna summarize the total into the form,
for example:
Mon Tue Wed Thu Fri Sat Sun
08:00~09:59 20 21 50 100 60 70 210
10:00~11:59 60 25 60 90 75 80 200
12:00~13:59 100 10 50 60 70 50 150
How to do that in MS SQL, thanks a lot.
You can extract the hour and divide by two for the rows. And then use conditional aggregation for the columns. Assuming you want the total of the price times quantity:
select convert(time, dateadd(hour, 2 * (datepart(hour, salestime) / 2), 0)) as hh,
sum(case when datename(weekday, salestime) = 'Monday' then qty * unit_price end) as mon,
sum(case when datename(weekday, salestime) = 'Tuesday' then qty * unit_price end) as tue,
. . .
from t
group by datepart(hour, salestime) / 2
order by min(salestime);
Note: This just returns the beginning of the time period, rather than the full range.

How to fill missing values in certain time interval

I have table in below format
user timestamp count total_count
xyz 01-01-2020 00:12:00 45 45
xyz 01-01-2020 00:27:00 12 57
xyz 01-01-2020 00:29:00 11 68
xyz 01-01-2020 00:53:00 32 100
I want the data into 5 min interval like below (Expected Output)
user timestamp count total_count
xyz 01-01-2020 00:05:00 0 0
xyz 01-01-2020 00:10:00 0 0
xyz 01-01-2020 00:15:00 45 45
xyz 01-01-2020 00:20:00 0 45
xyz 01-01-2020 00:25:00 0 45
xyz 01-01-2020 00:30:00 23 68
xyz 01-01-2020 00:35:00 0 68
xyz 01-01-2020 00:40:00 0 68
xyz 01-01-2020 00:45:00 0 68
xyz 01-01-2020 00:50:00 0 68
xyz 01-01-2020 00:55:00 32 100
I tried
SELECT
TIMESTAMP_SECONDS(5*60 * DIV(UNIX_SECONDS(timestamp), 5*60)) timekey,
SUM(count) AS count,
MAX(total_count) as total_count
FROM db.table
WHERE
timestamp BETWEEN {{ start_date }}
AND {{ end_date }}
AND user = {{ user_id }}
GROUP BY
timekey
ORDER BY
timekey
Result of above query:
user timestamp count total_count
xyz 01-01-2020 00:15:00 45 45
xyz 01-01-2020 00:30:00 23 68
xyz 01-01-2020 00:55:00 32 100
How can I fill those missing timestamps in above query and fill values of count(with zeros) and total_count(previous non null value)?
Use generate_timestamp_array() to fill in the missing values:
SELECT ts,
SUM(t.count) AS count,
MAX(t.total_count) as total_count
FROM UNNEST(GENERATE_TIMESTAMP_ARRAY( {{start_date}}, {{end_date}}, INTERVAL 5 minute)) ts LEFT JOIN
db.table t
ON t.timestamp >= ts AND
t.timestamp < TIMESTAMP_ADD(ts, INTERVAL 5 minute) AND
t.user = {{ user_id }}
GROUP BY ts
ORDER BY ts;
If you need to partition by the table, you can slightly modify the query:
SELECT ts,
SUM(t.count) AS count,
MAX(t.total_count) as total_count
FROM UNNEST(GENERATE_TIMESTAMP_ARRAY( {{start_date}}, {{end_date}}, INTERVAL 5 minute)) ts LEFT JOIN
(SELECT t.*
FROM db.table t
WHERE timestamp BETWEEN {{ start_date }} AND {{ end_date }}
) t
ON t.timestamp >= ts AND
t.timestamp < TIMESTAMP_ADD(ts, INTERVAL 5 minute) AND
t.user = {{ user_id }}
GROUP BY ts
ORDER BY ts;

Count median days per ID between one zero and the first transaction after the last zero in a running balance

I have a running balance sheet showing customer balances after inflows and (outflows) by date. It looks something like this:
ID DATE AMOUNT RUNNING AMOUNT
-- ---------------- ------- --------------
10 27/06/2019 14:30 100 100
10 29/06/2019 15:26 -100 0
10 03/07/2019 01:56 83 83
10 04/07/2019 17:53 15 98
10 05/07/2019 15:09 -98 0
10 05/07/2019 15:53 98.98 98.98
10 05/07/2019 19:54 -98.98 0
10 07/07/2019 01:36 90.97 90.97
10 07/07/2019 13:02 -90.97 0
10 07/07/2019 16:32 39.88 39.88
10 08/07/2019 13:41 50 89.88
20 08/01/2019 09:03 890.97 890.97
20 09/01/2019 14:47 -91.09 799.88
20 09/01/2019 14:53 100 899.88
20 09/01/2019 14:59 -399 500.88
20 09/01/2019 18:24 311 811.88
20 09/01/2019 23:25 50 861.88
20 10/01/2019 16:18 -861.88 0
20 12/01/2019 16:46 894.49 894.49
20 25/01/2019 05:40 -871.05 23.44
I have attempted using lag() but I seem not to understand how to use it yet.
SELECT ID, MEDIAN(DIFF) MEDIAN_AGE
FROM
(
SELECT *, DATEDIFF(day, Lag(DATE, 1) OVER(ORDER BY ID), DATE
)AS DIFF
FROM TABLE 1
WHERE RUNNING AMOUNT = 0
)
GROUP BY ID;
The expected result would be:
ID MEDIAN_AGE
-- ----------
10 1
20 2
Please help in writing out the query that gives the expected result.
As already pointed out, you are using syntax that isn't valid for Oracle, including functions that don't exist and column names that aren't allowed.
You seem to want to calculate the number of days between a zero running-amount and the following non-zero running-amount; lead() is probably easier than lag() here, and you can use a case expression to only calculate it when needed:
select id, date_, amount, running_amount,
case when running_amount = 0 then
lead(date_) over (partition by id order by date_) - date_
end as diff
from your_table;
ID DATE_ AMOUNT RUNNING_AMOUNT DIFF
---------- -------------------- ---------- -------------- ----------
10 2019-06-27 14:30:00 100 100
10 2019-06-29 15:26:00 -100 0 3.4375
10 2019-07-03 01:56:00 83 83
10 2019-07-04 17:53:00 15 98
10 2019-07-05 15:09:00 -98 0 .0305555556
10 2019-07-05 15:53:00 98.98 98.98
10 2019-07-05 19:54:00 -98.98 0 1.2375
10 2019-07-07 01:36:00 90.97 90.97
10 2019-07-07 13:02:00 -90.97 0 .145833333
10 2019-07-07 16:32:00 39.88 39.88
10 2019-07-08 13:41:00 50 89.88
20 2019-01-08 09:03:00 890.97 890.97
20 2019-01-09 14:47:00 -91.09 799.88
20 2019-01-09 14:53:00 100 899.88
20 2019-01-09 14:59:00 -399 500.88
20 2019-01-09 18:24:00 311 811.88
20 2019-01-09 23:25:00 50 861.88
20 2019-01-10 16:18:00 -861.88 0 2.01944444
20 2019-01-12 16:46:00 894.49 894.49
20 2019-01-25 05:40:00 -871.05 23.44
Then use the median() function, rounding if desired to get your expected result:
select id, median(diff) as median_age, round(median(diff)) as median_age_rounded
from (
select id, date_, amount, running_amount,
case when running_amount = 0 then
lead(date_) over (partition by id order by date_) - date_
end as diff
from your_table
)
group by id;
ID MEDIAN_AGE MEDIAN_AGE_ROUNDED
---------- ---------- ------------------
10 .691666667 1
20 2.01944444 2
db<>fiddle