Find increase in history records in specific range - sql

I want to find records in date range 1/1/19-1/7/19 which increase amount
using table HISTORY:
DATE AMOUNT ID
(Date, number, varchar2(30))
I find IDs inside range correctly
assuming increase/decrease can happens only when having two records with same Id
with suspect as
(select id
from history
where t.createddate < to_date('2019-07-01', 'yyyy-mm-dd')
group by id
having count(1) > 1),
ids as
(select id
from history
join suspect
on history.id = suspect.id
where history.date > to_date('2019-01-01', 'yyyy-mm-dd')
and history.date < to_date('2019-07-01', 'yyyy-mm-dd'))
select count(distinct id)
from history a, history b
where a.id = b.id
and a.date < b.date
and a.amount < b.amount
The problem to find increase I need to find previous record which can be before time range
I can find last previous time before time range, but I failed to use it:
ids_prevtime as (
select history.*, max(t.date) over (partition by t.id) max_date
from history
join ids on history.userid = ids.id
where history.date < to_date('2019-01-01','yyyy-mm-dd' )
), ids_prev as (
select * from ids_prevtime where createdate=max_date
)

I see that you found solution, but maybe you could do it simpler, using lag():
select count(distinct id)
from (select id, date_, amount,
lag(amount) over (partition by id order by date_) prev_amt
from history)
where date_ between date '2019-01-01' and date '2019-07-01'
and amount > prev_amt;
dbfiddle

Add union of last history records before range with records inside range
ids_prev as
(select ID, DATE, AMOUNT
from id_before_rangetime
where createddate = max_date),
ids_in_range as
(select history.*
from history
join ids
on history.ID = ids.ID
where history.date > to_date('2019-01-01', 'yyyy-mm-dd')
and history.date < to_date('2019-07-01', 'yyyy-mm-dd')),
all_relevant as
(select * from ids_in_range union all select * from ids_prev)
and then count increases:
select count(distinct id)
from all_relevant a, all_relevant b
where a.id = b.id
and a.date < b.date
and a.amount < b.amount

Related

oracle sql get transactions between the period

I have 3 tables in oracle sql namely investor, share and transaction.
I am trying to get new investors invested in any shares for a certain period. As they are the new investor, there should not be a transaction in the transaction table for that investor against that share prior to the search period.
For the transaction table with the following records:
Id TranDt InvCode ShareCode
1 2020-01-01 00:00:00.000 inv1 S1
2 2019-04-01 00:00:00.000 inv1 S1
3 2020-04-01 00:00:00.000 inv1 S1
4 2021-03-06 11:50:20.560 inv2 S2
5 2020-04-01 00:00:00.000 inv3 S1
For the search period between 2020-01-01 and 2020-05-01, I should get the output as
5 2020-04-01 00:00:00.000 inv3 S1
Though there are transactions for inv1 in the table for that period, there is also a transaction prior to the search period, so that shouldn't be included as it's not considered as new investor within the search period.
Below query is working but it's really taking ages to return the results calling from c# code leading to timeout issues. Is there anything we can do to refine to get the results quicker?
WITH
INVESTORS AS
(
SELECT I.INVCODE FROM INVESTOR I WHERE I.CLOSED IS NULL)
),
SHARES AS
(
SELECT S.SHARECODE FROM SHARE S WHERE S.DORMANT IS NULL))
),
SHARES_IN_PERIOD AS
(
SELECT DISTINCT
T.INVCODE,
T.SHARECODE,
T.TYPE
FROM TRANSACTION T
JOIN INVESTORS I ON T.INVCODE = I.INVCODE
JOIN SHARES S ON T.SHARECODE = S.SHARECODE
WHERE T.TRANDT >= :startDate AND T.TRANDT <= :endDate
),
PREVIOUS_SHARES AS
(
SELECT DISTINCT
T.INVCODE,
T.SHARECODE,
T.TYPE
FROM TRANSACTION T
JOIN INVESTORS I ON T.INVCODE = I.INVCODE
JOIN SHARES S ON T.TRSTCODE = S.TRSTCODE
WHERE T.TRANDT < :startDate
)
SELECT
DISTINCT
SP.INVCODE AS InvestorCode,
SP.SHARECODE AS ShareCode,
SP.TYPE AS ShareType
FROM SHARES_IN_PERIOD SP
WHERE (SP.INVCODE, SP.SHARECODE, SP.TYPE) NOT IN
(
SELECT
PS.INVCODE,
PS.SHARECODE,
PS.TYPE
FROM PREVIOUS_SHARES PS
)
With the suggestion given by #Gordon Linoff, I tried following options (for all the shares I need) but they are taking long time too. Transaction table is over 32 million rows.
1.
WITH
SHARES AS
(
SELECT S.SHARECODE FROM SHARE S WHERE S.DORMANT IS NULL))
)
select t.invcode, t.sharecode, t.type
from (select t.*,
row_number() over (partition by invcode, sharecode, type order by trandt)
as seqnum
from transactions t
) t
join shares s on s.sharecode = t.sharecode
where seqnum = 1 and
t.trandt >= date '2020-01-01' and
t.trandt < date '2020-05-01';
WITH
INVESTORS AS
(
SELECT I.INVCODE FROM INVESTOR I WHERE I.CLOSED IS NULL)
),
SHARES AS
(
SELECT S.SHARECODE FROM SHARE S WHERE S.DORMANT IS NULL))
)
select t.invcode, t.sharecode, t.type
from (select t.*,
row_number() over (partition by invcode, sharecode, type order by trandt)
as seqnum
from transactions t
) t
join investors i on i.invcode = t.invcode
join shares s on s.sharecode = t.sharecode
where seqnum = 1 and
t.trandt >= date '2020-01-01' and
t.trandt < date '2020-05-01';
select t.invcode, t.sharecode, t.type
from (select t.*,
row_number() over (partition by invcode, sharecode, type order by trandt)
as seqnum
from transactions t
) t
where seqnum = 1 and
t.sharecode IN (SELECT S.SHARECODE FROM SHARE S WHERE S.DORMANT IS NULL)))
and
t.trandt >= date '2020-01-01' and
t.trandt < date '2020-05-01';
If you want to know if the first record in transactions for a share is during a period, you can use window functions:
select t.*
from (select t.*,
row_number() over (partition by invcode, sharecode order by trandt) as seqnum
from transactions t
) t
where seqnum = 1 and
t.sharecode = :sharecode and
t.trandt >= date '2020-01-01' and
t.trandt < date '2020-05-01';
For performance for this code, you want an index on transactions(invcode, sharecode, trandate).

How to sum from multiple columns and segregate into separate column if result is positive and negative

I am using postgresql and need to write a query to sum values from separate columns of two different tables and then segregate into separate columns if positive or negative.
For Example,
Below is the source table
Below is the resultant table which need to be created also used while populating it
I have written below query to aggregate sum and able to populate TOT_CREDIT and TOT_DEBIT column. Is there any optimized query to achieve that ?
select t.account_id,
t.transaction_date,
SUM(t.transaction_amt) filter (where t.transaction_amt >= 0) as tot_debit,
SUM(t.transaction_amt) filter (where t.transaction_amt < 0) as tot_credit,
case
when
(
SUM(t.transaction_amt) +
SUM(COALESCE(b.credit_balance,0)) +
SUM(COALESCE(b.debit_balance,0))
) < 0
then
(
SUM(t.transaction_amt) +
SUM(COALESCE(b.credit_balance,0)) +
SUM(COALESCE(b.debit_balance,0))
)
end as credit_balance,
case
when
(
SUM(t.transaction_amt) +
SUM(COALESCE(b.credit_balance,0)) +
SUM(COALESCE(b.debit_balance,0))
) > 0
then
(
SUM(t.transaction_amt) +
SUM(COALESCE(b.credit_balance,0)) +
SUM(COALESCE(b.debit_balance,0))
)
end as debit_balance,
from
transaction t
LEFT OUTER JOIN balance b ON (t.account_id = b.account_id
and t.transaction_date = b.transaction_date
and b.transaction_date=t.transaction_date- INTERVAL '1 DAYS')
group by
t.account_id,
t.transaction_date
Please provide some pointer.
EDIT 1: This query is not working in expected manner.
One way is to break your logic into smal queries and join them in the end!
select tw.account_id, tw.t_date,tw.t_c,th.T_D,fo.C_B,fi.d_B from
(select account_id, Transaction_date as t_date, sum(Transaction_AMT) as t_C from TransactionTABLE
where Transaction_AMT<0 group by account_id, Transaction_date ) as tw inner join
(select account_id, Transaction_date as t_date, sum(Transaction_AMT) as t_d from TransactionTABLE
where Transaction_AMT>0 group by account_id, Transaction_date ) as th on tw.account_id=th.account_id and tw.t_date=th.t_date inner join
(select account_id, Transaction_date as t_date, sum(Transaction_AMT) as C_B from TransactionTABLE
where sum(Transaction_AMT)<0 group by account_id, Transaction_date ) as fo on th.account_id=fo.account_id and th.t_date=fo.t_date inner join
(select account_id, Transaction_date as t_date, sum(Transaction_AMT) as d_B from TransactionTABLE
where sum(Transaction_AMT)>0 group by account_id, Transaction_date ) as fi on fi.account_id=fo.account_id and fi.t_date=fo.t_date;
Or else
You could try something as follows which calculates the running count of d_B over the Transaction_date and account_id
select account_id,
transaction_date,
SUM(transaction_amt) filter (where transaction_amt >= 0) as tot_debit,
SUM(transaction_amt) filter (where transaction_amt < 0) as tot_credit,
sum(transaction_amt) over (partition by account_id where sum(transaction_amt)<0) as credit_balance,
sum(transaction_amt) over (partition by account_id where sum(transaction_amt)>=0) as debit_balance
from TransactionTABLE group by account_id, Transaction_date order by 1,2;

How to calculate running sums with append-only rows

I have a table where rows are never mutated but only inserted; they are immutable records. It has the following fields:
id: int
user_id: int
created: datetime
is_cool: boolean
likes_fruits: boolean
An object is tied to a user, and the "current" object for a given user is the one that has the latest created date. E.g. if I want to update is_cool for a user, I'd append a record with a new created timestamp and is_cool=true.
I want to calculate how many users are is_cool at the end of each day. I.e. I'd like the output table to have the columns:
day: some kind of date_trunc('day', created)
cool_users_count: number of users that have is_cool at the end of this day.
What SQL query can i write that does this? FWIW I'm using Presto (or Redshift if need to).
Note that there are other columns, e.g. likes_fruits, which means a record where is_cool is false does not mean is_cool was just changed to false - it could have been false for a while.
This is what procedural pseudo-code would look like to represent what I'd want to do in SQL:
// rows = ...
min_date = min([row.created for row in rows])
max_date = max([row.created for row in rows])
counts_by_day = {}
for date in range(min_date, max_date):
rows_up_until_date = [row for row in rows if row.created <= date]
latest_row_by_user = rows_up_until_date.reduce(
{},
(acc, row) => acc[row.user_id] = row,
)
counts_by_day[date] = latest_row_by_user.filter(row => row.is_cool).length
You can do this using jus a query .. try using a sum on boolend and group by
select date(created), sum(is_cool)
from my_table
group by date(created)
or if you need the number of users
select t.date_created, count(*) num_user
from (
select distinct date(created) date_created, user_id
from my_table
where is_cool = TRUE
) t
group by t.date_created
or if need the last value for is_cool
select date(max_date), sum(is_cool)
from (
select t.user_id, t.max_date, m.is_cool, m.user_id
from my_table m
inner join (
select max(date_created) max_date, user_id
from my_table
group by user_id, date(date_created)
) t on t.max_date = m.date_created
and t.user_id = m.user_id
where m.is_cool = TRUE
) t2
group by date(max_date)
A correlated subquery might be the simplest solution. The following gets the value of is_cool for each user on each date:
select u.user_id, d.date,
(select t.is_cool
from t
where t.user_id = u.user_id and
t.created < dateadd(day, 1, d.date)
order by t.created desc
limit 1
) as is_cool
from (select distinct date(created) as date
from t
) d cross join
(select distinct user_id
from t
) u ;
Then aggregate:
select date, sum(is_cool)
from (select u.user_id, d.date,
(select t.is_cool
from t
where t.user_id = u.user_id and
t.created < dateadd(day, 1, d.date)
order by t.created desc
limit 1
) as is_cool
from (select distinct date(created) as date
from t
) d cross join
(select distinct user_id
from t
) u
) ud
group by date;

sql to select first n unique lines on sorted result

I have query resulting me 1 column of strings, result example:
NAME:
-----
SOF
OTP
OTP
OTP
SOF
VIL
OTP
SOF
GGG
I want to be able to get SOF, OTP, VIL - the first 3 unique top,
I tried using DISTINCT and GROUP BY, but it is not working, the sorting is damaged..
The query building this result is :
SELECT DISTINCT d.adst
FROM (SELECT a.date adate,
b.date bdate,
a.price + b.price total,
( b.date - a.date ) days,
a.dst adst
FROM flights a
JOIN flights b
ON a.dst = b.dst
ORDER BY total) d
I have "flights" table with details, and I need to get the 3 (=n) cheapest destinations.
Thanks
This can easily be done using window functions:
select *
from (
SELECT a.date as adate,
b.date as bdate,
a.price + b.price as total,
dense_rank() over (order by a.price + b.price) as rnk,
b.date - a.date as days,
a.dst as adst
FROM flights a
JOIN flights b ON a.dst = b.dst
) t
where rnk <= 3
order by rnk;
More details on window functions can be found in the manual:
http://www.postgresql.org/docs/current/static/tutorial-window.html
Find a way to do it.
I am selecting the DST and the PRICE, grouping by DST with MIN function on Price and limiting 3.
do I have better way to do it?
SELECT d.adst , min(d.total) mttl
FROM (SELECT a.date adate,
b.date bdate,
a.price + b.price total,
( b.date - a.date ) days,
a.dst adst
FROM flights a
JOIN flights b
ON a.dst = b.dst
ORDER BY total) d
group by adst order by mttl;
select
name
from
testname
where
name in (
select distinct(name) from testname)
group by name order by min(ctid) limit 3
SQLFIDDLE DEMO
You can tweak your query to return the correct result, by adding where days > 0 and limit 3 in the outer query like this:
select *
from
(
select
a.date adate,
b.date bdate,
(a.price + b.price) total,
(b.date - a.date) days ,
a.dst adst
from flights a
join flights b on a.dst = b.dst
order by total
) d
where days > 0
limit 3;
SQL Fiddle Demo
This assuming that the second entry is the return flight with date greater than the first entry. So that you got positive days difference.
Note that, your query without days > 0 will give you a cross join between the table and it self, for each flight you will get 4 rows, two with it self with days = 0 and other row with negative days so I used days > 0 to get the correct row.
I recommend that you add a new column, an Id Flight_Id as a primary key, and another foreign key something like From_Flight_Id. So the primary flight would have a null From_Flight_Id, and the returning flight will have a From_Flight_Id equal to the flight_id of the primary filght, this way you can join them properly instead.
SELECT DISTINCT(`EnteredOn`) FROM `rm_pr_patients` Group By `EnteredOn`
SELECT DISTINCT ON (column_name) FROM table_name order by name LIMIT 3;

What's the proper SQL query to find a 'status change' before given date?

I have a table of logged 'status changes'. I need to find the latest status change for a user, and if it was a) a certain 'type' of status change (s.new_status_id), and b) greater than 7 days old (s.change_date), then include it in the results. My current query sometimes returns the second-to-latest status change for a given user, which I don't want -- I only want to evaluate the last one.
How can I modify this query so that it will only include a record if it is the most recent status change for that user?
Query
SELECT DISTINCT ON (s.applicant_id) s.applicant_id, a.full_name, a.email_address, u.first_name, s.new_status_id, s.change_date, a.applied_class
FROM automated_responses_statuschangelogs s
INNER JOIN application_app a on (a.id = s.applicant_id)
INNER JOIN accounts_siuser u on (s.person_who_modified_id = u.id)
WHERE now() - s.change_date > interval '7' day
AND s.new_status_id IN
(SELECT current_status
FROM application_status
WHERE status_phase_id = 'In The Flow'
)
ORDER BY s.applicant_id, s.change_date DESC, s.new_status_id, s.person_who_modified_id;
You can use row_number() to filter one entry per applicant:
select *
from (
select row_number() over (partition by applicant_id
order by change_date desc) rn
, *
from automated_responses_statuschangelogs
) as lc
join application_app a
on a.id = lc.applicant_id
join accounts_siuser u
on lc.person_who_modified_id = u.id
join application_status stat
on lc.new_status_id = stat.current_status
where lc.rn = 1
and stat.status_phase_id = 'In The Flow'
and lc.change_date < now() - interval '7' day