Return last amount for each element with same ref_id - sql

I have 2 tables, one is credit and other one is creditdetails.
Creditdetails creates new row every day for each of credit.
ID Amount ref_id date
1 2 1 16.03
2 3 1 17.03
3 4 1 18.03
4 1 2 16.03
5 2 2 17.03
6 0 2 18.03
I want to sum up amount of every row with the unique id and last date. So the output should be 4 + 0.

You can use ROW_NUMBER to filter on the latest amount per ref_id.
Then SUM it.
SELECT SUM(q.Amount) AS TotalLatestAmount
FROM
(
SELECT
cd.ref_id,
cd.Amount,
ROW_NUMBER() OVER (PARTITION BY cd.ref_id ORDER BY cd.date DESC) AS rn
FROM Creditdetails cd
) q
WHERE q.rn = 1;
A test on db<>fiddle here

With this query:
select ref_id, max(date) maxdate
from creditdetails
group by ref_id
you get all the last dates for each ref_id, so you can join it to the table creditdetails and sum over amount:
select sum(amount) total
from creditdetails c inner join (
select ref_id, max(date) maxdate
from creditdetails
group by ref_id
) g
on g.ref_id = c.ref_id and g.maxdate = c.date

I think you want something like this,
select sum(amount)
from table
where date = ( select max(date) from table);
with the understanding that your date column doesn't appear to be in a standard format so I can't tell if it needs to be formatted in the query to work properly.

Related

numbers of users buying the exact same product from the same shop for > 2 times in 1 years

I have data like this:
date user prod shop cat1 cat2
2022-02-01 1 a a ah g
2022-02-02 1 a1 b ah g
2022-04-03 1 a a ah g
2022-04-19 1 a a ah g
2022-05-01 2 b c bg g
I want to know how many user buy the same product in the same shop for >2 times in period 1 year. The result i want like:
table 1
cat1 number_of_user
ah 1
table 2
cat2 number_of_user
g 1
For total user, my query like:
WITH data_product AS(
SELECT DATE(payment_time) date,
user,
CONCAT(prod, "_", shop) product_shop,
cat1,
cat2
FROM
a
WHERE
DATE(payment_time) BETWEEN "2022-01-01" AND DATE_SUB(current_date, INTERVAL 1 day)
ORDER BY 1,2,3),
purchased AS (
SELECT user, product_shop, count(product_shop) tot_purchased
FROM data_product
GROUP BY 1,2
HAVING COUNT(product_shop) > 2
)
SELECT COUNT(user) number_of_user FROM purchased
Please help to get number of user buy the same product in the same shop more than 2 times in period based on cat1 and cat2.
Try this:
create temporary table table1 as(
select *,extract(YEAR from date) as year from `projectid.dataset.table`
);
create temporary table table2 as(
select * except(date,cat2) ,count(user) over(partition by cat1,year,user,prod,shop) tcount from table1
);
create temporary table table4 as(
select * except(date,cat1) ,count(user) over(partition by cat2,year,user,prod,shop) tcount from table1
);
select distinct year,cat1 ,count(distinct user) number_of_user from table2 where tcount>2 group by YEAR,cat1;
select distinct year,cat2 ,count(distinct user) number_of_user from table4 where tcount>2 group by YEAR,cat2;
If you want a single result set you can union both the select statements.
I think this query might work. The first part shows count of customers who purchased same product in category1 from same shop during one year. Second part shows that for category2, then we concatenate the two set by union operation :
with cte as
(select distinct
PDate,userID as userID,prod as prod,shop,cat1 as cat1,cat2,
count(userID) over (partition by UserID,prod,shop,year(Pdate),cat1) as cat1_count,
count(PDate) over (partition by UserID,prod,shop,year(Pdate),cat2) as cat2_count
from tbl1)
select
cte.cat1 as c1,'0' as c2,count(distinct cte.cat1) as Num
from cte
where cte.cat1_count>1
group by cte.prod,cte.userID,cte.cat1
union
select
'0',cte.cat2,count(distinct cte.cat2)
from cte
where cte.cat2_count>1
group by cte.prod,cte.userID,cte.cat2

Find gaps of a sequence in PostgreSQL tables

I have a table invoices with a field invoice_number. This is what happens when i execute select invoice_number from invoice
invoice_number
1
2
3
5
6
10
11
I want a SQL that gives me the following result:
gap_start
gap_end
1
3
5
6
10
11
demo:db<>fiddle
You can use row_number() window function to create a row count and use the difference to your actual values as group criterion:
SELECT
MIN(invoice) AS start,
MAX(invoice) AS end
FROM (
SELECT
*,
invoice - row_number() OVER (ORDER BY invoice) as group_id
FROM t
) s
GROUP BY group_id
ORDER BY start

PostgreSQL Pivot by Last Date

I need to make a PIVOT table from Source like this table
FactID UserID Date Product QTY
1 11 01/01/2020 A 600
2 11 02/01/2020 A 400
3 11 03/01/2020 B 500
4 11 04/01/2020 B 200
6 22 06/01/2020 A 1000
7 22 07/01/2020 A 200
8 22 08/01/2020 B 300
9 22 09/01/2020 B 100
Need Pivot Like this where Product QTY is QTY by Last Date
UserID A B
11 400 200
22 200 100
My try PostgreSQL
Select
UserID,
MAX(CASE WHEN Product='A' THEN 'QTY' END) AS 'A',
MAX(CASE WHEN Product='B' THEN 'QTY' END) AS 'B'
FROM table
GROUP BY UserID
And Result
UserID A B
11 600 500
22 1000 300
I mean I get a result by the maximum QTY and not by the maximum date!
What do I need to add to get results by the maximum (last) date ??
Postgres doesn't have "first" and "last" aggregation functions. One method for doing this (without a subquery) uses arrays:
select userid,
(array_agg(qty order by date desc) filter (where product = 'A'))[1] as a,
(array_agg(qty order by date desc) filter (where product = 'B'))[1] as b
from tab
group by userid;
Another method uses select distinct with first_value():
select distinct userid,
first_value(qty) over (partition by userid order by product = 'A' desc, date desc) as a,
first_value(qty) over (partition by userid order by product = 'B' desc, date desc) as b
from tab;
With the appropriate indexes, though, distinct on might be the fastest approach:
select userid,
max(qty) filter (where product = 'A') as a,
max(qty) filter (where product = 'B') as b
from (select distinct on (userid, product) t.*
from tab t
order by userid, product, date desc
) t
group by userid;
In particular, this can use an index on userid, product, date desc). The improvement in performance will be most notable if there are many dates for a given user.
You can use DENSE_RANK() window function in order to filter by the last date per each product and UserID before applying conditional aggregation such as
SELECT UserID,
MAX(CASE WHEN Product='A' THEN QTY END) AS "A",
MAX(CASE WHEN Product='B' THEN QTY END) AS "B"
FROM
(
SELECT t.*, DENSE_RANK() OVER (PARTITION BY Product,UserID ORDER BY Date DESC) AS rn
FROM tab t
) q
WHERE rn = 1
GROUP BY UserID
Demo
presuming all date values are distinct(no ties occur for dates)

Finding the most repeated value for each associated attribute

I'm practicing some SQL and I thought about the following problem:
For each pub find the time when more people go.
I have the following tables:
GOESTO
id_person id_pub time
1 1 Daytime
2 2 Night time
3 3 All Day
4 1 Daytime
5 2 Night time
6 1 All Day
7 3 Daytime
8 3 Night time
9 3 Night time
10 1 Night time
PUB
id_pub pub_name cost
1 pub1 123
2 pub2 324
3 pub3 345
What I want to get is something like the following:
pub_name time
I think I should use MAX and COUNT functions, but I'm not quite sure how should I do it. It should work in an Oracle database.
Thank you!
Try this one:
WITH mydata AS (
select id_pub, "TIME", count(*) as cnt
from GOESTO
group by id_pub, "TIME"
)
SELECT m.id_pub, m."TIME", m.cnt
FROM mydata m
JOIN (
SELECT id_pub, max( cnt ) as cnt
FROM mydata
GROUP BY id_pub
) x
ON (m.id_pub = x.id_pub AND m.cnt = x.cnt);
or this one
SELECT id_pub, "TIME"
FROM (
SELECT t.*,
dense_rank() over (partition by id_pub order by cnt desc ) rnk
FROM (
select id_pub, "TIME", count(*) as cnt
from GOESTO
group by id_pub, "TIME"
) t
)
WHERE rnk = 1
To get names instead of id_pub values you need to join the above queries with PUB table
SELECT p.pub_name, q."TIME"
FROM ( one_of_the_above_query )q
JOIN PUB p
ON p.id_pub = q.id_pub

Detect and delete gaps in time series

I have daily time series for different companies in my dataset and work with PostgreSQL. My goal is to exclude companies with too incomplete time series. Therefor I want to exclude all companies which have 3 or more consecutive missing values. Furthermore I want to exclude all companies which have more than 50% missing values between their first and final date in the dataset.
We can work with the following example data:
date company value
2012-01-01 A 5
2012-01-01 B 2
2012-01-02 A NULL
2012-01-02 B 2
2012-01-02 C 4
2012-01-03 A NULL
2012-01-03 B NULL
2012-01-03 C NULL
2012-01-04 A NULL
2012-01-04 B NULL
2012-01-04 C NULL
2012-01-05 A 8
2012-01-05 B 9
2012-01-05 C 3
2012-01-06 A 8
2012-01-06 B 9
2012-01-06 C NULL
So A has to be excluded because it has a gap of three consecutive missing values, and C because it has more than 50% missing values between its first and final date.
Combining other answers in this forum I made up the following code:
Add an autoincrement primary key to identify each row
CREATE TABLE test AS SELECT * FROM mytable ORDER BY company, date;
CREATE SEQUENCE id_seq; ALTER TABLE test ADD id INT UNIQUE;
ALTER TABLE test ALTER COLUMN id SET DEFAULT NEXTVAL('id_seq');
UPDATE test SET id = NEXTVAL('id_seq');
ALTER TABLE test ADD PRIMARY KEY (id);
Detect the gaps in the time series
CREATE TABLE to_del AS WITH count3 AS
( SELECT *,
COUNT(CASE WHEN value IS NULL THEN 1 END)
OVER (PARTITION BY company ORDER BY id
ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING)
AS cnt FROM test)
SELECT company, id FROM count3 WHERE cnt >= 3;
Delete the gaps from mytable
DELETE FROM mytable WHERE company in (SELECT DISTINCT company FROM to_del);
It seems to achieve to detect and delete gaps of 3 or more consecutive missing values from the time series. But this approach is very cumbersome. And I can't figure out how to additinoally exclude all companies with more than 50% missing values.
Can you think of a more effective solution than mine (I just learn to work with PostgreSQL), that also manages to exclude companies with more than 50% missing values?
I would create only one query:
DELETE FROM mytable
WHERE company in (
SELECT Company
FROM (
SELECT Company,
COUNT(CASE WHEN value IS NULL THEN 1 END)
OVER (PARTITION BY company ORDER BY id
ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) As cnt,
COUNT(CASE WHEN value IS NULL THEN 1 END)
OVER (PARTITION BY company)
/
COUNT(*)
OVER (PARTITION BY company) As p50
) alias
WHERE cnt >= 3 OR p50 > 0.5
)
A composite index on (company + value) columns can help to gain a maximum speed of this query.
EDIT
The above query doesn't work
I've corrected it slightly, here is a demo: http://sqlfiddle.com/#!15/c9bfe/7
Two things have been changed:
- PARTITION BY company ORDER BY date instead of ORDER BY id
- explicit cast to numeric( because integer have been truncated to 0):
OVER (PARTITION BY company)::numeric
SELECT company, cnt, p50
FROM (
SELECT company,
COUNT(CASE WHEN value IS NULL THEN 1 END)
OVER (PARTITION BY company ORDER BY date
ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) As cnt,
SUM(CASE WHEN value IS NULL THEN 1 ELSE 0 END)
OVER (PARTITION BY company)::numeric
/
COUNT(*)
OVER (PARTITION BY company) As p50
FROM mytable
) alias
-- WHERE cnt >= 3 OR p50 > 0.5
and now the delete query should work:
DELETE FROM mytable
WHERE company in (
SELECT company
FROM (
SELECT company,
COUNT(CASE WHEN value IS NULL THEN 1 END)
OVER (PARTITION BY company ORDER BY date
ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) As cnt,
SUM(CASE WHEN value IS NULL THEN 1 ELSE 0 END)
OVER (PARTITION BY company)::numeric
/
COUNT(*)
OVER (PARTITION BY company) As p50
FROM mytable
) alias
WHERE cnt >= 3 OR p50 > 0.5
)
For the 50% criteria, you could select all the companies for which the number of distinct dates in lower than half the number of days between the min and max dates.
I have not tested this but it should give you an idea. I used a CTE to make it easier to read.
WITH MinMax AS
(
SELECT Company, DATE_PART('day', AGE(MIN(Date), MAX(Date))) AS calendar_days, COUNT(DISTINCT date) AS days FROM table
GROUP By Company
)
SELECT Company FROM MinMax
WHERE (calendars_days / 2) > days