Take data within a month, if not, then in the previous one - sql

In my application, I use this select to get data in a month:
select e.id, e.date, e.local_currency_id, e.rate, e.downloaddate
from exchange_rates e
join (select local_currency_id, max(date) as max_date
from exchange_rates group by local_currency_id, date_trunc('month', date)) as m_date on e.date = m_date.max_date
where e.date >=:currentDate and e.local_currency_id = :localCurrencyId
order by date_trunc('month', e.downloaddate) desc limit 1
The question is that I want to get the data in the month (local_currency), sorting from the end of the month, in reverse order, and if there is no data in this month, then I need to take the data in the previous one.
Now, according to my select, I get data that is greater than my date, but I need to get the data as I described above

not totally following your logic by how about use row_number() to do a ranking.
partition by the currency id and order by the date descending. then just pick the ranking of 1. (the most current)
with t as (select e.*,
row_number() over (partition by e.local_currency_id order by date_trunc('month', e.downloaddate) desc) as rn
from exchange_rates e
where e.local_currency_id = :localCurrencyId)
select id, date, local_currency_id, rate, downloaddate
from t
where rn = 1

Related

How can i group rows on sql base on condition

I am using redshift sql and would like to group users who has overlapping voucher period into a single row instead (showing the minimum start date and max end date)
For E.g if i have these records,
I would like to achieve this result using redshift
Explanation is tat since row 1 and row 2 has overlapping dates, I would like to just combine them together and get the min(Start_date) and max(End_Date)
I do not really know where to start. Tried using row_number to partition them but does not seem to work well. This is what I tried.
select
id,
start_date,
end_date,
lag(end_date, 1) over (partition by id order by start_date) as prev_end_date,
row_number() over (partition by id, (case when prev_end_date >= start_date then 1 else 0) order by start_date) as rn
from users
Are there any suggestions out there? Thank you kind sirs.
This is a type of gaps-and-islands problem. Because the dates are arbitrary, let me suggest the following approach:
Use a cumulative max to get the maximum end_date before the current date.
Use logic to determine when there is no overall (i.e. a new period starts).
A cumulative sum of the starts provides an identifier for the group.
Then aggregate.
As SQL:
select id, min(start_date), max(end_date)
from (select u.*,
sum(case when prev_end_date >= start_date then 0 else 1
end) over (partition by id
order by start_date, voucher_code
rows between unbounded preceding and current row
) as grp
from (select u.*,
max(end_date) over (partition by id
order by start_date, voucher_code
rows between unbounded preceding and 1 preceding
) as prev_end_date
from users u
) u
) u
group by id, grp;
Another approach would be using recursive CTE:
Divide all rows into numbered partitions grouped by id and ordered by start_date and end_date
Iterate over them calculating group_start_date for each row (rows which have to be merged in final result would have the same group_start_date)
Finally you need to group the CTE by id and group_start_date taking max end_date from each group.
Here is corresponding sqlfiddle: http://sqlfiddle.com/#!18/7059b/2
And the SQL, just in case:
WITH cteSequencing AS (
-- Get Values Order
SELECT *, start_date AS group_start_date,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY start_date, end_date) AS iSequence
FROM users),
Recursion AS (
-- Anchor - the first value in groups
SELECT *
FROM cteSequencing
WHERE iSequence = 1
UNION ALL
-- Remaining items
SELECT b.id, b.start_date, b.end_date,
CASE WHEN a.end_date > b.start_date THEN a.group_start_date
ELSE b.start_date
END
AS groupStartDate,
b.iSequence
FROM Recursion AS a
INNER JOIN cteSequencing AS b ON a.iSequence + 1 = b.iSequence AND a.id = b.id)
SELECT id, group_start_date as start_date, MAX(end_date) as end_date FROM Recursion group by id, group_start_date ORDER BY id, group_start_date

How to split column based on the min and max value of another column in postgresql

I am very new to postgres trying to create a query but stuck halfway.
so here is the structure of my table:
so I need to Return a list of rows from the events table that has the following columns:
The customer id
The time difference (in seconds) between their
first and last events
The “types” of the first and last events
The location that the events originated from
I was able to create query but it does not solve point 3. and I am stuck.
select customer_id, location, EXTRACT(EPOCH FROM (max(tstamp) - min(tstamp))) AS difference
from events
GROUP BY customer_id ,location;
here is my partial solution output:
partial output
ANY help would be much appreciated.
location seems to be tied with the customer. For the rest, I would suggest conditional aggregation with row_number():
select customerid, location, min(tstamp), max(tstamp),
extract(epoch from max(tstamp) - min(tstamp)),
min(type) filter (where seqnum_asc = 1) as first_event,
min(type) filter (where seqnum_desc = 1) as last_event
from (select e.*,
row_number() over (partition by customerid order by tstamp) as seqnum_asc,
row_number() over (partition by customerid order by tstamp desc) as seqnum_desc
from events e
) e
group by customerid, location;

Rank the closest date, but not if date has already been used in previous row

Complicated to summarize the issue in the title, and in text so check the IMG for visual explanation.
I've got an issue joining two tables. The date from the first table (startdatetable) should get the next/closest date in the other table (enddatetable). This is to 99% easy done with rank, because most rows have a startdate that can find the next enddate, and before another enddate is available there is a new startdate.
However, if there are two startdates and only one enddate available the startdates will join the same enddate.
What I'm trying to do is, that if a date has been used in the row before, it should not be used in the next row.
The rows I want are highlighted.
The SQL I started out with looked like
select *
from (
select rank() over (partition by tid order by enddate asc) as rnk
, id, startdate, enddate
from startdateTable
inner join enddateTable on startdateTable.ID = enddateTable.id
and enddateTable.enddate > startdateTable.startdate
) q
where q.rnk = 1
This gets me the following result. The last row should instead get the 2100 date, since the 2020-06 date has been used in the previous row.
If you have two tables and you want to align them, then you can use row_number():
select s.id, s.startdate, e.enddate
from (select s.*, row_number() over (partition by id order by startdate) as seqnum
from startdateTable
) s join
(select e.*, row_number() over (partition by id order by enddate) as seqnum
from enddateTable
) e
on e.id = s.id and e.seqnum = s.seqnum

SQL order with equal group size

I have a table with columns month, name and transaction_id. I would like to count the number of transactions per month and name. However, for each month I want to have the top N names with the highest transaction counts.
The following query groups by month and name. However the LIMIT is applied to the complete result and not per month:
SELECT
month,
name,
COUNT(*) AS transaction_count
FROM my_table
GROUP BY month, name
ORDER BY month, transaction_count DESC
LIMIT N
Does anyone have an idea how I can get the top N results per month?
Use row_number():
SELECT month, name, transaction_count
FROM (SELECT month, name, COUNT(*) AS transaction_count,
ROW_NUMBER() OVER (PARTITION BY month ORDER BY COUNT(*) DESC) as seqnum
FROM my_table
GROUP BY month, name
) mn
WHERE seqnum <= N
ORDER BY month, transaction_count DESC

To recieve one record per id and the one with the latest date

I have the following table stamps with the columns:
Worker
Date
Transferred
Balance
I want out one row per worker,
the record with the latest day, and also have value 1 in Transferred
I have tried a lot of possibilities but none works the way I want to.
SELECT DISTINCT OUT.WORKER,OUT.DATE,OUT.TRANSFERRED,OUT.BALANCE
FROM (
SELECT WORKER,DATE,TRANSFERRED,BALANCE
FROM STAMPS
ORDER BY DATE DESC
) AS OUT
GROUP BY WORKER
You say you want the latest day (presumably the latest for a given worker), so you need the max function.
select s.Worker,
s.Date,
s.Transferred,
s.Balance
from
(select worker,
max(date) as date
from stamps
where transferred = 1
group by Worker) as max_dates,
join stamps s
on s.worked = max_dates.worker
and s.date = max_dates.date
The typical way is to use window functions:
SELECT S.WORKER, S.DATE, S.TRANSFERRED, S.BALANCE
FROM (SELECT S.*,
ROW_NUMBER() OVER (PARTITION BY WORKER ORDER BY DATE DESC) AS SEQNUM
FROM STAMPS S
) S
WHERE SEQNUM = 1;
With the right indexes, a correlated subquery often has the best performance:
select s.*
from stamps s
where s.date = (select max(s2.date)
from stamps s2
where s2.worker = s.worker
);
The appropriate index is on stamps(worker, date).
SELECT * FROM dbo.STAMPS[enter image description here][1]
SELECT subResult.WORKER,
subResult.Date,
subResult.Transferred,
subResult.Balance
FROM
(
SELECT WORKER,
DATE,
TRANSFERRED,
BALANCE,
ROW_NUMBER() OVER(PARTITION BY Worker ORDER BY date DESC) AS rowNum
FROM STAMPS
WHERE Transferred=1
) AS subResult
WHERE subResult.rowNum=1