I've got a simple temporal table that looks like this:
Table: item_approval
item user status modified
2 fred approved 2010-12-01 00:00:00
3 fred approved 2010-12-02 00:00:00
4 fred disapproved 2010-12-03 00:00:00
7 jack unapproved 2010-12-05 00:00:00
4 fred approved 2010-12-06 00:00:00
4 jack unapproved 2010-12-07 00:00:00
4 fred disapproved 2010-12-04 00:00:00
I'm using DBIx::Class. My "Item" result is defined with:
__PACKAGE__->has_many(
"item_approvals",
"Schema::Result::ItemApproval",
{ "foreign.item" => "self.id" },
{ cascade_copy => 0, cascade_delete => 0 },
);
Which means I can do:
my $item = $schema->resultset('Item')->find({id=>4});
Which is fine. Then, I can do:
my #approvals = $item->item_approvals;
to get a resultset like this:
item user status modified
4 fred disapproved 2010-12-03 00:00:00
4 fred approved 2010-12-06 00:00:00
4 jack unapproved 2010-12-07 00:00:00
4 fred disapproved 2010-12-04 00:00:00
My question: How do I get the set of Fred and Jack's single most recent approval status? That is, I want to get this resultset:
item user status modified
4 fred approved 2010-12-06 00:00:00
4 jack unapproved 2010-12-07 00:00:00
I tried things like this:
my #approvals = $item->search({}, {
group_by => 'user',
order_by => {-desc => 'modified'}
});
but the "ORDER BY" is executed after the "GROUP BY", so I get things like this instead:
item user status modified
4 fred disapproved 2010-12-03 00:00:00
4 jack unapproved 2010-12-07 00:00:00
Help?
From the behavior described in your comments I'm guessing your database is MySQL.
I'm also assuming your item_approval table has a primary key which I will call PK.
One option is to use a sub select to pick the row that has the largest (most recent) modified value:
select item, user, status, modified
from item_approval me
where PK = (select s.PK from item_approval s where me.item = s.item and me.user = s.user order by s.modified desc, s.PK desc limit 1)
and me.item = 4
This is a fairly slow option because it will re-run the sub select for each row and then reject all but one row for each item/user combination.
Other databases have slightly different ways to get similar results.
Related
I've got a table where we have registries of employees and where they have worked. In each row, we have the employee's starting date on that place. It's something like this:
Employee ID
Name
Branch
Start Date
1
John Doe
234
2018-01-20
1
John Doe
300
2019-03-20
1
John Doe
250
2022-01-19
2
Jane Doe
200
2019-02-15
2
Jane Doe
234
2020-05-20
I need a query where the data returned looks for the next value, making the starting date on the next branch as the end of the current. Eg:
Employee ID
Name
Branch
Start Date
End Date
1
John Doe
234
2018-01-20
2019-03-20
1
John Doe
300
2019-03-20
2022-01-19
1
John Doe
250
2022-01-19
---
2
Jane Doe
200
2019-02-15
2020-05-20
2
Jane Doe
234
2020-05-20
---
When there is not another register, we assume that the employee is still working on that branch, so we can leave it blank or put a default "9999-01-01" value.
Is there any way we can achieve a result like this using only SQL?
Another approach to my problem would be a query that returns only the row that is in a range. For example, if I look for what branch John Doe worked in 2020-12-01, the query should return the row that shows the branch 300.
You can use LEAD() to peek at the next row, according to a subgroup and ordering within it.
For example:
select
t.*,
lead(start_date) over(partition by employee_id order by start_date) as end_date
from t
I have the following dataset:
A
B
C
1
John
2018-08-14
1
John
2018-08-20
1
John
2018-09-03
2
John
2018-11-13
2
John
2018-12-11
2
John
2018-12-12
1
John
2020-01-20
1
John
2020-01-21
3
John
2021-03-02
3
John
2021-03-03
1
John
2020-05-10
1
John
2020-05-12
And I would like to have the following result:
A
B
C
1
John
2018-08-14
2
John
2018-11-13
1
John
2020-01-20
3
John
2021-03-02
1
John
2020-05-10
If I group by A, B the 1st row and the third just concatenate which is coherent. How could I create another columns to still use a group by and have the result I want.
If you have another ideas than mine, please explain it !
I tried to use some first, last, rank, dense_rank without success.
Use lag(). Looks like B is a function of A in your data. So checking lag(A) will suffice.
select A,B,C
from (
select *, case when lag(A) over(order by C) = A then 0 else 1 end startFlag
from mytable
) t
where startFlag = 1
order by C
I have a database with the following data:
Group ID Time
1 1 16:00:00
1 2 16:02:00
1 3 16:03:00
2 4 16:09:00
2 5 16:10:00
2 6 16:14:00
I am trying to find the difference in times between the consecutive rows within each group. Using LAG() and DATEDIFF() (ie. https://stackoverflow.com/a/43055820), right now I have the following result set:
Group ID Difference
1 1 NULL
1 2 00:02:00
1 3 00:01:00
2 4 00:06:00
2 5 00:01:00
2 6 00:04:00
However I need the difference to reset when a new group is reached, as in below. Can anyone advise?
Group ID Difference
1 1 NULL
1 2 00:02:00
1 3 00:01:00
2 4 NULL
2 5 00:01:00
2 6 00:04:00
The code would look something like:
select t.*,
datediff(second, lag(time) over (partition by group order by id), time)
from t;
This returns the difference as a number of seconds, but you seem to know how to convert that to a time representation. You also seem to know that group is not acceptable as a column name, because it is a SQL keyword.
Based on the question, you have put group in the order by clause of the lag(), not the partition by.
I have a table that contains a list of expiration dates for various companies. The table looks like the following:
ID CompanyID Expiration
--- ---------- ----------
1 1 2016-01-01
2 1 2015-01-01
3 2 2016-04-02
4 2 2015-04-02
5 3 2014-01-03
6 4 2015-04-09
7 5 2015-07-20
8 5 2016-05-01
I am trying to build a TSQL query that will return just the most recent record for every company (i.e. CompanyID). Such as:
ID CompanyID Expiration
--- ---------- ----------
1 1 2016-01-01
3 2 2016-04-02
5 3 2014-01-03
6 4 2015-04-09
8 5 2016-05-01
It looks like there is a exact correlation between ID and Expiration. If that is true, ie the later the Expiration the higher the ID, then you could simply pull Max(ID) and Max(Expiration) which are 1:1 and group by CompanyID:
Select max(ID), CompanyID, max(Expiration) from Table group by Company ID
I am trying to get the count of items given an interval with no start or stop times specified. I would imagine you could do it with window functions but i am not too sure how to go about it.
The problem is as follows i would like to get the number of times people login to a website within a given an arbitrary interval say 20 mins.
Example A
1. 2015-06-24 23:00:00
2. 2015-06-24 23:45:00
3. 2015-06-25 00:00:00
4. 2015-06-25 00:15:00
5. 2015-06-25 00:17:00
6. 2015-06-25 00:21:00
In the above example I would highlight items (2,3),(3,4,5), (4,5,6), (5,6) the output I would like is the
start,end,count
2015-06-25 23:45:00,2015-06-25 00:00:00,2
2015-06-25 00:00:00,2015-06-25 00:17:00,3
2015-06-25 00:15:00,2015-06-25 00:21:00,3
Also only keep the data where count >= 2 otherwise everything will be a valid grouping
Now is a window function the way i should go, cte or is there another practice to adopt?
Try this query with self join:
select a.id, a.log_at, max(b.log_at), count(1)
from logs a
join logs b on b.log_at >= a.log_at and b.log_at <= a.log_at+ '20 m'::interval
group by 1, 2
having count(1) > 1
order by 1
You can get each "day" groups with counts by a query like:
SELECT MIN(last_seen_at), MAX(last_seen_at), COUNT(*)
FROM user_kinds
GROUP BY DATE(last_seen_at)
ORDER BY DATE(last_seen_at) DESC LIMIT 5;
Which on my sample data set yields a result like:
2015-06-26 00:12:30.476548 | 2015-06-26 22:06:25.134322 | 69
2015-06-25 00:46:03.392651 | 2015-06-25 23:49:46.616964 | 14
2015-06-24 14:22:33.578176 | 2015-06-24 23:39:01.32241 | 10
2015-06-23 01:42:53.438663 | 2015-06-23 20:12:21.864601 | 2
(5 rows)