SQL ORDER BY with 2 timestamp columns - sql

I'm trying to build a blog-type website for a few of my programmer friends where we can share tips and articles about code, Linux, software, etc.
I'm building the post system in PostgreSQL and so far it's been going quite good. The thing that stumps me however is the sorting of two timestamptz columns with ORDER BY. I want to be able to have a created timestamp, but also a modified timestamp. This should sort by newest post (created OR modified most recently). I came up with this -- post 135 should be on top but the modified posts are taking precedence.
I would preferably like to have both modified and created fields available so I can display: "created on xx-xx-xx, last updated yy-yy-yy".
SELECT posts.postid, users.id, posts.modified, posts.created
FROM posts
JOIN users ON posts.userid=users.id
WHERE posts.isdraft=false
ORDER BY posts.modified DESC NULLS LAST, posts.created DESC;
postid | id | modified | created
--------+-----+-------------------------------+-------------------------------
100 | 999 | 2022-11-28 01:57:07.495482-06 | 2022-11-27 21:43:34.132985-06
115 | 111 | 2022-11-28 01:55:05.9358-06 | 2022-11-27 21:43:34.137873-06
135 | 999 | | 2022-11-28 02:28:20.64009-06
130 | 444 | | 2022-11-28 01:42:49.301489-06
110 | 42 | | 2022-11-27 21:43:34.137254-06
(the reason for the JOIN is that I'll need the username attached to the user id but I omitted it here for space)
All help is appreciated, thanks!

Sort by greatest of the two timestamps. Here is your query with this modification.
SELECT posts.postid, users.id, posts.modified, posts.created
FROM posts
JOIN users ON posts.userid=users.id
WHERE not posts.isdraft
ORDER BY greatest(posts.modified, posts.created) DESC;

Related

Performance on querying only the most recent entries

I made an app that saves when a worker arrives and departures from the premises.
Over a 24 hours multiple checks are made, so the database can quickly fill hundreds to thousands of records depending on the activity.
| user_id | device_id | station_id | arrived_at | departed_at |
|-----------|-----------|------------|---------------------|---------------------|
| 67 | 46 | 4 | 2020-01-03 11:32:45 | 2020-01-03 11:59:49 |
| 254 | 256 | 8 | 2020-01-02 16:29:12 | 2020-01-02 16:44:65 |
| 97 | 87 | 7 | 2020-01-01 09:55:01 | 2020-01-01 11:59:18 |
...
This becomes a problem since the daily report software, which later reports who was absent or who made extra hours, filters by arrival date.
The query becomes a full table sweep:
(I just used SQLite for this example, but you get the idea)
EXPLAIN QUERY PLAN
SELECT * FROM activities
WHERE user_id = 67
AND arrived_at > '2020-01-01 00:00:00'
AND departed_at < '2020-01-01 23:59:59'
ORDER BY arrived_at DESC
LIMIT 10
What I want to make is make the query snappier for records created (arrived) only the most recent day, since queries for older days are rarely executed. Otherwise, I'll have to deal with timeouts.
I would use the following index, so that departed_at that don't match can be eliminated before probing the table:
CREATE INDEX ON activities (arrived_at, departed_at);
On Postgres, you may use DISTINCT ON:
SELECT DISTINCT ON (user_id) *
FROM activities
ORDER BY user_id, arrived_at::date DESC;
This assumes that you only want to report the latest record, as determined by the arrival date, for each user. If instead you just want to show all records with the latest arrival date across the entire table, then use:
SELECT *
FROM activities
WHERE arrived_at::date = (SELECT MAX(arrived_at::date) FROM activities);

SQL query to get latest user to update record

I have a postgres database that contains an audit log table which holds a historical log of updates to documents. It contains which document was updated, which field was updated, which user made the change, and when the change was made. Some sample data looks like this:
doc_id | user_id | created_date | field | old_value | new_value
--------+---------+------------------------+-------------+---------------+------------
A | 1 | 2018-07-30 15:43:44-05 | Title | | War and Piece
A | 2 | 2018-07-30 15:45:13-05 | Title | War and Piece | War and Peas
A | 1 | 2018-07-30 16:05:59-05 | Title | War and Peas | War and Peace
B | 1 | 2018-07-30 15:43:44-05 | Description | test 1 | test 2
B | 2 | 2018-07-30 17:45:44-05 | Description | test 2 | test 3
You can see that the Title of document A was changed three times, first by user 1 then by user 2, then again by user 1.
Basically I need to know which user was the last one to update a field on a particular document. So for example, I need to know that User 1 was the last user to update the Title field on document A. I don't really care what time it happened, just the document, field, and user.
So sample output would be something like this:
doc_id | field | user_id
--------+-------------+---------
A | Title | 1
B | Description | 2
Seems like it should be fairly straightforward query to write but I'm having some trouble with it. I would think that group by would be in order but the problem is that if I group by doc_id I lose the user data:
select doc_id, max(created_date)
from document_history
group by doc_id;
doc_id | max
--------+------------------------
B | 2018-07-30 15:00:00-05
A | 2018-07-30 16:00:00-05
I could join these results table back to the document_history table but I would need to do so based on the doc_id and timestamp which doesn't seem quite right. If two people editing a document at the exact same time I would get multiple rows back for that document and field. Maybe that's so unlikely I shouldn't worry about it, but still...
Any thoughts on a way to do this in a single query?
You want to filter the records, so think where, not group by:
select dh.*
from document_history
where dh.created_date = (select max(dh2.created_date) from document_history dh2 where dh2.doc_id = dh.doc_id);
In most databases, this will have better performance than a group by, if you have an index on document_history(doc_id, created_date).
If your DBMS supports window functions (e.g. PostgreSQL, SQL Server; aka analytic function in Oracle) you could do something like this (SQLFiddle with Postgres, other systems might differ slightly in the syntax):
http://sqlfiddle.com/#!17/981af/4
SELECT DISTINCT
doc_id, field,
first_value(user_id) OVER (PARTITION BY doc_id, field ORDER BY created_date DESC) as last_user
FROM get_last_updated
first_value() OVER (... ORDER BY x DESC) orders the window frames/partitions descending and then takes the first value which is your latest time stamp.
I added the DISTINCT to get your expected result. The window function just adds a new column to your SELECT result but within the same partition with the same value. If you do not need it, remove it and then you are able to work with the origin data plus the new won information.

Find spectators that have seen the same shows (match multiple rows for each)

For an assignment I have to write several SQL queries for a database stored in a PostgreSQL server running PostgreSQL 9.3.0. However, I find myself blocked with last query. The database models a reservation system for an opera house. The query is about associating the a spectator the other spectators that assist to the same events every time.
The model looks like this:
Reservations table
id_res | create_date | tickets_presented | id_show | id_spectator | price | category
-------+---------------------+---------------------+---------+--------------+-------+----------
1 | 2015-08-05 17:45:03 | | 1 | 1 | 195 | 1
2 | 2014-03-15 14:51:08 | 2014-11-30 14:17:00 | 11 | 1 | 150 | 2
Spectators table
id_spectator | last_name | first_name | email | create_time | age
---------------+------------+------------+----------------------------------------+---------------------+-----
1 | gonzalez | colin | colin.gonzalez#gmail.com | 2014-03-15 14:21:30 | 22
2 | bequet | camille | bequet.camille#gmail.com | 2014-12-10 15:22:31 | 22
Shows table
id_show | name | kind | presentation_date | start_time | end_time | id_season | capacity_cat1 | capacity_cat2 | capacity_cat3 | price_cat1 | price_cat2 | price_cat3
---------+------------------------+--------+-------------------+------------+----------+-----------+---------------+---------------+---------------+------------+------------+------------
1 | madama butterfly | opera | 2015-09-05 | 19:30:00 | 21:30:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
2 | don giovanni | opera | 2015-09-12 | 19:30:00 | 21:45:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
So far I've started by writing a query to get the id of the spectator and the date of the show he's attending to, the query looks like this.
SELECT Reservations.id_spectator, Shows.presentation_date
FROM Reservations
LEFT JOIN Shows ON Reservations.id_show = Shows.id_show;
Could someone help me understand better the problem and hint me towards finding a solution. Thanks in advance.
So the result I'm expecting should be something like this
id_spectator | other_id_spectators
-------------+--------------------
1| 2,3
Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.
Note based on comments: Wanted to make clear that this answer may be of limited use as it was answered in the context of SQL-Server (tag was present at the time)
There is probably a better way to do it, but you could do it with the 'stuff 'function. The only drawback here is that, since your ids are ints, placing a comma between values will involve a work around (would need to be a string). Below is the method I can think of using a work around.
SELECT [id_spectator], [id_show]
, STUFF((SELECT ',' + CAST(A.[id_spectator] as NVARCHAR(10))
FROM reservations A
Where A.[id_show]=B.[id_show] AND a.[id_spectator] != b.[id_spectator] FOR XML PATH('')),1,1,'') As [other_id_spectators]
From reservations B
Group By [id_spectator], [id_show]
This will show you all other spectators that attended the same shows.
Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.
In other words, you want a list of ...
all spectators that have seen all the shows that a given spectator has seen (and possibly more than the given one)
This is a special case of relational division. We have assembled an arsenal of basic techniques here:
How to filter SQL results in a has-many-through relation
It is special because the list of shows each spectator has to have attended is dynamically determined by the given prime spectator.
Assuming that (d_spectator, id_show) is unique in reservations, which has not been clarified.
A UNIQUE constraint on those two columns (in that order) also provides the most important index.
For best performance in query 2 and 3 below also create an index with leading id_show.
1. Brute force
The primitive approach would be to form a sorted array of shows the given user has seen and compare the same array of others:
SELECT 1 AS id_spectator, array_agg(sub.id_spectator) AS id_other_spectators
FROM (
SELECT id_spectator
FROM reservations r
WHERE id_spectator <> 1
GROUP BY 1
HAVING array_agg(id_show ORDER BY id_show)
#> (SELECT array_agg(id_show ORDER BY id_show)
FROM reservations
WHERE id_spectator = 1)
) sub;
But this is potentially very expensive for big tables. The whole table hast to be processes, and in a rather expensive way, too.
2. Smarter
Use a CTE to determine relevant shows, then only consider those
WITH shows AS ( -- all shows of id 1; 1 row per show
SELECT id_spectator, id_show
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
)
SELECT sub.id_spectator, array_agg(sub.other) AS id_other_spectators
FROM (
SELECT s.id_spectator, r.id_spectator AS other
FROM shows s
JOIN reservations r USING (id_show)
WHERE r.id_spectator <> s.id_spectator
GROUP BY 1,2
HAVING count(*) = (SELECT count(*) FROM shows)
) sub
GROUP BY 1;
#> is the "contains2 operator for arrays - so we get all spectators that have at least seen the same shows.
Faster than 1. because only relevant shows are considered.
3. Real smart
To also exclude spectators that are not going to qualify early from the query, use a recursive CTE:
WITH RECURSIVE shows AS ( -- produces exactly 1 row
SELECT id_spectator, array_agg(id_show) AS shows, count(*) AS ct
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
GROUP BY 1
)
, cte AS (
SELECT r.id_spectator, 1 AS idx
FROM shows s
JOIN reservations r ON r.id_show = s.shows[1]
WHERE r.id_spectator <> s.id_spectator
UNION ALL
SELECT r.id_spectator, idx + 1
FROM cte c
JOIN reservations r USING (id_spectator)
JOIN shows s ON s.shows[c.idx + 1] = r.id_show
)
SELECT s.id_spectator, array_agg(c.id_spectator) AS id_other_spectators
FROM shows s
JOIN cte c ON c.idx = s.ct -- has an entry for every show
GROUP BY 1;
Note that the first CTE is non-recursive. Only the second part is recursive (iterative really).
This should be fastest for small selections from big tables. Row that don't qualify are excluded early. the two indices I mentioned are essential.
SQL Fiddle demonstrating all three.
It sounds like you have one half of the total question--determining which id_shows a particular id_spectator attended.
What you want to ask yourself is how you can determine which id_spectators attended an id_show, given an id_show. Once you have that, combine the two answers to get the full result.
So the final answer I got, looks like this :
SELECT id_spectator, id_show,(
SELECT string_agg(to_char(A.id_spectator, '999'), ',')
FROM Reservations A
WHERE A.id_show=B.id_show
) AS other_id_spectators
FROM Reservations B
GROUP By id_spectator, id_show
ORDER BY id_spectator ASC;
Which prints something like this:
id_spectator | id_show | other_id_spectators
-------------+---------+---------------------
1 | 1 | 1, 2, 9
1 | 14 | 1, 2
Which suits my needs, however if you have any improvements to offer, please share :) Thanks again everybody!

How do I query messages stored in a table such that I get messages grouped by sender and the groups sorted by time?

Overall Scenario: I am storing conversations in a table, I need to retrieve the messages for a particular location, such that they're grouped into conversations, and the groups of conversations are sorted by the most recent message received in that group. This is analogous to how text messages are organized on a phone or facebook's newsfeed ordering. I'm storing the messages in the following schema:
Location_id | SentByUser | Customer | Messsage | Time
1 | Yes | 555-123-1234 | Hello world | 2013-12-01 10:00:00
1 | No | 555-123-1234 | Thank you | 2013-12-01 12:00:00
1 | Yes | 999-999-9999 | Winter is coming | 2013-12-03 11:00:20
1 | Yes | 555-123-1234 | Foo Bar | 2013-12-02 11:00:00
1 | No | 999-999-9999 | Thank you | 2013-12-04 13:00:00
1 | Yes | 111-111-1111 | Foo Foo Bar | 2013-12-05 01:00:00
In this case, if I was building the conversation tree for location id, I'd want the following output:
Location_id | SentByUser | Customer | Messsage | Time
1 | Yes | 111-111-1111 | Foo Foo Bar | 2013-12-05 01:00:00
1 | Yes | 999-999-9999 | Winter is coming | 2013-12-03 11:00:20
1 | No | 999-999-9999 | Thank you | 2013-12-04 13:00:00
1 | Yes | 555-123-1234 | Hello world | 2013-12-01 10:00:00
1 | No | 555-123-1234 | Thank you | 2013-12-01 12:00:00
1 | Yes | 555-123-1234 | Foo Bar | 2013-12-02 11:00:00
So what I'd like to do is group all the conversations by the Customer field, and then order the groups by Time, and lastly order the messages within each group also. This is because I'm building out an interface that's similar to text messages. For each location there may be hundreds of conversations, and I'm only going to show a handful at a time. If I ensure that my query output is ordered, I dont have to worry about server maintaining any state. The client can simply say give me the next 100 messages etc.
My question is two fold:
1. Is there a simple way to sub order results? Is there an easy way without doing a complex join back on the table itself or creating a new table to maintain some order.
2. Is the way I'm approaching this a good practice? As in, is there a better way to store and retrieve messages such that the server doesn't have to maintain state? As in, is there a better pattern that I should consider?
I looked at various questions and answers, and the best one I could find was What is the most efficient/elegant way to parse a flat table into a tree?, but it doesnt seem fully applicable to my case because the author is talking about multi branch trees.
It seems like you want two different queries. This is written in T-SQL for SQL Server, but could easily be adapted for SQLite or MySQL or whatever you're working with.
1) Show me the Customer groups ordered by most recent
select Location_id, Customer, Max(Time) as LatestMessageTime from #Table
group by Location_id, Customer order by LatestMessageTime desc
This would be similar to the first view of your text message application.
2) Show me the Messages in order given a Location_id and Customer
declare #Location int, #Customer varchar(900)
set #Location = 1
set #Customer = '999-999-9999'
select * from #Table where Location_id = #Location and Customer = #Customer
order by Time desc
If you just wanted the sample output, you don't need anything too complex:
select t.*, g.MostRecentTime from #Table t LEFT OUTER JOIN
(select Location_id, Customer, Max(Time) as MostRecentTime from #Table
group by Location_id, Customer) g on g.Location_id = t.Location_id and g.Customer = t.Customer
order by MostRecentTime desc, Location_id, Customer, Time
Here's a SQLFiddle of it: http://sqlfiddle.com/#!6/ae3f8/1/0
I think this is an acceptable way to store the information. As far as retrieving it I'd have two different stored procedures: Give me the 'summary' (1 above), and then give me the 'messages' given a certain location and customer (2 above). I'd also order by .... Customer, Time desc so that the most recent messages are the first returned, and then it goes 'back' into the past rather than loading the oldest first.

SQL Getting last date and values associated with description of value in another table

I have two different tables, my first table contains the authorizations granted to other requests. It has the following columns:
Authorizations table
| authorization_date | role_id | request_id |
|--------------------|---------|------------|
| 2011-08-02 | 1 | 168 |
| 2011-08-10 | 2 | 168 |
| 2011-08-20 | 6 | 168 |
| 2011-08-03 | 2 | 169 |
| 2011-08-24 | 6 | 169 |
| 2011-08-05 | 3 | 170 |
| 2011-08-09 | 5 | 170 |
As you can see, different people have different roles and also can grant a certain level of authorization. The higher the role, the higher the request has been processed.
Now, what I want to do is I want to show the description associated to the role_id (which is in another table) and ONLY the last authorization. Since I have the date for it I already know which one is the latest one. However I don't know how to do this as I try to group my query by getting the maximum value of the date, when I link it to my second table containing only the description of the role_id, I get duplicates and I just can't think of a way to do this, I'm kind of new with queries & as I've been learning by myself I don't know many things. Any ideas?
Thanks!
I am guessing a little here, but I think you want:
SELECT Authorizations.*, anotherTable.description
FROM
(
SELECT MAX(authorization_date) AS max_date, request_id
FROM Authorizations
GROUP BY request_id
) last_auths
INNER JOIN Authorizations ON last_auths.max_date = Authorizations.authorization_date
AND last_auths.request_id = Authorizations.request_id
INNER JOIN anotherTable ON anotherTable.role_id = Authorizations.role_id
The derived table will get the maximum authorization date for each request. Then you can join to get the role_id for that request and join again to get the description.
SELECT d.role_desc, a.role_id, MAX(a.authorization_date)
FROM Authorizations a
INNER JOIN Description d ON a.role_id = d.role_id
GROUP BY a.role_id, d.role_desc, d.role_id
Try this query and ensure you are getting the expected result.