SELECT on INNER JOIN taking 9 hours (and counting) to complete - sql

I'm using a sqlite database I have from the output of another script. I have a query that is taking a huge amount of time to complete. The samples table and multiclass table both have the same ~4,000,000 name's. The multiclass table has one row for each name (4 million rows), and the sample table could have one or many rows for each name (>100 million rows). I am joining on the names and summing the count grouped by the tax_id, day, and sample that names belong to. This query should return ~25000 rows
Here is a toy version of the schema and query I'm using:
SQL Fiddle
SQLite (SQL.js) Schema Setup:
CREATE TABLE samples
(
name varchar(20),
day integer,
sample integer,
count integer
);
CREATE TABLE multiclass
(
name varchar(20),
tax_id varchar(20),
details varchar(30)
);
INSERT INTO samples
(name, day, sample, count)
VALUES
('seq1', 204, 37, 50),
('seq2', 205, 37, 50),
('seq2', 206, 37, 50),
('seq3', 204, 37, 50),
('seq4', 205, 37, 50),
('seq4', 206, 37, 50);
INSERT INTO multiclass
(name, tax_id, details)
VALUES
('seq1', 'Vibrio', 'unimportant'),
('seq2', 'Shewenella', 'still_unimportant'),
('seq3', 'Vibrio', 'also_unimportant'),
('seq4', 'Shewenella', 'doesntmatter');
Query 1:
SELECT tax_id, day, sample, SUM(count)
FROM samples INNER JOIN multiclass USING(name)
GROUP BY tax_id, day, sample
ORDER BY day, sample;
Results:
| tax_id | day | sample | SUM(count) |
|------------|-----|--------|------------|
| Vibrio | 204 | 37 | 100 |
| Shewenella | 205 | 37 | 100 |
| Shewenella | 206 | 37 | 100 |
I am very new to SQL and am not sure how to proceed. This is a query I would only need to execute once. so I'm not sure adding indexes to the table is appropriate.
Is there a different way to construct the query to make it run faster? Would adding indexes make sense or take too long? If it is taking 9 hours, is it likely to still be hung up on the SQL, or is something else going wrong?
Edit: updated question to include database schema and intended results. I am currently building indexes on the samples.name column, it's been running for over 4 hours (using a node on a cluster environment with 60 Gigs of ram and many cpus).

This query:
SELECT tax_id, day, sample, SUM(count)
FROM samples INNER JOIN
multiclass
ON samples.name = multiclass.name
GROUP BY tax_id, day, sample
ORDER BY day, sample;
is pretty simple. An index on either samples(name) or multiclass(name) would normally be recommended.
However, there is a hint in your question that both tables contain 4 million rows, but you are only expecting 25,000. I suspect that you have duplicate names in each table. To determine the number of intermediate rows generated by the join, run this query:
select sum(s.cnt * m.cnt), max(s.cnt * m.cnt)
from (select name, count(*) as cnt from samples group by name
) s join
(select name, count(*) as cnt from multiclass group by name
) m
on s.name = m.name;
I am guessing that you will get a really large number, explaining why the query is taking so long.
Unfortunately, at this point, I don't have real answer on how to solve the problem, because your question doesn't specify what you actually want the query to produce. However, aggregating the tables before joining them is likely to be one possible solution.

The issue was the version of sqlite3 that was installed on the cluster I was using. The version on the cluster was 3.6.20. It seems incredible, but downloading the binary for 3.9.2 from the sqlite website and running the exact same query completed in less than 10 minutes.

Related

How to use a group by but still access every number values cleverly

I need to "group by" my data to distinguish between tests (each test has a specific id, name and temperature) and to calculate their count, standard deviation, etc. But I also need access every raw data value from each group, for further indexes calculations that I do in a python script.
I have found two solution to this problem, but both seems non-optimal/flawed:
1) Using listagg to store every raw value that were grouped into a single string row. It does the work but it is not optimized : I concatenate multiples float values into a giant string that I will immediately de-concatenate and convert back to float. That seem necessary and costly.
2) Removing the group by entirely and do the count and standard deviation though partitioning. But that seems even worse to me. I don't know if PLSQL/oracle optimizes this, it could be calculating the same count and standard deviation for every line (I don't know how to check this). The query result also becomes messy: since there is no 'group by' anymore, I have to do add multiple checks in my python file in order to differentiate every test data (specific id, name and temperature).
I think that my first solution can be improved but I don't know how. How can I use a group by but still access every number values cleverly ?
A function similar to list_agg but with a collection/array output type instead of a string output type could maybe do the trick (a sort of 'array_agg' compatible with oracle), but I don't know any.
EDIT:
The sample data is complex and probably restricted to the company viewing, but I can show you my simplified query for my 1) :
SELECT
rav.rav_testid as test_id,
tte.tte_testname as test_name,
tsc.tsc_temperature as temperature,
listagg(rav.rav_value, ' ')WITHIN GROUP (ORDER BY rav.rav_value) as all_specific_test_values,
COUNT(rav.rav_value) as n,
STDDEV(rav.rav_value) as sigma,
FROM
...
(8 inner joins)
GROUP BY
rav.rqv_testid, tte.tte_testname,tsc.tsc_temperature
ORDER BY
rav.RAV_testid, tte.tte_testname, spd.SPD_SPLITNAMEINTERNAL,tsc.tsc_temperature
The result looks like :
test_id | test_name | temperature | all_specific_test_values | n | sigma
-------------------------------------------------------------------------
6001 |VADC_A(...) | -40 | ,8094034194946289 ,8(...)| 58 | 0,54
6001 |VADC_A(...) | 25 | ,5054857852946545 ,6(...)| 56 | 0,24
6001 |VADC_A(...) | 150 | ,8625754277452524 ,4(...)| 56 | 0,26
6002 |VADC_B(...) | -40 | ,9874514651548454 ,5(...)| 57 | 0,44
I think you want analytic functions:
select t.*,
count(*) over (partition by test) as cnt,
avg(value) over (partition by test) as avg_value,
stddev(value) over (partition by test) as stddev_value
from t;
This adds additional columns on each row.
I would suggest going with #Gordon_Linoff's solution. That is likely the most standard solution.
If you want to go with a less standard solution, you can have a group by that returns a collection as one of the columns. Presumably, your script could iterate through that collection though it might take a bit of work in the script to do that.
create type num_tbl as table of number;
/
create table foo (
grp integer,
val number
);
insert into foo values( 1, 1.1 );
insert into foo values( 2, 1.2 );
insert into foo values( 1, 1.3 );
insert into foo values( 2, 1.4 );
select grp, avg(val), cast( collect( val ) as num_tbl )
from foo
group by grp

Execution time select * vs select count(*)

I'm trying to measure execution time of a query, but I have a feeling that my results are wrong.
Before every query I execute: sync; echo 3 > /proc/sys/vm/drop_caches
My server log file results are:
2014-02-08 14:28:30 EET LOG: duration: 32466.103 ms statement: select * from partsupp
2014-02-08 14:32:48 EET LOG: duration: 9785.503 ms statement: select count(*) from partsupp
Shouldn't select count(*) take more time to execute since it makes more operations?
To output all the results from select * I need 4 minutes (not 32 seconds, as indicated by server log). I understand that the client has to output a lot of data and it will be slow, but what about the server's log? Does it count output operations too?
I also used explain analyze and the results are (as expected):
select *: Total runtime: 13254.733 ms
select count(*): Total runtime: 13463.294 ms
I have run it many times and the results are similar.
What exactly does the log measure?
Why there is so big difference in select * query between explain analyze and server's log, although it doesn't count I/O operations?
What is the difference between log measurement and explain analyze?
I have a dedicated server with Ubuntu 12.04 and PostgreSQL 9.1
Thank you!
Any aggregate function has some small overhead - but on second hand SELECT * send to client lot of data in dependency on column numbers and column size.
log measurements is a total query time, it can be similar to EXPLAIN ANALYZE - but much more times is significantly faster, because EXPLAIN ANALYZE collects a execution time (and execution statistics) for all subnodes of execution plan. And it is significant overhead usually. But there are no overhead from transport data from server to client.
The first query asks for all rows in a table. Therefore, the entire table must be read.
The second query only asks for how many rows there are. The database can answer this by reading the entire table, but can also answer this by reading any index it has for that table. Since the index is smaller than the table, doing that would be faster. In practice, nearly all tables have indexes (because a primary key constraint creates an index, too).
select * = select all data all column included
select count(*) = count how many rows
for example this table
------------------------
name | id | address
----------------------
s | 12 | abc
---------------------
x | 14 | cc
---------------------
y | 15 | vv
---------------------
z | 16 | ll
---------------------
select * will display all the table
select count(*) will display the total of the rows = 4

Select distinct values for a particular column choosing arbitrarily from duplicates

I have health data relating to deaths. Individual should die once maximum. In the database they sometimes don't; probably because causes of death were changed but the original entry was not deleted. I don't really understand how this was allowed to happen, but it has. So, as a made up example, I have:
Row_number | Individual_ID | Cause_of_death | Date_of_death
------------+---------------+-----------------------+---------------
1 | 1 | Stroke | 3 march 2008
2 | 2 | Myocardial infarction | 1 jan 2009
3 | 2 | Pulmonary Embolus | 1 jan 2009
I want each individual to have only one cause of death.
In the example, I want a query that returns row 1 and either row 2 or row 3 (not both). I have to make an arbitrary choice between rows 2 and 3 because there is no timestamp in any of the fields that can be used to determine which is the revision; it's not ideal but is unavoidable.
I can't make the SQL work to do this. I've tried inner joining distinct Individual_ID to the other fields, but this still gives all the rows. I've tried adding a 'having count(Individual_ID) = 1' clause with it. This leaves out people with more than one cause of death completely. Suggestions on the internet seem to be based on using a timestamped field to choose the most recent, but I don't have that.
IBM DB2. Windows XP. Any thoughts gratefully received.
Have you tried using MIN (or MAX) against the cause of death. (and the date of death, if they died on two different dates)
SELECT IndividualID, MIN(Cause_Of_Death), MIN (Date_Of_Death)
from deaths
GROUP BY IndividualID
I don't know DB2 so I'll answer in general. There are two main approaches:
select *
from T
join (
select keys, min(ID) as MinID
from T
group by keys
) on T.ID = MinID
And
select *, row_number() over (partition by keys) as r
from T
where r = 1
Both return all rows, no matter if duplicate or not. But they returns only one duplicate per "key".
Notice, that both statements are pseudo-SQL.
The row_number() approach is probably preferable from a performance standpoint. Here is usr's example, in DB2 syntax:
select * from (
select T.*, row_number() over (partition by Individual_ID) as r
from T
)
where r=1;

How to retrieve data that is not in the same order as the query in SQL?

I am trying to retrieve a record from a table in SQL.
Here is what I want. For example:
I have a table name studentScore with two columns:
studentName ----- Scores
John Smith ----- 75,83, 96
I want to do this: When I type the score in a search box, I want it to show me the name of the student. For example: I could type "83, 96, 75", (the scores can be in any order) and this should show me the student name "John Smith". But I'm wondering how we could specify in the WHERE clause so that it picks up the correct record, if what we type in the box is not in the same order as the original data in the column?
Your issue is that your data is not properly normalized. You are putting a 1 to n relationship into a single table. If you'd reorganize your tables like such:
Table Students
id name
1 John Smith
Table Scores
studentId score
1 75
1 83
1 96
You could do a query like:
select st.name from Students st, Score sc where st.id = sc.studentId and sc.score in ("83", "75", "96")
This also helps if you want to do other queries, like find out which students have a score of at least X, which would be otherwise impossible with your existing table layout.
If you must stick with your existing layout, which I don't recommend, however you could split up the user input and then do a query like
select from studentScore where score like '%75%' or score like '%83%' or score like '%96%'
But i really would refrain from doing so.
I suppose it is solvable, but it would be simpler if the scores for each student were stored as separate rows, for example in a scores table. Otherwise, the code would have to permute the entry into every conceivable order. Or the scores entry would have to be in a standard order somehow.
If you do not want to create a new table for Scores, - e.g. with StudentId, Score columns -, you may sort the numbers before storing them.
This way, when someone types a query, you sort those numbers as well and just compare it to the stored strings.
If you need the original position of the scores, you can store those in a separate field.
Improve your database schema...this does not satisfy even the first normal form (http://en.wikipedia.org/wiki/Database_normalization#Normal_forms).
Improving the schema will save you plenty of headaches in the future (stemming from update anomalies).
No sql table should have multiple values for an attribute (in the same column). Are the scores stored as a string? If so, your query will be more complicated and you're wasting the point of the DB.
however, to your question:
SELECT col4, col3, col2 FROM students WHERE col1 = 57;
this will return columns 4, 3, and 2 in that order (4,3,2) even if they are saved in the order 1, 2, 3, 4. SQL returns the things you ask for in the order you ask for them.
So yeah, I agree with everyone else that this design is crap. If you were to normalize this table properly, you would be able to very easily get the data you need.
However, this is how you could do it with the current structure. Split the user input into discrete scores. Then, pass each value into the procedure.
CREATE PROCEDURE FindStudentByScores
(
#score1 AS VARCHAR(3) = NULL
,#score2 AS VARCHAR(3) = NULL
,#score3 AS VARCHAR(3) = NULL
)
AS
BEGIN
SELECT *
FROM [Students]
WHERE ( #score1 IS NULL
OR [Scores] LIKE '%' + #score1 + '%' )
AND ( #score2 IS NULL
OR [Scores] LIKE '%' + #score2 + '%' )
AND ( #score3 IS NULL
OR [Scores] LIKE '%' + #score3 + '%' )
END
You could use Regular expressions or the Like operator
A regexp solution could look like
SIMILAR TO '%(SCORE1|SCORE2|SCORE3)%'
That's the easiest way to go
but I recommend you changing your entire table structure as been mentioned now
a couple of times, since you have no possibility to take advantage of an index or key
which will exhaust the computer in matter of a couple tens of visitors
This is an example of where database normalization should help you a lot.
You could store your data like this
(Edit: if you want to keep the order you can add an order column)
studentName Scores Order
John Smith 75 1
John Smith 83 2
John Smith 96 3
Foo bar 73 1
Foo bar 34 2
........
But if you are stuck with the current model your next best option is to have the Scores column sorted, then you just need to take the search string from the textbox, sort and format it correctly, then you can search.
Lastly if the scores is not sorted in the table you can create all possible combinations
75, 83, 96
75, 96, 83
83, 75, 96
83, 96, 75
96, 75, 83
96, 83, 75
and search for them all with OR.

PostgreSQL - fetch the rows which have the Max value for a column in each GROUP BY group

I'm dealing with a Postgres table (called "lives") that contains records with columns for time_stamp, usr_id, transaction_id, and lives_remaining. I need a query that will give me the most recent lives_remaining total for each usr_id
There are multiple users (distinct usr_id's)
time_stamp is not a unique identifier: sometimes user events (one by row in the table) will occur with the same time_stamp.
trans_id is unique only for very small time ranges: over time it repeats
remaining_lives (for a given user) can both increase and decrease over time
example:
time_stamp|lives_remaining|usr_id|trans_id
-----------------------------------------
07:00 | 1 | 1 | 1
09:00 | 4 | 2 | 2
10:00 | 2 | 3 | 3
10:00 | 1 | 2 | 4
11:00 | 4 | 1 | 5
11:00 | 3 | 1 | 6
13:00 | 3 | 3 | 1
As I will need to access other columns of the row with the latest data for each given usr_id, I need a query that gives a result like this:
time_stamp|lives_remaining|usr_id|trans_id
-----------------------------------------
11:00 | 3 | 1 | 6
10:00 | 1 | 2 | 4
13:00 | 3 | 3 | 1
As mentioned, each usr_id can gain or lose lives, and sometimes these timestamped events occur so close together that they have the same timestamp! Therefore this query won't work:
SELECT b.time_stamp,b.lives_remaining,b.usr_id,b.trans_id FROM
(SELECT usr_id, max(time_stamp) AS max_timestamp
FROM lives GROUP BY usr_id ORDER BY usr_id) a
JOIN lives b ON a.max_timestamp = b.time_stamp
Instead, I need to use both time_stamp (first) and trans_id (second) to identify the correct row. I also then need to pass that information from the subquery to the main query that will provide the data for the other columns of the appropriate rows. This is the hacked up query that I've gotten to work:
SELECT b.time_stamp,b.lives_remaining,b.usr_id,b.trans_id FROM
(SELECT usr_id, max(time_stamp || '*' || trans_id)
AS max_timestamp_transid
FROM lives GROUP BY usr_id ORDER BY usr_id) a
JOIN lives b ON a.max_timestamp_transid = b.time_stamp || '*' || b.trans_id
ORDER BY b.usr_id
Okay, so this works, but I don't like it. It requires a query within a query, a self join, and it seems to me that it could be much simpler by grabbing the row that MAX found to have the largest timestamp and trans_id. The table "lives" has tens of millions of rows to parse, so I'd like this query to be as fast and efficient as possible. I'm new to RDBM and Postgres in particular, so I know that I need to make effective use of the proper indexes. I'm a bit lost on how to optimize.
I found a similar discussion here. Can I perform some type of Postgres equivalent to an Oracle analytic function?
Any advice on accessing related column information used by an aggregate function (like MAX), creating indexes, and creating better queries would be much appreciated!
P.S. You can use the following to create my example case:
create TABLE lives (time_stamp timestamp, lives_remaining integer,
usr_id integer, trans_id integer);
insert into lives values ('2000-01-01 07:00', 1, 1, 1);
insert into lives values ('2000-01-01 09:00', 4, 2, 2);
insert into lives values ('2000-01-01 10:00', 2, 3, 3);
insert into lives values ('2000-01-01 10:00', 1, 2, 4);
insert into lives values ('2000-01-01 11:00', 4, 1, 5);
insert into lives values ('2000-01-01 11:00', 3, 1, 6);
insert into lives values ('2000-01-01 13:00', 3, 3, 1);
I would propose a clean version based on DISTINCT ON (see docs):
SELECT DISTINCT ON (usr_id)
time_stamp,
lives_remaining,
usr_id,
trans_id
FROM lives
ORDER BY usr_id, time_stamp DESC, trans_id DESC;
On a table with 158k pseudo-random rows (usr_id uniformly distributed between 0 and 10k, trans_id uniformly distributed between 0 and 30),
By query cost, below, I am referring to Postgres' cost based optimizer's cost estimate (with Postgres' default xxx_cost values), which is a weighed function estimate of required I/O and CPU resources; you can obtain this by firing up PgAdminIII and running "Query/Explain (F7)" on the query with "Query/Explain options" set to "Analyze"
Quassnoy's query has a cost estimate of 745k (!), and completes in 1.3 seconds (given a compound index on (usr_id, trans_id, time_stamp))
Bill's query has a cost estimate of 93k, and completes in 2.9 seconds (given a compound index on (usr_id, trans_id))
Query #1 below has a cost estimate of 16k, and completes in 800ms (given a compound index on (usr_id, trans_id, time_stamp))
Query #2 below has a cost estimate of 14k, and completes in 800ms (given a compound function index on (usr_id, EXTRACT(EPOCH FROM time_stamp), trans_id))
this is Postgres-specific
Query #3 below (Postgres 8.4+) has a cost estimate and completion time comparable to (or better than) query #2 (given a compound index on (usr_id, time_stamp, trans_id)); it has the advantage of scanning the lives table only once and, should you temporarily increase (if needed) work_mem to accommodate the sort in memory, it will be by far the fastest of all queries.
All times above include retrieval of the full 10k rows result-set.
Your goal is minimal cost estimate and minimal query execution time, with an emphasis on estimated cost. Query execution can dependent significantly on runtime conditions (e.g. whether relevant rows are already fully cached in memory or not), whereas the cost estimate is not. On the other hand, keep in mind that cost estimate is exactly that, an estimate.
The best query execution time is obtained when running on a dedicated database without load (e.g. playing with pgAdminIII on a development PC.) Query time will vary in production based on actual machine load/data access spread. When one query appears slightly faster (<20%) than the other but has a much higher cost, it will generally be wiser to choose the one with higher execution time but lower cost.
When you expect that there will be no competition for memory on your production machine at the time the query is run (e.g. the RDBMS cache and filesystem cache won't be thrashed by concurrent queries and/or filesystem activity) then the query time you obtained in standalone (e.g. pgAdminIII on a development PC) mode will be representative. If there is contention on the production system, query time will degrade proportionally to the estimated cost ratio, as the query with the lower cost does not rely as much on cache whereas the query with higher cost will revisit the same data over and over (triggering additional I/O in the absence of a stable cache), e.g.:
cost | time (dedicated machine) | time (under load) |
-------------------+--------------------------+-----------------------+
some query A: 5k | (all data cached) 900ms | (less i/o) 1000ms |
some query B: 50k | (all data cached) 900ms | (lots of i/o) 10000ms |
Do not forget to run ANALYZE lives once after creating the necessary indices.
Query #1
-- incrementally narrow down the result set via inner joins
-- the CBO may elect to perform one full index scan combined
-- with cascading index lookups, or as hash aggregates terminated
-- by one nested index lookup into lives - on my machine
-- the latter query plan was selected given my memory settings and
-- histogram
SELECT
l1.*
FROM
lives AS l1
INNER JOIN (
SELECT
usr_id,
MAX(time_stamp) AS time_stamp_max
FROM
lives
GROUP BY
usr_id
) AS l2
ON
l1.usr_id = l2.usr_id AND
l1.time_stamp = l2.time_stamp_max
INNER JOIN (
SELECT
usr_id,
time_stamp,
MAX(trans_id) AS trans_max
FROM
lives
GROUP BY
usr_id, time_stamp
) AS l3
ON
l1.usr_id = l3.usr_id AND
l1.time_stamp = l3.time_stamp AND
l1.trans_id = l3.trans_max
Query #2
-- cheat to obtain a max of the (time_stamp, trans_id) tuple in one pass
-- this results in a single table scan and one nested index lookup into lives,
-- by far the least I/O intensive operation even in case of great scarcity
-- of memory (least reliant on cache for the best performance)
SELECT
l1.*
FROM
lives AS l1
INNER JOIN (
SELECT
usr_id,
MAX(ARRAY[EXTRACT(EPOCH FROM time_stamp),trans_id])
AS compound_time_stamp
FROM
lives
GROUP BY
usr_id
) AS l2
ON
l1.usr_id = l2.usr_id AND
EXTRACT(EPOCH FROM l1.time_stamp) = l2.compound_time_stamp[1] AND
l1.trans_id = l2.compound_time_stamp[2]
2013/01/29 update
Finally, as of version 8.4, Postgres supports Window Function meaning you can write something as simple and efficient as:
Query #3
-- use Window Functions
-- performs a SINGLE scan of the table
SELECT DISTINCT ON (usr_id)
last_value(time_stamp) OVER wnd,
last_value(lives_remaining) OVER wnd,
usr_id,
last_value(trans_id) OVER wnd
FROM lives
WINDOW wnd AS (
PARTITION BY usr_id ORDER BY time_stamp, trans_id
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
);
Here's another method, which happens to use no correlated subqueries or GROUP BY. I'm not expert in PostgreSQL performance tuning, so I suggest you try both this and the solutions given by other folks to see which works better for you.
SELECT l1.*
FROM lives l1 LEFT OUTER JOIN lives l2
ON (l1.usr_id = l2.usr_id AND (l1.time_stamp < l2.time_stamp
OR (l1.time_stamp = l2.time_stamp AND l1.trans_id < l2.trans_id)))
WHERE l2.usr_id IS NULL
ORDER BY l1.usr_id;
I am assuming that trans_id is unique at least over any given value of time_stamp.
There is a new option in Postgressql 9.5 called DISTINCT ON
SELECT DISTINCT ON (location) location, time, report
FROM weather_reports
ORDER BY location, time DESC;
It eliminates duplicate rows an leaves only the first row as defined my the ORDER BY clause.
see the official documentation
I like the style of Mike Woodhouse's answer on the other page you mentioned. It's especially concise when the thing being maximised over is just a single column, in which case the subquery can just use MAX(some_col) and GROUP BY the other columns, but in your case you have a 2-part quantity to be maximised, you can still do so by using ORDER BY plus LIMIT 1 instead (as done by Quassnoi):
SELECT *
FROM lives outer
WHERE (usr_id, time_stamp, trans_id) IN (
SELECT usr_id, time_stamp, trans_id
FROM lives sq
WHERE sq.usr_id = outer.usr_id
ORDER BY trans_id, time_stamp
LIMIT 1
)
I find using the row-constructor syntax WHERE (a, b, c) IN (subquery) nice because it cuts down on the amount of verbiage needed.
Actaully there's a hacky solution for this problem. Let's say you want to select the biggest tree of each forest in a region.
SELECT (array_agg(tree.id ORDER BY tree_size.size)))[1]
FROM tree JOIN forest ON (tree.forest = forest.id)
GROUP BY forest.id
When you group trees by forests there will be an unsorted list of trees and you need to find the biggest one. First thing you should do is to sort the rows by their sizes and select the first one of your list. It may seems inefficient but if you have millions of rows it will be quite faster than the solutions that includes JOIN's and WHERE conditions.
BTW, note that ORDER_BY for array_agg is introduced in Postgresql 9.0
You can do it with window functions
SELECT t.*
FROM
(SELECT
*,
ROW_NUMBER() OVER(PARTITION BY usr_id ORDER BY time_stamp DESC) as r
FROM lives) as t
WHERE t.r = 1
SELECT l.*
FROM (
SELECT DISTINCT usr_id
FROM lives
) lo, lives l
WHERE l.ctid = (
SELECT ctid
FROM lives li
WHERE li.usr_id = lo.usr_id
ORDER BY
time_stamp DESC, trans_id DESC
LIMIT 1
)
Creating an index on (usr_id, time_stamp, trans_id) will greatly improve this query.
You should always, always have some kind of PRIMARY KEY in your tables.
I think you've got one major problem here: there's no monotonically increasing "counter" to guarantee that a given row has happened later in time than another. Take this example:
timestamp lives_remaining user_id trans_id
10:00 4 3 5
10:00 5 3 6
10:00 3 3 1
10:00 2 3 2
You cannot determine from this data which is the most recent entry. Is it the second one or the last one? There is no sort or max() function you can apply to any of this data to give you the correct answer.
Increasing the resolution of the timestamp would be a huge help. Since the database engine serializes requests, with sufficient resolution you can guarantee that no two timestamps will be the same.
Alternatively, use a trans_id that won't roll over for a very, very long time. Having a trans_id that rolls over means you can't tell (for the same timestamp) whether trans_id 6 is more recent than trans_id 1 unless you do some complicated math.