COUNT and GROUP BY on text fields seems slow

COUNT and GROUP BY on text fields seems slow - sql

I'm building a MySQL database which contains entries about special substrings of DNA in species of yeast. My table looks like this:
+--------------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+---------+------+-----+---------+-------+
| species | text | YES | MUL | NULL | |
| region | text | YES | MUL | NULL | |
| gene | text | YES | MUL | NULL | |
| startPos | int(11) | YES | | NULL | |
| repeatLength | int(11) | YES | | NULL | |
| coreLength | int(11) | YES | | NULL | |
| sequence | text | YES | MUL | NULL | |
+--------------+---------+------+-----+---------+-------+
There are approximately 1.8 million records. In one type of query I want to see how many DNA substrings are associated with each type of species and region, so I issue this query:
select species, region, count(*) group by species, region;
The species and region columns have only two possible entries (conserved/scer for species, and promoter/coding for region) yet this query takes about 30 seconds.
Is this a normal amount of time to expect for this type of query given the size of the table? Is it slow because I'm using text fields instead of simple integer or boolean values (I prefer text fields as several non-CS researchers will be using the DB). Any other ideas and suggestions would be welcome.
Please excuse if this is a boneheaded question, I am an SQL neophyte.
P.S. I've also seen this question but the proposed solution doesn't seem relevant for what I'm doing.
EDIT: Converting those fields to VARCHARs reduced the runtime to ~2.5 seconds. Note I also timed it against ENUMs which had a similar timing.

Why're all your string based columns defined as TEXT? If you read the performance comparison, you'll see that TEXT was ~3x slower than a VARCHAR column using identical indexing: http://forums.mysql.com/read.php?24,105964,105964

If your fields are only ever going to have 2 values, you're much better off making them booleans. You should also make everything NOT NULL unless there's a real reason you'll need it to be NULL.
Also take a look at the ENUM type for a better way to use a finite number of human-readable values for a column.
As for slowness, the first thing to try is to create indices on your columns. For the particular query you're showing here, an index on species, region should make a huge difference:
create index on mytablename (species, region);
should do it.

Related

Access text count in query design

I am new to Access and am trying to develop a query that will allow me to count the number of occurrences of one word in each field from a table with 15 fields.
The table simply stores test results for employees. There is one table that stores the employee identification - id, name, etc.
The second table has 15 fields - A1 through A15 with the words correct or incorrect in each field. I need the total number of incorrect occurrences for each field, not for the entire table.
Is there an answer through Query Design, or is code required?
The solution, whether Query Design, or code, would be greatly appreciated!

Firstly, one of the reasons that you are struggling to obtain the desired result for what should be a relatively straightforward request is because your data does not follow database normalisation rules, and consequently, you are working against the natural operation of a RDBMS when querying your data.
From your description, I assume that the fields A1 through A15 are answers to questions on a test.
By representing these as separate fields within your database, aside from the inherent difficulty in querying the resulting data (as you have discovered), if ever you wanted to add or remove a question to/from the test, you would be forced to restructure your entire database!
Instead, I would suggest structuring your table in the following way:
Results
+------------+------------+-----------+
| EmployeeID | QuestionID | Result |
+------------+------------+-----------+
| 1 | 1 | correct |
| 1 | 2 | incorrect |
| ... | ... | ... |
| 1 | 15 | correct |
| 2 | 1 | correct |
| 2 | 2 | correct |
| ... | ... | ... |
+------------+------------+-----------+
This table would be a junction table (a.k.a. linking / cross-reference table) in your database, supporting a many-to-many relationship between the tables Employees & Questions, which might look like the following:
Employees
+--------+-----------+-----------+------------+------------+-----+
| Emp_ID | Emp_FName | Emp_LName | Emp_DOB | Emp_Gender | ... |
+--------+-----------+-----------+------------+------------+-----+
| 1 | Joe | Bloggs | 01/01/1969 | M | ... |
| ... | ... | ... | ... | ... | ... |
+--------+-----------+-----------+------------+------------+-----+
Questions
+-------+------------------------------------------------------------+--------+
| Qu_ID | Qu_Desc | Qu_Ans |
+-------+------------------------------------------------------------+--------+
| 1 | What is the meaning of life, the universe, and everything? | 42 |
| ... | ... | ... |
+-------+------------------------------------------------------------+--------+
With this structure, if ever you wish to add or remove a question from the test, you can simply add or remove a record from the table without needing to restructure your database or rewrite any of the queries, forms, or reports which depends upon the existing structure.
Furthermore, since the result of an answer is likely to be a binary correct or incorrect, then this would be better (and far more efficiently) represented using a Boolean True/False data type, e.g.:
Results
+------------+------------+--------+
| EmployeeID | QuestionID | Result |
+------------+------------+--------+
| 1 | 1 | True |
| 1 | 2 | False |
| ... | ... | ... |
| 1 | 15 | True |
| 2 | 1 | True |
| 2 | 2 | True |
| ... | ... | ... |
+------------+------------+--------+
Not only does this consume less memory in your database, but this may be indexed far more efficiently (yielding faster queries), and removes all ambiguity and potential for error surrounding typos & case sensitivity.
With this new structure, if you wanted to see the number of correct answers for each employee, the query can be something as simple as:
select results.employeeid, count(*)
from results
where results.result = true
group by results.employeeid
Alternatively, if you wanted to view the number of employees answering each question correctly (for example, to understand which questions most employees got wrong), you might use something like:
select results.questionid, count(*)
from results
where results.result = true
group by results.questionid
The above are obviously very basic example queries, and you would likely want to join the Results table to an Employees table and a Questions table to obtain richer information about the results.
Contrast the above with your current database structure -
Per your original question:
The second table has 15 fields - A1 through A15 with the words correct or incorrect in each field. I need the total number of incorrect occurrences for each field, not for the entire table.
Assuming that you want to view the number of incorrect answers by employee, you are forced to use an incredibly messy query such as the following:
select
employeeid,
iif(A1='incorrect',1,0)+
iif(A2='incorrect',1,0)+
iif(A3='incorrect',1,0)+
iif(A4='incorrect',1,0)+
iif(A5='incorrect',1,0)+
iif(A6='incorrect',1,0)+
iif(A7='incorrect',1,0)+
iif(A8='incorrect',1,0)+
iif(A9='incorrect',1,0)+
iif(A10='incorrect',1,0)+
iif(A11='incorrect',1,0)+
iif(A12='incorrect',1,0)+
iif(A13='incorrect',1,0)+
iif(A14='incorrect',1,0)+
iif(A15='incorrect',1,0) as IncorrectAnswers
from
YourTable
Here, notice that the answer numbers are also hard-coded into the query, meaning that if you decide to add a new question or remove an existing question, not only would you need to restructure your entire database, but queries such as the above would also need to be rewritten.

How to make postgres search fast

I have a Postgresql-database with more than 100 billion rows in one table.
The table schema is as follow:
id_1 | integer | | not null |
id_2 | bigint | | not null |
created_at | timestamp without time zone | | not null |
id_3 | bigint | | |
char1 | character varying(20) | | not null |
lang | character(6) | | not null |
gps | point | | |
some_dat | character varying(140)[] | | |
JSON | jsonb | | not null |
I'm trying to search inside the JSON object and sort the data by the JSON object but the problem is that it takes too much time for sorting and returning the data.
Also when sorting the data by created_at for example, it takes also time for the result.
I'm trying to make my application as a real-time as I can.
I have 2 indexing for id_1 and id_2
Also, I tried to use materialized view for each (id) but the problem is updating the materialized view takes much time also.
Any suggestions please?
I'm running PostgreSQL 10.3, on a Linux server with SSD and 128 GB of ram.
Thanks,

If you want to sort a query result with an expression like this:
ORDER BY expr1, expr2, ...
You need the following index to speed up the sorting:
CREATE INDEX ON atable ((expr1), (expr2), ...);
If that does not work because the expressions contain functions that are not IMMUTABLE, you cannot speed up the sort with an index. In that case, consider rewriting your query with IMMUTABLE expressions.

Select * sql query vs Select specific columns sql query [duplicate]

This question already has answers here:
Closed 12 years ago.
The community reviewed whether to reopen this question 2 months ago and left it closed:
Original close reason(s) were not resolved
Possible Duplicate:
Why is SELECT * considered harmful?
Probably a database nOOb question.
Our application has a table like the following
TABLE WF
Field | Type | Null | Key | Default | Extra |
+--------------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| children | text | YES | | NULL | |
| w_id | int(11) | YES | | NULL | |
| f_id | int(11) | YES | | NULL | |
| filterable | tinyint(1) | YES | | 1 | |
| created_at | datetime | YES | | NULL | |
| updated_at | datetime | YES | | NULL | |
| status | smallint(6) | YES | | 1 | |
| visible | tinyint(1) | YES | | 1 | |
| weight | int(11) | YES | | NULL | |
| root | tinyint(1) | YES | | 0 | |
| mfr | tinyint(1) | YES | | 0 | |
+--------------------+-------------+------+-----+---------+----------------+
This table is expected to be upwards of ten million records. The schema is not expected to change much. I need to retrieve the columns f_id, children, status, visible, weight, root, mfr.
Which approach is faster for data retrieval?
1) Select * from WF where w_id = 1 AND status = 1;
I will strip the unnecessary columns in the application layer.
2) Select children,f_id,status,visible,weight,root,mfr from WF where w_id = 1 AND status = 1;
There is no need to strip the unnecessary columns as its pre-selected in the query.
Does any one have a real life benchmark as to which is faster. I know some say Select * is evil, but will MySQL respond faster while trying to get the whole chunk as opposed to retrieving selective columns?
I am using MySQL version: 5.1.37-1ubuntu5 (Ubuntu) and the application is Rails3 app.

As an example of how a select statement that includes a subset of columns can be significantly faster, it can use a covering index on the table that includes just those columns, potentially resulting in much better query performance.

If you return fewer columns there is less data to go across the network and less data for the database to process and it will almost always return faster. Databases also tend to be slower using select * because the database then has to go figure out what the columns are and thus do more work than when you specify. Further select * will often return bad results if the structure changes significantly. It may end up showing the user fields you don;t want them to see or if someone is silly enough to rearrange the columns, then the application may actually appear to show things in the wrong order or if doing an insert from the data, put them in the wrong column. It is almost alawys a poor practice to use selct * in production code.

Optimize GROUP BY after ranged index query

I have a content application that needs to count responses in a time slice, then order them by number of responses. It currently works great with a small data set, but needs to scale to millions rows. My current query won't work.
mysql> describe Responses;
+---------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+---------------------+------+-----+---------+-------+
| site_id | int(10) unsigned | NO | MUL | NULL | |
| content_id | bigint(20) unsigned | NO | PRI | NULL | |
| response_id | bigint(20) unsigned | NO | PRI | NULL | |
| date | int(10) unsigned | NO | | NULL | |
+---------------+---------------------+------+-----+---------+-------+
The table type is InnoDB, the primary key is on (content_id, response_id). There is an additional index on (content_id, date) used to find responses to a piece of content, and another additional index on (site_id, date) used in the query I am have trouble with:
mysql> explain SELECT content_id id, COUNT(response_id) num_responses
FROM Responses
WHERE site_id = 1
AND date > 1234567890
AND date < 1293579867
GROUP BY content_id
ORDER BY num_responses DESC
LIMIT 0, 10;
+----+-------------+-----------+-------+---------------+------+---------+------+------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+------+---------+------+------+-----------------------------------------------------------+
| 1 | SIMPLE | Responses | range | date | date | 8 | NULL | 102 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+-----------+-------+---------------+------+---------+------+------+-----------------------------------------------------------+
That's the best I've been able to come up with, but it will end up being in the 1,000,000's of rows needing to be counted, resulting in 10,000's of rows to sort, to pull in a handful of results.
I can't think of a way to precalculate the count either, as the date range is arbitrary. I have some liberty with changing the primary key: it can be composed of content_id, response_id, and site_id in any order, but cannot contain date.
The application is developed mostly in PHP, so if there is an quicker way to accomplish the same results by splitting the query into subqueries, using temporary tables, or doing things on the application side, I'm open to suggestions.

(Reposted from comments by request)
Set up a table that has three columns: id, date, and num_responses. The column num_responses consists of the number of responses for the given id on the given date. Backfill the table appropriately, and then at around midnight (or later) each night, run a script that adds a new row for the previous day.
Then, to get the rows you want, you can merely query the table mentioned above.

Rather than calculating each time, how about cache the calculated count since the last query, and add the increment of count to update the cache by putting date condition into the WHERE clause?

Have you considered partitioning the table by date? Are there any indices on the table?

What is a structured way to build a MySQL query?

I consider myself fairly competent in understanding and manipulating C-ish languages; it's not a problem for me to come up with an algorithm and implement it in any C-ish language.
I have tremendous difficulty writing SQL (in my specific case, MySQL) queries. For very simple queries, it isn't a problem, but for complex queries, I become frustrated not knowing where to start. Reading the MySQL documentation is difficult, mainly because the syntax description and explanation isn't organized very well.
For example, the SELECT documentation is all over the map: it starts out with what looks like psuedo-BNF, but then (since the text for aggregate descriptions aren't clickable... like select_expr) it quickly devolves into this frustrating exercise of trying to piece the syntax together yourself by having a number of browser windows open.
Enough whining.
I'd like to know how people, step by step, begin constructing a complex MySQL query. Here is a specific example. I have three tables below. I want to SELECT a set of rows with the following characteristics:
From the userInfo and userProgram tables, I want to select the userName, isApproved, and modifiedTimestamp fields and UNION them into one set. From this set I want to ORDER by modifiedTimestamp taking the MAX(modifiedTimestamp) for every user (i.e. there should be only one row with a unique userName and the timestamp associated with that username should be as high as possible).
From the user table, I want to match the firstName and lastName that is associated with the userName so that it looks something like this:
+-----------+----------+----------+-------------------+
| firstName | lastName | userName | modifiedTimestamp |
+-----------+----------+----------+-------------------+
| JJ | Prof | jjprofUs | 1289914725 |
| User | 2 | user2 | 1289914722 |
| User | 1 | user1 | 1289914716 |
| User | 3 | user3 | 1289914713 |
| User | 4 | user4 | 1289914712 |
| User | 5 | user5 | 1289914711 |
+-----------+----------+----------+-------------------+
The closest I've got is a query that looks like this:
(SELECT firstName, lastName, user.userName, modifiedTimestamp
FROM user, userInfo
WHERE user.userName=userInfo.userName)
UNION
(SELECT firstName, lastName, user.userName, modifiedTimestamp
FROM user, userProgram
WHERE user.userName=userProgram.userName)
ORDER BY modifiedTimestamp DESC;
I feel like I'm pretty close but I don't know where to go from here or even if I'm thinking about this in the right way.
> user
+--------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+-------+
| userName | char(8) | NO | PRI | NULL | |
| firstName | varchar(255) | NO | | NULL | |
| lastName | varchar(255) | NO | | NULL | |
| email | varchar(255) | NO | UNI | NULL | |
| avatar | varchar(255) | YES | | '' | |
| password | varchar(255) | NO | | NULL | |
| passwordHint | text | YES | | NULL | |
| access | int(11) | NO | | 1 | |
| lastLoginTimestamp | int(11) | NO | | -1 | |
| isActive | tinyint(4) | NO | | 1 | |
+--------------------+--------------+------+-----+---------+-------+
> userInfo
+-------------------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+------------+------+-----+---------+-------+
| userName | char(8) | NO | MUL | NULL | |
| isApproved | tinyint(4) | NO | | 0 | |
| modifiedTimestamp | int(11) | NO | | NULL | |
| field | char(255) | YES | | NULL | |
| value | text | YES | | NULL | |
+-------------------+------------+------+-----+---------+-------+
> userProgram
+-------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+-------+
| userName | char(8) | NO | PRI | NULL | |
| isApproved | tinyint(4) | NO | PRI | 0 | |
| modifiedTimestamp | int(11) | NO | | NULL | |
| name | varchar(255) | YES | | NULL | |
| address1 | varchar(255) | YES | | NULL | |
| address2 | varchar(255) | YES | | NULL | |
| city | varchar(50) | YES | | NULL | |
| state | char(2) | YES | MUL | NULL | |
| zip | char(10) | YES | | NULL | |
| phone | varchar(25) | YES | | NULL | |
| fax | varchar(25) | YES | | NULL | |
| ehsChildren | int(11) | YES | | NULL | |
| hsChildren | int(11) | YES | | NULL | |
| siteCount | int(11) | YES | | NULL | |
| staffCount | int(11) | YES | | NULL | |
| grantee | varchar(255) | YES | | NULL | |
| programType | varchar(255) | YES | | NULL | |
| additional | text | YES | | NULL | |
+-------------------+--------------+------+-----+---------+-------+

For what I understand from your question, you seem to need a correlated query, which would look like this:
(SELECT firstName, lastName, user.userName, modifiedTimestamp
FROM user, userInfo ui1
WHERE user.userName=userInfo.userName
AND modifiedtimestamp=(select max(modifiedtimestamp) from userInfo ui2 where ui1.userName=ui2.userName))
UNION
(SELECT firstName, lastName, user.userName, modifiedTimestamp
FROM user, userProgram up1
WHERE user.userName=userProgram.userName
AND modifiedtimestamp=(select max(modifiedtimestamp) from userProgram up2 where up1.userName=up2.userName))
ORDER BY modifiedTimestamp DESC;
So, do I proceed to get to this result? Key is: express clearly the information you want to retrieve, without taking mental shortcuts.
Step 1: Choose the fields I need in the different tables of my database. That's what is between SELECT and FROM. Seems obvious, but it becomes less obvious when it comes to aggregation function like sums or counts. In that case, you have to say, for example "I need the count of lines in userInfo for each firstName". See below in GROUP BY.
Step 2: Knowing the field you need, write the joins between the different corresponding tables. That's an easy one...
Step 3: Express your conditions. It can be easy, like if you want data from user for userName="RZEZDFGBH", or more complicated, like in your case: the way to formulate it so you can get the thing done, if you want only the most recent modifiedtimestamp, is "so that the modifiedtimestamp is equal to the most recent modifiedtimestamp" (that's where you can easily take a mental shortcut and miss the point)
Step 4: If you have aggregates, it's time to set the GROUP BY statement. For example, if you count all line in userInfo for each firstName, you would write "GROUP BY firstName":
SELECT firstName,count(*) FROM userInfo GROUP BY firstName
This gives you the number of entries in the table for each different firstName.
Step 5: HAVING conditions. These are conditions on the aggregates. In the previous example, if you wanted only the data for the firstName having more than 5 lines in the table, you could write SELECT firstName,count(*) FROM userInfo GROUP BY firstName HAVING count(*)>5
Step 6: Sort with ORDER BY. Pretty easy...
That's only a short summary. There is much, much more to discover, but it would be too long to write an entire SQL course here... Hope it helps, though!

As f00 says, it's simple(r) if you think of the data in terms of sets.
One of the issues with the question as it stands is that the expected output doesn't match the stated requirements - the description mentions the isApproved column, but this doesn't appear anywhere in either the query or the expected output.
What this illustrates is that the first step in writing a query is to have a clear idea of what you want to achieve. The bigger issue with the question as it stands is that this is not clearly described - instead, it moves from a sample table of expected output (which would be more helpful if we had corresponding samples of expected input data) straight into a description of how you intend to achieve it.
As I understand it, what you want to see is a list of users (by username, with their associated first and last names), together with the last time any associated record was modified on either the userInfo or userProgram tables.
(It isn't clear whether you want to see users who have no associated activity on either of these other tables - your supplied query implies not, otherwise the joins would be outer joins.)
So, you want a list of users (by username, with their associated first and last names):
SELECT firstName, lastName, userName
FROM user
together with a list of times that records were last modified:
SELECT userName, MAX(modifiedTimestamp)
...
on either the userInfo or userProgram tables:
...
FROM
(SELECT userName, modifiedTimestamp FROM userInfo
UNION ALL
SELECT userName, modifiedTimestamp FROM userProgram
) subquery -- <- this is an alias
...
by userName:
...
group by userName
These two sets of data need to be linked by their userName - so the final query becomes:
SELECT user.firstName, user.lastName, user.userName,
MAX(subquery.modifiedTimestamp) last_modifiedTimestamp
FROM user
JOIN
(SELECT userName, modifiedTimestamp FROM userInfo
UNION ALL
SELECT userName, modifiedTimestamp FROM userProgram
) subquery
ON user.userName = subquery.userName
GROUP BY user.userName
In most versions of SQL, this query would return an error as user.firstName and user.lastName are not included in the GROUP BY clause, nor are they summarised.
MySQL allows this syntax - in other SQLs, since those fields are functionally dependant on userName, adding a MAX in front of each field or adding them to the grouping would achieve the same result.
A couple of additional points:
UNION and UNION ALL are not identical - the former removes duplicates while the latter does not; this makes the former more processor-intensive.
Since duplicates will be removed by the grouping, it is better to use UNION ALL.
Many people will write this query as user joined to userInfo UNIONed ALL with user joined to userProgram - this is because many SQL engines can optimise this type of query more effectively.
At this point, this represents premature optimisation.

There's a lot of good stuff here. Thanks to everyone who contributed. This is a quick summary of the things I found helpful as well as some additional thoughts in connecting building functions to building queries. I wish I could give everyone SO merit badges/points but I think that there can only be one (answer) so I'm picking Traroth based upon point total and personal helpfulness.
A function can be understood as three parts: input, process, output. A query can be understood similarly. Most queries look something like this:
SELECT stuff FROM data WHERE data is like something
The SELECT portion is the output. There are some capabilities for formatting the output here (i.e. using AS)
The FROM portion is the input. The input should be seen as a pool of data; you will want to make this as specific as possible, using a variety of joins and subqueries that are appropriate.
The WHERE portion is like the process, but there's a lot of overlap with the FROM portion. Both the FROM and WHERE portions can reduce the pool of data appropriately using a variety of conditions to filter out unwanted data (or to only included desired data). The WHERE portion can also help format the output.
Here's how I broke down the steps:
Start with thinking about what your output looks like. This stuff goes into the SELECT portion.
Next, you want to define the set of data that you wish to work on. Traroth notes: "Knowing the field you need, write the joins between the different corresponding tables. That's an easy one..." It depends on what you mean by 'easy'. If you are new to writing queries, you will probably just default to writing inner joins (like I did). This is not always the best way to go. http://en.wikipedia.org/wiki/Join_(SQL) is a great resource to understanding the different kinds of joins possible.
As a part of the previous step think about smaller parts of that data set and build up to the complete data set you are interested in. In writing a function, you can write subfunctions to help express your process in a clearer manner. Similar to that, you can write subqueries. A huge tip from Mark Bannister in creating a subquery AND USING AN ALIAS. You will have to reconfigure your output to use this alias, but this is pretty key.
Last, you can use various methods to pare down your data set, removing data you're not interested in
One way to think about the data you are operating on is a giant 2-D matrix: JOINs make larger the horizontal aspect, UNIONs make larger the vertical aspect. All the other filters are designed to make this matrix smaller to be appropriate for your output. I don't know if there is a "functional" analogy to JOIN, but UNION is just adding the output of two functions together.
I realize, though, there are lots of ways that building query IS NOT like writing a function. For example, you can build and pare down your data set in both the FROM and WHERE areas. What was key for me was understanding joins and finding out how to create subqueries using aliases.

just learn to think in terms of sets - then it's simple :P
http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html

You can't construct sql without understanding the data in the tables and the logical result required. There's no background given for what data the tables might look like and mean and the description of the results you're trying to gather doesn't make sense to me so I'm not going to venture a guess.
On the latter point... it's rare that you'd want a union of timestamp values multiple sources. Generally speaking when results like that are gathered it's generally for some sort of auditing/tracing. However, when you're discarding all information about the source of the timestamp and just computing a maximum you have... well what exactly?
Anyways, one or more examples of data and desired output and maybe something about the application and the whys is a must to make yourself clear.
To the extent I'll make any prediction about the shape of your eventual statement, (assuming your task will still be to get a single maximum timestamp per user) it's that it will look something like this:
select u.firstname, u.lastname, user_max_time.userName, user_max_time.max_time
from users u,
( select (sometable).userName, max((sometable).(timestamp column))
from (data of interest)
group by (sometable).userName) user_max_time
where u.userName = user_max_time.userName
order by max_time desc;
Your task here would then be to replace the ()s inside the the user_max_time subselect with something that makes sense and maps to your requirements. In terms of a general approach to complex sql, the major suggestion is to build the query from the innermost subselects back out (testing along the way to make sure performance is ok and you don't need intermediate tables).
Anyways, if you're having trouble, and can come back with examples, would be happy to help.
Cheers,
Ben

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas