How can I select all entries with the highest version? - sql

I have a table called documents that has the fields id, title, version and content.
Now I want to get all ids (or rows) for every title with the highest version.
Suppose I have the following data:
+----+-------+---------+---------+
| id | title | version | content |
+----+-------+---------+---------+
| 2 | foo | 1 | abc1 |
| 3 | foo | 2 | abc2 |
| 4 | foo | 3 | abc3 |
| 5 | bar | 1 | abcd1 |
| 6 | bar | 2 | abcd2 |
| 7 | baz | 1 | abcde1 |
+----+-------+---------+---------+
I want to receive either the ids 4,6,7 or the whole rows for these entries.
Performance is not an issue as there will be only a few hundred entries.

To retrieve the entire rows, you need to GROUP BY version with a MAX() aggregate, and join that as a subquery against the whole table to pull in the remaining columns. The JOIN condition will need to be against the combination of title, version since together they uniquely identify a title's greatest version.
The following should work independently of the RDBMS you are using:
SELECT
documents.id,
documents.title,
documents.version,
documents.content
FROM
documents
JOIN (
/* subquery pulls greatest version per title */
SELECT
title,
MAX(version) AS version
FROM documents
GROUP BY title
/* title, version pair is joined back against the main table */
) maxversions ON (documents.title = maxversions.title AND documents.version = maxversions.version)

This is a simple group by:
select max(version)
from documents
group by title
You can join back to documents, if you like, to get the full document information.

How about:
SELECT * FROM dataz
WHERE version IN (
SELECT TOP 1 version
FROM dataz
ORDER BY version DESC
)

This will get the IDs you want:
Select max(id) as id from documents
group by title
having count(1) = max(version)
order by id

Related

SQL Join to the latest record in MS ACCESS

I want to join tables in MS Access in such a way that it fetches only the latest record from one of the tables. I've looked at the other solutions available on the site, but discovered that they only work for other versions of SQL. Here is a simplified version of my data:
PatientInfo Table:
+-----+------+
| ID | Name |
+-----+------+
| 1 | John |
| 2 | Tom |
| 3 | Anna |
+-----+------+
Appointments Table
+----+-----------+
| ID | Date |
+----+-----------+
| 1 | 5/5/2001 |
| 1 | 10/5/2012 |
| 1 | 4/20/2018 |
| 2 | 4/5/1999 |
| 2 | 8/8/2010 |
| 2 | 4/9/1982 |
| 3 | 7/3/1997 |
| 3 | 6/4/2015 |
| 3 | 3/4/2017 |
+----+-----------+
And here is a simplified version of the results that I need after the join:
+----+------+------------+
| ID | Name | Date |
+----+------+------------+
| 1 | John | 4/20/2018 |
| 2 | Tom | 8/8/2010 |
| 3 | Anna | 3/4/2017 |
+----+------+------------+
Thanks in advance for reading and for your help.
You can use aggregation and JOIN:
select pi.id, pi.name, max(a.date)
from appointments as a inner join
patientinfo as pi
on a.id = pi.id
group by pi.id, pi.name;
something like this:
select P.ID, P.name, max(A.Date) as Dt
from PatientInfo P inner join Appointments A
on P.ID=A.ID
group by P.ID, P.name
Both Bing and Gordon's answers work if your summary table only needs one field (the Max(Date)) but gets more tricky if you also want to report other fields from the joined table, since you would need to include them either as an aggregated field or group by them as well.
Eg if you want your summary to also include the assessment they were given at their last appointment, GROUP BY is not the way to go.
A more versatile structure may be something like
SELECT Patient.ID, Patient.Name, Appointment.Date, Appointment.Assessment
FROM Patient INNER JOIN Appointment ON Patient.ID=Appointment.ID
WHERE Appointment.Date = (SELECT Max(Appointment.Date) FROM Appointment WHERE Appointment.ID = Patient.ID)
;
As an aside, you may want to think whether you should use a field named 'ID' to refer to the ID of another table (in this case, the Apppintment.ID field refers to the Patient.ID). You may make your db more readable if you leave the 'ID' field as an identifier specific to that table and refer to that field in other tables as OtherTableID or similar, ie PatientID in this case. Or go all the way and include the name of the actual table in its own ID field.
Edited after comment:
Not quite sure why it would crash. I just ran an equivalent query on 2 tables I have which are about 10,000 records each and it was pretty instanteneous. Are your ID fields (i) unique numbers and (ii) indexed?
Another structure which should do the same thing (adapted for your field names and assuming that there is an ID field in Appointments which is unique) would be something like:
SELECT PatientInfo.UID, PatientInfo.Name, Appointments.StartDateTime, Appointments.Assessment
FROM PatientInfo INNER JOIN Appointments ON PatientInfo_UID = Appointments.PatientFID
WHERE Appointments.ID = (SELECT TOP 1 ID FROM Appointments WHERE Appointments.PatientFID = PatientInfo_UID ORDER BY StartDateTime DESC)
;
But that is starting to look a bit contrived. On my data they both produce the same result (as they should!) and are both almost instantaneous.
Always difficult to troubleshoot Access when it crashes - I guess you see no error codes or similar? Is this against a native .accdb database or another server?

How select data from two column in sql?

I have a table in postgresql as follow:
id | name | parent_id |
1 | morteza | null |
2 | ali | null |
3 | morteza2 | 1 |
4 | morteza3 | 1 |
My unique data are records with id=1,2, and record id=1 modified twice. now I want to select data with last modified. Query result for above data is as follow:
id | name |
1 | morteza3 |
2 | ali |
What's the suitable query?
If I am following correctly, you can use distinct on and coalesce():
select distinct on (coalesce(parent_id, id)) coalesce(parent_id, id) as new_id, name
from mytable
order by coalesce(parent_id, id), id desc
Demo on DB Fiddle:
new_id | name
-----: | :-------
1 | morteza3
2 | ali
From your description it would seem that the latest version of each row has parent_id IS NULL. (And obsoleted row versions have parent_id IS NOT NULL.)
The query is simple then:
SELECT id, name
FROM tbl
WHERE parent_id IS NULL;
db<>fiddle here
If you have many updates (hence, many obsoleted row versions), a partial index will help performance a lot:
CREATE INDEX ON tbl(id) WHERE parent_id IS NULL;
The actual index column is mostly irrelevant (unless there are additional requirements). The WHERE clause is the point here, to exclude the many obsoleted rows from the index. See:
Postgres partial index on IS NULL not working
Slow PostgreSQL query in production - help me understand this explain analyze output

SQL / Oracle to Tableau - How to combine to sort based on two fields?

I have tables below as follows:
tbl_tasks
+---------+-------------+
| Task_ID | Assigned_ID |
+---------+-------------+
| 1 | 8 |
| 2 | 12 |
| 3 | 31 |
+---------+-------------+
tbl_resources
+---------+-----------+
| Task_ID | Source_ID |
+---------+-----------+
| 1 | 4 |
| 1 | 10 |
| 2 | 42 |
| 4 | 8 |
+---------+-----------+
A task is assigned to at least one person (denoted by the "assigned_ID") and then any number of people can be assigned as a source (denoted by "source_ID"). The ID numbers are all linked to names in another table. Though the ID numbers are named differently, they all return to the same table.
Would there be any way for me to combine the two tables based on ID such that I could search based on someone's ID number? For example- if I decide to search on or do a WHERE User_ID = 8, in order to see what Tasks that 8 is involved in, I would get back Task 1 and Task 4.
Right now, by joining all the tables together, I can easily filter on "Assigned" but not "Source" due to all the multiple entries in the table.
Use union all:
select distinct task_id
from ((select task_id, assigned_id as id
from tbl_tasks
) union all
(select task_id, source_id
from tbl_resources
)
) ti
where id = ?;
Note that this uses select distinct in case someone is assigned to the same task in both tables. If not, remove the distinct.

Flattening edit diffs onto a master record — can I do this more simply or efficiently?

I'm working on a system that needs to track user edits over time. Due to various constraints, we need to keep the original records pristine and merge the edits down into a single row representing the current state.
I'm essentially aiming for the result of "replaying" the edits, so that edited columns show their most recent edited value and unedited columns show the original.
By simplified example, given a table of original records:
books:
book_id | title | color | year
-------------------------------
1 | First | blue | null
2 | Second | green | null
3 | Third | red | 1992
And a table of edits that have been made to the records where all unchanged values are null:
edits:
edit_id | book_id | title | color | year
----------------------------------------
101 | 1 | Uno | null | 2003
102 | 1 | Ett | teal | null
103 | 2 | null | null | 1999
I'm producing output like:
book_id | title | color | year
-------------------------------
1 | Ett | teal | 2003
2 | Second | green | 1999
3 | Third | red | 1992
My current implementation works as expected (on PostgreSQL 9.6), but I have the sneaking feeling that I may be missing a simpler or more efficient way to go about it:
SELECT
books.id,
COALESCE(
(
array_agg(edits.title ORDER BY edits.id DESC)
FILTER (WHERE edits.title IS NOT NULL)
)[1],
books.title
) as title
-- [... repeat for other fields ...]
FROM books
LEFT JOIN edits
ON books.id = edits.book_id
GROUP BY books.id;
Any thoughts?
If you create an aggregate that returns the last non-null value, you could do it like this:
select b.book_id,
coalesce(last(e.title order by e.edit_id), b.title) as title,
coalesce(last(e.color order by e.edit_id), b.color) as color,
coalesce(last(e.year order by e.edit_id), b.year) as year
from books b
left join edits e on b.book_id = e.book_id
group by b.book_id
order by b.book_id;
See the Postgres Wiki for an implementation of the last() (and first()) function.
This might be faster as it does not have to keep all values in memory just to pick the last one. It only keeps one value in memory during aggregation.

MySQL Advanced SELECT help

Alright well I recently got into normalizing my database for this little side project that I have been creating for a while now, but I've just hit a brick wall. I'll try to give an understandable example of what I have and what I need to accomplish ― and hopefully it won't be too painful. OK.
I have 3 tables the first one we will call Shows, structured something like this:
+----+--------------------------+
| id | title |
+----+--------------------------+
| 1 | Example #1 |
| 2 | Example #2 |
| 3 | Example #3 |
+----+--------------------------+
Plain and simple.
My next table is called Categories, and lookes like this:
+----+--------------------------+
| id | category |
+----+--------------------------+
| 1 | Comedy |
| 2 | Drama |
| 3 | Action |
+----+--------------------------+
And a final table called Show_categories:
+---------+---------+
| show_id | cat_id |
+---------+---------+
| 1 | 1 |
| 1 | 3 |
| 2 | 2 |
| 2 | 3 |
| 3 | 1 |
| 3 | 2 |
+---------+---------+
As you may have noticed the problem is the in my database a single show can have multiple categories. Everything is structured fine, except for the fact that I can't find a why to search for show with multiple categories.
If I were to search for action and comedy type shows I would be given Example #1, but it is not possible (at least with my queries), because the cat_id's inside the Show_categories are in different rows.
Example of a working single category search (Selecting all comedy shows):
SELECT s.id,s.title
FROM Shows s JOIN Show_categories sc ON sc.anid=s.id
WHERE sc.cat_id=1 GROUP BY s.id
And a query that is impossible (because cat_id can't equal 2 different things):
SELECT s.id,s.title
FROM Shows s JOIN Show_categories sc ON sc.anid=s.id
WHERE sc.cat_id=1 AND sc.cat_id=2 GROUP BY s.id
So to sum things up what I am asking is how do I handle a query where I am looking for a show based on multiple matching categories.
Use:
SELECT s.id,
s.title
FROM SHOWS s
JOIN SHOW_CATEGORIES sc ON sc.anid = s.id
WHERE sc.cat_id IN (1, 2)
GROUP BY s.id, s.title
HAVING COUNT(DISTINCT sc.cat_id) = 2
The COUNT(DISTINCT sc.cat_id) comparison needs to equal the number of cat_id values listed in the IN clause. But if both the SHOW_CATEGORIES show_id and cat_id columns are either the primary key, or there's a unique constraint on both columns -- then you can use COUNT(sc.cat_id).
You need an OR statement.
SELECT s.id,s.title
FROM Shows s JOIN Show_categories sc ON sc.anid=s.id
WHERE sc.cat_id=1 OR sc.cat_id=2 GROUP BY s.id
That is, you want all shows with either catid 1 OR catid 2. So this query will return 1, 2 and 3.