SQL Inner Join based on MAX of timestamp - sql

Amended Once
Amended Twice: The headers of the remaining 9 tables except for reports are always called "what".
I have about 10 tables with the following structure:
reports (165k rows)
+-----------+-----------+
| identifier| category |
+-----------+-----------+
| 1 | fixed |
| 2 | wontfix |
| 3 | fixed |
| 4 | invalid |
| 5 | later |
| 6 | wontfix |
| 7 | duplicate |
| 8 | later |
| 9 | wontfix |
+-----------+-----------+
status (300k rows, all identifiers from reports come up at least once)
+-----------+-----------+----------+
| identifier| time | what |
+-----------+-----------+----------+
| 1 | 12 | RESOLVED |
| 1 | 9 | NEW |
| 2 | 7 | ASSIGNED |
| 3 | 10 | RESOLVED |
| 5 | 4 | REOPEN |
| 7 | 9 | ASSIGNED |
| 4 | 9 | ASSIGNED |
| 7 | 11 | RESOLVED |
| 8 | 3 | NEW |
| 4 | 3 | NEW |
| 7 | 6 | NEW |
+-----------+-----------+----------+
priority (300k rows, all identifiers from reports come up at least once)
+-----------+-----------+----------+
| identifier| time | what |
+-----------+-----------+----------+
| 3 | 12 | LOW |
| 1 | 9 | LOW |
| 9 | 2 | HIGH |
| 8 | 7 | HIGH |
| 3 | 10 | HIGH |
| 5 | 4 | MEDIUM |
| 4 | 9 | MEDIUM |
| 4 | 3 | LOW |
| 7 | 9 | LOW |
| 7 | 11 | HIGH |
| 8 | 3 | LOW |
| 6 | 12 | MEDIUM |
| 7 | 6 | LOW |
| 6 | 9 | HIGH |
| 2 | 6 | HIGH |
| 2 | 1 | LOW |
+-----------+-----------+----------+
What I need is:
reportsfinal (165k rows)
+-----------+-----------+--------------+------------+
| identifier| category | what11 | what22 |
+-----------+-----------+--------------+------------+
| 1 | fixed | RESOLVED | LOW |
| 2 | wontfix | ASSIGNED | HIGH |
| 3 | fixed | RESOLVED | LOW |
| 4 | invalid | ASSIGNED | MEDIUM |
| 5 | later | REOPEN | MEDIUM |
| 6 | wontfix | | MEDIUM |
| 7 | duplicate | RESOLVED | HIGH |
| 8 | later | NEW | HIGH |
| 9 | wontifx | | HIGH |
+-----------+-----------+--------------+------------+
That is, reports (after query = reportsfinal) serves as the basis table and I have to add one or two columns from 9 other tables. The identifier is the key, but in some tables, the identifier comes up multiple times. In these cases I want to use the entry with the highest time only.
I tried several queries, but none of them worked. If possible, I want to run one query to get different columns from the 9 other tables with this approach.
What I tried based on the answer below:
select T.identifier,
T.category,
t.what AS what11,
t.what AS what22 from (
select R.identifier,
R.category,
COALESCE(S.what,'NA')what,
COALESCE(P.what,'NA')what,
ROW_NUMBER()OVER(partition by R.identifier,R.category ORDER by (select null))RN
from reports R
LEFT JOIN bugstatus S
ON S.identifier = R.identifier
LEFT JOIN priority P
ON P.identifier = s.identifier
GROUP BY R.identifier,R.category,S.what,P.what)T
Where T.RN = 1
ORDER BY T.identifier;
This gives the error:
Error: near "(": syntax error.

Basically you need a correlated subqueries in the select list.
From the hip, something like:
Select a.Identifier
,a.Category
,(select process
from status where status.identifier = a.Identifer order by time desc limit 1) Process
,(select prio
from priority where priorty.identifier = a.Identifer order by time desc limit 1) prio
From Reports a

For each associated table just use a predicate based on a subquery to identify the specific timestamp...
Single letter tokens r, s, and p are defined aliases for tables reports, status and priority respectively
Select r.Identifier, r.category,
coalesce(s.what, 'NA') status,
coalesce(p.what, 'NA') priority
From reports r
left join status s
on s.identifier = r.identifier
and s.time =
(Select max(time) from status
where identifier = r.identifier)
left join priority p
on p.identifier = r.identifier
and p.time =
(Select max(time) from priority
where identifier = r.identifier);
QUESTION: Why did you rename the columns from Status, and priority to What?? You might as well name then something or data, or information. At least the original names (status and prio) communicated something.. The word What is meaningless.
NOTE. I reversed (undid) the edit for the aliases of what11 and what12, as these names are e meaningless.

using Row_number works based on your assumed data
select T.identifier,
T.category,
what AS what11,
what AS what22 from (
select R.identifier,
r.category,
COALESCE(S.what,'NA')what,
COALESCE(P.what,'NA')what,
ROW_NUMBER()OVER(partition by R.identifier,r.category ORDER by (select null))RN
from reports R left join status S
ON S.identifier = R.identifier
LEFT JOIN Priority P
ON P.identifier = s.identifier
GROUP BY R.identifier,r.category,S.what,P.what)T
Where T.RN = 1
ORDER BY T.identifier

Related

How to add conditional count based on mutiple columns

I'm trying to summarise a T-SQL output that looks a little like this:
+---------+---------+-----+-------+
| perf_no | section | row | seat |
+---------+---------+-----+-------+
| 7128 | 6 | A | 4 |
| 7128 | 6 | A | 5 |
| 7128 | 6 | A | 7 |
| 7128 | 6 | A | 9 |
| 7128 | 6 | A | 28 |
| 7129 | 6 | A | 29 |
| 7129 | 6 | A | 8 |
| 7129 | 6 | A | 9 |
| 7129 | 8 | A | 6 |
| 7129 | 8 | B | 3 |
| 7129 | 8 | B | 4 |
+---------+---------+-----+-------+
Comparing one row to the row(s) below, if the perf_no, section, and row values are the same, and the difference between the seat values is 1, then I want to consider them a group, and count the number of rows in that group.
To give you a real world example, these are seats in a theatre! I'm trying to summarise what seats are available.
Using the table above to illustrate:
rows 1 & 2 show that seats 4 & 5 in section 6, row 8 for performance 7128 are available. So that's 2 seats together
row 3 shows that 7 in sectino 6, row 8 for performance 7128 is available on its own. So that's a single seat (1)
rows 5 & 6 have the same section and row, and the seats are consecutive, but you can see the performance is different. So that's a single seat too.
So the output for the table above would look a little like...
(I've left in the spaces just so visually you can see the groupings more easily - obviously the final version will have none)
+---------+---------+----------+-------+
| perf_no | section | seat_row | total |
+---------+---------+----------+-------+
| 7128 | 6 | A | 2 |
| | | | |
| 7128 | 6 | A | 1 |
| 7128 | 6 | A | 1 |
| 7128 | 6 | A | 1 |
| 7129 | 6 | A | 1 |
| 7129 | 6 | A | 2 |
| | | | |
| 7129 | 6 | A | 1 |
| 7129 | 8 | B | 2 |
+---------+---------+----------+-------+
I've been trying to use some conditional case statements to not much avail. Any assistance very gratefully received!
This is a type of gaps-and-islands problem. You can generate a grouping by subtracting a sequence (generated by row_number()) from the seat:
select perf_no, section, row, count(*) as num_seats,
min(seat) as first_seat, max(seat) as last_seat
from (select t.*,
row_number() over (partition by perf_no, section, row order by seat) as seqnum
from t
) t
group by perf_no, section, row, (seat - seqnum);

Update where value pair matches in SQL

I need to update this table:
Centers:
+-----+------------+---------+--------+
| id | country | process | center |
+-----+------------+---------+--------+
| 1 | 1 | 1 | 1 |
| 2 | 1 | 2 | 1 |
| 3 | 1 | 3 | 1 |
| 4 | 2 | 1 | 1 |
| 5 | 2 | 2 | 1 |
| 6 | 2 | 3 | 1 |
| 7 | 3 | 1 | 1 |
| 8 | 3 | 2 | 1 |
| 9 | 3 | 3 | 1 |
+-----+------------+---------+--------+
During a selection process I retrieve two tempTables:
TempCountries:
+-----+------------+
| id | country |
+-----+------------+
| 1 | 1 |
| 2 | 3 |
+-----+------------+
And TempProcesses:
+-----+------------+
| id | process |
+-----+------------+
| 1 | 2 |
| 2 | 3 |
+-----+------------+
In a subquery I get all possible combinations of the values:
SELECT TempCountries.countryId, TempProcesses.processesId FROM TempCenterCountries,TempCenterProcesses
This returns:
+-----+------------+---------+
| id | country | process |
+-----+------------+---------+
| 1 | 1 | 2 |
| 2 | 1 | 3 |
| 3 | 3 | 2 |
| 4 | 3 | 3 |
+-----+------------+---------+
During the selection process the user chooses a center for these combinations. Let’s say center = 7.
Now I need to update the center value in the Centers table where the combinations of the subquery are present.
So,
UPDATE Centers SET center = 7 WHERE ?
So I get:
+-----+------------+---------+--------+
| id | country | process | center |
+-----+------------+---------+--------+
| 1 | 1 | 1 | 1 |
| 2 | 1 | 2 | 7 |
| 3 | 1 | 3 | 7 |
| 4 | 2 | 1 | 1 |
| 5 | 2 | 2 | 1 |
| 6 | 2 | 3 | 1 |
| 7 | 3 | 1 | 1 |
| 8 | 3 | 2 | 7 |
| 9 | 3 | 3 | 7 |
+-----+------------+---------+--------+
Not all sql implementations let you have a from clause when using update. Fortunately in your case since you're doing a Cartesian product to get all the combinations it implies that you don't have any constraints between the two values.
UPDATE Centers
SET center = 7
WHERE country IN (SELECT countryId FROM TempCountries)
AND process IN (SELECT processId FROM TempCenterProcesses)
Try if this standard sql,
Update Centers
set center = 7
where country in (select country from TempCenterCountries)
and process in (select process from TempCenterProcesses)
You need to have exact match of country as well as process before you run the update query. So, something like below query would help you achieve that. Basically update the column if there exists a record
WITH (SELECT TempCountries.countryId, TempProcesses.processesId
FROM TempCenterCountries,
TempCenterProcesses) AS TempTables,
UPDATE Centers
SET center = 7
WHERE EXISTS (SELECT 1
FROM TempTables tmp
WHERE country = tmp.countryId and process = tmp.processesId
);
The idea is to update the record if both country and process matches with the one you have already fetched in temporary table.
Use update join -
For Sql Server
update c set SET center = 7 from Centers c
join
(SELECT TempCountries.countryId, TempProcesses.processesId FROM TempCenterCountries join TempCenterProcesses
)A on c.countryid=A.countryid and c.processesId=A.processId
For Mysql -
update Centers c
join
(SELECT TempCountries.countryId, TempProcesses.processesId FROM TempCenterCountries join TempCenterProcesses
)A on c.countryid=A.countryid and c.processesId=A.processId
set SET center = 7

Is there an easier way to find the row with a max value?

I have a schema where these two tables exist (among others)
participation
+------+--------+------------------+
| movie| person | role |
+------+--------+------------------+
| 1 | 1 | "Regisseur" |
| 1 | 1 | "Schauspieler" |
| 1 | 2 | "Schauspielerin" |
| 2 | 3 | "Regisseur" |
| 3 | 4 | "Regisseur" |
| 3 | 5 | "Schauspieler" |
| 3 | 6 | "Schauspieler" |
| 4 | 7 | "Schauspielerin" |
| 4 | 8 | "Schauspieler" |
| 5 | 1 | "Schauspieler" |
| 5 | 8 | "Schauspieler" |
| 5 | 14 | "Schauspieler" |
+------+--------+------------------+
movie
+----+------------------------------+------+-----+
| id | title | year | fsk |
+----+------------------------------+------+-----+
| 1 | "Die Bruecke am Fluss" | 1995 | 12 |
| 2 | "101 Dalmatiner" | 1961 | 0 |
| 3 | "Vernetzt - Johnny Mnemonic" | 1995 | 16 |
| 4 | "Waehrend Du schliefst..." | 1995 | 6 |
| 5 | "Casper" | 1995 | 6 |
| 6 | "French Kiss" | 1995 | 6 |
| 7 | "Stadtgespraech" | 1995 | 12 |
| 8 | "Apollo 13" | 1995 | 6 |
| 9 | "Schlafes Bruder" | 1995 | 12 |
| 10 | "Assassins - Die Killer" | 1995 | 16 |
| 11 | "Braveheart" | 1995 | 16 |
| 12 | "Das Netz" | 1995 | 12 |
| 13 | "Free Willy 2" | 1995 | 6 |
+----+------------------------------+------+-----+
I want to get the movie with the highest number of people that participated. I figured out an SQL statement that actually does this, but looks super complicated. It looks like this:
SELECT titel
FROM movie.movie
JOIN (SELECT *
FROM (SELECT Max(count_person) AS max_count_person
FROM (SELECT movie,
Count(person) AS count_person
FROM movie.participation
GROUP BY movie) AS countPersons) AS
maxCountPersons
JOIN (SELECT movie,
Count(person) AS count_person
FROM movie.participation
GROUP BY movie) AS countPersons
ON maxCountPersons.max_count_person =
countPersons.count_person)
AS maxPersonsmovie
ON maxPersonsmovie.movie = movie.id
The main problem is, that I can't find an easier way to select the row with the highest value. If I simply could make a selection on the inner table and pick the row with the highest value on count_person without losing the information about the movie itself, this would look so much simpler. Is there a way to simplify this, or is this really the easiest way to do this?
Here is a way without subqueries:
SELECT m.title
FROM movie.movie m JOIN
movie.participation p
ON m.id = p.movie
GROUP BY m.title
ORDER BY COUNT(*) DESC
FETCH FIRST 1 ROW ONLY;
You can use LIMIT 1 instead of FETCH, if you prefer.
Note: In the event of ties, this only returns one value. That seems consistent with your question.
You can use rank window function to do this.
SELECT title
FROM (SELECT m.title,rank() over(order by count(p.person) desc) as rnk
FROM movie.movie m
LEFT JOIN movie.participation p ON m.id=p.movie
GROUP BY m.title
) t
WHERE rnk=1
SELECT title
FROM movie.movie
WHERE id = (SELECT movie
FROM movie.participation
GROUP BY movie
ORDER BY count(*) DESC
LIMIT 1);

SQL Exclude Records with Leveling

For example, I have a table like this:
+---------+-------+----------+
| sort_id | level | security |
+---------+-------+----------+
| 1 | 1 | A |
| 2 | 2 | A |
| 3 | 3 | U |
| 4 | 4 | A |
| 5 | 5 | A |
| 6 | 3 | A |
| 7 | 4 | U |
| 8 | 5 | A |
| 9 | 6 | A |
| 10 | 7 | A |
| 11 | 3 | A |
| 12 | 3 | A |
+---------+-------+----------+
Security column is A for Authorized and U for Unauthorized. I need to exclude those records under the Unauthorized records based on their level.
For a better picture of the SQL records, it looks like this:
Those pointed with arrow are the Unauthorized records and we should exclude those under it.
So the SQL result should be the following table:
+---------+-------+----------+
| sort_id | level | security |
+---------+-------+----------+
| 1 | 1 | A |
| 2 | 2 | A |
| 3 | 3 | U |
| 6 | 3 | A |
| 7 | 4 | U |
| 11 | 3 | A |
| 12 | 3 | A |
+---------+-------+----------+
How can we produce it using a simple Select statement? Thanks in advanced! Just comment if something is unclear.
If I understand "under the unauthorized records" as meaning a sequence of records with increasing id`s following the unauthorized records (based on the id), then here is an approach:
select sort_id, level, security
from (select t.*, min(case when authorized = 'U' then id end) over (partition by grp) as minuid
from (select t.*,
(row_number() over (order by id) - level) as grp
from table t
) t
) t
where id > minuid;

SQL SELECT rows with MAX value on a column and returns all columns

Ok so I have this table :
+----+--------------+------------------+----------+
| id | business_key | other columns... | creation |
+----+--------------+------------------+----------+
| 1 | 1 | ... | 01/01/14 |
| 2 | 1 | ... | 12/02/14 |
| 3 | 1 | ... | 13/03/14 | <--
| 4 | 2 | ... | 01/01/14 |
| 5 | 2 | ... | 12/02/14 | <--
| 6 | 8 | ... | 01/01/14 | <--
| 7 | 10 | ... | 01/01/14 |
| 8 | 10 | ... | 12/02/14 |
| 9 | 10 | ... | 13/03/14 |
| 10 | 10 | ... | 13/03/14 | <--
+----+--------------+------------------+----------+
For each business key, I want to return the most recent row and for that I have the "creation" column (see the arrows above). The simple answer would be :
SELECT business_key, MAX(creation) FROM mytable GROUP BY business_key;
The thing is, I need to return ALL the columns. Then I learned the existence of the greatest-n-per-group tag on StackOverflow and I found this topic : SQL Select only rows with Max Value on a Column. The best answer is great and provides this request :
SELECT mt1.*
FROM mytable mt1
LEFT OUTER JOIN mytable mt2
ON (mt1.business_key = mt2.business_key AND mt1.creation < mt2.creation)
WHERE mt2.business_key IS NULL;
Sadly it doesn't work because my situation is a little trickier : if you look at the line 9 and 10 of my table, you will see that they have the same business key and the same creation date. While this should be avoided in my application, I still have to handle it if it happens.
With the last request above, this is what I will get :
+----+--------------+------------------+----------+
| id | business_key | other columns... | creation |
+----+--------------+------------------+----------+
| 3 | 1 | ... | 13/03/14 |
| 5 | 2 | ... | 12/02/14 |
| 6 | 8 | ... | 01/01/14 |
| 9 | 10 | ... | 13/03/14 | <--
| 10 | 10 | ... | 13/03/14 | <--
+----+--------------+------------------+----------+
While I wanted this :
+----+--------------+------------------+----------+
| id | business_key | other columns... | creation |
+----+--------------+------------------+----------+
| 3 | 1 | ... | 13/03/14 |
| 5 | 2 | ... | 12/02/14 |
| 6 | 8 | ... | 01/01/14 |
| 10 | 10 | ... | 13/03/14 | <--
+----+--------------+------------------+----------+
I know it's a poor choice to want a MAX() on a technical column like "id", but right now it's the only way for me to prevent duplicates when the business key AND the creation date are the same. The problem is, I have no idea how to do it. Any idea ? Keep in mind it must return all the columns (and we have a lot of columns so a SELECT * will be necessary).
Thanks a lot.
The first thought is that your id seems to increment along with the date, so just use that:
SELECT mt1.*
FROM mytable mt1 LEFT OUTER JOIN
mytable mt2
ON mt1.business_key = mt2.business_key AND mt2.id > mt1.id
WHERE mt2.business_key IS NULL;
You can still do the same idea with two columns:
SELECT mt1.*
FROM mytable mt1 LEFT OUTER JOIN
mytable mt2
ON mt1.business_key = mt2.business_key AND
(mt2.creation > mt1.creation OR
mt2.creation = mt1.creation AND
mt2.id > mt1.id
)
WHERE mt2.business_key IS NULL;