SQL query filtering

SQL query filtering - sql

Using SQL Server 2005, I have a table where certain events are being logged, and I need to create a query that returns only very specific results. There's an example below:
Log:
Log_ID | FB_ID | Date | Log_Name | Log_Type
7 | 4 | 2007/11/8 | Nina | Critical
6 | 4 | 2007/11/6 | John | Critical
5 | 4 | 2007/11/6 | Mike | Critical
4 | 4 | 2007/11/6 | Mike | Critical
3 | 3 | 2007/11/3 | Ben | Critical
2 | 3 | 2007/11/1 | Ben | Critical
The query should do the following: return ONLY one row per each FB_ID, but this needs to be the one where Log_Name has changed for the first time, or if the name never changes, then the first dated row.
In layman's terms I need this to browse through a DB to check for each instance where the responsibility of a case (FB_ID) has been moved to another person, and in case it never has, then just get the original logger's name.
In the example above, I should get rows (Log_ID) 2 and 6.
Is this even possible? Right now there's a discussion going on whether the DB was just made the wrong way. :)
I imagine I need to somehow be able to store the first resulting Log_Name into a variable and then compare it with an IF condition etc. I have no idea how to do such a thing with SQL though.
Edit: Updated the date. And to clarify on this, the correct result would look like this:
Log_ID | FB_ID | Date | Log_Name | Log_Type
6 | 4 | 2007/11/6 | John | Critical
2 | 3 | 2007/11/1 | Ben | Critical
It's not the first date per FB_ID I'm after, but the row where the Log_Name is changed from the original.
Originally FB_ID 4 belongs to Mike, but the query should return the row where it moves on to John. However, it should NOT return the row where it moves further on to Nina, because the first responsibility change already happened when John got it.
In the case of Ben with FB_ID 3, the logger is never changed, so the first row for Ben should be returned.

I guess that there is a better and more performant way, but this one seems to work:
SELECT *
FROM log
WHERE log_id IN
( SELECT MIN(log_id)
FROM log
WHERE
( SELECT COUNT(DISTINCT log_name)
FROM log log2
WHERE log2.fb_id = log.fb_id ) = 1
OR log.log_name <> ( SELECT log_name
FROM log log_3
WHERE log_3.log_id =
( SELECT MIN(log_id)
FROM log log4
WHERE log4.fb_id = log.fb_id ) )
GROUP BY fb_id )

This will efficiently use an index on (fb_id, cdate, id):
SELECT lo4.*
FROM
(
SELECT CASE WHEN ln.log_id IS NULL THEN lo2.log_id ELSE ln.log_id END AS log_id,
ROW_NUMBER() OVER (PARTITION BY lo2.fb_id ORDER BY lo2.cdate) AS rn
FROM (
SELECT
lo.*,
(
SELECT TOP 1 log_id
FROM t_log li
WHERE li.fb_id = lo.fb_id
AND li.cdate >= lo.cdate
AND li.log_id <> lo.log_id
AND li.log_name <> lo.log_name
ORDER BY
cdate, log_id
) AS next_id
FROM t_log lo
) lo2
LEFT OUTER JOIN
t_log ln
ON ln.log_id = lo2.next_id
) lo3, t_log lo4
WHERE lo3.rn = 1
AND lo4.log_id = lo3.log_id

If I've understood the problem correctly, the following SQL should do the trick:
SELECT Log_ID, FB_ID, min(Date), Log_Name, Log_Type
FROM Log
GROUP BY Date
The SQL will select the row with the earliest date for each FP_ID.

Related

Find Duplicate Values in a column based on specific criteria

I have a table that holds actions against specific accounts, the actions are given a numbered SET of actions and within that SET they get a unique, sequential number. We ran into an issue where somehow one of the unique numbers had been duplicated and would like to check for more examples where this might have happened. The table looks a little like this:
Account | Action Set | Action No | Action Code
--------|------------|-----------|------------
001 | 1 | 1 | GEN
001 | 1 | 2 | PHO
001 | 1 | 3 | RAN
001 | 1 | 3 | GEN
002 | 1 | 1 | GEN
002 | 1 | 2 | PHO
002 | 1 | 3 | RAN
I have tried various things I've found through searches on here but can't find anything that looks like it fits my specific circumstances.
For any given account number, I would like to find where within one Action SET the same Action Number is used more than once. I also need to return the full row, not just a count of how many there are.
From the example above, I would expect to see these results, same account, same action set, same action number
Account | Action Set | Action No | Action Code
--------|------------|-----------|------------
001 | 1 | 3 | RAN
001 | 1 | 3 | GEN
I would post what I have tried so far but honestly the extent of the code I have written so far is:
SELECT
TIA
Mark

Based on your description, you can use exists:
select t.*
from t
where exists (select 1
from t t2
where t2.account = t.account and
t2.actionset = t.actionset and
t2.actionno <> t.actionno
);
EDIT:
The above assumes that action numbers are different. Otherwise you can use:
select t.*
from t
where (select count(*)
from t t2
where t2.account = t.account and
t2.actionset = t.actionset
) >= 2;

try this one
Select account,actionset,actioncode,actionno
from table
where (account,actionset)
IN
(
Select account,actionset from table
group by account,actionset
having count(distinct actionno)>1
)
group by account,actionset,actioncode,actionno

Please find my solution for Getting duplicate records from table.
SELECT [ActionSet],ActionCode,[ActionNo]
FROM
(
SELECT *,ROW_NUMBER()OVER(PARTITION by [ActionSet],[ActionNo] ORDER BY
[ActionNo]) as rnk FROM [dbo].[ActionAccount]
) t where t.rnk>1
Thanks .

Find spectators that have seen the same shows (match multiple rows for each)

For an assignment I have to write several SQL queries for a database stored in a PostgreSQL server running PostgreSQL 9.3.0. However, I find myself blocked with last query. The database models a reservation system for an opera house. The query is about associating the a spectator the other spectators that assist to the same events every time.
The model looks like this:
Reservations table
id_res | create_date | tickets_presented | id_show | id_spectator | price | category
-------+---------------------+---------------------+---------+--------------+-------+----------
1 | 2015-08-05 17:45:03 | | 1 | 1 | 195 | 1
2 | 2014-03-15 14:51:08 | 2014-11-30 14:17:00 | 11 | 1 | 150 | 2
Spectators table
id_spectator | last_name | first_name | email | create_time | age
---------------+------------+------------+----------------------------------------+---------------------+-----
1 | gonzalez | colin | colin.gonzalez#gmail.com | 2014-03-15 14:21:30 | 22
2 | bequet | camille | bequet.camille#gmail.com | 2014-12-10 15:22:31 | 22
Shows table
id_show | name | kind | presentation_date | start_time | end_time | id_season | capacity_cat1 | capacity_cat2 | capacity_cat3 | price_cat1 | price_cat2 | price_cat3
---------+------------------------+--------+-------------------+------------+----------+-----------+---------------+---------------+---------------+------------+------------+------------
1 | madama butterfly | opera | 2015-09-05 | 19:30:00 | 21:30:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
2 | don giovanni | opera | 2015-09-12 | 19:30:00 | 21:45:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
So far I've started by writing a query to get the id of the spectator and the date of the show he's attending to, the query looks like this.
SELECT Reservations.id_spectator, Shows.presentation_date
FROM Reservations
LEFT JOIN Shows ON Reservations.id_show = Shows.id_show;
Could someone help me understand better the problem and hint me towards finding a solution. Thanks in advance.
So the result I'm expecting should be something like this
id_spectator | other_id_spectators
-------------+--------------------
1| 2,3
Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.

Note based on comments: Wanted to make clear that this answer may be of limited use as it was answered in the context of SQL-Server (tag was present at the time)
There is probably a better way to do it, but you could do it with the 'stuff 'function. The only drawback here is that, since your ids are ints, placing a comma between values will involve a work around (would need to be a string). Below is the method I can think of using a work around.
SELECT [id_spectator], [id_show]
, STUFF((SELECT ',' + CAST(A.[id_spectator] as NVARCHAR(10))
FROM reservations A
Where A.[id_show]=B.[id_show] AND a.[id_spectator] != b.[id_spectator] FOR XML PATH('')),1,1,'') As [other_id_spectators]
From reservations B
Group By [id_spectator], [id_show]
This will show you all other spectators that attended the same shows.

Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.
In other words, you want a list of ...
all spectators that have seen all the shows that a given spectator has seen (and possibly more than the given one)
This is a special case of relational division. We have assembled an arsenal of basic techniques here:
How to filter SQL results in a has-many-through relation
It is special because the list of shows each spectator has to have attended is dynamically determined by the given prime spectator.
Assuming that (d_spectator, id_show) is unique in reservations, which has not been clarified.
A UNIQUE constraint on those two columns (in that order) also provides the most important index.
For best performance in query 2 and 3 below also create an index with leading id_show.
1. Brute force
The primitive approach would be to form a sorted array of shows the given user has seen and compare the same array of others:
SELECT 1 AS id_spectator, array_agg(sub.id_spectator) AS id_other_spectators
FROM (
SELECT id_spectator
FROM reservations r
WHERE id_spectator <> 1
GROUP BY 1
HAVING array_agg(id_show ORDER BY id_show)
#> (SELECT array_agg(id_show ORDER BY id_show)
FROM reservations
WHERE id_spectator = 1)
) sub;
But this is potentially very expensive for big tables. The whole table hast to be processes, and in a rather expensive way, too.
2. Smarter
Use a CTE to determine relevant shows, then only consider those
WITH shows AS ( -- all shows of id 1; 1 row per show
SELECT id_spectator, id_show
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
)
SELECT sub.id_spectator, array_agg(sub.other) AS id_other_spectators
FROM (
SELECT s.id_spectator, r.id_spectator AS other
FROM shows s
JOIN reservations r USING (id_show)
WHERE r.id_spectator <> s.id_spectator
GROUP BY 1,2
HAVING count(*) = (SELECT count(*) FROM shows)
) sub
GROUP BY 1;
#> is the "contains2 operator for arrays - so we get all spectators that have at least seen the same shows.
Faster than 1. because only relevant shows are considered.
3. Real smart
To also exclude spectators that are not going to qualify early from the query, use a recursive CTE:
WITH RECURSIVE shows AS ( -- produces exactly 1 row
SELECT id_spectator, array_agg(id_show) AS shows, count(*) AS ct
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
GROUP BY 1
)
, cte AS (
SELECT r.id_spectator, 1 AS idx
FROM shows s
JOIN reservations r ON r.id_show = s.shows[1]
WHERE r.id_spectator <> s.id_spectator
UNION ALL
SELECT r.id_spectator, idx + 1
FROM cte c
JOIN reservations r USING (id_spectator)
JOIN shows s ON s.shows[c.idx + 1] = r.id_show
)
SELECT s.id_spectator, array_agg(c.id_spectator) AS id_other_spectators
FROM shows s
JOIN cte c ON c.idx = s.ct -- has an entry for every show
GROUP BY 1;
Note that the first CTE is non-recursive. Only the second part is recursive (iterative really).
This should be fastest for small selections from big tables. Row that don't qualify are excluded early. the two indices I mentioned are essential.
SQL Fiddle demonstrating all three.

It sounds like you have one half of the total question--determining which id_shows a particular id_spectator attended.
What you want to ask yourself is how you can determine which id_spectators attended an id_show, given an id_show. Once you have that, combine the two answers to get the full result.

So the final answer I got, looks like this :
SELECT id_spectator, id_show,(
SELECT string_agg(to_char(A.id_spectator, '999'), ',')
FROM Reservations A
WHERE A.id_show=B.id_show
) AS other_id_spectators
FROM Reservations B
GROUP By id_spectator, id_show
ORDER BY id_spectator ASC;
Which prints something like this:
id_spectator | id_show | other_id_spectators
-------------+---------+---------------------
1 | 1 | 1, 2, 9
1 | 14 | 1, 2
Which suits my needs, however if you have any improvements to offer, please share :) Thanks again everybody!

CTE to represent a logical table for the rows in a table which have the max value in one column

I have an "insert only" database, wherein records aren't physically updated, but rather logically updated by adding a new record, with a CRUD value, carrying a larger sequence. In this case, the "seq" (sequence) column is more in line with what you may consider a primary key, but the "id" is the logical identifier for the record. In the example below,
This is the physical representation of the table:
seq id name | CRUD |
----|-----|--------|------|
1 | 10 | john | C |
2 | 10 | joe | U |
3 | 11 | kent | C |
4 | 12 | katie | C |
5 | 12 | sue | U |
6 | 13 | jill | C |
7 | 14 | bill | C |
This is the logical representation of the table, considering the "most recent" records:
seq id name | CRUD |
----|-----|--------|------|
2 | 10 | joe | U |
3 | 11 | kent | C |
5 | 12 | sue | U |
6 | 13 | jill | C |
7 | 14 | bill | C |
In order to, for instance, retrieve the most recent record for the person with id=12, I would currently do something like this:
SELECT
*
FROM
PEOPLE P
WHERE
P.ID = 12
AND
P.SEQ = (
SELECT
MAX(P1.SEQ)
FROM
PEOPLE P1
WHERE P.ID = 12
)
...and I would receive this row:
seq id name | CRUD |
----|-----|--------|------|
5 | 12 | sue | U |
What I'd rather do is something like this:
WITH
NEW_P
AS
(
--CTE representing all of the most recent records
--i.e. for any given id, the most recent sequence
)
SELECT
*
FROM
NEW_P P2
WHERE
P2.ID = 12
The first SQL example using the the subquery already works for us.
Question: How can I leverage a CTE to simplify our predicates when needing to leverage the "most recent" logical view of the table. In essence, I don't want to inline a subquery every single time I want to get at the most recent record. I'd rather define a CTE and leverage that in any subsequent predicate.
P.S. While I'm currently using DB2, I'm looking for a solution that is database agnostic.

This is a clear case for window (or OLAP) functions, which are supported by all modern SQL databases. For example:
WITH
ORD_P
AS
(
SELECT p.*, ROW_NUMBER() OVER ( PARTITION BY id ORDER BY seq DESC) rn
FROM people p
)
,
NEW_P
AS
(
SELECT * from ORD_P
WHERE rn = 1
)
SELECT
*
FROM
NEW_P P2
WHERE
P2.ID = 12
PS. Not tested. You may need to explicitly list all columns in the CTE clauses.

I guess you already put it together. First find the max seq associated with each id, then use that to join back to the main table:
WITH newp AS (
SELECT id, MAX(seq) AS latestseq
FROM people
GROUP BY id
)
SELECT p.*
FROM people p
JOIN newp n ON (n.latestseq = p.seq)
ORDER BY p.id
What you originally had would work, or moving the CTE into the "from" clause. Maybe you want to use a timestamp field rather than a sequence number for the ordering?

Following up from #Glenn's answer, here is an updated query which meets my original goal and is on par with #mustaccio's answer, but I'm still not sure what the performance (and other) implications of this approach vs the other are.
WITH
LATEST_PERSON_SEQS AS
(
SELECT
ID,
MAX(SEQ) AS LATEST_SEQ
FROM
PERSON
GROUP BY
ID
)
,
LATEST_PERSON AS
(
SELECT
P.*
FROM
PERSON P
JOIN
LATEST_PERSON_SEQS L
ON
(
L.LATEST_SEQ = P.SEQ)
)
SELECT
*
FROM
LATEST_PERSON L2
WHERE
L2.ID = 12

SQL SELECT only rows where a max value is present, and the corresponding ID from another linked table

I have a simple Parts database which I'd like to use for calculating costs of assemblies, and I need to keep a cost history, so that I can update the costs for parts without the update affecting historic data.
So far I have the info stored in 2 tables:
tblPart:
PartID | PartName
1 | Foo
2 | Bar
3 | Foobar
tblPartCostHistory
PartCostHistoryID | PartID | Revision | Cost
1 | 1 | 1 | £1.00
2 | 1 | 2 | £1.20
3 | 2 | 1 | £3.00
4 | 3 | 1 | £2.20
5 | 3 | 2 | £2.05
What I want to end up with is just the PartID for each part, and the PartCostHistoryID where the revision number is highest, so this:
PartID | PartCostHistoryID
1 | 2
2 | 3
3 | 5
I've had a look at some of the other threads on here and I can't quite get it. I can manage to get the PartID along with the highest Revision number, but if I try to then do anything with the PartCostHistoryID I end up with multiple PartCostHistoryIDs per part.
I'm using MS Access 2007.
Many thanks.

Mihai's (very concise) answer will work assuming that the order of both
[PartCostHistoryID] and
[Revision] for each [PartID]
are always ascending.
A solution that does not rely on that assumption would be
SELECT
tblPartCostHistory.PartID,
tblPartCostHistory.PartCostHistoryID
FROM
tblPartCostHistory
INNER JOIN
(
SELECT
PartID,
MAX(Revision) AS MaxOfRevision
FROM tblPartCostHistory
GROUP BY PartID
) AS max
ON max.PartID = tblPartCostHistory.PartID
AND max.MaxOfRevision = tblPartCostHistory.Revision

SELECT PartID,MAX(PartCostHistoryID) FROM table GROUP BY PartID

Here is query
select PartCostHistoryId, PartId from tblCost
where PartCostHistoryId in
(select PartCostHistoryId from
(select * from tblCost as tbl order by Revision desc) as tbl1
group by PartId
)
Here is SQL Fiddle http://sqlfiddle.com/#!2/19c2d/12

complex django or SQL query

A record can have status 'renewal_required'. If it enters this status, and the applicant indeed renews, a copy is generated, which enters status 'in_process' (But an application can have status 'in_process' for other reasons too).
Now I need to get all records that have renewal_required status, BUT, if a copy exists in status 'in_process' for a given applicant, I shall only show that one...the key is the applicant_id, being the same for copied records.
| id | status | applicant_id |
| 1 | renewal_required | 2 |
| 2 | in_process | 3 |
| 3 | renewal_required | 4 |
| 4 | in_process | 4 |
in the above example, records with id 1 and 4 would be returned...
Can this be done? Thanks for any suggestion (DB-redesign excluded, even if the design looks ridiculous - can't do anything about it right now)
Solution needs to be for django but if a SQL solution is being proposed I will happily accept it and adapt/execute directly

select a.applicant_id,COALESCE(b.status,a.status) status from
(select applicant_id,status from yourtable where status='renewal_required') a
left join
(select applicant_id,status from yourtable where status='in_process') b
on a.applicant_id = b.applicant_id;
check the DEMO

Here, a possible solution
SELECT MAX(t1.id) as max_id, t1.status, t1.applicant_id
FROM t1
JOIN (
SELECT MIN(status) as status, applicant_id
FROM t1
WHERE status in ('renewal_required', 'in_process')
GROUP by applicant_id ) tmp
ON t1.status = tmp.status
AND t1.applicant_id = tmp.applicant_id
GROUP BY t1.status, t1.applicant_id
SQL Fiddle
EDIT: Rethought it, now this one won't work if there are more statuses than just these two, because of SELECT MIN(status). Could you comment on that?
EDIT2: Might be like this it will. added WHERE status in ('renewal_required', 'in_process')

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL query filtering - sql

If I've understood the problem correctly, the following SQL should do the trick: SELECT Log_ID, FB_ID, min(Date), Log_Name, Log_Type FROM Log GROUP BY Date The SQL will select the row with the earliest date for each FP_ID.

Related

Find Duplicate Values in a column based on specific criteria

Find spectators that have seen the same shows (match multiple rows for each)

CTE to represent a logical table for the rows in a table which have the max value in one column

SQL SELECT only rows where a max value is present, and the corresponding ID from another linked table

complex django or SQL query

Categories

Resources