How to get all pks after grouping them (finding duplicates) - sql

I have three tables and I am searching for the duplicates in the main table that have the same foreign keys. But I need the primary keys in return of this query:
SELECT ta.fk1, ta.fk2, count(ta.fk2)
FROM ta, tb, tc
WHERE ta.fk2 = tb.pk
AND ta.fk1 = tc.pk
GROUP BY ta.fk1, ta.fk2
HAVING count(ta.fk2) > 1
How can I get the primary keys? Another join or nested query? I tried all I know and found.
Thanks for help.

Something like this maybe?
select distinct ta_pk, tc_pk
from (
select ta.pk as ta_pk,
tc.pk as tc_pk,
count(*) over (partition by ta.fk2) as cnt
from ta
join tb on ta.fk2 = tb.pk
join tc on ta.fk1 = tc.pk
) t
where cnt > 1;

Related

Select distinct slows query

So when I add DISTINCT to my query it goes from 0.062s to 0.152s?
How can this be?
Project is a index in the table target, so I don't know how I can speed this process up?
SELECT DISTINCT
`Project` AS `Value`
FROM
`Testreportingdebug`.`Target`
LEFT JOIN
`TestJob` ON `Target`.`TestJobId` = `TestJob`.`Id`
WHERE
`TestJob`.`Engine` = 'SeqZap';
Here is my two tables
Why are you surprised? The database has to do work to remove duplicates.
Assuming that project is not duplicated in target, you can just use exists:
select t.project
from Testreportingdebug.Target t
where exists (select 1
from TestJob tj
where t.TestJobId = tj.id and tj.Engine = 'SeqZap'
);
However, if project is not unique in Target, then you still need distinct:
select distinct t.project
from Testreportingdebug.Target t
where exists (select 1
from TestJob tj
where t.TestJobId = tj.id and tj.Engine = 'SeqZap'
);
For this query, you can try an index on Target(project, TestJobId).

How to join 100 random rows from table 1 multiple other tables in oracle

I have scrapped my previous question as I did not do a good job explaining. Maybe this will be simpler.
I have the following query.
Select * from comp_eval_hdr, comp_eval_pi_xref, core_pi, comp_eval_dtl
where comp_eval_hdr.START_DATE between TO_DATE('01-JAN-16' , 'DD-MON-YY')
and TO_DATE('12-DEC-17' , 'DD-MON-YY')
and comp_eval_hdr.COMP_EVAL_ID = comp_eval_dtl.COMP_EVAL_ID
and comp_eval_hdr.COMP_EVAL_ID = comp_eval_pi_xref.COMP_EVAL_ID
and core_pi.PI_ID = comp_eval_pi_xref.PI_ID
and core_pi.PROGRAM_CODE = 'PS'
Now if I only want a random 100 rows from the comp_eval_hdr table to join with the other tables how would I go about it? If it makes it easier you can disregard the comp_eval_dtl table.
I think you are pretty much there. You just need subqueries, table aliases, and JOIN conditions:
SELECT . . .
FROM (SELECT a.*
FROM (SELECT a.*
FROM a
WHERE a.START_DATE BEWTWEEN DATE '2016-01-01' AND DATE '2017-12-12'
ORDER BY DBMS_RANDOM.VALUE
) a
WHERE ROWNUM <= 100
) a JOIN
mapping m
ON a.? = m.? JOIN
b
ON m.? = b.?;
The ? is just a placeholder for the join columns.
It's a bit of a stretch to know what you want with the question as written but here's my attempt.
WITH rand_list AS
(SELECT * FROM comp_eval_hdr
WHERE comp_eval_hdr.START_DATE BEWTWEEN TO_DATE('01-JAN-16' , 'DD-MON-YY') AND TO_DATE('12-DEC-17' , 'DD-MON-YY')
ORDER BY DBMS_RANDOM.VALUE)
first_100 AS
(SELECT *
FROM rand_list
WHERE ROWNUM <=100)
SELECT md.col_1, t3.col_a
FROM first_100 md
INNER JOIN
table2 t2 ON md.id_column = t2.fk_comp_eval_hdr_id
INNER JOIN
table3 t3 ON t3.id_column = t2.fk_table3_id
You haven't given any indication how they join or the table names and obviously I haven't run this against any mock tables.
You've got a list of randomised records with RAND_LIST which you could, if you wanted, combine with the FIRST_100 query (your choice).
The main query then just joins that through your mapping table (T2) to your 'multiples' table (T3).
how does table 2 look like?...Let me put one example as person table and order table?
select * from (
select * from person ps , order order where ps.city = 'mumbai' and ps.id = order.purchasedby ) porder where porder.rownum <= 100
I did not tested it but it will look something like this.

PostgreSQL - how to query "result IN ALL OF"?

I am new to PostgreSQL and I have a problem with the following query:
WITH relevant_einsatz AS (
SELECT einsatz.fahrzeug,einsatz.mannschaft
FROM einsatz
INNER JOIN bergefahrzeug ON einsatz.fahrzeug = bergefahrzeug.id
),
relevant_mannschaften AS (
SELECT DISTINCT relevant_einsatz.mannschaft
FROM relevant_einsatz
WHERE relevant_einsatz.fahrzeug IN (SELECT id FROM bergefahrzeug)
)
SELECT mannschaft.id,mannschaft.rufname,person.id,person.nachname
FROM mannschaft,person,relevant_mannschaften WHERE mannschaft.leiter = person.id AND relevant_mannschaften.mannschaft=mannschaft.id;
This query is working basically - but in "relevant_mannschaften" I am currently selecting each mannschaft, which has been to an relevant_einsatz with at least 1 bergefahrzeug.
Instead of this, I want to select into "relevant_mannschaften" each mannschaft, which has been to an relevant_einsatz WITH EACH from bergefahrzeug.
Does anybody know how to formulate this change?
The information you provide is rather rudimentary. But tuning into my mentalist skills, going out on a limb, I would guess this untangled version of the query does the job much faster:
SELECT m.id, m.rufname, p.id, p.nachname
FROM person p
JOIN mannschaft m ON m.leiter = p.id
JOIN (
SELECT e.mannschaft
FROM einsatz e
JOIN bergefahrzeug b ON b.id = e.fahrzeug -- may be redundant
GROUP BY e.mannschaft
HAVING count(DISTINCT e.fahrzeug)
= (SELECT count(*) FROM bergefahrzeug)
) e ON e.mannschaft = m.id
Explain:
In the subquery e I count how many DISTINCT mountain-vehicles (bergfahrzeug) have been used by a team (mannschaft) in all their deployments (einsatz): count(DISTINCT e.fahrzeug)
If that number matches the count in table bergfahrzeug: (SELECT count(*) FROM bergefahrzeug) - the team qualifies according to your description.
The rest of the query just fetches details from matching rows in mannschaft and person.
You don't need this line at all, if there are no other vehicles in play than bergfahrzeuge:
JOIN bergefahrzeug b ON b.id = e.fahrzeug
Basically, this is a special application of relational division. A lot more on the topic under this related question:
How to filter SQL results in a has-many-through relation
Do not know how to explain it, but here is an example how I solved this problem, just in case somebody has the some question one day.
WITH dfz AS (
SELECT DISTINCT fahrzeug,mannschaft FROM einsatz WHERE einsatz.fahrzeug IN (SELECT id FROM bergefahrzeug)
), abc AS (
SELECT DISTINCT mannschaft FROM dfz
), einsatzmannschaften AS (
SELECT abc.mannschaft FROM abc WHERE (SELECT sum(dfz.fahrzeug) FROM dfz WHERE dfz.mannschaft = abc.mannschaft) = (SELECT sum(bergefahrzeug.id) FROM bergefahrzeug)
)
SELECT mannschaft.id,mannschaft.rufname,person.id,person.nachname
FROM mannschaft,person,einsatzmannschaften WHERE mannschaft.leiter = person.id AND einsatzmannschaften.mannschaft=mannschaft.id;

How to rewrite long query?

I have the following 2 tables:
items:
id int primary key
bla text
events:
id_items int
num int
when timestamp without time zone
ble text
composite primary key: id_items, num
and want to select to each item the most recent event (the newest 'when').
I wrote an request, but I don't know if it could be written more efficiently.
Also on PostgreSQL there is a issue with comparing Timestamp objects:
2010-05-08T10:00:00.123 == 2010-05-08T10:00:00.321
so I select with 'MAX(num)'
Any thoughts how to make it better? Thanks.
SELECT i.*, ea.* FROM items AS i JOIN
( SELECT t.s AS t_s, t.c AS t_c, max(e.num) AS o FROM events AS e JOIN
( SELECT DISTINCT id_item AS s, MAX(when) AS c FROM events GROUP BY s ORDER BY c ) AS t
ON t.s = e.id_item AND e.when = t.c GROUP BY t.s, t.c ) AS tt
ON tt.t_s = i.id JOIN events AS ea ON ea.id_item = tt.t_s AND ea.cas = tt.t_c AND ea.num = tt.o;
EDIT: had bad data, sorry, my bad, however thanks for finding better SQL query
SELECT (i).*, (e).*
FROM (
SELECT i,
(
SELECT e
FROM events e
WHERE e.id_items = i.id
ORDER BY
when DESC
LIMIT 1
) e
FROM items i
) q
If you're using 8.4:
select * from (
select item.*, event.*,
row_number() over(partition by item.id order by event."when" desc) as row_number
from items item
join events event on event.id_items = item.id
) x where row_number = 1
For this kind of joins, I prefer the DISTINCT ON syntax (example).
It's a Postgresql extension (not SQL standard syntax), but it comes very handy:
SELECT DISTINCT ON (it.id)
it.*, ev.*
FROM items it, events ev
WHERE ev.id_items = it.id
ORDER by it.id, ev.when DESC;
You can't beat that, on terms of simplicity and readability.
That query assumes that every item has at least one event. If not, and if you want all
events, you'll need an outer join:
SELECT DISTINCT ON (it.id)
it.*, ev.*
FROM items it LEFT JOIN events ev
ON ev.id_items = it.id
ORDER BY it.id, ev.when DESC;
BTW: There is no "timestamp issue" in Postgresql, perhaps you should change the title.

selecting latest rows per distinct foreign key value

excuse the title, i couldn't come up with something short and to the point...
I've got a table 'updates' with the three columns, text, typeid, created - text is a text field, typeid is a foreign key from a 'type' table and created is a timestamp. A user is entering an update and select the 'type' it corresponds too.
There's a corresponding 'type' table with columns 'id' and 'name'.
I'm trying to end up with a result set with as many rows as is in the 'type' table and the latest value from updates.text for the particular row in types. So if i've got 3 types, 3 rows would be returned, one row for each type and the most recent updates.text value for the type in question.
Any ideas?
thanks,
John.
select u.text, u.typeid, u.created, t.name
from (
select typeid, max(created) as MaxCreated
from updates
group by typeid
) mu
inner join updates u on mu.typeid = u.typeid and mu.MaxCreated = u.Created
left outer join type t on u.typeid = t.typeid
What are the actual columns you want returned?
SELECT t.*,
y.*
FROM TYPE t
JOIN (SELECT u.typeid,
MAX(u.created) 'max_created'
FROM UPDATES u
GROUP BY u.typeid) x ON x.typeid = t.id
JOIN UPDATES y ON y.typeid = x.typeid
AND y.created = x.max_created
SELECT
TYP.id,
TYP.name,
TXT.comment
FROM
dbo.Types TYP
INNER JOIN dbo.Type_Comments TXT ON
TXT.type_id = TYP.id
WHERE
NOT EXISTS
(
SELECT
*
FROM
dbo.Type_Comments TXT2
WHERE
TXT2.type_id = TYP.id AND
TXT2.created > TXT.created
)
Or:
SELECT
TYP.id,
TYP.name,
TXT.comment
FROM
dbo.Types TYP
INNER JOIN dbo.Type_Comments TXT ON
TXT.type_id = TYP.id
LEFT OUTER JOIN dbo.Type_Comments TXT2 ON
TXT2.type_id = TYP.id AND
TXT2.created > TXT.created
WHERE
TXT2.type_id IS NULL
In either case, if the created date can be identical between two rows with the same type_id then you would need to account for that.
I've also assumed at least one comment per type exists. If that's not the case then you would need to make a minor adjustment for that as well.