unable to use LIMIT when using correlated query - sql

I have two tables in Postgres. I want to get the latest 3records data from table.
Below is the query:
select two.sid as sid,
two.sidname as sidname,
two.myPercent as mypercent,
two.saccur as saccur,
one.totalSid as totalSid
from table1 one,table2 two
where one.sid = two.sid;
The above query displays all records checking the condition one.sid = two.sid;I want to get only recent 3 records data(4,5,6) from table2.
I know in Postgres we can use limit to limit the rows to retrieve, but here in table2 for each ID I have multiple rows. So I guess I cannot use limit on table2 but should use on table1. Any suggestions?
table1:
sid totalSid
1 10
2 20
3 30
4 40
5 50
6 60
table2:
sid sidname myPercent saccur
1 aaaa 11 11t
1 bbb 13 13g
1 ccc 11 11g
1 qw 88 88k
//more data for 2,3,4,5....
6 xyz 89 895W
6 xyz1 90 90k
6 xyz2 91 91p
6 xyz3 92 92q

Given a changed understanding of the question a simple subquery and join should suffice.
We select everything from table1 limit to 3 records in sid order desc. This gives us the 3 most recent Sid's and then join to table2 to get the other SID relevant data. The assumption here is that SID is unique in table one and "most recent" would be those records having the highest SID.
SELECT two.sid as sid
, two.sidname as sidname
, two.myPercent as mypercent
, two.saccur as saccur
, one.totalSid as totalSid
FROM (SELECT * FROM table1 ORDER BY SID DESC LIMIT 3) one
INNER JOIN table2 two
ON one.sid = two.sid;
*note I removed a comma after one alias above.
and below we reinstated the ANSI 88 join syntax using , notation.
SELECT two.sid as sid
, two.sidname as sidname
, two.myPercent as mypercent
, two.saccur as saccur
, one.totalSid as totalSid
FROM (SELECT * FROM table1 ORDER BY SID DESC LIMIT 3) one
, table2 two
WHERE one.sid = two.sid;
This syntax basically says get the 3 most recent SIDs from table one and cross join (For each record in one match it to all records in two) that to all records in table two but then return only records that have the same SID on both sides. Modern compilers may be able to use Cost based optimization to improve performance here negating the need to do the entire cross join; however, order of operation says this is what the database would normally have to do. if one and two are both tables of substantial size, you can see the cross join could result in a very large temporary dataset

Related

Best practice of implementing sql "JOIN" returning policy in snowflake

Say we have two tables performing a left join:
Table 1
Joint Key || Attribute 1 || Attribute 2 || Attribute 3
A 1 11 21
B 2 12 22
C 3 13 23
Table 2
Joint Key || Attribute 4 || Attribute 5
A 31 41
A 32 42
C 33 43
by performing a table 1 left join table 2 on "Joint Key"
it will return two records having
Joint Key = 'A'
Joint Key || Attribute 1 || Attribute 2 || Attribute 3 || Attribute 4 || Attribute 5
A 1 11 21 31 41
A 1 11 21 32 42
What's the best practice of defining the return police, specifically in snowflake, that can return me the same row count as table 1.
Taking the above example, I want the the record has the MAX(Attribute 4). Two initial ideas come to my mind
Option 1: use "GROUP BY" clause -- need list columns explicitly, cumbersome when dealing with table has many columns.
Option 2: something like
select * from (
select
Tabel1.*
max(Table2.Attribute_4) as mx_Attribute_4,
Table2.Attribute_5
from Table1
left join Table2
on Joint_Key
) as temp
where temp.Attribute_4 = temp.mx_Attribute_4
it's quite complicated and time-consuming too.
Any other suggestions?
you could use QUALIFY
Something like:
select
t1.Joint_key, t1.Attribute_1, t1.Attribute_2, t1.Attribute_3, t2.Attribute_4, t2.Attribute_5
from Table1 t1
left join Table2 t2
on t1.Joint_key = t2.Joint_key
qualify row_number() over(partition by Joint_Key order by Attribute_4 desc) = 1
This is certainly more clean, and should be more efficient than a group by. It does still require the query to sort records by Attribute_4, but I don't see a way of avoiding that unless you are ok with using any of the sets of values instead of the one with MAX(Attribute_4). In that case you could be more efficient by using order by 1 in the row_number() window function.
You seem to have some confused ideas about how joins work. If you have Table1 left join Table2 then it will return all the records from Table1 with any data from matching records in Table2 - so in your case you would normally get the 3 records from Table1.
However, in your case you have 2 records in table2 that matches 1 record in table 1 so this will duplicate your results and you will get 4 records: 2 with key A and then 1 with B and 1 with C.
Anyway, given the example data you’ve provided, please update your question with the result you want to achieve so that someone can help you

wants to pick most closest record in group of records in single table which input criteria

we have a table and there is possibility that one record can have multiple copies means same record can exist in table with multiple entries but their criteria will be different criteria is decided using three main parameters.income,score,no_months.these columns are integer.and we are grouping them by giving unique code to same records profile.
if one input is eligible for multiple profiles then we need to pick which is most matching to criteria.
Sample Data.
id
name
income
score
no_months
group_code
22
abc
1000
500
6
abccode
23
abc
900
600
12
abccode
24
bca
1000
600
12
bcacode
Desired Results
id
name
income
score
no_months
group_code
23
abc
900
600
12
abccode
24
bca
1000
600
12
bcacode
Note: id 23 row has 2 columns which values are greater than id 22 row that is why id 23 was picked although id 23 has less income
Only those records should be display which columns have more count of greater values than other row if group_code is same.
I have tried using multiple order by with cte as more columns needs to display like image city etc. but its not working
Select a single row for the Name or a winner of multiple rows. Winner is one with max score of wins when compared to others in a triangle join. Provided 2 rows has the same criteria, a row with the lesser id wins.
select *
from tbl t
where id in (
-- winners
select winid
from tbl t1
join tbl t2 on t1.name = t2.name and t1.id < t2.id
join lateral (
select case when sign(t1.income - t2.income) + sign(t1.score - t2.score) + sign(t1.no_months - t2.no_months) >= 0
then t1.id else t2.id end winid
) w on 1=1
group by winid
order by count(*) desc
limit 1)
or not exists(select 1 from tbl t3 where t3.name = t.name and t3.id <> t.id)

Count number of repeats in SQL

I tried to solve one problem but without success.
I have two list of number
{1,2,3,4}
{5,6,7,8,9}
And I have table
ID Number
1 1
1 2
1 7
1 2
1 6
2 8
2 7
2 3
2 9
Now I need to count how many times number from second list come after number from first list but I should count only one by one id
in example table above result should be 2
three matched pars but because we have only two different IDs result is 2 instead 3
Pars:
1 2
1 7
1 2
1 6
2 3
2 9
note. I work with MSSQL
Edit. There is one more column Date which determined order
Edit2 - Solution
i write this query
SELECT * FROM table t
left JOIN table tt ON tt.ID = t.ID
AND tt.Date > t.Date
AND t.Number IN (1,2,3,4)
AND tt.Number IN (6,7,8,9)
And after this I had a plan to group by id and use only one match for each id but execution take a lot time
Here is a query that would do it:
select a.id, min(a.number) as a, min(b.number) as b
from mytable a
inner join mytable b
on a.id = b.id
and a.date < b.date
and b.number in (5,6,7,8,9)
where a.number in (1,2,3,4)
group by a.id
Output is:
id a b
1 1 6
2 3 9
So the two pairs are output each on one line, with the value a belonging to the first group of numbers, and the value of column b to the second group.
Here is a fiddle
Comments on attempt (edit 2 to question)
Later you added a query attempt to your question. Some comments about that attempt:
You don't need a left join because you really want to have a match for both values. inner join has in general better performance, so use that.
The condition t.Number IN (1,2,3,4) does not belong in the on clause. In combination with a left join the result will include t records that violate this condition. It should be put in the where clause.
Your concern about performance may be warranted, but can be resolved by adding a useful index on your table, i.e. on (id, number, date) or (id, date, number)

Checking for (and Deleting) Complex Object Duplicates in SQL Server

So I need to duplicate check a complex object, and then cascade delete dupes from all associated tables and I'm wondering if I can do it efficiently in SQL Server, or if I should go about it in my code. Structurally I have the following tables.
Claim
ClaimCaseSubTypes (mapping table for many to many relationship)
ClaimDiagnosticCodes (ditto)
ClaimTreatmentCodes (ditto)
Basically a Claim is only a duplicate if it is matching on 8 fields in itself AND has the same relationships in all the mapping tables.
For Example, the following records would be indicated as duplicates
Claim
Id CreateDate Other Fields
1 1/1/2015 matched
2 6/1/2015 matched
ClaimCaseSubTypes
ClaimId SubTypeId
1 34
1 64
2 34
2 64
ClaimDiagnosticCodes
ClaimId DiagnosticCodeId
1 1
2 1
ClaimTreatmentCodes
ClaimId TreatmentCodeId
1 5
1 6
2 6
2 5
And in this case I would want to keep 1 and delete 2 from the Claim table as well as any rows in the mapping tables with ClaimId of 2
This is the kind of problem that window functions are for:
;WITH cte AS (
SELECT c.ID,
ROW_NUMBER() OVER (PARTITION BY field1, field2, field3, ... ORDER BY c.CreateDate) As ClaimOrder
FROM Claim c
INNER JOIN other tables...
)
UPDATE Claim
SET IsDuplicate = IIF(cte.ClaimOrder = 1, 0, 1)
FROM Claim c
INNER JOIN cte ON c.ID = cte.ID
The fields that you include in the PARTITION BY indicates what fields need to be identical for two claims to be considered matched. The ORDER BY tell SQL Server assign the earliest claim the order of 1. Everything that doesn't have the order of 1 is a duplicate of something else.

SQL return a default value if a row is not found [PostgreSQL]

I'm wondering if it was doable (in one query if possible) to make the query return a default value if a row is missing ? For example takes these 2 tables and given my query takes 2 parameter (place_id and user_id)
T1
place_id / tag_id
1 2
1 3
1 4
2 4
3 2
4 5
T2
user_id / tag_id / count
100 2 1
100 3 20
200 4 30
200 2 2
300 5 22
As you see, the pair user/tag (100,4) is missing. What I would like to archive is a query that will return me these 3 results
tag_id / count
2 1
3 20
4 0
I know that i can do this with something like this but it doesn't really match the final result as it only works if i know in advance the tag_id... and obviously only return 1 row..:
SELECT T1.tag_id, T2.count
from T1 t1
left join T2 t2 on t1.tagId=t2.tag_id
where t1.place_id=1
UNION ALL
select tag_id,0
from T1
where not exist (select 1 from T2 where user_id=100 and tag_id=4)
and tag_id=4;
EDIT: My question was not complete and had missing cases
here is an example (curtesy of #a_horse_with_no_name) http://sqlfiddle.com/#!12/67042/4
Thank you!
The outer join will already take care of what you want.
As t1 is the "left table" of the join, all rows from t1 will be returned. Columns from the "right table" (t2 in your example) will then have a null value. So you only need to convert that null to a 0:
select t1.tag_id, coalesce(t2.cnt, 0)
from T1 t1
left join T2 t2 on t1.tag_Id=t2.tag_id
and t1.place_id = 1;
SQLFiddle example: http://sqlfiddle.com/#!12/ed7bf/1
Unrelated but:
Using count as a column name is a really bad idea, because it will require you to always enclose the column name in double quotes: t2."count" because it is a reserved word. Plus it doesn't really document the purpose of the column. You should find a better name for that.