Duplicate alternate columns - sql

I have a table created by joining a table to itself.
SELECT a.id id1, b.id id2, ST_Distance(a.geom, b.geom) dist
FROM yy_cluster_1 a, yy_cluster_1 b
WHERE ST_DWithin(a.geom, b.geom, 12) AND a.id <> b.id
ORDER BY a.id, b.id
This results in a table like the one below on the left. I am trying to number rows to produce the result on the right
For the life of me I cannot think how to do this, let alone efficiently.

Use dense_rank() along with least() and greatest():
SELECT
a.id id1,
b.id id2,
ST_Distance(a.geom, b.geom) dist,
dense_rank() over(order by least(a.id, b.id), greatest(a.id, b.id)) new_id
FROM
yy_cluster_1 a
INNER JOIN yy_cluster_1 b ON ST_DWithin(a.geom, b.geom, 12) AND a.id <> b.id
ORDER BY a.id, b.id
Note that I changed your query so it performs an explicit join instead of an old-school implicit join.

Related

Getting the last entry within a join

I know this has been asked a lot but I can't seem to get my query working.
I'm trying to get only one row per id in a query looking like this :
SELECT a.id, b.name
FROM table1 a
LEFT JOIN table2 b ON a.key = b.key
WHERE a.Date =
(SELECT MAX(a1.date) from table1 WHERE a1.primarykey = a.primarykey)
GROUP BY a.id, b.name
I do not need to group by b.name but have to since I need to group by id.
Right now, I have multiple occurences for b.name which duplicates a.id where I just want the corresponding b.name for the last date for a.id.
Can anyone point me to the right way to do this ?
Thank you
I guess this condition:
WHERE a1.primarykey = a.primarykey
should be:
WHERE a1.key = a.key
and key is not the primary key of table1, because if you really mean the primary key then there is no point to search for the MAX(date) for the primary key since there is only 1 date for each primary key.
If I'm not wrong then try with row_number():
SELECT t.id, t.name
FROM (
SELECT a.id, b.name,
row_number() over (partition by a.key order by a.date desc) rn
FROM table1 a LEFT JOIN table2 b
ON a.key = b.key
) t
WHERE t.rn = 1
It looks like you would be getting 1 row per id if you would be removing b.name from your group statement.
Not sure why you would need to group on b.name if you group on a.id?
try this:
SELECT a.id, b.name from (
SELECT a1.id,a1.key,
rank() over(partition by a1.key order by a1.date desc) md FROM table1 a1 )a
LEFT JOIN table2 b ON a.key = b.key and a.md=1;
but I don't get -you need group by Id or key, double check it

Removing duplicate values from a column in SQL

I have two tables A (group_id, id, subject) and B (id, date). Below is the joint table of tables A and B on id. I have tried using distinct and partition to remove the duplicates in group_id(field) only, but no luck:
My code:
select
a.group_id, a.id, a.subject, b.date
from
A a
inner join
(select
b.*,
row_number() over (partition by group_id order by date asc) as seqnum
from
B b) b on a.id = b.id and seqnum = 1
order by
date desc;
I got this error when I ran the code:
Partitioning can not be used stand-alone in query near 'partition by group_id order by date asc) as seqnum from B' at line 1
This is my expected result:
Thank you in advance!
It looks like you want the earliest date for each row in the table you show. Your question mentions two tables, but you only show one.
I recommend a correlated subquery in most databases:
select b.*
from b
where b.date = (select min(b2.date)
from b b2
where b2.group_id = b.group_id
);
I see. You need to join first and then use row_number():
select ab.*
from (select a.group_id, a.id, a.subject, b.date,
row_number() over (partition by a.group_id order by b.date) as seqnum
from A a join
B b
on a.id = b.id
) ab
where seqnum = 1
order by date desc;
You are almost there. But the column that you try to use to partition (ie group_id) comes from table a, which is not available in the subquery.
You would need to JOIN and assign the row number in a subquery, and then filter in the outer query.
select *
from (
select
a.group_id,
a.id,
a.subject,
b.date,
row_number() over (partition by a.group_id order by b.date asc) as seqnum
from a
inner join b on ON a.id = b.id
)
where seqnum = 1
ORDER BY date desc;
Another way to achieve your goal though it may not be the efficient one
SELECT
A.group_id, A.id, B.Date, A.subject
FROM A
INNER JOIN B
ON A.Id = B.Id
INNER JOIN
(
SELECT
A.Group_id, MIN(B.Date) AS Date
FROM A
INNER JOIN B
ON A.Id = B.Id
GROUP BY A.group_id
) AS supportTable
ON A.group_id = supportTable.group_id
AND B.Date = supportTable.Date

SQL summations with multiple outer joins

I have tables a, b, c, and d whereby:
There are 0 or more b rows for each a row
There are 0 or more c rows for each a row
There are 0 or more d rows for each a row
If I try a query like the following:
SELECT a.id, SUM(b.debit), SUM(c.credit), SUM(d.other)
FROM a
LEFT JOIN b on a.id = b.a_id
LEFT JOIN c on a.id = c.a_id
LEFT JOIN d on a.id = d.a_id
GROUP BY a.id
I notice that I have created a cartesian product and therefore my sums are incorrect (much too large).
I see that there are other SO questions and answers, however I'm still not grasping how I can accomplish what I want to do in a single query. Is it possible in SQL to write a query which aggregates all of the following data:
SELECT a.id, SUM(b.debit)
FROM a
LEFT JOIN b on a.id = b.a_id
GROUP BY a.id
SELECT a.id, SUM(c.credit)
FROM a
LEFT JOIN c on a.id = c.a_id
GROUP BY a.id
SELECT a.id, SUM(d.other)
FROM a
LEFT JOIN d on a.id = d.a_id
GROUP BY a.id
in a single query?
Your analysis is correct. Unrelated JOIN create cartesian products.
You have to do the sums separately and then do a final addition. This is doable in one query and you have several options for that:
Sub-requests in your SELECT: SELECT a.id, (SELECT SUM(b.debit) FROM b WHERE b.a_id = a.id) + ...
CROSS APPLY with a similar query as the first bullet then SELECT a.id, b_sum + c_sum + d_sum
UNION ALL as you suggested with an outer SUM and GROUP BY on top of that.
LEFT JOIN to similar subqueries as above.
And probably more... The performance of the various solutions might be slightly different depending on how many rows in A you want to select.
SELECT a.ID, debit, credit, other
FROM a
LEFT JOIN (SELECT a_id, SUM(b.debit) as debit
FROM b
GROUP BY a_id) b ON a.ID = b.a_id
LEFT JOIN (SELECT a_id, SUM(b.credit) as credit
FROM c
GROUP BY a_id) c ON a.ID = c.a_id
LEFT JOIN (SELECT a_id, SUM(b.other) as other
FROM d
GROUP BY a_id) d ON a.ID = d.a_id
Can also be done with correlated subqueries:
SELECT a.id
, (SELECT SUM(debit) FROM b WHERE a.id = b.a_id)
, (SELECT SUM(credit) FROM c WHERE a.id = c.a_id)
, (SELECT SUM(other) FROM d WHERE a.id = d.a_id)
FROM a

sql - multiple layers of correlated subqueries

I have table A, B and C
I want to return all entries in table A that do not exist in table B and of that list do not exist in table C.
select * from table_A as a
where not exists (select 1 from table_B as b
where a.id = b.id)
this gives me the first result of entries in A that are not in B. But now I want only those entries of this result that are also not in C.
I tried flavours of:
select * from table_A as a
where not exists (select 1 from table_B as b
where a.id = b.id)
AND
where not exists (select 1 from table_C as c
where a.id = c.id)
But that isnt the correct logic. If there is a way to store the results from the first query and then select * from that result that are not existent in table C. But I'm not sure how to do that. I appreciate the help.
Try this:
select * from (
select a.*, b.id as b_id, c.id as c_id
from table_A as a
left outer join table_B as b on a.id = b.id
left outer join table_C as c on c.id = a.id
) T
where b_id is null
and c_id is null
Another implementation is this:
select a1.*
from table_A as a1
inner join (
select a.id from table_A
except
select b.id from table_B
except
select c.id from table_c
) as a2 on a1.id = a2.id
Note the restrictions on the form of the sub-query as described here. The second implementation, by most succinctly and clearly describing the desired operation to SQL Server, is likely to be the most efficient.
You have two WHERE clauses in (the external part of) your second query. That is not valid SQL. If you remove it, it should work as expected:
select * from table_A as a
where not exists (select 1 from table_B as b
where a.id = b.id)
AND
not exists (select 1 from table_C as c -- WHERE removed
where a.id = c.id) ;
Tested in SQL-Fiddle (thnx #Alexander)
how about using LEFT JOIN
SELECT a.*
FROM TableA a
LEFT JOIN TableB b
ON a.ID = b.ID
LEFT JOIN TableC c
ON a.ID = c.ID
WHERE b.ID IS NULL AND
c.ID IS NULL
SQLFiddle Demo
One more option with NOT EXISTS operator
SELECT *
FROM dbo.test71 a
WHERE NOT EXISTS(
SELECT 1
FROM (SELECT b.ID
FROM dbo.test72 b
UNION ALL
SELECT c.ID
FROM dbo.test73 c) x
WHERE a.ID = x.ID
)
Demo on SQLFiddle
Option from #ypercube.Thank for the present;)
SELECT *
FROM dbo.test71 a
WHERE NOT EXISTS(
SELECT 1
FROM dbo.test72 b
WHERE a.ID = b.ID
UNION ALL
SELECT 1
FROM dbo.test73 c
WHERE a.ID = c.ID
);
Demo on SQLFiddle
I do not like "not exists" but if for some reason it seems to be more logical to you; then you can use a alias for your first query. Subsequently, you can re apply another "not exists" clause. Something like:
SELECT * FROM
( select * from tableA as a
where not exists (select 1 from tableB as b
where a.id = b.id) )
AS A_NOT_IN_B
WHERE NOT EXISTS (
SELECT 1 FROM tableC as c
WHERE c.id = A_NOT_IN_B.id
)

Aliasing derived table which is a union of two selects

I can't get the syntax right for aliasing the derived table correctly:
SELECT * FROM
(SELECT a.*, b.*
FROM a INNER JOIN b ON a.B_id = b.B_id
WHERE a.flag IS NULL AND b.date < NOW()
UNION
SELECT a.*, b.*
FROM a INNER JOIN b ON a.B_id = b.B_id
INNER JOIN c ON a.C_id = c.C_id
WHERE a.flag IS NOT NULL AND c.date < NOW())
AS t1
ORDER BY RAND() LIMIT 1
I'm getting a Duplicate column name of B_id. Any suggestions?
The problem isn't the union, it's the select a.*, b.* in each of the inner select statements - since a and b both have B_id columns, that means you have two B_id cols in the result.
You can fix that by changing the selects to something like:
select a.*, b.col_1, b.col_2 -- repeat for columns of b you need
In general, I'd avoid using select table1.* in queries you're using from code (rather than just interactive queries). If someone adds a column to the table, various queries can suddenly stop working.
In your derived table, you are retrieving the column id that exists in table a and table b, so you need to choose one of them or give an alias to them:
SELECT * FROM
(SELECT a.*, b.[all columns except id]
FROM a INNER JOIN b ON a.B_id = b.B_id
WHERE a.flag IS NULL AND b.date < NOW()
UNION
SELECT a.*, b.[all columns except id]
FROM a INNER JOIN b ON a.B_id = b.B_id
INNER JOIN c ON a.C_id = c.C_id
WHERE a.flag IS NOT NULL AND c.date < NOW())
AS t1
ORDER BY RAND() LIMIT 1
First, you could use UNION ALL instead of UNION. The two subqueries will have no common rows because of the excluding condtion on a.flag.
Another way you could write it, is:
SELECT a.*, b.*
FROM a
INNER JOIN b
ON a.B_id = b.B_id
WHERE ( a.flag IS NULL
AND b.date < NOW()
)
OR
( a.flag IS NOT NULL
AND EXISTS
( SELECT *
FROM c
WHERE a.C_id = c.C_id
AND c.date < NOW()
)
)
ORDER BY RAND()
LIMIT 1