Join unrelated table with unequal rows - sql

I would like to join Table A, Table B, and Table C as the expected result in the attached image.

You can enumerate the rows and use that for joining . . . which might be what you want:
select ab.*, c.*
from (select a.*, b.*, -- really list out the columns you want
row_number() over (order by accountid) as seqnum
from a join
b
on a.accountid = b.accountid
) ab join
(select c.*, row_number() over (order by code) as seqnum
from c
) c
on ab.seqnum = c.seqnum

Related

Find the first N nearest points in Bigquery

To find the nearest point and its distance in Bigquery I am using this query
WITH table_a AS (
SELECT id, geom
FROM bqtable
), table_b AS (
SELECT id, geom
FROM bqtable
)
SELECT AS VALUE ARRAY_AGG(STRUCT<id_a STRING,id_b STRING, dist FLOAT64>(a.id,b.id,ST_DISTANCE(a.geom, b.geom)) ORDER BY ST_DISTANCE(a.geom, b.geom) LIMIT 1)[OFFSET(0)]
FROM (SELECT id, geom FROM table_a) a
CROSS JOIN (SELECT id, geom FROM table_b) b
WHERE a.id <> b.id
GROUP BY a.id
How can I modify this query to find the nearest 10 points and their distances?
Thanks!
One method uses ORDER BY, LIMIT, and UNNEST(). Using your approach:
SELECT AS VALUE s
FROM (SELECT ARRAY_AGG(STRUCT<id_a STRING,id_b STRING, dist FLOAT64>(a.id, b.id, ST_DISTANCE(a.geom, b.geom))
ORDER BY ST_DISTANCE(a.geom, b.geom)
LIMIT 10
) as ar
FROM (SELECT id, geom FROM table_a) a CROSS JOIN
(SELECT id, geom FROM table_b) b
WHERE a.id <> b.id
GROUP BY a.id
) ab CROSS JOIN
UNNEST(ab.ar) s;
A simpler method would be
select id_a, id_b, ST_DISTANCE(a.geom, b.geom) as dist
from table_a a cross join
table_b b
where a.id <> b.id
qualify row_number() over (partition by id_a order by dist) <= 10;

Removing duplicate values from a column in SQL

I have two tables A (group_id, id, subject) and B (id, date). Below is the joint table of tables A and B on id. I have tried using distinct and partition to remove the duplicates in group_id(field) only, but no luck:
My code:
select
a.group_id, a.id, a.subject, b.date
from
A a
inner join
(select
b.*,
row_number() over (partition by group_id order by date asc) as seqnum
from
B b) b on a.id = b.id and seqnum = 1
order by
date desc;
I got this error when I ran the code:
Partitioning can not be used stand-alone in query near 'partition by group_id order by date asc) as seqnum from B' at line 1
This is my expected result:
Thank you in advance!
It looks like you want the earliest date for each row in the table you show. Your question mentions two tables, but you only show one.
I recommend a correlated subquery in most databases:
select b.*
from b
where b.date = (select min(b2.date)
from b b2
where b2.group_id = b.group_id
);
I see. You need to join first and then use row_number():
select ab.*
from (select a.group_id, a.id, a.subject, b.date,
row_number() over (partition by a.group_id order by b.date) as seqnum
from A a join
B b
on a.id = b.id
) ab
where seqnum = 1
order by date desc;
You are almost there. But the column that you try to use to partition (ie group_id) comes from table a, which is not available in the subquery.
You would need to JOIN and assign the row number in a subquery, and then filter in the outer query.
select *
from (
select
a.group_id,
a.id,
a.subject,
b.date,
row_number() over (partition by a.group_id order by b.date asc) as seqnum
from a
inner join b on ON a.id = b.id
)
where seqnum = 1
ORDER BY date desc;
Another way to achieve your goal though it may not be the efficient one
SELECT
A.group_id, A.id, B.Date, A.subject
FROM A
INNER JOIN B
ON A.Id = B.Id
INNER JOIN
(
SELECT
A.Group_id, MIN(B.Date) AS Date
FROM A
INNER JOIN B
ON A.Id = B.Id
GROUP BY A.group_id
) AS supportTable
ON A.group_id = supportTable.group_id
AND B.Date = supportTable.Date

Max(date) in inner query

I was given sample SQL which does not seem to do what I need.
Big table has 4 million rows and small table has 600 thousand rows.
/* Sample code: (I was given this sample by a senior analyst) */
SELECT SUM(BigTable.VALUE)
FROM BigTable INNER JOIN SmallTable
WHERE BigTable.ID = SmallTable.ID
AND BigTable.VALUATION_DATE IN
(SELECT MAX(VALUATION_DATE)
FROM BigTable)
GROUP BY BigTable.ID
/* My code: (I placed a WHERE in the inner query) */
SELECT BigTable.ID, SUM(BigTable.VALUE)
FROM BigTable INNER JOIN SmallTable
WHERE BigTable.ID = SmallTable.ID
AND BigTable.VALUATION_DATE IN
(SELECT MAX(VALUATION_DATE)
FROM BigTable INNER JOIN SmallTable
WHERE BigTable.ID = SmallTable.ID)
GROUP BY BigTable.ID
If ID xyz has three accounts with values $1, $2, $3 respectively on the most recent date, I want to return the sum of all accounts on that date: xyz, $6
So the INNER JOIN syntax you are using I believe is incorrect. After the INNER JOIN table that will be joined, you need to state ON what columns you wish to join the tables on.
The following query is the correct syntax (although it may not be correct for your implementation).
SELECT BigTable.ID, SUM(BigTable.VALUE)
FROM BigTable INNER JOIN SmallTable
ON BigTable.ID = SmallTable.ID
WHERE BigTable.VALUATION_DATE IN
(SELECT MAX(VALUATION_DATE)
FROM BigTable INNER JOIN SmallTable
ON BigTable.ID = SmallTable.ID)
GROUP BY BigTable.ID
Only when you are doing cross joins, and natural joins do you not use the ON keyword and only use the WHERE command.
You should avoid the where clause and use the ON Clause
SELECT SUM(BigTable.VALUE)
FROM BigTable
INNER JOIN SmallTable ON BigTable.ID = SmallTable.ID
AND BigTable.VALUATION_DATE = (
SELECT MAX(VALUATION_DATE)
FROM BigTable)
and youn should not use a group by id ..
Use window functions:
SELECT b.ID, b.VALUE
FROM (SELECT b.*,
ROW_NUMBER() OVER (PARTITION BY b.id ORDER BY b.VALUATION_DATE DESC) as seqnum
FROM BigTable b
) b JOIN
SmallTable s
ON b.ID = s.ID
WHERE b.seqnum = 1;
I don't think aggregation is necessary. But, if you have multiple values on the same date for the same id, then:
SELECT b.ID, SUM(b.VALUE)
FROM (SELECT b.*,
RANK() OVER (PARTITION BY b.id ORDER BY b.VALUATION_DATE DESC) as seqnum
FROM BigTable b
) b JOIN
SmallTable s
ON b.ID = s.ID
WHERE b.seqnum = 1
GROUP BY b.id;

Oracle-Join with same table multiple times with different where condition

Here is my case,
SELECT
A.TAB1_COL1,B.TAB2_COL4,C.TAB2_COL4
FROM TABLE1 A,
LEFT OUTER JOIN
(SELECT * FROM
(SELECT TAB2_COL1, TAB2_COL2, TAB2_COL4, ROW_NUMBER() OVER (PARTITION BY TAB2_COL1,TAB2_COL2 ORDER BY TAB2_COL3 DESC ) AS ROW_NUM
FROM TABLE2
WHERE TAB2_COL2=2
) WHERE ROW_NUM=1
) B ON A.TAB1_COL1=B.TAB2_COL1
LEFT OUTER JOIN
(SELECT * FROM
(SELECT TAB2_COL1, TAB2_COL2, TAB2_COL4, ROW_NUMBER() OVER (PARTITION BY TAB2_COL1,TAB2_COL2 ORDER BY TAB2_COL3 DESC ) AS ROW_NUM
FROM TABLE2 WHERE TAB2_COL2=5
) WHERE ROW_NUM=1
) C ON A.TAB1_COL1=C.TAB2_COL1 AND A.TAB1_COL2=C.TAB2.COL5
LEFT OUTER JOIN
(SELECT * FROM
(SELECT TAB2_COL1, TAB2_COL2, TAB2_COL4, ROW_NUMBER() OVER (PARTITION BY TAB2_COL1,TAB2_COL2 ORDER BY TAB2_COL3 DESC ) AS ROW_NUM
FROM TABLE2 WHERE TAB2_COL2=8
) WHERE ROW_NUM=1
) D ON A.TAB1_COL1=D.TAB2_COL1
This code will work.But, I'm left joining with same table multiple times. In my case, it was around 25 times. Reference table has around 200 million records. Partition to remove dups is taking much time.
Any other effective way of writing to make it process faster. Kindly help.
Thanks
If I understand correctly, you can use conditional aggregation:
select t1.tab1_col1,
max(case when tab2_col2 = 2 then tab2_col4 end),
max(case when tab2_col2 = 5 then tab2_col4 end),
max(case when tab2_col2 = 8 then tab2_col4 end)
from table1 t1 left join
(select t2.*,
row_number() over (partition by tab2_col1, tab2_col2 order by tab2_col3 desc) as seqnum
from table2 t2
) t2
on t1.tab1_col1 = t2.tab2_col1
group by t1.tab1_col1;

SQL: Modifying Inner Join to Select One Row

I have two tables, A and B that I want to inner join on location. However, for each row in A, there are many rows in B whose location matches. I want to end up with at most the same number of rows as in A. Specifically, I want to take the row in B where date is earliest. Here's what I have so far:
SELECT *
FROM A
INNER JOIN B ON A.location = B.location
How would I modify this so that each row in A only gets joined with a single row in B (using the earliest date)?
Attempt:
SELECT *
FROM A
INNER JOIN B ON A.location = B.location
AND B.date = (SELECT MIN(date) FROM B)
Is that the right approach?
You can use the ANSI/ISO standard row_number() function:
SELECT *
FROM A INNER JOIN
(SELECT B.*, ROW_NUMBER() OVER (PARTITION BY B.location ORDER BY B.date) as seqnum
FROM B
) B
ON A.location = B.location AND seqnum = 1;
SELECT TOP(1) * FROM A
INNER JOIN B ON
A.LOCATION=B.LOCATION
ORDER BY B.DATE