I have 2 tables that contain IDs. There will be duplicate IDs in one of the tables and I only want to return one row for each matching ID in table B. For example:
Table A
+-----------+-----------+
| objectIdA | objectIdB |
+-----------+-----------+
| 1 | A |
| 1 | B |
| 1 | D |
| 5 | F |
+-----------+-----------+
Table B
+-----------+
| objectIdA |
+-----------+
| 1 |
| 5 |
+-----------+
Would return:
+-----------+-----------+
| objectIdA | objectIdB |
+-----------+-----------+
| 1 | D |
| 5 | F |
+-----------+-----------+
I only need one entry from Table A that matches Table B. It doesn't matter which row of table A is returned.
I'm using SQL Server.
Thanks.
;WITH CTE
AS (
SELECT B.objectIdA
,A.objectIdB
,ROW_NUMBER() OVER (PARTITION BY B.objectIdA ORDER BY A.objectIdB DESC) rn
FROM TableA A
INNER JOIN TableB B ON A.objectIdA = B.objectIdA
)
SELECT C.objectIdA
,C.objectIdB
FROM CTE
WHERE rn = 1
You can do so,by using a subselect for table a to get one entry per objectIdA group
select b.*,a.[objectIdB]
from b
join
(select [objectIdA], max([objectIdB]) [objectIdB]
from a group by [objectIdA]
) a
on(b.[objectIdA] = a.[objectIdA])
Fiddle Demo
Edit deom comments to get a whole row from tablea you can use a self join for tablea
select b.*,a.*
from b
join a
on(b.[objectIdA] = a.[objectIdA])
join (select [objectIdA], max([objectIdB]) [objectIdB]
from a group by [objectIdA]) a1
on(a.[objectIdA] = a1.[objectIdA]
and
a.[objectIdB] = a1.[objectIdB])
Fiddle Demo 2
SELECT MAX(b.ID) AS ID
,MAX(Value) AS Value
,MAX(OtherCol1) AS OtherCol1
,MAX(OtherCol2) AS OtherCol2
,MAX(OtherCol3) AS OtherCol3
FROM TblA AS a
INNER JOIN TblB AS b ON a.TblBID = b.ID
GROUP BY TblBID
Table A
Table B
Table A Data
Table B Data
Query Result
You should use PARTITION OVER to achieve the results.
SELECT
t.objectIdA,
t.objectIdB
FROM (
SELECT
a.objectIdA,
a.objectIdB,
rowid = ROW_NUMBER() OVER (PARTITION BY a.objectIdA ORDER BY a.objectIdB DESC)
FROM TableA a
INNER JOIN TableB b ON (a.objectIdA = b.objectIdA)
) t
WHERE rowid <= 1
Fiddle Code: http://sqlfiddle.com/#!3/a2ccd/1
Related
I've got myself in a bit of a mess on something I'm doing where I'm trying to get two tables linked together based on multiple bits of info.
I want to link one table to another based on the basic rules of(in this hierarchy)
where main linking is where orderid matches between the two tables
records from table 2 where valid=Y,
from those i want the valid records which has the highest seqn1 number and then from those the one that has the highest seqn2 value
table1
orderid | date | otherinfo
223344 | 22/10/2020 | okokkokokooeodijjf
table2
orderid | seqn1 | seqn2 | valid | additonaldata
223344 | 1 | 3 | y | sdfsfsf
223344 | 2 | 1 | y | sffferfr
223344 | 2 | 2 | y | sfrfrefr -- This row
223344 | 2 | 3 | n | rfrg66rr
223344 | 2 | 4 | n | adwere
223344 | 3 | 4 | n | adwere
so would want the final record to be
orderid | date | otherinfo | seqn1 | seqn2 | valid | additonaldata
223344 | 22/10/2020 | okokkokokooeodijjf | 2 | 2 | y | sfrfrefr
I started off with the code below but I'm not sure I'm doing it right and I can't seem to get it to pay attention to the valid flag when i try to add it in.
SELECT * FROM table1
left JOIN table2
ON table1.orderid = table2.orderid
AND table2.seqn1 = (SELECT MAX(table2.seqn1) FROM table2 WHERE table1.orderid = table2.orderid)
AND table2.seqn2 = (SELECT MAX(table2.seqn2) FROM table2 WHERE table1.orderid = table2.orderid
AND table2.seqn1 = (SELECT MAX(table2.seqn1) FROM table2 WHERE table1.orderid = table2.orderid))
Could someone help me amend the code please.
Use row_number analytic function with partition by orderid and order by SEQNRs in the order you need. No need for multiple subselects. To add more selections for the single row, use CASE to map your values to numbers and order by them also.
Fiddle here.
with l as (
select *,
rank() over(partition by orderid order by seqn1 desc, seqn2 desc) as rn
from line
where valid = 'y'
)
select *
from header as h
join l
on h.orderid = l.orderid
and l.rn = 1
How about something like this:
;
with cte_table2 as
(
SELECT ordered
,MAX(seqn1) as seqn1
,MAX(seqn2) as seqn2
FROM table2
where valid = 'y'
group by ordered --check if you need to add 'valid' to the group by but I don't think so.
)
SELECT
t1.*
,t3.otherinfo
--,t3.[OtherFields]
from table1 t1
inner join cte_table2 t2 on t1.orderid = t2.orderid -- first match on id
left join table2 t3 on t3.orderid = t2.orderid and t3.seqn1 = t2.seqn1 and t3.seqn2 = t2.seqn2
so i have reference table (table A) like this
| cust_id | prod |
| 1 | A, B |
| 2 | C, D, E|
This reference table will be joined by transaction history like table (table B)
| trx_id | cust_id | prod | amount
| 1 | 1 | A | 10
| 2 | 1 | B | 5
| 3 | 1 | C | 1
| 4 | 1 | D | 6
i want to get sumup value of table b amount, but the list of products is only obtained from table A.
i tried something like this but doesn't work
SELECT A.cust_id
, SUM(B.amount) AS amount
FROM A
INNER JOIN B ON A.cust_id = B.cust_id
AND B.prod IN(A.prod)
GROUP BY 1
Hmmm . . . Try splitting the prod and join on that:
SELECT A.cust_id, SUM(B.amount) AS amount
FROM A CROSS JOIN
UNNEST(SPLIT(a.prod, ', ')) p JOIN
B
ON A.cust_id = B.cust_id AND B.prod = p
GROUP BY 1;
Note: Storing multiple values in a string is a really bad idea. You can use a separate junction table (one row per customer and product) or use an array.
Below is for BigQuery Standard SQL
#standardSQL
select cust_id,
sum(if(b.prod in unnest(split(a.prod, ', ')), amount, 0)) as amount
from `project.dataset.tableB` b
join `project.dataset.tableA` a
using(cust_id)
group by cust_id
Also, note: for BigQuery - in general - storing multiple values in a string is a really good idea. :o)
See Denormalize data whenever possible for more on this
Let me ask you something I've been thinking about for a while. Imagine that you have two tables with data:
MAIN TABLE (A)
| ID | Date |
|:-----------|------------:|
| 1 | 01-01-1990|
| 2 | 01-01-1991|
| 3 | 01-01-1992|
| 4 | 01-01-2000|
| 5 | 01-01-2001|
| 6 | 01-01-2003|
SECONDARY TABLE (B)
| ID | Date | TOTAL |
|:-----------|------------:|--------:|
| 1 | 01-01-1990| 1 |
| 2 | 01-01-1991| 2 |
| 3 | 01-01-1992| 1 |
| 4 | 01-01-2000| 5 |
| 5 | 01-01-2001| 8 |
| 6 | 01-01-2003| 7 |
and you want to select only ID with date greater than 31-12-1999 and get the following columns: ID, Date and Total. For that we have many options but my question would be which of the following would be better in terms of performance:
OPTION 1
With main as(
select id,
date
from A
where date > '31-12-1999'
)
select main.id,
main.date,
B.total
from main inner join B on main.id = b.id
OPTION 1
With main as(
select id,
date
from A
where date > '31-12-1999'
),
secondary as (
select id,
total
from B
where date > '31-12-1999'
)
select main.id,
main.date,
secondary.total
from main inner join secondary on main.id = b.id
Which of both queries would be better in terms of performance? Thanks in advance!
DATE FOR BOTH TABLES MEANS THE SAME
You don't need to use CTE you can directly join two tables -
select A.id,
A.date,
B.total
from A inner join B on A.id = b.id
where A.date > '31-12-1999'
You would need to test on your data. But there is really no need for CTEs:
select a.id a.date, b.total
from a inner join
b
on a.id = b.id
where a.date > '1999-12-31' and b.date > '1999-12-31';
As for your specific question, the two queries are not the same, because the first is filtering on only one date and the second is filtering on two dates. You should run the query that implements the logic that you intend.
I have two tables:
Table A
ID | name
---+----------
1 | example
2 | example2
Table B (created field is timestamptz)
ID | id_table_a | dek | created
---+------------+------+---------------------
1 | 1 | deka | 2019-10-21 10:00:00
2 | 2 | dekb | 2019-10-21 11:00:00
3 | 1 | dekc | 2019-10-21 09:00:00
4 | 2 | dekd | 2019-10-21 09:40:00
5 | 1 | deke | 2019-10-21 09:21:00
I need to get records from Table A and each records should have the last dek from table b based on created.
How can I do that?
I would use a lateral join, very often this is faster than using a select max()
select a.*, b.dek
from table_a a
join lateral (
select id, id_table_a, dek, created
from table_b
where b.id_table_a = a.id
order by created desc
limit 1
) tb on true;
Another alternative is to use distinct on:
select a.*, b.dek
from table_a a
join lateral (
select distinct on (id_table_a) id, id_table_a, dek, created
from table_b
order by id_table_a, created desc
) tb on tb.id_table_a = a.id;
It depends on your data distribution which one is faster.
With a CTE returning the joined tables and NOT EXISTS:
with cte as (
select a.id, a.name, b.dek, b.created
from tablea a inner join tableb b
on b.id_table_a = a.id
)
select t.* from cte t
where not exists (
select 1 from cte
where id = t.id and created > t.created
)
I have table with descriptions of smth. For example:
My_Table
id description
================
1 ABC
2 ABB
3 OPAC
4 APEЧ
I need to get all unique symbols from all "description" columns.
Result should look like that:
symbol
================
A
B
C
O
P
E
Ч
And it shoud work for all languages, so, as I see, regular expressions cant help.
Please help me. Thanks.
with cte (c,description_suffix) as
(
select substr(description,1,1)
,substr(description,2)
from mytable
where description is not null
union all
select substr(description_suffix,1,1)
,substr(description_suffix,2)
from cte
where description_suffix is not null
)
select c
,count(*) as cnt
from cte
group by c
order by c
or
with cte(n) as
(
select level
from dual
connect by level <= (select max(length(description)) from mytable)
)
select substr(t.description,c.n,1) as c
,count(*) as cnt
from mytable t
join cte c
on c.n <= length(description)
group by substr(t.description,c.n,1)
order by c
+---+-----+
| C | CNT |
+---+-----+
| A | 4 |
| B | 3 |
| C | 2 |
| E | 1 |
| O | 1 |
| P | 2 |
| Ч | 1 |
+---+-----+
Create a numbers table and populate it with all the relevant ids you'd need (in this case 1..maxlength of string)
SELECT DISTINCT
locate(your_table.description, numbers.id) AS symbol
FROM
your_table
INNER JOIN
numbers
ON numbers.id >= 1
AND numbers.id <= CHAR_LENGTH(your_table.description)
SELECT DISTINCT(SUBSTR(ll,LEVEL,1)) OP --Here DISTINCT(SUBSTR(ll,LEVEL,1)) is used to get all distinct character/numeric in vertical as per requirment
FROM
(
SELECT LISTAGG(DES,'')
WITHIN GROUP (ORDER BY ID) ll
FROM My_Table --Here listagg is used to convert all values under description(des) column into a single value and there is no space in between
)
CONNECT BY LEVEL <= LENGTH(ll);