SQL select all rows that are not equal to an id, and replace the id column with the value - without cross join - sql

Say I have a table like this:
+----+-------+
| id | value |
+----+-------+
| 1 | a |
| 1 | b |
| 2 | c |
| 2 | d |
| 3 | e |
| 3 | f |
+----+-------+
And I want to select all rows with id that are not a, and change their id to a; select all rows with id that are not b, and change the id to b; and select all rows with id that are not c, and change their id to c.
Here is the output I want:
+----+-------+
| id | value |
+----+-------+
| 1 | c |
| 1 | d |
| 1 | e |
| 1 | f |
| 2 | a |
| 2 | b |
| 2 | e |
| 2 | f |
| 3 | a |
| 3 | b |
| 3 | c |
| 3 | d |
+----+-------+
The only solution I can think of is through cross join and distinct:
select distinct a.id, b.value
from table a
cross join table b
where a.id != b.id
Is there any other way to avoid such expensive operation?

I think the typical way to write this is to generate all pairs of id and value and then remove the ones that exist:
select i.id, v.value
from (select distinct id from t) i cross join
(select distinct value from t) v left join
t
on t.id = i.id and t.value = i.value
where t.id is null;
First, I don't think this is what your query does. But this is what you seem to be describing.
From a performance perspective, you might have other sources for i and v that don't require subqueries. If so, use those for performance.
Finally, I don't think you can do much to improve the performance of this, apart from using explicit tables -- and perhaps having appropriate indexes on all the tables.

Related

How to query all values which share the same value in another field as the value on that row?

Sorry for the awkwardly-phrased title; I couldn't think of another way to describe this problem!
I have a table (lets call it table1) which looks roughly like this:
OFFER | DEAL
------------
A | 1
B | 1
C | 1
D | 2
E | 2
F | 3
I want to write a query which lists all offers and their deal number, along with an additional field showing any offers which share that deal number.
In other words, the results should look like this:
OFFER | DEAL | SHARED
---------------------
A | 1 | B
A | 1 | C
B | 1 | A
B | 1 | C
C | 1 | A
C | 1 | B
D | 2 | E
E | 2 | D
Does anyone know how to do this please?
You can use a self-join:
select t1.*, t2.offer
from t t1 join
t t2
on t1.deal = t2.deal and t1.offer <> t2.offer;

Iterate over the rows of a second table to return resultset with cumulative sum

Yesterday, after the help of a SO user #
Iterate over the rows of a second table to return resultset
I was able to make a combination of rows with a selfjoin.
After some modifications, to adapt to my implementation, I faced a new challenge that I'm stuck: how to make an aggregate sum of a third column?
My issue is better explained in the image below:
Based on the code
SELECT
b1.table_a_id,
b1.label_x,
b2.label_y
FROM table_a a
INNER JOIN table_b b1
ON b1.table_a_id = a.table_a_id
INNER JOIN table_b b2
ON b2.table_a_id = b1.table_a_id AND
b2.label_y > b1.label_x
ORDER BY
b1.table_a_id,
b1.label_x,
b2.label_y;
I was able to acquire the combinations.
What should be the next step to get the cumulative sum based on a third column?
I couldn't think of a solution without using a second service, such as python with pandas, using a cumsum function.
To generate the expected resultset, you would need to join the table with itself with an inequality condition on the order column. Then, you can do a window sum:
select
t1.table_a_id,
t1.label_x,
t2.label_y,
sum(t2.value) over(
partition by t1.table_a_id, t1.label_x
order by t1."order", t2."order"
) agg_value
from
table_b t1
inner join table_b t2
on t1.table_a_id = t2.table_a_id
and t2."order" >= t1."order"
order by t1."order", t2."order"
Note: order is a reserved word, so it needs to be quoted; if you actual database column has a different name, you can remove the double quotes.
Demo on DB Fiddle:
TABLE_A_ID | LABEL_X | LABEL_Y | AGG_VALUE
---------: | :------ | :------ | --------:
1 | A | B | 1
1 | A | C | 3
1 | A | D | 6
1 | A | E | 10
1 | A | F | 15
1 | B | C | 2
1 | B | D | 5
1 | B | E | 9
1 | B | F | 14
1 | C | D | 3
1 | C | E | 7
1 | C | F | 12
1 | D | E | 4
1 | D | F | 9
1 | E | F | 5
You seem to want a cumulative sum:
SELECT b1.table_a_id, b1.label_x, b2.label_y,
SUM(b1.value) OVER (PARTITION BY b1.table_a_id, b1.label_x
ORDER BY b2.order
) as AGG_VALUE

SQL JOIN two table & show all rows for table A

I have a question about JOIN.
TABLE A | TABLE B |
-----------------------------------------|
PK | div | PK | div | val |
-----------------------------------------|
A | a | 1 | a | 10 |
B | b | 2 | a | 100 |
C | c | 3 | c | 9 |
------------------| 4 | c | 99 |
-----------------------
There are two tables something like above, and I have been trying to join two tables but I want to see all rows from TABLE A.
Something like
SELECT T1.PK, T1.div, T2.val
FROM A T1
LEFT OUTER JOIN B T2
ON T1.div = T2.div
and I want the result would look like this below.
PK | div | val |
-------------------------
A | a | 10 |
A | a | 100 |
B | null | null |
C | c | 9 |
C | c | 99 |
I have tried all JOINs I know but B doesn't appear because it doesn't exist. Is it possible to show all rows on TABLE A and just show null if it doesn't exists on TABLE B?
Thanks in advance!
If you change your query to
SELECT T1.PK, T2.div, T2.val
FROM A T1
LEFT OUTER JOIN B T2
ON T1.div = T2.div
(Note, that div comes from T2 here.), you'll get exactly the result posted (but maybe in a different order, add an ORDER BY clause if you want a specific order).
Your query as it stands will get you:
PK | div | val |
-------------------------
A | a | 10 |
A | a | 100 |
B | b | null |
C | c | 9 |
C | c | 99 |
(Note, that div is b for the row with the PK of B, not null.)
To get to your resultset, all you need to do is use T2.Div as that is the value that does not exist in the second table:
SELECT T1.PK, T2.div, T2.val
FROM A T1
LEFT OUTER JOIN B T2
ON T1.div = T2.div

Filter array depending on other table

I'm trying to filter values from an array. The information, which values should be kept, are in another table.
table_a table_b
___________________ ___________
| id | values | | keyword |
------------------- -----------
| 1 | [a, b, c] | | b |
| 2 | [d, e, f] | | e |
| 3 | [a, g] | | f |
------------------- -----------
I expect the following output:
output
________________________
| id | filtered_values |
------------------------
| 1 | [b] |
| 2 | [e, f] |
| 3 | [] |
------------------------
At the moment, I am using following query:
SELECT
id,
array_intersect(ta.values, tb.filter_keywords) AS filtered_values -- brickhouse UDF
FROM
table_a ta
CROSS JOIN (
SELECT
collect_set(keyword) as filter_keywords
FROM (
SELECT
"dummy" as grouping_dummy,
keyword
FROM
table_b
) tmp
GROUP BY
grouping_dummy
)
table_a has a couple million rows, table_b contains less than 1000 rows.
I guess the cross join is the bottleneck, because it uses only one reducer.
Is there any way to optimize this query?
Thanks!
I have a different assumption.
The reducer is needed in order to generate filter_keywords, not for the CROSS JOIN which is a map side operation.
So no problem here.
My guess is that the performance penalty comes from the use of array_intersect with an array of 1000 elements, therefor the solution would be avoiding it.
P.s.
There is no need for grouping_dummy.
You don't need to use GROUP BY in order to use aggregate functions.
select a.id
,collect_list (case when b.keyword is not null then a.val end) as vals
from (select a.id
,e.val
from table_a a
lateral view outer
explode (a.vals) e as val
) a
left join table_b b
on b.keyword =
a.val
group by a.id
+----+-----------+
| id | vals |
+----+-----------+
| 1 | ["b"] |
| 2 | ["e","f"] |
| 3 | [] |
+----+-----------+

Access Queries comparing two tables

I have two tables in Access, Table A and Table B:
Table MasterLockInsNew:
+----+-------+----------+
| ID | Value | Date |
+----+-------+----------+
| 1 | 123 | 12/02/13 |
| 2 | 1231 | 11/02/13 |
| 4 | 1265 | 16/02/13 |
+----+-------+----------+
Table InitialPolData:
+----+-------+----------+---+
| ID | Value | Date |Type
+----+-------+----------+---+
| 1 | 123 | 12/02/13 | x |
| 2 | 1231 | 11/02/13 | x |
| 3 | 1238 | 10/02/13 | y |
| 4 | 1265 | 16/02/13 | a |
| 7 | 7649 | 18/02/13 | z |
+----+-------+----------+---+
All I want are the rows from table B for IDs not contained in A. My current code looks like this:
SELECT Distinct InitialPolData.*
FROM InitialPolData
WHERE InitialPolData.ID NOT IN (SELECT Distinct InitialPolData.ID
from InitialPolData INNER JOIN
MasterLockInsNew
ON InitialPolData.ID=MasterLockInsNew.ID);
But whenever I run this in Access it crashes!! The tables are fairly large but I don't think this is the reason.
Can anyone help?
Thanks
or try a left outer join:
SELECT b.*
FROM InitialPolData b left outer join
MasterLockInsNew a on
b.id = a.id
where
a.id is null
Simple subquery will do.
select * from InitialPolData
where id not in (
select id from MasterLockInsNew
);
Try using NOT EXISTS:
SELECT Distinct i.*
FROM InitialPolData AS i
WHERE NOT EXISTS (SELECT 1
FROM MasterLockInsNew AS m
WHERE m.ID = i.ID)