For this table:
mysql> select * from work;
+------+---------+-------+
| code | surname | name |
+------+---------+-------+
| 1 | John | Smith |
| 2 | John | Smith |
+------+---------+-------+
I'd like to get the pair of code where the names are equal, so I do this:
select distinct A.code, B.code from work A, work B where A.name = B.name group by A.code, B.code;
However, I get the follow result back:
+------+------+
| code | code |
+------+------+
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
| 2 | 2 |
+------+------+
As you can see, This result has 2 duplicates, obviously from a cartesian product. I'd like to find out how I can do this such that it outputs only:
+------+------+
| code | code |
+------+------+
| 1 | 2 |
+------+------+
Any clue? Thanks!
This should work (assuming code is the primary key):
SELECT A.code, B.code
FROM work A, work B
WHERE A.name = B.name AND A.code < B.code
try this
Select A.Code, B.Code
From work a
Join work b
On A.surname = b.surname
And A.Name = B.Name
And A.Code > B.Code
You need to use A.Code > B.Code rather than != to eliminate dupes of the type
{1, 2} and {2, 1}
(If you only care about when the name is the same and not the surname, eliminate that predicate from the join condition)
Related
Say I have a table like this:
+----+-------+
| id | value |
+----+-------+
| 1 | a |
| 1 | b |
| 2 | c |
| 2 | d |
| 3 | e |
| 3 | f |
+----+-------+
And I want to select all rows with id that are not a, and change their id to a; select all rows with id that are not b, and change the id to b; and select all rows with id that are not c, and change their id to c.
Here is the output I want:
+----+-------+
| id | value |
+----+-------+
| 1 | c |
| 1 | d |
| 1 | e |
| 1 | f |
| 2 | a |
| 2 | b |
| 2 | e |
| 2 | f |
| 3 | a |
| 3 | b |
| 3 | c |
| 3 | d |
+----+-------+
The only solution I can think of is through cross join and distinct:
select distinct a.id, b.value
from table a
cross join table b
where a.id != b.id
Is there any other way to avoid such expensive operation?
I think the typical way to write this is to generate all pairs of id and value and then remove the ones that exist:
select i.id, v.value
from (select distinct id from t) i cross join
(select distinct value from t) v left join
t
on t.id = i.id and t.value = i.value
where t.id is null;
First, I don't think this is what your query does. But this is what you seem to be describing.
From a performance perspective, you might have other sources for i and v that don't require subqueries. If so, use those for performance.
Finally, I don't think you can do much to improve the performance of this, apart from using explicit tables -- and perhaps having appropriate indexes on all the tables.
I have query problem. I have 3 table.
table A
----------------------------
NAME | CODE
----------------------------
bob | PL
david | AA
susan | PL
joe | AB
table B
----------------------------
CODE | DESCRIPTION
----------------------------
PL | code 1
PB | code 2
PC | code 3
table C
----------------------------
CODE | DESCRIPTION
----------------------------
AA | code 4
AB | code 5
AC | code 6
Table B and C have unique row.
the result I need :
----------------------------
NAME | CODE | DESCRIPTION
----------------------------
bob | PL | code 1
david | AA | code 4
susan | PL | code 1
joe | AB | code 5
What I have tried so far
http://sqlfiddle.com/#!9/ffb2eb/9
You are close. I think you just need COALESCE():
select A.*, coalesce(B.DESCRIPTION, C.DESCRIPTION) as description
from A left join
B
on A.CODE = B.CODE left join
C
on A.CODE = C.CODE
order by A.NAME;
I think UNION will make this. And additionally It will also remove duplicates if some will exists.
SELECT A.NAME , UN.CODE ,UN.DESCRIPTION
FROM A,
(SELECT CODE,DESCRIPTION FROM B
UNION
SELECT CODE,DESCRIPTION FROM C ) UN
WHERE A.CODE = UN.CODE;
I am trying to make a query that returns multiple fields, keeping the first 3 as distinct columns and returns values for the last modified date. Some of the variables in the query fields should come from more than one table and one of them has a True/False criterion too. The three 3 distinct fields are needed because the combination of these is associated with the other returning parameters.
The tables look roughly as follows...
Table a:
ID | Sc | Country | TechID | VarA | ... | VarX(T/F) | LastModified
1 | 1 | AA | 1 | x | ... | T | 1-1-2017
2 | 1 | AA | 1 | z | ... | T | 1-1-2017
3 | 1 | AA | 2 | y | ... | T | 1-1-2018
4 | 1 | AB | 1 | u | ... | T | 1-1-2017
5 | 2 | AB | 2 | v | ... | T | 1-1-2018
6 | 3 | AB | 1 | w | ... | F | 1-1-2018
Table b:
TechID | TechName | Categ | Units
1 | Tech1 | Cat1 | M
2 | Tech2 | Cat2 | N
3 | Tech3 | Cat3 | P
The idea is that the query returns something like this (when the T/F criterion is met). Where the combination of Sc-Country-Tech shows up only once, with the last modified having presedence:
Sc' | Country' |TechName'| Units | Cat | VarA... | LastModified |
1 | AA | 1 | ... | ... | ... | 1-1-2018
1 | AB | 2 | ... | ... | ... | 1-1-2017
2 | AB | 1 | ... | ... | ... | 1-1-2018
So far I've tried a few SQL lines to no avail. First, with Select DISTINCT but the option was too "all inclusive".
SELECT DISTINCT a.Sc, a.Country, b.TechName, b.Units, b.Cat, a.VarA,..,a.VarX, Max(a.LastModified) AS MaxOfLastModified
FROM a INNER JOIN (b INNER JOIN a ON b.TechName =
a.TechID) ON b.Cat = a.TechID
GROUP BY a.Sc, a.Country, b.TechName, b.Units, b.Cat, a.VarA,..,a.VarX
HAVING (((a.VarX)=True));
Also tried this but it prompts errors related to aggregate functions:
SELECT a.Sc, a.Country, b.TechID, b.Units, b.Cat, a.VarA,..,a.VarX, Max(a.LastModified) AS MaxOfLastModified
FROM a INNER JOIN (b INNER JOIN a ON b.TechName =
a.TechID) ON b.Cat = a.TechID
GROUP BY a.Sc, a.Country, a.TechID
HAVING (((a.VarX)=True));
Any thoughts/suggestions on how to go about this?? Any pointers to previous related answers are also much appreciated.
Thanks in advance! :)
EDIT (2017.09.29):
This certainly cleared things up a bit!
I managed to get the query going with some of the fields, only when calling fields from a single table with the following:
SELECT a.Sc, a.Country, a.Tech, a.LastModified, a.VarA
FROM a INNER JOIN (SELECT Sc, Country, Tech, max(LastModified) AS lm FROM a GROUP BY Sc, Country, Tech) AS dt ON (dt.lm=a.LastModified) AND (dt.Tech=a.Tech) AND (dt.Country=a.Country) AND (dt.Sc=a.Sc)
GROUP BY a.Sc, a.Country, a.Tech, a.LastModified, a.VarA, a.VarX
HAVING (((a.VarX)=Yes));
I'm still running into a syntax error on JOIN when trying to add fields from a lookup table using the INNER JOIN command as suggested. The code I tried looked something like:
SELECT a.Sc, a.Country, a.Tech, a.LastModified, a.VarA b.TechCategory
FROM a INNER JOIN (SELECT Sc, Country, Tech, max(LastModified) AS lm FROM a GROUP BY Sc, Country, Tech) AS dt ON (dt.lm=a.LastModified) AND (dt.Tech=a.Tech) AND (dt.Country=a.Country) AND (dt.Sc=a.Sc)
INNER JOIN b ON Tech.Category=a.Tech
GROUP BY a.Sc, a.Country, a.Tech, b.TechCategory, a.LastModified, a.VarA, a.VarX
HAVING (((a.VarX)=Yes));
Any additional pointers are much appreciated!
Use an aggregate query to get the maximum date for each combination of Sc, Country, and TechID, then use this as a subquery and join it back to tables a and b to get the data in your final query. Something like this:
select
a.Sc, a.Country, b.TechName,
b.Units, b.Category, b.Units, a.VarA, a.LastModified
from
(Table_a as a
inner join (
select Sc, Country, TechID, max(LastModified) as lm
from Table_a
group by Sc, Country, TechID
) as dt on dt.Sc=a.Sc and dt.Country=a.Country and dt.TechID=a.TechID and dt.lm=a.LastModified)
inner join Table_b as b on b.TechID=a.TechID
I'm trying to filter values from an array. The information, which values should be kept, are in another table.
table_a table_b
___________________ ___________
| id | values | | keyword |
------------------- -----------
| 1 | [a, b, c] | | b |
| 2 | [d, e, f] | | e |
| 3 | [a, g] | | f |
------------------- -----------
I expect the following output:
output
________________________
| id | filtered_values |
------------------------
| 1 | [b] |
| 2 | [e, f] |
| 3 | [] |
------------------------
At the moment, I am using following query:
SELECT
id,
array_intersect(ta.values, tb.filter_keywords) AS filtered_values -- brickhouse UDF
FROM
table_a ta
CROSS JOIN (
SELECT
collect_set(keyword) as filter_keywords
FROM (
SELECT
"dummy" as grouping_dummy,
keyword
FROM
table_b
) tmp
GROUP BY
grouping_dummy
)
table_a has a couple million rows, table_b contains less than 1000 rows.
I guess the cross join is the bottleneck, because it uses only one reducer.
Is there any way to optimize this query?
Thanks!
I have a different assumption.
The reducer is needed in order to generate filter_keywords, not for the CROSS JOIN which is a map side operation.
So no problem here.
My guess is that the performance penalty comes from the use of array_intersect with an array of 1000 elements, therefor the solution would be avoiding it.
P.s.
There is no need for grouping_dummy.
You don't need to use GROUP BY in order to use aggregate functions.
select a.id
,collect_list (case when b.keyword is not null then a.val end) as vals
from (select a.id
,e.val
from table_a a
lateral view outer
explode (a.vals) e as val
) a
left join table_b b
on b.keyword =
a.val
group by a.id
+----+-----------+
| id | vals |
+----+-----------+
| 1 | ["b"] |
| 2 | ["e","f"] |
| 3 | [] |
+----+-----------+
This is probably a standard problem, and I've keyed off some other greatest-n-per-group answers, but so far been unable to resolve my current problem.
A B C
+----+-------+ +----+------+ +----+------+-------+
| id | start | | id | a_id | | id | b_id | name |
+----+-------+ +----+------+ +----+------+-------+
| 1 | 1 | | 1 | 1 | | 1 | 1 | aname |
| 2 | 2 | | 2 | 1 | | 2 | 2 | aname |
+----+-------+ | 3 | 2 | | 3 | 3 | aname |
+----+------+ | 4 | 3 | bname |
+----+------+-------+
In English what I'd like to accomplish is:
For each c.name, select its newest entry based on the start time in a.start
The SQL I've tried is the following:
SELECT a.id, a.start, c.id, c.name
FROM a
INNER JOIN (
SELECT id, MAX(start) as start
FROM a
GROUP BY id
) a2 ON a.id = a2.id AND a.start = a2.start
JOIN b
ON a.id = b.a_id
JOIN c
on b.id = c.b_id
GROUP BY c.name;
It fails with errors such as:
ERROR: column "a.id" must appear in the GROUP BY clause or be used in an aggregate function Position: 8
To be useful I really need the ids from the query, but cannot group on them since they are unique. Here is an example of output I'd love for the first case above:
+------+---------+------+--------+
| a.id | a.start | c.id | c.name |
+------+---------+------+--------+
| 2 | 2 | 3 | aname |
| 2 | 2 | 4 | bname |
+------+---------+------+--------+
Here is a Sqlfiddle
Edit - removed second case
Case 1
select distinct on (c.name)
a.id, a.start, c.id, c.name
from
a
inner join
b on a.id = b.a_id
inner join
c on b.id = c.b_id
order by c.name, a.start desc
;
id | start | id | name
----+-------+----+-------
2 | 2 | 3 | aname
2 | 2 | 4 | bname
Case 2
select distinct on (c.name)
a.id, a.start, c.id, c.name
from
a
inner join
b on a.id = b.a_id
inner join
c on b.id = c.b_id
where
b.a_id in (
select a_id
from b
group by a_id
having count(*) > 1
)
order by c.name, a.start desc
;
id | start | id | name
----+-------+----+-------
1 | 1 | 1 | aname