How to combine two columns on the same table using Hive

How to combine two columns on the same table using Hive - sql

Right now I have:
Scorecard
team1
team2
Winner
Margin
Ground
Match Date
Year
ODI # 1
Australia
England
Australia
5 wickets
Melbourne
5-Jan-71
1971
ODI # 2
England
Australia
England
6 wickets
Manchester
24-Aug-72
1972
ODI # 3
England
Australia
Australia
5 wickets
Lord's
26-Aug-72
1972
ODI # 4
England
Australia
England
2 wickets
Birmingham
28-Aug-72
1972
ODI # 5
New Zealand
Pakistan
New Zealand
22 runs
Christchurch
11-Feb-73
1973
And what I want to is combine team1 and team2 and then get distant list
Example based on what I have above:
teams
Australia
England
New Zealand
Pakistan
I am using Cloudera Hive- I was trying to get a union to work.
I also tried:
SELECT concat_ws('^',(SPLIT('${team1,team2}',',')));
However, the output is just giving me:
${team1^team2}

easiet way would be to use union:
select team1 as teams from tablename
union distinct
select team2 from tablename
Here is another ways using sub query :
Select distinct teams from (
select team1 as teams from tablename
union
select team2 from tablename
) t

Related

Constructing the SQL query below

GOALS (~1700 rows)
YEAR COUNTRY NAME NUM_GOALS
-------------------------------------------
2018 England Harry Kane 6
2018 France Antoine Griezmann 4
2014 Argentina Lionel Messi 4
2014 Brazil Fred 1
2010 Germany Thomas Muller 5
2010 Japan Shinji Okazaki 1
1992 England Gary Linekar 6
CHAMPIONS (~500 rows)
YEAR COUNTRY NAME ROLE
-------------------------------------------------
2018 France Didier Deschamps Manager
2018 France Hugo Lloris Goalkeeper
2018 France Paul Pogba Midfielder
2014 Germany Joachim Loew Manager
2014 Germany Mesut Ozil Midfielder
2014 Germany Miroslav Klose Forward
2002 Brazil Da Silva Midfielder
1994 Brazil Da Silva Midfielder
1998 France Didier Deschamps Midfielder
Write a query showing all world cup winning players who have never scored a goal.
What I am unsure about is whether to use a join for this and whether there is a need to specify and ID's if a join is to be used.
I'd be grateful for extra clarification and help with this, or if my query needs any tweaking.
What I have tried:
This is what I came up with:
SELECT GOALS.NAME
FROM GOALS
INNER JOIN CHAMPIONS ON CHAMPIONS.COUNTRY = GOALS.NAME
WHERE GOALS.NUM_GOALS = 0;

Problems with your query:
the join condition does not look right
even if it was, it searches for players that had at least one world cup without scoring a goal - which is different from those that never scored a goal
You could use not exists:
select c.*
from champions c
where not exists (
select 1
from goals g
where g.country = c.country and g.name = c.name and g.num_goals > 0
)
This assumes that (country, name) tuples do identify a player.
On the other hand, if you want players that won a world cup without scoring a goal in that particular event, then you can either add a correlation condition on year, or use a straight join:
select c.*
from champions c
inner join goals g
on g.country = c.country
and g.name = c.name
and g.year = c.year
where g.num_goals = 0

Your ON condition is comparing CHAMPIONS.COUNTRY = GOALS.NAME, which is not a good comparison for joining these two tables. I would suggest doing this:
SELECT
GOALS.NAME
FROM
GOALS
INNER JOIN
CHAMPIONS
ON
CHAMPIONS.COUNTRY = GOALS.COUNTRY
WHERE
GOALS.NUM_GOALS = 0;

How to select data in SQL based on a filter which changes if there is no data in a specific table column?

I have tables similar to the three below. I need to join the first two tables based on id, and then join the third table based on second name. However the last table needs a filter where the city should be equal to London unless age is empty in which case the city should equal Manchester.
I tried the code below using CASE statement but it is not working. I am new to SQL so I was not sure how can I combine a where statement with an if clause where the filter for the selection changes depending on whether there is data in a different column than the one used to filter by. The DBMS I am using Toad for Oracle.
FIRST.NAME.TABLE
ID FIRST_NAME ENTRY_DATE
1 JOHN 09/09/2019
2 NICOLA 09/09/2019
3 PATRICK 05/09/2019
4 JOAN 01/09/2019
5 JAKE 09/09/2019
6 AMELIA 01/09/2019
7 CAMERON 09/09/2019
SECOND.NAME.TABLE
ID SECOND_NAME ENTRY_DATE
1 BROWN 09/09/2019
2 SMITH 09/09/2019
3 COLE 05/09/2019
4 HOUSTON 01/09/2019
5 FARRIS 09/09/2019
6 HATHAWAY 01/09/2019
7 JONES 09/09/2019
CITY.AGE.TABLE
CITY SECOND_NAME AGE
LONDON BROWN 24.00
LONDON SMITH
MANCHESTER COLE 30.00
MANCHESTER HOUSTON 66.00
LONDON FARRIS
LONDON HATHAWAY 32.00
GLASGOW JONES 28.00
MANCHESTER SMITH 32.00
LONDON FARRIS 62.00
SELECT FN.ID,
FN.FIRST_NAME,
SN.SECOND_NAME,
AC.CITY,
AC.AGE
FROM FIRST.NAME.TABLE AS FN
INNER JOIN SECOND.NAME.TABLE SN
ON FN.ID=SN.ID
INNER JOIN CITY.AGE.TABLE AS CA
ON SN.SECOND NAME=AC.SECOND_NAME
WHERE FN.ENTRY_DATE='09-SEP-19'
AND SN.ENTRY_DATE='09-SEP-19'
AND (CASE WHEN AC.CITY='LONDON' AND AC.AGE IS NOT NULL
THEN AC.CITY='LONDON'
ELSE AS.CITY='MANCHESTER' END)

You can express this as boolean logic:
WHERE FN.ENTRY_DATE = DATE '2019-09-09' AND
SN.ENTRY_DATE = DATE '2019-09-09' AND
(AC.AGE IS NOT NULL AND AC.CITY = 'LONDON' OR
AC.AGE IS NULL AND AC.CITY = 'MANCHESTER'
)
This answers your question about how to implement the logic using SQL. However, I'm not sure that is the logic that you really want. I speculate that you really want a LEFT JOIN to the age table.

using oracle when join two subqueries together changed the contents of the first query

when i tried to join two subqueries together, the value in first query changed. Any idea why? Thanks!
here is the code:
SELECT *
FROM ((
SELECT HOME, VISITOR, COUNT("result") AS HGOALS
FROM(
SELECT HOME, VISITOR, "result"
FROM ENGLAND
WHERE TIER = 1 AND "SEASON" >= 1980 AND "result" = 'H'
)
GROUP BY HOME, VISITOR
ORDER BY HGOALS DESC)
JOIN
(SELECT HOME, VISITOR, COUNT("result") AS AGOALS
FROM(
SELECT HOME, VISITOR, "result"
FROM ENGLAND
WHERE TIER = 1 AND "SEASON" > 1980 AND "result" = 'A'
)
GROUP BY HOME, VISITOR) USING (VISITOR, HOME))
ORDER BY AGOALS DESC;
the part of output is
Manchester United Aston Villa 5 18
Arsenal West Ham United 5 17
Arsenal Aston Villa 6 17
Manchester United Everton 12 16
Liverpool Aston Villa 8 16
but when i execute only the first part of JOIN, which is
SELECT HOME, VISITOR, COUNT("result") AS HGOALS
FROM(
SELECT HOME, VISITOR, "result"
FROM ENGLAND
WHERE TIER = 1 AND "SEASON" >= 1980 AND "result" = 'H'
)
GROUP BY HOME, VISITOR
ORDER BY HGOALS DESC
the part of result is:
Manchester United Tottenham Hotspur 27
Arsenal Everton 26
Manchester United Aston Villa 26
Liverpool Tottenham Hotspur 25
Manchester United West Ham United 24
note that for
Manchester United Aston Villa 5 18
Manchester United Aston Villa 26
the result I should get is 26, but it changed to 5 when I join two subqueries together.
why?

My your query is so complicated. Doesn't this do what you want?
SELECT HOME, VISITOR,
SUM(CASE WHEN result = 'H' THEN 1 ELSE 0 END) as HGOALS,
SUM(CASE WHEN result = 'A' THEN 1 ELSE 0 END) as AGOALS
FROM ENGLAND
WHERE TIER = 1 AND SEASON >= 1980
GROUP BY HOME, VISITOR
ORDER BY AGOALS DESC;
Note: You may need the quotes around result, depending on how the column is defined.

Same-table Tree Table Query in SQL Server

I've searched but found nothing that could help.
I have the following table in a SQL Server 2005 database:
Parent Child Value
---- -------- ---------
America Mexico 8
America Canada 1
Asia Japan 5
Asia Korea 7
Europe Spain 0
Europe Italy 2
Africa Zimbabwe 1
Mexico Baja California 0
America USA 3
USA California 1
USA Texas 2
Parent and Child are Primary Key, value is not important (IMO). I would like to create a view that results in something like this:
Parent Child Value
---- -------- ---------
America USA 3
USA California 1
USA Texas 2
I would search for America, and the result will give back every nested child there is, recursively, no matter how many it has, since I could include cities, localities, etc.
What I need is similar to what some call a BOM explosion.

Here is how you can do it:
with cte as (
select parent, child
from t
union all
select cte.parent, t.child
from cte join
t
on cte.child = t.parent
)
select cte.*
from cte
where parent = 'America';
Here is a small SQL Fiddle example.

Sams Teach Yourself SQL in 10 minutes - Question about GROUP BY

i read the book "Sams Teach Yourself SQL in 10 minutes, Third Edition" and in the lesson 10 "Grouping Data", section "Creating Groups", i can't understand the following:
"Aside from the aggregate calculations statements, every column in your SELECT statement must be present in the GROUP BY clause."
Why? I tried this and i think that it is not true.
For example, consider a table 'World' with the columns 'continent', 'country', 'population'.
SELECT continent, country
FROM World
GROUP BY continent;
According to the book, this should lead to an error, right? But it doesn't. I can group my data depending on the continent (so we have at the results 7 continents) and next to each continent, a random country name.
Like this
continent country
North America Canada
South America Brazil
Europe France
Africa Cameroon
Asia Japan
Australia New Zealand
Antarctica TuxLand

You are most probably using MySQL which allows ungrouped and unaggregated expressions in SELECT clause.
This is violation of standard of course.
This is intended to simplify GROUP BY with joins on a PRIMARY KEY:
SELECT a.*, SUM(b.value)
FROM a
JOIN b
ON b.a_id = a.id
GROUP BY
a.id
Normally, you would have either to add all columns from a into the GROUP BY clause or use a subquery.
MySQL allows you not to do it since all values from a are guaranteed to be the same for a given value of the PRIMARY KEY (which is grouped on).

This is correct and should produce no error in some forms of SQL such as MySQL. You may optionally use the GROUP BY statement on more than one column but it's not required.

GROUP BY will list the first result of the columns specified - so in your case, it would return the first country/continent pair.
PostgreSQL and MySQL allow this, using one field for the group by.
The tutorial probably assumes you should use GROUP BY on all fields so from what you select, you don't lose any data - it would show every country/continent in the above example, but only once.
Here's an example table:
Continent | Country | Random_Field
---------------------------------------------
North America Canada Cake
North America Canada Dog
South America Brazil Cat
Europe France Frog
Africa Cameroon House
Asia Japan Gadget
Asia India Dance
Australia New Zealand Frodo
Antarctica TuxLand Linux
In your first statement:
SELECT continent, country
FROM World
GROUP BY continent;
The output would be:
Continent | Country
--------------------------
North America Canada
South America Brazil
Europe France
Africa Cameroon
Asia Japan
Australia New Zealand
Antarctica TuxLand
Notice one of the Asia rows was lost, despite being different.
Using a GROUP BY on both:
SELECT continent, country
FROM World
GROUP BY continent, country;
Would yield:
Continent | Country
-----------------------------
North America Canada
South America Brazil
Europe France
Africa Cameroon
Asia Japan
Asia India
Australia New Zealand
Antarctica TuxLand

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to combine two columns on the same table using Hive - sql

easiet way would be to use union: select team1 as teams from tablename union distinct select team2 from tablename Here is another ways using sub query : Select distinct teams from ( select team1 as teams from tablename union select team2 from tablename ) t

Related

Constructing the SQL query below

How to select data in SQL based on a filter which changes if there is no data in a specific table column?

using oracle when join two subqueries together changed the contents of the first query

Same-table Tree Table Query in SQL Server

Sams Teach Yourself SQL in 10 minutes - Question about GROUP BY

Categories

Resources