I have the following data in the format of pandas.core.series.Series (after processing the original DataFrame), and I want to do some visualisation of the data. What I need is to plot a bar chart per Fruit (Apple, Pear, Oranges, etc.) where the annual production of producers X1, X2, X3 are next to each other on the chart (multiple bar chart). The x axis of the figure should be the Year.
Could anyone help please!
Thanks
The data:
Fruit Producer Year Production (tons)
Apple X1 1981 125
1982 146
1983 251
1984 278
1985 161
X2 1981 510
1982 456
1983 531
1984 563
1985 508
X3 1981 68
1982 121
1983 126
1984 189
1985 134
Pear X1 1981 126
1982 148
1983 255
1984 272
1985 166
X2 1981 515
1982 454
1983 539
1984 565
1985 558
X3 1981 516
1982 485
1983 567
1984 519
1985 588
Oranges X1 1981 68
1982 100
1983 109
1984 190
1985 136
X2 1981 50
1982 155
1983 126
1984 155
1985 139
X3 1981 12
1982 163
1983 198
1984 174
1985 136
Related
I have following dataset (much larger, this is just small sample from it):
City Year Votes
Detroit 1964 23
Detroit 1977 61
Detroit 1978 89
Detroit 1986 116
Detroit 1993 144
Baltimore 1964 42
Baltimore 1965 91
Baltimore 1966 161
Baltimore 1967 219
Baltimore 1968 263
Baltimore 1969 312
Baltimore 1970 346
Baltimore 1978 375
Baltimore 1980 415
Baltimore 1981 449
Baltimore 1995 484
Baltimore 1996 529
Baltimore 1997 578
Baltimore 1998 619
Baltimore 1999 660
Baltimore 2000 713
Baltimore 2001 757
Baltimore 2002 807
Baltimore 2003 852
Baltimore 2004 884
Boston 1968 47
Boston 1969 101
Boston 1970 123
Boston 2007 157
Phoenix 1971 41
Phoenix 1972 41
Phoenix 1979 76
Phoenix 1981 112
Phoenix 1982 154
Phoenix 1983 197
Phoenix 1984 242
Phoenix 1985 279
Phoenix 1997 319
Phoenix 1998 351
Phoenix 2000 381
Phoenix 2003 417
Phoenix 2005 457
Phoenix 2006 494
Phoenix 2007 536
Phoenix 2008 570
Phoenix 2009 598
Phoenix 2021 633
Phoenix 2022 661
Years should be in range from 1950 to 2023, and I would like to populate years for each city that are missing:
if city has votes at the starting year (1950) then use that value
if city doesn't have the votes at the starting year (1950), then use 0 as a start
for every city I would like to fill missing years with the next logic: always use value of votes for the previous year.
Result, (only Detroit in, as I did it manually, but for all cities) should look like this:
City Year Votes
Detroit 1950 0
Detroit 1951 0
Detroit 1952 0
Detroit 1953 0
Detroit 1954 0
Detroit 1955 0
Detroit 1956 0
Detroit 1957 0
Detroit 1958 0
Detroit 1959 0
Detroit 1960 0
Detroit 1961 0
Detroit 1962 0
Detroit 1963 0
Detroit 1964 23
Detroit 1965 23
Detroit 1966 23
Detroit 1967 23
Detroit 1968 23
Detroit 1969 23
Detroit 1970 23
Detroit 1971 23
Detroit 1972 23
Detroit 1973 23
Detroit 1974 23
Detroit 1975 23
Detroit 1976 23
Detroit 1977 61
Detroit 1978 89
Detroit 1979 89
Detroit 1980 89
Detroit 1981 89
Detroit 1982 89
Detroit 1983 89
Detroit 1984 89
Detroit 1985 89
Detroit 1986 116
Detroit 1987 116
Detroit 1988 116
Detroit 1989 116
Detroit 1990 116
Detroit 1991 116
Detroit 1992 116
Detroit 1993 144
Detroit 1994 144
Detroit 1995 144
Detroit 1996 144
Detroit 1997 144
Detroit 1998 144
Detroit 1999 144
Detroit 2000 144
Detroit 2001 144
Detroit 2002 144
Detroit 2003 144
Detroit 2004 144
Detroit 2005 144
Detroit 2006 144
Detroit 2007 144
Detroit 2008 144
Detroit 2009 144
Detroit 2010 144
Detroit 2011 144
Detroit 2012 144
Detroit 2013 144
Detroit 2014 144
Detroit 2015 144
Detroit 2016 144
Detroit 2017 144
Detroit 2018 144
Detroit 2019 144
Detroit 2020 144
Detroit 2021 144
Detroit 2022 144
Detroit 2023 144
import pandas as pd
df = pd.read_clipboard() # Your df here
cities = df["City"].unique()
years = range(1950, 2024)
index = pd.MultiIndex.from_product([cities, years], names=["City", "Year"])
out = df.set_index(["City", "Year"]).reindex(index).groupby(level=0).ffill().fillna(0).astype(int).reset_index()
One option is with complete from pyjanitor to expose the missing rows:
# pip install pyjanitor
import pandas as pd
import janitor
# create a dictionary of the range of all the years
# the key of the dictionary must exist in the dataframe
years = {'Year': range(1950, 2024)}
(df
.complete(years, 'City')
.assign(Votes = lambda df: df.Votes.ffill().fillna(0, downcast='infer'))
)
City Year Votes
0 Detroit 1950 0
1 Baltimore 1950 0
2 Boston 1950 0
3 Phoenix 1950 0
4 Detroit 1951 0
.. ... ... ...
291 Phoenix 2022 661
292 Detroit 2023 661
293 Baltimore 2023 661
294 Boston 2023 661
295 Phoenix 2023 661
[296 rows x 3 columns]
ITEMS
ITEM_ID NAME_ID ITEM_NAME
1001 2001 Office chair
1002 2002 Writing Desk
1003 2003 Filing cabinet
1004 2004 Bookshelf bookcase
1005 2005 Table lamp
1006 2001 Office chair
1007 2002 Writing Desk
1008 2003 Filing cabinet
1009 2004 Bookshelf bookcase
1010 2005 Table lamp
1011 2001 Office chair
1012 2002 Writing Desk
1013 2003 Filing cabinet
1014 2004 Bookshelf bookcase
1015 2005 Table lamp
1016 2016 Triangle window
1017 2017 Screen
1018 2018 Cradle
1019 2017 Screen
1020 2018 Cradle
1021 2017 Screen
1022 2018 Cradle
1023 2023 Futon
1024 2024 Single bed
1025 2025 Bunk beds
1026 2026 Sofa bed
1027 2027 Camp bed cot sleeping bag
1028 2028 Airbed air mattress
1029 2029 Hammock
1030 2030 Loveseat
1031 2031 Sleeper sofa
1032 2032 Settee
1032 2032 Settee
1033 2001 Office chair
1034 2002 Writing Desk
1035 2003 Filing cabinet
1036 2004 Bookshelf/bookcase
1037 2005 Table lamp
1038 2001 Office chair
1039 2002 Writing Desk
1040 2003 Filing cabinet
1041 2004 Bookshelf/bookcase
1042 2005 Table lamp
1043 2017 Screen
1044 2018 Cradle
1045 2017 Screen
1046 2018 Cradle
1047 2017 Screen
1048 2018 Cradle
1049 2017 Screen
1050 2018 Cradle
ITEMS_DETAILS:
CITY ITEM_ID SHOP_ID
NEW YORK 1001 4001
NEW YORK 1002 4002
NEW YORK 1003 4003
NEW YORK 1004 4004
NEW YORK 1005 4005
DALLAS 1006 4006
DALLAS 1007 4007
DALLAS 1008 4008
DALLAS 1009 4001
DALLAS 1010 4002
DALLAS 1011 4003
DALLAS 1012 4004
WASHINGTON 1013 4005
WASHINGTON 1014 4006
WASHINGTON 1015 4007
WASHINGTON 1016 4008
WASHINGTON 1017 4009
WASHINGTON 1018 4010
WASHINGTON 1019 4011
SANFRANSISCO 1020 4012
SANFRANSISCO 1021 4013
CHICAGO 1022 4014
CHICAGO 1023 4015
CHICAGO 1024 4016
CHICAGO 1025 4017
BOSTON 1026 4018
BOSTON 1027 4019
BOSTON 1028 4020
BOSTON 1029 4021
BOSTON 1030 4022
SANFRANSISCO 1031 4023
SANFRANSISCO 1032 4024
SANFRANSISCO 1032 4025
SANFRANSISCO 1033 4026
Las Vegas 1034 4027
Austin 1035 4028
Houston 1036 4029
Los Angeles 1037 4030
Seattle 1038 4031
Atlanta 1039 4032
McKinney 1040 4033
Vancouver 1041 4034
Las Vegas 1042 4035
Austin 1043 4036
Houston 1044 4037
Los Angeles 1045 4038
Seattle 1046 4034
Atlanta 1047 4035
McKinney 1048 4036
Vancouver 1049 4037
Las Vegas 1050 4043
Austin 1051 4044
Houston 1052 4045
Los Angeles 1053 4046
Seattle 1054 4047
Atlanta 1055 4048
McKinney 1056 4049
Vancouver 1057 4050
Las Vegas 1058 4051
Austin 1059 4052
Houston 1060 4053
Hi All,
I am trying to find the duplicates values of the columns after the result of the join ITEMS & ITEM_DETAILS.
I know the sql for duplicate values of column on a single table. A bit confused with join.
Logic: If ITEM_NAME is the same but SHOP_ID is different, it should show as duplicate. If SHOP_ID is the same, it should show as unique
Please help me.
I tried as below:
select * from (
select a.NAME_ID from ITEMS a inner join ITEMS_DETAILS b on b.ITEM_ID = a.ITEM_ID) x
inner join ITEMS y on y.NAME_ID=x.NAME_ID
inner join ITEMS_DETAILS z on z.ITEM_ID=y.ITEM_ID
If you are interested in grouping and counting dups then try the query below:
SELECT
COUNT(*) As DupCount,
y.ITEM_ID
FROM
ITEMS y
INNER JOIN ITEMS_DETAILS z ON z.ITEM_ID=y.ITEM_ID
GROUP BY
y.ITEM_ID
HAVING
COUNT(*) > 1
I'm trying to do a merge and am having trouble.
These are my 2 dataframes:
DF1
Team_Id Team_Name Season Daynum Wteam Wscore Lteam
0 1104 Alabama 1985 137 1104 50 1112
1 1104 Alabama 1985 139 1104 63 1433
2 1104 Alabama 1986 137 1104 97 1462
3 1104 Alabama 1986 139 1104 58 1228
4 1104 Alabama 1987 136 1104 88 1299
DF2
Season Seed Team
0 1985 X07 1104
1 1986 Y05 1104
2 1987 X02 1104
I want the seeds from DF2 to be in the rows of DF1. There is more information in DF2 then there is in DF1.
The expected results are:
Team_Id Team_Name Season Daynum Wteam Wscore Lteam Seed
0 1104 Alabama 1985 137 1104 50 1112 X07
1 1104 Alabama 1985 139 1104 63 1433 X07
2 1104 Alabama 1986 137 1104 97 1462 Y05
3 1104 Alabama 1986 139 1104 58 1228 Y05
4 1104 Alabama 1987 136 1104 88 1299 X02
You need merge with left_on and right_on:
DF1.merge(DF2, left_on=['Season','Team_Id'], right_on=['Season','Team'])
Output:
Team_Id Team_Name Season Daynum Wteam Wscore Lteam Seed Team
0 1104 Alabama 1985 137 1104 50 1112 X07 1104
1 1104 Alabama 1985 139 1104 63 1433 X07 1104
2 1104 Alabama 1986 137 1104 97 1462 Y05 1104
3 1104 Alabama 1986 139 1104 58 1228 Y05 1104
4 1104 Alabama 1987 136 1104 88 1299 X02 1104
The database data is from http://www.w3resource.com/mysql-exercises/join-exercises/
sqlite> select * from employees;
EMPLOYEE_ID FIRST_NAME LAST_NAME EMAIL PHONE_NUMBER HIRE_DATE JOB_ID SALARY COMMISSION_PCT MANAGER_ID DEPARTMENT_ID
----------- ------------- ------------- ---------- -------------------- ------------ ------------ ---------- -------------- ---------- -------------
100 Steven King SKING 515.123.4567 1987-06-17 AD_PRES 24000 0.0 0 90
101 Neena Kochhar NKOCHHAR 515.123.4568 1987-06-18 AD_VP 17000 0.0 100 90
102 Lex De Haan LDEHAAN 515.123.4569 1987-06-19 AD_VP 17000 0.0 100 90
...
202 Pat Fay PFAY 603.123.6666 1987-09-27 MK_REP 6000 0.0 201 20
203 Susan Mavris SMAVRIS 515.123.7777 1987-09-28 HR_REP 6500 0.0 101 40
204 Hermann Baer HBAER 515.123.8888 1987-09-29 PR_REP 10000 0.0 101 70
205 Shelley Higgins SHIGGINS 515.123.8080 1987-09-30 AC_MGR 12000 0.0 101 110
206 William Gietz WGIETZ 515.123.8181 1987-10-01 AC_ACCOUNT 8300 0.0 205 110
sqlite> select * from departments;
DEPARTMENT_ID DEPARTMENT_NAME MANAGER_ID LOCATION_ID
------------- ---------------------- ---------- -----------
10 Administration 200 1700
20 Marketing 201 1800
30 Purchasing 114 1700
40 Human Resources 203 2400
50 Shipping 121 1500
60 IT 103 1400
70 Public Relations 204 2700
80 Sales 145 2500
90 Executive 100 1700
100 Finance 108 1700
110 Accounting 205 1700
120 Treasury 0 1700
130 Corporate Tax 0 1700
140 Control And Credit 0 1700
150 Shareholder Services 0 1700
160 Benefits 0 1700
170 Manufacturing 0 1700
180 Construction 0 1700
190 Contracting 0 1700
200 Operations 0 1700
210 IT Support 0 1700
220 NOC 0 1700
230 IT Helpdesk 0 1700
240 Government Sales 0 1700
250 Retail Sales 0 1700
260 Recruiting 0 1700
270 Payroll 0 1700
The natural join result:
sqlite> select * from employees e natural join departments d;
EMPLOYEE_ID FIRST_NAME LAST_NAME EMAIL PHONE_NUMBER HIRE_DATE JOB_ID SALARY COMMISSION_PCT MANAGER_ID DEPARTMENT_ID DEPARTMENT_NAME LOCATION_ID
----------- ------------- ------------- ---------- -------------------- ------------ ------------ ---------- -------------- ---------- ------------- ---------------------- -----------
101 Neena Kochhar NKOCHHAR 515.123.4568 1987-06-18 AD_VP 17000 0.0 100 90 Executive 1700
102 Lex De Haan LDEHAAN 515.123.4569 1987-06-19 AD_VP 17000 0.0 100 90 Executive 1700
104 Bruce Ernst BERNST 590.423.4568 1987-06-21 IT_PROG 6000 0.0 103 60 IT 1400
105 David Austin DAUSTIN 590.423.4569 1987-06-22 IT_PROG 4800 0.0 103 60 IT 1400
106 Valli Pataballa VPATABAL 590.423.4560 1987-06-23 IT_PROG 4800 0.0 103 60 IT 1400
107 Diana Lorentz DLORENTZ 590.423.5567 1987-06-24 IT_PROG 4200 0.0 103 60 IT 1400
109 Daniel Faviet DFAVIET 515.124.4169 1987-06-26 FI_ACCOUNT 9000 0.0 108 100 Finance 1700
110 John Chen JCHEN 515.124.4269 1987-06-27 FI_ACCOUNT 8200 0.0 108 100 Finance 1700
111 Ismael Sciarra ISCIARRA 515.124.4369 1987-06-28 FI_ACCOUNT 7700 0.0 108 100 Finance 1700
112 Jose Manuel Urman JMURMAN 515.124.4469 1987-06-29 FI_ACCOUNT 7800 0.0 108 100 Finance 1700
113 Luis Popp LPOPP 515.124.4567 1987-06-30 FI_ACCOUNT 6900 0.0 108 100 Finance 1700
115 Alexander Khoo AKHOO 515.127.4562 1987-07-02 PU_CLERK 3100 0.0 114 30 Purchasing 1700
116 Shelli Baida SBAIDA 515.127.4563 1987-07-03 PU_CLERK 2900 0.0 114 30 Purchasing 1700
117 Sigal Tobias STOBIAS 515.127.4564 1987-07-04 PU_CLERK 2800 0.0 114 30 Purchasing 1700
118 Guy Himuro GHIMURO 515.127.4565 1987-07-05 PU_CLERK 2600 0.0 114 30 Purchasing 1700
119 Karen Colmenares KCOLMENA 515.127.4566 1987-07-06 PU_CLERK 2500 0.0 114 30 Purchasing 1700
129 Laura Bissot LBISSOT 650.124.5234 1987-07-16 ST_CLERK 3300 0.0 121 50 Shipping 1500
130 Mozhe Atkinson MATKINSO 650.124.6234 1987-07-17 ST_CLERK 2800 0.0 121 50 Shipping 1500
131 James Marlow JAMRLOW 650.124.7234 1987-07-18 ST_CLERK 2500 0.0 121 50 Shipping 1500
132 TJ Olson TJOLSON 650.124.8234 1987-07-19 ST_CLERK 2100 0.0 121 50 Shipping 1500
150 Peter Tucker PTUCKER 011.44.1344.129268 1987-08-06 SA_REP 10000 0.3 145 80 Sales 2500
151 David Bernstein DBERNSTE 011.44.1344.345268 1987-08-07 SA_REP 9500 0.25 145 80 Sales 2500
152 Peter Hall PHALL 011.44.1344.478968 1987-08-08 SA_REP 9000 0.25 145 80 Sales 2500
153 Christopher Olsen COLSEN 011.44.1344.498718 1987-08-09 SA_REP 8000 0.2 145 80 Sales 2500
154 Nanette Cambrault NCAMBRAU 011.44.1344.987668 1987-08-10 SA_REP 7500 0.2 145 80 Sales 2500
155 Oliver Tuvault OTUVAULT 011.44.1344.486508 1987-08-11 SA_REP 7000 0.15 145 80 Sales 2500
184 Nandita Sarchand NSARCHAN 650.509.1876 1987-09-09 SH_CLERK 4200 0.0 121 50 Shipping 1500
185 Alexis Bull ABULL 650.509.2876 1987-09-10 SH_CLERK 4100 0.0 121 50 Shipping 1500
186 Julia Dellinger JDELLING 650.509.3876 1987-09-11 SH_CLERK 3400 0.0 121 50 Shipping 1500
187 Anthony Cabrio ACABRIO 650.509.4876 1987-09-12 SH_CLERK 3000 0.0 121 50 Shipping 1500
202 Pat Fay PFAY 603.123.6666 1987-09-27 MK_REP 6000 0.0 201 20 Marketing 1800
206 William Gietz WGIETZ 515.123.8181 1987-10-01 AC_ACCOUNT 8300 0.0 205 110 Accounting 1700
sqlite> select count(*) from employees e natural join departments d;
count(*)
----------
32
The join result:
sqlite> select * from employees e join departments d using
(department_id);
EMPLOYEE_ID FIRST_NAME LAST_NAME EMAIL PHONE_NUMBER HIRE_DATE JOB_ID SALARY COMMISSION_PCT MANAGER_ID DEPARTMENT_ID DEPARTMENT_NAME MANAGER_ID LOCATION_ID
----------- ------------- ------------- ---------- -------------------- ------------ ------------ ---------- -------------- ---------- ------------- ---------------------- ---------- -----------
100 Steven King SKING 515.123.4567 1987-06-17 AD_PRES 24000 0.0 0 90 Executive 100 1700
101 Neena Kochhar NKOCHHAR 515.123.4568 1987-06-18 AD_VP 17000 0.0 100 90 Executive 100 1700
102 Lex De Haan LDEHAAN 515.123.4569 1987-06-19 AD_VP 17000 0.0 100 90 Executive 100 1700
103 Alexander Hunold AHUNOLD 590.423.4567 1987-06-20 IT_PROG 9000 0.0 102 60 IT 103 1400
104 Bruce Ernst BERNST 590.423.4568 1987-06-21 IT_PROG 6000 0.0 103 60 IT 103 1400
105 David Austin DAUSTIN 590.423.4569 1987-06-22 IT_PROG 4800 0.0 103 60 IT 103 1400
106 Valli Pataballa VPATABAL 590.423.4560 1987-06-23 IT_PROG 4800 0.0 103 60 IT 103 1400
107 Diana Lorentz DLORENTZ 590.423.5567 1987-06-24 IT_PROG 4200 0.0 103 60 IT 103 1400
108 Nancy Greenberg NGREENBE 515.124.4569 1987-06-25 FI_MGR 12000 0.0 101 100 Finance 108 1700
109 Daniel Faviet DFAVIET 515.124.4169 1987-06-26 FI_ACCOUNT 9000 0.0 108 100 Finance 108 1700
110 John Chen JCHEN 515.124.4269 1987-06-27 FI_ACCOUNT 8200 0.0 108 100 Finance 108 1700
111 Ismael Sciarra ISCIARRA 515.124.4369 1987-06-28 FI_ACCOUNT 7700 0.0 108 100 Finance 108 1700
112 Jose Manuel Urman JMURMAN 515.124.4469 1987-06-29 FI_ACCOUNT 7800 0.0 108 100 Finance 108 1700
...
155 Oliver Tuvault OTUVAULT 011.44.1344.486508 1987-08-11 SA_REP 7000 0.15 145 80 Sales 145 2500
156 Janette King JKING 011.44.1345.429268 1987-08-12 SA_REP 10000 0.35 146 80 Sales 145 2500
157 Patrick Sully PSULLY 011.44.1345.929268 1987-08-13 SA_REP 9500 0.35 146 80 Sales 145 2500
158 Allan McEwen AMCEWEN 011.44.1345.829268 1987-08-14 SA_REP 9000 0.35 146 80 Sales 145 2500
159 Lindsey Smith LSMITH 011.44.1345.729268 1987-08-15 SA_REP 8000 0.3 146 80 Sales 145 2500
160 Louise Doran LDORAN 011.44.1345.629268 1987-08-16 SA_REP 7500 0.3 146 80 Sales 145 2500
161 Sarath Sewall SSEWALL 011.44.1345.529268 1987-08-17 SA_REP 7000 0.25 146 80 Sales 145 2500
162 Clara Vishney CVISHNEY 011.44.1346.129268 1987-08-18 SA_REP 10500 0.25 147 80 Sales 145 2500
163 Danielle Greene DGREENE 011.44.1346.229268 1987-08-19 SA_REP 9500 0.15 147 80 Sales 145 2500
164 Mattea Marvins MMARVINS 011.44.1346.329268 1987-08-20 SA_REP 7200 0.1 147 80 Sales 145 2500
165 David Lee DLEE 011.44.1346.529268 1987-08-21 SA_REP 6800 0.1 147 80 Sales 145 2500
166 Sundar Ande SANDE 011.44.1346.629268 1987-08-22 SA_REP 6400 0.1 147 80 Sales 145 2500
167 Amit Banda ABANDA 011.44.1346.729268 1987-08-23 SA_REP 6200 0.1 147 80 Sales 145 2500
168 Lisa Ozer LOZER 011.44.1343.929268 1987-08-24 SA_REP 11500 0.25 148 80 Sales 145 2500
169 Harrison Bloom HBLOOM 011.44.1343.829268 1987-08-25 SA_REP 10000 0.2 148 80 Sales 145 2500
170 Tayler Fox TFOX 011.44.1343.729268 1987-08-26 SA_REP 9600 0.2 148 80 Sales 145 2500
171 William Smith WSMITH 011.44.1343.629268 1987-08-27 SA_REP 7400 0.15 148 80 Sales 145 2500
172 Elizabeth Bates EBATES 011.44.1343.529268 1987-08-28 SA_REP 7300 0.15 148 80 Sales 145 2500
173 Sundita Kumar SKUMAR 011.44.1343.329268 1987-08-29 SA_REP 6100 0.1 148 80 Sales 145 2500
174 Ellen Abel EABEL 011.44.1644.429267 1987-08-30 SA_REP 11000 0.3 149 80 Sales 145 2500
175 Alyssa Hutton AHUTTON 011.44.1644.429266 1987-08-31 SA_REP 8800 0.25 149 80 Sales 145 2500
176 Jonathon Taylor JTAYLOR 011.44.1644.429265 1987-09-01 SA_REP 8600 0.2 149 80 Sales 145 2500
177 Jack Livingston JLIVINGS 011.44.1644.429264 1987-09-02 SA_REP 8400 0.2 149 80 Sales 145 2500
179 Charles Johnson CJOHNSON 011.44.1644.429262 1987-09-04 SA_REP 6200 0.1 149 80 Sales 145 2500
180 Winston Taylor WTAYLOR 650.507.9876 1987-09-05 SH_CLERK 3200 0.0 120 50 Shipping 121 1500
181 Jean Fleaur JFLEAUR 650.507.9877 1987-09-06 SH_CLERK 3100 0.0 120 50 Shipping 121 1500
182 Martha Sullivan MSULLIVA 650.507.9878 1987-09-07 SH_CLERK 2500 0.0 120 50 Shipping 121 1500
183 Girard Geoni GGEONI 650.507.9879 1987-09-08 SH_CLERK 2800 0.0 120 50 Shipping 121 1500
184 Nandita Sarchand NSARCHAN 650.509.1876 1987-09-09 SH_CLERK 4200 0.0 121 50 Shipping 121 1500
185 Alexis Bull ABULL 650.509.2876 1987-09-10 SH_CLERK 4100 0.0 121 50 Shipping 121 1500
186 Julia Dellinger JDELLING 650.509.3876 1987-09-11 SH_CLERK 3400 0.0 121 50 Shipping 121 1500
187 Anthony Cabrio ACABRIO 650.509.4876 1987-09-12 SH_CLERK 3000 0.0 121 50 Shipping 121 1500
188 Kelly Chung KCHUNG 650.505.1876 1987-09-13 SH_CLERK 3800 0.0 122 50 Shipping 121 1500
189 Jennifer Dilly JDILLY 650.505.2876 1987-09-14 SH_CLERK 3600 0.0 122 50 Shipping 121 1500
190 Timothy Gates TGATES 650.505.3876 1987-09-15 SH_CLERK 2900 0.0 122 50 Shipping 121 1500
191 Randall Perkins RPERKINS 650.505.4876 1987-09-16 SH_CLERK 2500 0.0 122 50 Shipping 121 1500
192 Sarah Bell SBELL 650.501.1876 1987-09-17 SH_CLERK 4000 0.0 123 50 Shipping 121 1500
193 Britney Everett BEVERETT 650.501.2876 1987-09-18 SH_CLERK 3900 0.0 123 50 Shipping 121 1500
194 Samuel McCain SMCCAIN 650.501.3876 1987-09-19 SH_CLERK 3200 0.0 123 50 Shipping 121 1500
195 Vance Jones VJONES 650.501.4876 1987-09-20 SH_CLERK 2800 0.0 123 50 Shipping 121 1500
196 Alana Walsh AWALSH 650.507.9811 1987-09-21 SH_CLERK 3100 0.0 124 50 Shipping 121 1500
197 Kevin Feeney KFEENEY 650.507.9822 1987-09-22 SH_CLERK 3000 0.0 124 50 Shipping 121 1500
198 Donald OConnell DOCONNEL 650.507.9833 1987-09-23 SH_CLERK 2600 0.0 124 50 Shipping 121 1500
199 Douglas Grant DGRANT 650.507.9844 1987-09-24 SH_CLERK 2600 0.0 124 50 Shipping 121 1500
200 Jennifer Whalen JWHALEN 515.123.4444 1987-09-25 AD_ASST 4400 0.0 101 10 Administration 200 1700
201 Michael Hartstein MHARTSTE 515.123.5555 1987-09-26 MK_MAN 13000 0.0 100 20 Marketing 201 1800
202 Pat Fay PFAY 603.123.6666 1987-09-27 MK_REP 6000 0.0 201 20 Marketing 201 1800
203 Susan Mavris SMAVRIS 515.123.7777 1987-09-28 HR_REP 6500 0.0 101 40 Human Resources 203 2400
204 Hermann Baer HBAER 515.123.8888 1987-09-29 PR_REP 10000 0.0 101 70 Public Relations 204 2700
205 Shelley Higgins SHIGGINS 515.123.8080 1987-09-30 AC_MGR 12000 0.0 101 110 Accounting 205 1700
206 William Gietz WGIETZ 515.123.8181 1987-10-01 AC_ACCOUNT 8300 0.0 205 110 Accounting 205 1700
sqlite> select count(*) from employees e join departments d using (department_id);
count(*)
----------
106
The natural join result rows count should be same as join, but not, why?
The different between a natural join and a 'normal' join is that the former use all columns that happen to have the same name in both tables.
In this case, both DEPARTMENT_ID and MANAGER_ID match, so the natural join is actually the same as this query:
select * from employees e join departments d using (department_id, manager_id);
This is why you should never, ever use a natural join.
I have the following right self-join query performed on oracles HR schema, but I can't really understand what it returns. When I've performed the exactly same query but with LEFT JOIN I understood that it returned all employees regardless they have a supervisor.
The manager ID's are a bit confusing, for example 156, King - but King has ID of 100.
SELECT emps.employee_id as "Employee", emps.last_name, mgr.employee_id as "Manager", mgr.last_name
FROM employees emps
RIGHT JOIN employees mgr
ON emps.manager_id = mgr.employee_id
The result
Employee LAST_NAME Manager LAST_NAME
---------- ------------------------- ---------- -------------------------
101 Kochhar 100 King
102 De Haan 100 King
103 Hunold 102 De Haan
104 Ernst 103 Hunold
105 Austin 103 Hunold
106 Pataballa 103 Hunold
107 Lorentz 103 Hunold
108 Greenberg 101 Kochhar
109 Faviet 108 Greenberg
110 Chen 108 Greenberg
111 Sciarra 108 Greenberg
112 Urman 108 Greenberg
113 Popp 108 Greenberg
114 Raphaely 100 King
115 Khoo 114 Raphaely
116 Baida 114 Raphaely
117 Tobias 114 Raphaely
118 Himuro 114 Raphaely
119 Colmenares 114 Raphaely
120 Weiss 100 King
121 Fripp 100 King
122 Kaufling 100 King
123 Vollman 100 King
124 Mourgos 100 King
125 Nayer 120 Weiss
126 Mikkilineni 120 Weiss
127 Landry 120 Weiss
128 Markle 120 Weiss
129 Bissot 121 Fripp
130 Atkinson 121 Fripp
131 Marlow 121 Fripp
132 Olson 121 Fripp
133 Mallin 122 Kaufling
134 Rogers 122 Kaufling
135 Gee 122 Kaufling
136 Philtanker 122 Kaufling
137 Ladwig 123 Vollman
138 Stiles 123 Vollman
139 Seo 123 Vollman
140 Patel 123 Vollman
141 Rajs 124 Mourgos
142 Davies 124 Mourgos
143 Matos 124 Mourgos
144 Vargas 124 Mourgos
145 Russell 100 King
146 Partners 100 King
147 Errazuriz 100 King
148 Cambrault 100 King
149 Zlotkey 100 King
150 Tucker 145 Russell
151 Bernstein 145 Russell
152 Hall 145 Russell
153 Olsen 145 Russell
154 Cambrault 145 Russell
155 Tuvault 145 Russell
156 King 146 Partners
157 Sully 146 Partners
158 McEwen 146 Partners
159 Smith 146 Partners
160 Doran 146 Partners
161 Sewall 146 Partners
162 Vishney 147 Errazuriz
163 Greene 147 Errazuriz
164 Marvins 147 Errazuriz
165 Lee 147 Errazuriz
166 Ande 147 Errazuriz
167 Banda 147 Errazuriz
168 Ozer 148 Cambrault
169 Bloom 148 Cambrault
170 Fox 148 Cambrault
171 Smith 148 Cambrault
172 Bates 148 Cambrault
173 Kumar 148 Cambrault
174 Abel 149 Zlotkey
175 Hutton 149 Zlotkey
176 Taylor 149 Zlotkey
177 Livingston 149 Zlotkey
178 Grant 149 Zlotkey
179 Johnson 149 Zlotkey
180 Taylor 120 Weiss
181 Fleaur 120 Weiss
182 Sullivan 120 Weiss
183 Geoni 120 Weiss
184 Sarchand 121 Fripp
185 Bull 121 Fripp
186 Dellinger 121 Fripp
187 Cabrio 121 Fripp
188 Chung 122 Kaufling
189 Dilly 122 Kaufling
190 Gates 122 Kaufling
191 Perkins 122 Kaufling
192 Bell 123 Vollman
193 Everett 123 Vollman
194 McCain 123 Vollman
195 Jones 123 Vollman
196 Walsh 124 Mourgos
197 Feeney 124 Mourgos
198 OConnell 124 Mourgos
199 Grant 124 Mourgos
200 Whalen 101 Kochhar
201 Hartstein 100 King
202 Fay 201 Hartstein
203 Mavris 101 Kochhar
204 Baer 101 Kochhar
205 Higgins 101 Kochhar
206 Gietz 205 Higgins
162 Vishney
133 Mallin
136 Philtanker
154 Cambrault
196 Walsh
104 Ernst
184 Sarchand
172 Bates
197 Feeney
150 Tucker
142 Davies
143 Matos
191 Perkins
119 Colmenares
200 Whalen
183 Geoni
180 Taylor
152 Hall
137 Ladwig
139 Seo
126 Mikkilineni
125 Nayer
170 Fox
175 Hutton
129 Bissot
163 Greene
105 Austin
176 Taylor
188 Chung
116 Baida
115 Khoo
144 Vargas
195 Jones
174 Abel
157 Sully
182 Sullivan
156 King
194 McCain
193 Everett
187 Cabrio
117 Tobias
179 Johnson
135 Gee
159 Smith
131 Marlow
190 Gates
169 Bloom
166 Ande
151 Bernstein
204 Baer
203 Mavris
160 Doran
155 Tuvault
107 Lorentz
185 Bull
128 Markle
134 Rogers
140 Patel
168 Ozer
178 Grant
141 Rajs
181 Fleaur
165 Lee
138 Stiles
173 Kumar
206 Gietz
164 Marvins
202 Fay
112 Urman
189 Dilly
110 Chen
153 Olsen
161 Sewall
186 Dellinger
109 Faviet
177 Livingston
198 OConnell
106 Pataballa
111 Sciarra
118 Himuro
132 Olson
192 Bell
113 Popp
171 Smith
127 Landry
167 Banda
130 Atkinson
158 McEwen
199 Grant
195 rows selected
In my opinion, the query with right join is confusing managers and employees. That's why the right join doesn't seem to return a clear answer. With left join, you require "All employees and if there is a related manager, also the manager". With right join, you still get "All employees", no matter if they are managers or not. So the meaning of the "right side" is wrong.
Of course it is the intention of the table to contain both types, but probably you may get a clearer picture by better separation.
Say, a manager is everybody who has no manager_id (that's the case if your table is not a deep tree). Then, look at this modification:
SELECT emps.employee_id as "Employee", emps.last_name, mgr.employee_id as "Manager", mgr.last_name
FROM employees emps
RIGHT JOIN (SELECT * FROM employees WHERE manager_id IS NULL) mgr
ON emps.manager_id = mgr.employee_id
Like this, your data basis for right join would be a proper selection of all managers. They will have showed also an employee, even if they have none. Yet this last "even" does not happen with the kind of relation you have chosen.
Then, I fully agree to #Ameya Desphande that there is a second King with ID 156. Which is even more puzzling ;-)