sql command to find out how many players score how much - sql

I have a table like these
country
gender
player
score
year
ID
Germany
male
Michael
14
1990
1
Austria
male
Simon
13
1990
2
Germany
female
Mila
16
1990
3
Austria
female
Simona
15
1990
4
This is a table in the database. It shows 70 countries around the world with player names and gender. It shows which player score how many goals in which year. The years goes from 1990 to 2015. So the table is large. Now I would like to know how many goals all female player and how many male player from Germany have scored from 2010 to 2015. So I would like to know the total score of german male player and the total score of german female player every year from 2010 to 2015 with a Sqlite
I expecting these output
country
gender
score
year
Germany
male
114
2010
Germany
female
113
2010
Germany
male
110
2011
Germany
female
111
2011
Germany
male
119
2012
Germany
female
114
2012
Germany
male
119
2013
Germany
female
114
2013
Germany
male
129
2014
Germany
female
103
2014
Germany
male
109
2015
Germany
female
104
2015

SELECT
country,
gender,
year,
SUM(score) AS score
FROM
<table_name>
WHERE
country ='Germany'
AND year between 2010 and 2015
GROUP BY
1, 2, 3
filtering on country and the years you are interested in
then summing up total score using group by

Related

Pandas removing rows with incomplete time series in panel data

I have a dataframe along the lines of the below:
Country1 Country2 Year
1 Italy Greece 2000
2 Italy Greece 2001
3 Italy Greece 2002
4 Germany Italy 2000
5 Germany Italy 2002
6 Mexico Canada 2000
7 Mexico Canada 2001
8 Mexico Canada 2002
9 US France 2000
10 US France 2001
11 Greece Italy 2000
12 Greece Italy 2001
I want to keep only the rows in which there are observations for the entire time series (2000-2002). So, the end result would be:
Country1 Country2 Year
1 Italy Greece 2000
2 Italy Greece 2001
3 Italy Greece 2002
4 Mexico Canada 2000
5 Mexico Canada 2001
6 Mexico Canada 2002
One idea is reshape by crosstab and test if rows has not 0 values by DataFrame.ne with DataFrame.all, convert index to DataFrame by MultiIndex.to_frame and last get filtered rows in DataFrame.merge:
df1 = pd.crosstab([df['Country1'], df['Country2']], df['Year'])
df = df.merge(df1.index[df1.ne(0).all(axis=1)].to_frame(index=False))
print (df)
Country1 Country2 Year
0 Italy Greece 2000
1 Italy Greece 2001
2 Italy Greece 2002
3 Mexico Canada 2000
4 Mexico Canada 2001
5 Mexico Canada 2002
Or if need test some specific range is possible compare sets in GroupBy.transform:
r = set(range(2000, 2003))
df = df[df.groupby(['Country1', 'Country2'])['Year'].transform(lambda x: set(x) == r)]
print (df)
Country1 Country2 Year
1 Italy Greece 2000
2 Italy Greece 2001
3 Italy Greece 2002
6 Mexico Canada 2000
7 Mexico Canada 2001
8 Mexico Canada 2002
One option is to pivot the data, drop null rows and reshape back; this only works if the combination of Country* and Year is unique (in the sample data it is ):
(df.assign(dummy = 1)
.pivot(('Country1', 'Country2'), 'Year')
.dropna()
.stack()
.drop(columns='dummy')
.reset_index()
)
Country1 Country2 Year
0 Italy Greece 2000
1 Italy Greece 2001
2 Italy Greece 2002
3 Mexico Canada 2000
4 Mexico Canada 2001
5 Mexico Canada 2002

Adding rows in a table from data that is not in a column

I'm trying to create a table to add all Medals won by the participant countries in the Olympics.
I scraped the data from Wikipedia and have something similar to this:
Year
Country_Name
Host_city
Host_Country
Gold
Silver
Bronze
1986
146
Los Angeles
United States
41
32
30
1986
67
Los Angeles
United States
12
12
12
And so on
I double-checked the data for some years, and it seems very accurate. The Country_Name has an ID because I have a Country_ID table that I created and updated the names with the ID:
Country_ID
Country_Name
1986
1
1986
2
So far so good. Now I want to create a new table where I'll have all countries in a specific year and the total medals for that country. I managed to easily do that for countries that participated in an edition, here's an example for the 1896 edition:
INSERT INTO Cumultative_Medals_by_Year(Country_ID, Year, Culmutative_Gold, Culmutative_Silver, Culmutative_Bronze, Total_Medals)
SELECT a.Country_Name, a.Year, SUM(a.Gold) As Cumultative_Gold, SUM(a.Silver) As Cumultative_Silver, SUM(a.Bronze) As Cumultative_Bronze, SUM(a.Gold) + SUM(a.Silver) + SUM(a.Bronze) AS Total_Medals
FROM Country_Medals a
Where a.Year >= 1896 AND Year < 1900
Group By a.Country_Name, a.Year
And I'll have this table:
Country_ID
Year
Cumultative_Gold
Cumultative_Silver
Cumultative_Bronze
Total_Medals
6
1986
2
0
0
5
7
1986
2
1
2
5
35
1986
1
2
3
6
46
1986
5
4
2
11
49
1986
6
5
2
13
51
1986
2
3
2
7
52
1986
10
18
19
47
58
1986
2
1
3
6
85
1986
1
0
1
2
131
1986
1
2
0
3
146
1986
11
7
2
20
To add the other editions I just have to edit the dates, "Where a.Year >= 1900 AND Year < 1904", for example.
INSERT INTO Cumultative_Medals_by_Year(Country_ID, Year, Culmutative_Gold, Culmutative_Silver, Culmutative_Bronze, Total_Medals)
SELECT a.Country_Name, a.Year, SUM(a.Gold) As Cumultative_Gold, SUM(a.Silver) As Cumultative_Silver, SUM(a.Bronze) As Cumultative_Bronze, SUM(a.Gold) + SUM(a.Silver) + SUM(a.Bronze) AS Total_Medals
FROM Country_Medals a
Where a.Year >= 1900 AND Year < 1904
Group By a.Country_Name, a.Year
And the table will grow.
But I'd like to also add all the other countries for the year 1896. This way I'll have a full record of all countries. So for example, you see that Country 1 has no medals in the 1896 Olympic edition, but I'd like to also add it there, even if the sum becomes NULL (where I'll update with a 0).
Why do I want that? I'd like to do an Animated Bar Chart Race, and with the data I have, some counties go "away" from the race. For example, the US didn't participate in the 1980 Olympics, so for a brief moment, the Bar for the US in the chart goes away just to return in 1984 (when it participated again). Another example is the Soviet Union, even though they do not participate anymore, it's the second participant with most medals won (only behind the US), but as the country does not have more participation after 1988, the bar just goes away after that year. By keeping a record of medals for all countries in all editions would prevent that from happening.
I'm pretty sure there are lots of countries that have won metals that were not around in 1896. But if you want a row for every country and every year, then generate the rows you want using cross join. Then join in the available information:
select c.Country_Name, y.Year,
SUM(cm.Gold) As Cumulative_Gold,
SUM(cm.Silver) As Cumulative_Silver,
SUM(cm.Bronze) As Cumulative_Bronze,
COALESCE(SUM(cm.Gold), 0) + COALESCE(SUM(cm.Silver), 0) + COALESCE(SUM(cm.Bronze), 0) AS Total_Medals
from (select distinct year from Country_Medals) y cross join
(select distinct country_name from country_medals) c left join
country_medals cm
on cm.year = y.year and
cm.country_name = c.country_name
group By c.Country_Name, y.Year

Update a table column with some values in SQL Server

I need some help or guidance on this.
I have this situation where I don't have a primary key in the first table:
County,
Gender,
EconomyName,
HighestEducation,
HighestEducationCount,
EconomyCount
In a second table, I have
County,
Gender,
HighestEducation,
HighestEducationCount
I want to update the first table (HighestEducaiion, HighestEducationCount) from the second table value.
How to do this without a key? Here is the sample data, EconStat,EduStatu is blank in table 1.
County Year Gender AgeDetails EconStat EducStatu AgeCnt EconCnt EduCnt
Carlow 2006 Male Total persons 20193 0 0
Carlow 2006 Male Total whose 17215 0 0
Carlow 2006 Male Under 15 years 2179 0 0
Carlow 2006 Male 15 years 1366 0 0
Carlow 2006 Male 16 years 2369 0 0
Carlow 2006 Male 17 years 1767 0 0
Carlow 2006 Male 18 years 2485 0 0
In the second table
County Year Gender EducStatu EduCnt
Carlow 2006 Male Total education ceased and not ceased 20193
Carlow 2006 Male Total whose full-time education has ceased 17215
Carlow 2006 Male Primary (incl. no formal education) 3536
Carlow 2006 Male Lower secondary 4408
Note : always less data In the second table
Result should look like this
County Year Gender AgeDetails EconStat EducStatu AgeCnt EconCnt EduCnt
Carlow 2006 Male Total persons Total education ceased and not ceased 20193 0 20193
Carlow 2006 Male Total whose Total whose full-time education has ceased 17215 0 17215
Carlow 2006 Male Under 15 years Primary (incl. no formal education) 2179 0 3536
Carlow 2006 Male 15 years Lower secondary 1366 0 4408
Carlow 2006 Male 16 years 2369 0 0
Carlow 2006 Male 17 years 1767 0 0
Carlow 2006 Male 18 years 2485 0 0
The combination of County, Year and Gender columns are not unique. In your sample data, they have the exact same set of values for all rows in both the columns. So you cannot do any operations based on them.
In your second column, you are left with two columns - EducStatu, and EduCnt. From the data, EducStatu is the column that differentiates rows in the second table (in addition to the combination of the first three columns in that table). But you have mentioned that EduStatu is blank in the first table. So you don't have any link (through columns) between the first and second tables. From the data you have given here, there is no way to pragmatically summarize the data in second table to meaningful data in the first table. Unless you have more columns in both tables, you are out of luck.

List the Id who appeared once only in Relational Algebra

Let's say there's a table called Winner, with 3 attributes: Name, Gender and Id.
Name Gender Id
Kevin Male 8
Kevin Male 8
Benny Male 31
Jenny Female 7
Louie Male 4
Peter Male 11
Kevin Male 2
Jenny Female 7
Jenny Female 7
Chris Male 23
Louie Female 14
Apart from those people who is actually 2 different person but with the same name and those people who have the same name but with different gender, their Id's will be the unique value to identify themselves. If I want to list all the Id's who appeared once only in the list, I am thinking to do something like this:
Am I expressing it correctly ?
I don't know what your formula is trying to say, but in SQL you can achieve the result you want with a GROUP BY query:
SELECT Id, COUNT(Id) AS idCount
FROM Winner
GROUP BY Id
HAVING COUNT(Id) = 1

Joining a Table with Itself with multiple WHERE statemetns

Long time reader, first time poster.
I'm trying to consolidate a table I have to the rate of sold goods getting lost in transit. In this table, we have four kinds of products, three countries of origin, three transit countries (where the goods are first shipped to before being passed to customers) and three destination countries. The table is as follows.
Status Product Count Origin Transit Destination
--------------------------------------------------------------------
Delivered Shoes 100 Germany France USA
Delivered Books 50 Germany France USA
Delivered Jackets 75 Germany France USA
Delivered DVDS 30 Germany France USA
Not Delivered Shoes 7 Germany France USA
Not Delivered Books 3 Germany France USA
Not Delivered Jackets 5 Germany France USA
Not Delivered DVDS 1 Germany France USA
Delivered Shoes 300 Poland Netherlands Canada
Delivered Books 80 Poland Netherlands Canada
Delivered Jackets 25 Poland Netherlands Canada
Delivered DVDS 90 Poland Netherlands Canada
Not Delivered Shoes 17 Poland Netherlands Canada
Not Delivered Books 13 Poland Netherlands Canada
Not Delivered Jackets 1 Poland Netherlands Canada
Delivered Shoes 250 Spain Ireland UK
Delivered Books 20 Spain Ireland UK
Delivered Jackets 150 Spain Ireland UK
Delivered DVDS 60 Spain Ireland UK
Not Delivered Shoes 19 Spain Ireland UK
Not Delivered Books 8 Spain Ireland UK
Not Delivered Jackets 8 Spain Ireland UK
Not Delivered DVDS 10 Spain Ireland UK
I would like to create a new table that shows the count of goods delivered and not delivered in one row, like this.
Product Delivered Not_Delivered Origin Transit Destination
Shoes 100 7 Germany France USA
Books 50 3 Germany France USA
Jackets 75 5 Germany France USA
DVDS 30 1 Germany France USA
Shoes 300 17 Poland Netherlands Canada
Books 80 13 Poland Netherlands Canada
Jackets 25 1 Poland Netherlands Canada
DVDS 90 0 Poland Netherlands Canada
Shoes 250 19 Spain Ireland UK
Books 20 8 Spain Ireland UK
Jackets 150 8 Spain Ireland UK
DVDS 60 10 Spain Ireland UK
I've had a look at some other posts and so far I haven't found exactly what I'm looking for. Perhaps the issue here is that there will be multiple WHERE statements in the code to ensure that I don't group all shoes together, ore all country groups.
Is this possible with SQL?
Something like this?
select
product
,sum(case when status = 'Delivered' then count else 0 end) as delivered
,sum(case when status = 'Not Delivered' then count else 0 end) as not_delivered
,origin
,transit
,destination
from table
group by
product
,origin
,transit
,destination
This is rather easy; instead of one line per Product, Origin, Transit, Destination and Status, you want one result line per Product, Origin, Transit and Destination only. So group by these four columns and aggregate conditionally:
select
product, origin, transit, destination,
sum(case when status = 'Delivered' then "count" else 0 end) as delivered,
sum(case when status = 'Not Delivered' then "count" else 0 end) as not_delivered
from mytable
group by product, origin, transit, destination;
BTW: It is not a good idea to use a keyword for a column name. I used double quotes to use your column count, which is standard SQL, but I don't know if it works in Google BigQuery. Maybe it must be "Count" rather than "count" or something entirely else.)
SELECT
product, origin, transit, destination,
SUM([count] * (status = 'Delivered')) AS delivered,
SUM([count] * (status = 'Not Delivered')) AS not_delivered
FROM mytable
GROUP BY 1, 2, 3, 4