Joining a Table with Itself with multiple WHERE statemetns - sql

Long time reader, first time poster.
I'm trying to consolidate a table I have to the rate of sold goods getting lost in transit. In this table, we have four kinds of products, three countries of origin, three transit countries (where the goods are first shipped to before being passed to customers) and three destination countries. The table is as follows.
Status Product Count Origin Transit Destination
--------------------------------------------------------------------
Delivered Shoes 100 Germany France USA
Delivered Books 50 Germany France USA
Delivered Jackets 75 Germany France USA
Delivered DVDS 30 Germany France USA
Not Delivered Shoes 7 Germany France USA
Not Delivered Books 3 Germany France USA
Not Delivered Jackets 5 Germany France USA
Not Delivered DVDS 1 Germany France USA
Delivered Shoes 300 Poland Netherlands Canada
Delivered Books 80 Poland Netherlands Canada
Delivered Jackets 25 Poland Netherlands Canada
Delivered DVDS 90 Poland Netherlands Canada
Not Delivered Shoes 17 Poland Netherlands Canada
Not Delivered Books 13 Poland Netherlands Canada
Not Delivered Jackets 1 Poland Netherlands Canada
Delivered Shoes 250 Spain Ireland UK
Delivered Books 20 Spain Ireland UK
Delivered Jackets 150 Spain Ireland UK
Delivered DVDS 60 Spain Ireland UK
Not Delivered Shoes 19 Spain Ireland UK
Not Delivered Books 8 Spain Ireland UK
Not Delivered Jackets 8 Spain Ireland UK
Not Delivered DVDS 10 Spain Ireland UK
I would like to create a new table that shows the count of goods delivered and not delivered in one row, like this.
Product Delivered Not_Delivered Origin Transit Destination
Shoes 100 7 Germany France USA
Books 50 3 Germany France USA
Jackets 75 5 Germany France USA
DVDS 30 1 Germany France USA
Shoes 300 17 Poland Netherlands Canada
Books 80 13 Poland Netherlands Canada
Jackets 25 1 Poland Netherlands Canada
DVDS 90 0 Poland Netherlands Canada
Shoes 250 19 Spain Ireland UK
Books 20 8 Spain Ireland UK
Jackets 150 8 Spain Ireland UK
DVDS 60 10 Spain Ireland UK
I've had a look at some other posts and so far I haven't found exactly what I'm looking for. Perhaps the issue here is that there will be multiple WHERE statements in the code to ensure that I don't group all shoes together, ore all country groups.
Is this possible with SQL?

Something like this?
select
product
,sum(case when status = 'Delivered' then count else 0 end) as delivered
,sum(case when status = 'Not Delivered' then count else 0 end) as not_delivered
,origin
,transit
,destination
from table
group by
product
,origin
,transit
,destination

This is rather easy; instead of one line per Product, Origin, Transit, Destination and Status, you want one result line per Product, Origin, Transit and Destination only. So group by these four columns and aggregate conditionally:
select
product, origin, transit, destination,
sum(case when status = 'Delivered' then "count" else 0 end) as delivered,
sum(case when status = 'Not Delivered' then "count" else 0 end) as not_delivered
from mytable
group by product, origin, transit, destination;
BTW: It is not a good idea to use a keyword for a column name. I used double quotes to use your column count, which is standard SQL, but I don't know if it works in Google BigQuery. Maybe it must be "Count" rather than "count" or something entirely else.)

SELECT
product, origin, transit, destination,
SUM([count] * (status = 'Delivered')) AS delivered,
SUM([count] * (status = 'Not Delivered')) AS not_delivered
FROM mytable
GROUP BY 1, 2, 3, 4

Related

sql command to find out how many players score how much

I have a table like these
country
gender
player
score
year
ID
Germany
male
Michael
14
1990
1
Austria
male
Simon
13
1990
2
Germany
female
Mila
16
1990
3
Austria
female
Simona
15
1990
4
This is a table in the database. It shows 70 countries around the world with player names and gender. It shows which player score how many goals in which year. The years goes from 1990 to 2015. So the table is large. Now I would like to know how many goals all female player and how many male player from Germany have scored from 2010 to 2015. So I would like to know the total score of german male player and the total score of german female player every year from 2010 to 2015 with a Sqlite
I expecting these output
country
gender
score
year
Germany
male
114
2010
Germany
female
113
2010
Germany
male
110
2011
Germany
female
111
2011
Germany
male
119
2012
Germany
female
114
2012
Germany
male
119
2013
Germany
female
114
2013
Germany
male
129
2014
Germany
female
103
2014
Germany
male
109
2015
Germany
female
104
2015
SELECT
country,
gender,
year,
SUM(score) AS score
FROM
<table_name>
WHERE
country ='Germany'
AND year between 2010 and 2015
GROUP BY
1, 2, 3
filtering on country and the years you are interested in
then summing up total score using group by

Pandas removing rows with incomplete time series in panel data

I have a dataframe along the lines of the below:
Country1 Country2 Year
1 Italy Greece 2000
2 Italy Greece 2001
3 Italy Greece 2002
4 Germany Italy 2000
5 Germany Italy 2002
6 Mexico Canada 2000
7 Mexico Canada 2001
8 Mexico Canada 2002
9 US France 2000
10 US France 2001
11 Greece Italy 2000
12 Greece Italy 2001
I want to keep only the rows in which there are observations for the entire time series (2000-2002). So, the end result would be:
Country1 Country2 Year
1 Italy Greece 2000
2 Italy Greece 2001
3 Italy Greece 2002
4 Mexico Canada 2000
5 Mexico Canada 2001
6 Mexico Canada 2002
One idea is reshape by crosstab and test if rows has not 0 values by DataFrame.ne with DataFrame.all, convert index to DataFrame by MultiIndex.to_frame and last get filtered rows in DataFrame.merge:
df1 = pd.crosstab([df['Country1'], df['Country2']], df['Year'])
df = df.merge(df1.index[df1.ne(0).all(axis=1)].to_frame(index=False))
print (df)
Country1 Country2 Year
0 Italy Greece 2000
1 Italy Greece 2001
2 Italy Greece 2002
3 Mexico Canada 2000
4 Mexico Canada 2001
5 Mexico Canada 2002
Or if need test some specific range is possible compare sets in GroupBy.transform:
r = set(range(2000, 2003))
df = df[df.groupby(['Country1', 'Country2'])['Year'].transform(lambda x: set(x) == r)]
print (df)
Country1 Country2 Year
1 Italy Greece 2000
2 Italy Greece 2001
3 Italy Greece 2002
6 Mexico Canada 2000
7 Mexico Canada 2001
8 Mexico Canada 2002
One option is to pivot the data, drop null rows and reshape back; this only works if the combination of Country* and Year is unique (in the sample data it is ):
(df.assign(dummy = 1)
.pivot(('Country1', 'Country2'), 'Year')
.dropna()
.stack()
.drop(columns='dummy')
.reset_index()
)
Country1 Country2 Year
0 Italy Greece 2000
1 Italy Greece 2001
2 Italy Greece 2002
3 Mexico Canada 2000
4 Mexico Canada 2001
5 Mexico Canada 2002

sql select 1 item from list

i want to select a column from a table which can have another column reference many times.
select t1.name
from ccp.ENTITIES t1
Non
Albania
Australia
China
Czech Republic
Egypt
Germany
Greece
Group
Hungary
India
Ireland
Italy
Luxembourg
Malaysia
Malta
Netherlands
Portugal
Romania
Spain
Turkey
UK
US
this will give me a list of names of which i want 1 row from another table
v_networks_by_lm this table holds records with column t1.name and network. i want the column network only once for each item in the list. v_networks_by_lmcan hold many t1.name
entity name
a Spain
b Spain
c Spain
d Spain
e Spain
f Spain
g Spain
h Germany
i Germany
j Germany
k Germany
l Germany
m Germany
n Germany
o Germany
p UK
q Germany
r Spain
s Spain
t Portugal
u Portugal
v Portugal
q Portugal
from the above data which is in v_networks_by_lm i only want name returned once with any value of entity. but i want to pick the name from ENTITIES as it can be dynamic
I think aggregation does what you want:
SELECT MAX(n.network) as network, e.name
FROM ccp.ENTITIES e JOIN
ccp.v_networks_by_lm n
ON n.name = e.name
GROUP BY e.name;
Sounds like you want a subquery to get the single instance of name from the table, and then you do the join against entities.
Select sub.one_of_entity_values, sub.name
from ccp.entities e
inner join (
select max(entity) as one_of_entity_values, name
from v_networks_by_lm
group by name) sub on e.name = sub.name

Same-table Tree Table Query in SQL Server

I've searched but found nothing that could help.
I have the following table in a SQL Server 2005 database:
Parent Child Value
---- -------- ---------
America Mexico 8
America Canada 1
Asia Japan 5
Asia Korea 7
Europe Spain 0
Europe Italy 2
Africa Zimbabwe 1
Mexico Baja California 0
America USA 3
USA California 1
USA Texas 2
Parent and Child are Primary Key, value is not important (IMO). I would like to create a view that results in something like this:
Parent Child Value
---- -------- ---------
America USA 3
USA California 1
USA Texas 2
I would search for America, and the result will give back every nested child there is, recursively, no matter how many it has, since I could include cities, localities, etc.
What I need is similar to what some call a BOM explosion.
Here is how you can do it:
with cte as (
select parent, child
from t
union all
select cte.parent, t.child
from cte join
t
on cte.child = t.parent
)
select cte.*
from cte
where parent = 'America';
Here is a small SQL Fiddle example.

Sams Teach Yourself SQL in 10 minutes - Question about GROUP BY

i read the book "Sams Teach Yourself SQL in 10 minutes, Third Edition" and in the lesson 10 "Grouping Data", section "Creating Groups", i can't understand the following:
"Aside from the aggregate calculations statements, every column in your SELECT statement must be present in the GROUP BY clause."
Why? I tried this and i think that it is not true.
For example, consider a table 'World' with the columns 'continent', 'country', 'population'.
SELECT continent, country
FROM World
GROUP BY continent;
According to the book, this should lead to an error, right? But it doesn't. I can group my data depending on the continent (so we have at the results 7 continents) and next to each continent, a random country name.
Like this
continent country
North America Canada
South America Brazil
Europe France
Africa Cameroon
Asia Japan
Australia New Zealand
Antarctica TuxLand
You are most probably using MySQL which allows ungrouped and unaggregated expressions in SELECT clause.
This is violation of standard of course.
This is intended to simplify GROUP BY with joins on a PRIMARY KEY:
SELECT a.*, SUM(b.value)
FROM a
JOIN b
ON b.a_id = a.id
GROUP BY
a.id
Normally, you would have either to add all columns from a into the GROUP BY clause or use a subquery.
MySQL allows you not to do it since all values from a are guaranteed to be the same for a given value of the PRIMARY KEY (which is grouped on).
This is correct and should produce no error in some forms of SQL such as MySQL. You may optionally use the GROUP BY statement on more than one column but it's not required.
GROUP BY will list the first result of the columns specified - so in your case, it would return the first country/continent pair.
PostgreSQL and MySQL allow this, using one field for the group by.
The tutorial probably assumes you should use GROUP BY on all fields so from what you select, you don't lose any data - it would show every country/continent in the above example, but only once.
Here's an example table:
Continent | Country | Random_Field
---------------------------------------------
North America Canada Cake
North America Canada Dog
South America Brazil Cat
Europe France Frog
Africa Cameroon House
Asia Japan Gadget
Asia India Dance
Australia New Zealand Frodo
Antarctica TuxLand Linux
In your first statement:
SELECT continent, country
FROM World
GROUP BY continent;
The output would be:
Continent | Country
--------------------------
North America Canada
South America Brazil
Europe France
Africa Cameroon
Asia Japan
Australia New Zealand
Antarctica TuxLand
Notice one of the Asia rows was lost, despite being different.
Using a GROUP BY on both:
SELECT continent, country
FROM World
GROUP BY continent, country;
Would yield:
Continent | Country
-----------------------------
North America Canada
South America Brazil
Europe France
Africa Cameroon
Asia Japan
Asia India
Australia New Zealand
Antarctica TuxLand