SQL - How to solve this challenging problem? - sql

I have two tables
First table - ticket history:
customer_id
ticket_price
transportation
company_id
1
$342.21
Plane
D7573
1
$79.00
Car
G2943
1
$91.30
Car
M3223
2
$64.00
Car
K2329
3
$351.00
Plane
H2312
3
$354.27
Plane
P3857
4
$80.00
Car
N2938
4
$229.67
Plane
J2938
5
$77.00
Car
L2938
2nd table - companies and corresponding vehicles:
company_id
vehicle
D7573
Boeing
G2943
Coach
M3223
Shuttle
K2329
Shuttle
H2312
Airbus
P3857
Boeing
N2938
Minibus
J2938
Airbus
L2938
Minibus
Z3849
Airbus
A3848
Minibus
If a customer took both plane and car, then they are "mixed". Otherwise they are "plane" or "car" customers. How can I get the result below?
# shuttle took
Avg ticket price per customer
# of customers
mixed
??????????????
????????????????????????????
??????????????
plane
??????????????
????????????????????????????
??????????????
car
??????????????
????????????????????????????
??????????????

Your title is misleading, you need to specify which part you are having problem.
May not be the best answer. Tested in MYSQL env, sql fiddle
select transportation,
sum(no_of_shuttle) as no_of_shuttle_took,
round(avg(ticket_price), 2) as avg_price_per_customer,
count(customer_id) as no_of_customer
from (
select
customer_id,
'mixed' as transportation,
count(transportation) as no_of_shuttle,
sum(ticket_price) as ticket_price
from tickets
group by customer_id
having count(distinct transportation) > 1
union all
select
customer_id,
transportation,
count(transportation) as no_of_shuttle,
sum(ticket_price) as avg_ticket_price
from tickets
group by customer_id
having count(distinct transportation) = 1
) t
group by transportation
I am using subqueries to aggregate
customers with multiple distinct transportation type
customers with single distinct transportation type
Then I union these two results into one result set to further calculate the number of customers, number of shuttle took and average ticket price per customer. Note that I am rounding the price to 2 decimal places.

SQL Server using a common table expression:
;WITH cte1 as (
SELECT customer_id,CASE when count(distinct(transportation))>1 THEN 'Mixed' ELSE MAX(transportation) END as transportation, AVG(ticket_price) as avg_ticket_price,SUM(CASE WHEN vehicle='Shuttle' THEN 1 ELSE 0 END) as shuttle
FROM history as a
JOIN vehicle as b ON a.company_id=b.company_id
GROUP BY customer_id)
SELECT transportation,COUNT(DISTINCT(customer_id)) as num_cust, AVG(avg_ticket_price) as avg_ticket_price,sum(shuttle) as shuttle
FROM cte1
GROUP BY transportation

Related

Why did the 'NOT IN' work but not the 'NOT EXISTS'?

I've been trying to improve my SQL and was playing around with a 'NOT EXISTS' function. I needed to find the names of salespeople who did not have any sales to company 'RED'.
I tried this and it did not work:
SELECT DISTINCT
sp.name
FROM salesperson sp
WHERE NOT EXISTS (
SELECT
ord.sales_id
FROM
company cmp
LEFT JOIN orders ord
on cmp.com_id=ord.com_id
WHERE cmp.name = 'RED')
This query ran but returned a NULL. Then I changed it to this and it worked fine:
SELECT DISTINCT
sp.name
FROM salesperson sp
WHERE sp.sales_id NOT IN (
SELECT
ord.sales_id as sales_id
FROM
company cmp
left join orders ord
on cmp.com_id=ord.com_id
WHERE cmp.name = 'RED')
Can someone explain why 'NOT EXISTS' did not work in this instance?
.
.
.
.
.
.
Just in case, here is the exercise in full:
Given three tables: salesperson, company, orders
Output all the names in the table salesperson, who didn’t have sales to company 'RED'.
Table: salesperson
sales_id
name
salary
commission_rate
hire_date
1
John
100000
6
4/1/2006
2
Amy
120000
5
5/1/2010
3
Mark
65000
12
12/25/2008
4
Pam
25000
25
1/1/2005
5
Alex
50000
10
2/3/2007
The table salesperson holds the salesperson information. Every salesperson has a sales_id and a name.
Table: company
com_id
name
city
1
RED
Boston
2
ORANGE
New York
3
YELLOW
Boston
4
GREEN
Austin
The table company holds the company information. Every company has a com_id and a name.
Table: orders
order_id
order_date
com_id
sales_id
amount
1
1/1/2014
3
4
100000
2
2/1/2014
4
5
5000
3
3/1/2014
1
1
50000
4
4/1/2014
1
4
25000
The table orders holds the sales record information, salesperson and customer company are represented by sales_id and com_id.
expected output
name
Amy
Mark
Alex
Explanation:
According to order '3' and '4' in table orders, it is easy to tell only salesperson 'John' and 'Pam' have sales to company 'RED', so we need to output all the other names in the table salesperson.
I think your two queries are totally different.
NOT EXISTS - this will return data when that subquery doesn't return data. Which will always return some data so you will always get null. You need to join this subquery with the main query using WHERE sp.sales_id = ord.sales_id AND cmp.name = 'RED'
NOT IN - this is what you need for your purpose. You can see that it's clearly giving you data for not in (subquery) condition.
The equivalent NOT EXISTS requires a correlation clause:
SELECT sp.name
FROM salesperson sp
WHERE NOT EXISTS (SELECT ord.sales_id
FROM company cmp JOIN
orders ord
ON cmp.com_id = ord.com_id
WHERE sp.sales_id = ord.sales_id AND
cmp.name = 'RED'
);
Neither the NOT IN nor NOT EXISTS versions requires a LEFT JOIN in the subquery. In fact, the LEFT JOIN somewhat defeats the purpose of the logic.
Without the correlation clause, the subquery runs and it will return rows if any cmp.name is 'RED'. That appears to be the case and so NOT EXISTS always returns false.

Calculating average value per month and returning a single average

I am calculating average per month that should return a single average so i did an average of average , It has to be a one to many relationship between trucks table and orders , and i am using row mapper for spring jdbc
Select truckId, truckCode, purchasedDate,
descriptions, orderId, petrolQty, orderDate, avg(avgPetrolQty) as avgPerMonth, truckId from (
SELECT t.truckId, t.truckCode, t.purchasedDate, t.descriptions, o.orderId, o.petrolQty, o.orderDate,
COALESCE(monthname(o.orderDate),'Not Announced') as month,
IFNULL (avg(o.petrolQty),0) as avgPetrolQty
from truck t left join orderz o
on t.truckId = o.truckId
where t.truckId = 3
group by t.truckCode, o.orderId, o.orderDate
) group by truckCode, orderId
so i have such orders that belong to truckId 3
here is the full result
TRUCKID TRUCKCODE PURCHASEDDATE DESCRIPTION ORDERID PETROLQTY ORDERDATE AVGPERMONTH TRUCKID
3 BY2354 2005-05-01 BLACK TOYOTA 15 1 13.0 2006-01-21 13.0 3
3 BY2354 2005-05-01 BLACK TOYOTA 15 2 53.0 2002-01-21 53.0 3
Which gives two avgPermonth How can i write a better sql
and my sql above gives average per month as 100
which is not right even when i add more orders it doesnt change untill after 4 or more new orders then it gives some non logical average, What am i doing wrong , and its important my sql gets full detail of both tables orderId can be ommitted if it affects solution

SQL: Combine result columns

SELECT Category, SUM (Volume) as Volume
FROM Product
GROUP BY Category;
The above query returns this result:
Category Volume
-------------------
Oth 2
Tv Kids 4
{null} 1
Humour 3
Tv 5
Theatrical 13
Doc 6
I want to combine some of the columns as one colum as follows:
Oth,{null}, Humour, Doc as Others
Tv Kids, Tv as TV
Theatrical as Film
So my result would look like:
Category Volume
-------------------
Others 12
Tv 9
Film 13
How would I go about this?
You need a CASE here, like this:
SELECT
CASE
WHEN Category IN ('Oth','Humour','Doc')
OR Category IS NULL THEN 'Others'
WHEN Category IN ('Tv Kids','Tv') THEN 'TV'
WHEN Category = 'Theatrical' THEN 'Film'
END as category ,
SUM (Volume) as Volume
from Product
GROUP BY
CASE
WHEN Category IN ('Oth','Humour','Doc')
OR Category IS NULL THEN 'Others'
WHEN Category IN ('Tv Kids','Tv') THEN 'TV'
WHEN Category = 'Theatrical' THEN 'Film'
END;
Null must be dealt with outside the IN list as it is a special value.
I think you need to use a case statement to group categories together.
select case category when 'Tv' then 'Tv'
when 'Film' then 'Film'
else 'Other'
end as Category,
sum(Volume) as Volume
from (
SELECT Category, SUM (Volume) as Volume
FROM Product
GROUP BY Category
) subcategoryTotals
group by Category
(I think most DBs will allow you to group by the alias Category. (If not you can re-use the case statement)
Edit: Just a final thought (or two):
You should consider normalizing your database - for example, the Category column should really be a foreign key to a Categories table.
Also, this sql is reasonably ok because the case statement isn't too long or complex. If you wanted to split things up further it could quickly get to be unmanageable. I'd be inclined to use the idea of categories and subcategories in my database.
The best solution might be to implement those groups in the database. For instance:
category_group
id_category_group name sortkey
1 Others 3
2 TV 2
3 Film 1
category
id_category name id_category_group
1 Oth 1
2 Tv Kids 2
3 Humour 1
4 Tv 2
5 Theatrical 3
6 Doc 1
query
SELECT g.Name, SUM (p.Volume) as Volume
FROM Product p
LEFT JOIN Category c ON c.Id_Category = p.Id_Category
LEFT JOIN Category_Group g ON g.Id_Category_Group = c.Id_Category_Group
GROUP BY g.Id_Category_Group, g.Name
ORDER BY g.sortkey;
This makes NULL a group of its own, though. But well, it is a group of its own, as NULL means not known (yet), so you don't actually know whether it's TV, Film or Other. If you still want to count NULL as Others, change the ON clause accordingly:
LEFT JOIN Category_Group g
ON g.Id_Category_Group = COALESCE(c.Id_Category_Group, 3) -- default to group 'Others'
Try following,
select category_group , sum(volume) as Volume from
(
SELECT
Category,
Volume,
case
WHEN Category IN ('Oth','Humour','Doc','{null}') THEN 'Others'
WHEN Category IN ('Tv Kids','Tv') THEN 'TV'
WHEN Category = 'Theatrical' THEN 'Film'
end as category_group
FROM Product
) T
group by category_group

Invalid count and sum in cross tab query using PostgreSQL

I am using PostgreSQL 9.3 version database.
I have a situation where I want to count the number of products sales and sum the amount of product and also want to show the cities in a column where the product have sale.
Example
Setup
create table products (
name varchar(20),
price integer,
city varchar(20)
);
insert into products values
('P1',1200,'London'),
('P1',100,'Melborun'),
('P1',1400,'Moscow'),
('P2',1560,'Munich'),
('P2',2300,'Shunghai'),
('P2',3000,'Dubai');
Crosstab query:
select * from crosstab (
'select name,count(*),sum(price),city,count(city)
from products
group by name,city
order by name,city
'
,
'select distinct city from products order by 1'
)
as tb (
name varchar(20),TotalSales bigint,TotalAmount bigint,London bigint,Melborun bigint,Moscow bigint,Munich bigint,Shunghai bigint,Dubai bigint
);
Output
name totalsales totalamount london melborun moscow munich shunghai dubai
---------------------------------------------------------------------------------------------------------
P1 1 1200 1 1 1
P2 1 3000 1 1 1
Expected Output:
name totalsales totalamount london melborun moscow munich shunghai dubai
---------------------------------------------------------------------------------------------------------
P1 3 2700 1 1 1
P2 3 6860 1 1 1
Your first mistake seems to be simple. According to the 2nd parameter of the crosstab() function, 'Dubai' must come as first city (sorted by city). Details:
PostgreSQL Crosstab Query
The unexpected values for totalsales and totalamount represent values from the first row for each name group. "Extra" columns are treated like that. Details:
Pivot on Multiple Columns using Tablefunc
To get sums per name, run window functions over your aggregate functions. Details:
Get the distinct sum of a joined table column
select * from crosstab (
'select name
,sum(count(*)) OVER (PARTITION BY name)
,sum(sum(price)) OVER (PARTITION BY name)
,city
,count(city)
from products
group by name,city
order by name,city
'
-- ,'select distinct city from products order by 1' -- replaced
,$$SELECT unnest('{Dubai,London,Melborun
,Moscow,Munich,Shunghai}'::varchar[])$$
) AS tb (
name varchar(20), TotalSales bigint, TotalAmount bigint
,Dubai bigint
,London bigint
,Melborun bigint
,Moscow bigint
,Munich bigint
,Shunghai bigint
);
Better yet, provide a static set as 2nd parameter. Output columns are hard coded, it may be unreliable to generate data columns dynamically. If you a another row with a new city, this would break.
This way you can also order your columns as you like. Just keep output columns and 2nd parameter in sync.
Honestly I think your database needs some drastic normalization and your results in several columns (one for each city name) is not something I would do myself.
Nevertheless if you want to stick to it you can do it this way.
For the first step you need get the correct amounts. This would do the trick quite fast:
select name, count(1) totalsales, sum(price) totalAmount
from products
group by name;
This will be your result:
NAME TOTALSALES TOTALAMOUNT
P2 3 6860
P1 3 2700
You would get the Products/City this way:
select name, city, count(1) totalCityName
from products
group by name, city
order by name, city;
This result:
NAME CITY TOTALCITYNAME
P1 London 1
P1 Melborun 1
P1 Moscow 1
P2 Dubai 1
P2 Munich 1
P2 Shunghai 1
If you really would like a column per city you could do something like:
select name,
count(1) totalsales,
sum(price) totalAmount,
(select count(1)
from Products a
where a.City = 'London' and a.name = p.name) London,
...
from products p
group by name;
But I would not recommend it!!!
This would be the result:
NAME TOTALSALES TOTALAMOUNT LONDON ...
P1 3 2700 1
P2 3 6860 0
Demonstration here.

Fill Users table with data using percentages from another table

I have a Table Users (it has millions of rows)
Id Name Country Product
+----+---------------+---------------+--------------+
1 John Canada
2 Kate Argentina
3 Mark China
4 Max Canada
5 Sam Argentina
6 Stacy China
...
1000 Ken Canada
I want to fill the Product column with A, B or C based on percentages.
I have another table called CountriesStats like the following
Id Country A B C
+-----+---------------+--------------+-------------+----------+
1 Canada 60 20 20
2 Argentina 35 45 20
3 China 40 10 50
This table holds the percentage of people with each product. For example in Canada 60% of people have product A, 20% have product B and 20% have product C.
I would like to fill the Users table with data based on the Percentages in the second data. So for example if there are 1 million user in canada, I would like to fill 600000 of the Product column in the Users table with A 200000 with B and 200000 with C
Thanks for any help on how to do that. I do not mind doing it in multiple steps I jsut need hints on how can I achieve that in SQL
The logic behind this is not too difficult. Assign a sequential counter to each person in each country. Then, using this value, assign the correct product based on this value. For instance, in your example, when the number is less than or equal to 600,000 then 'A' gets assigned. For 600,001 to 800,000 then 'B', and finally 'C' to the rest.
The following SQL accomplishes this:
with toupdate as (
select u.*,
row_number() over (partition by country order by newid()) as seqnum,
count(*) over (partition by country) as tot
from users u
)
update u
set product = (case when seqnum <= tot * A / 100 then 'A'
when seqnum <= tot * (A + B) / 100 then 'B'
else 'C'
end)
from toupdate u join
CountriesStats cs
on u.country = cs.country;
The with statement defines an updatable subquery with the sequence number and total for each each country, on each row. This is a nice feature of SQL Server, but is not supported in all databases.
The from statement is joining back to the CountriesStats table to get the needed values for each country. And the case statement does the necessary logic.
Note that the sequential number is assigned randomly, using newid(), so the products should be assigned randomly through the initial table.