Finding top 10 products sold in a year - sql

I have these tables below along with the definition. I want to find top 10 products sold in a year after finding counts and without using aggregation and in an optimized way. I want to know if aggregation is still needed or I can accomplish it without using aggregation. Below is the query. Can anyone suggest a better approach.
CREATE TABLE Customer (
id int not null,
first_name VARCHAR(30),
last_name VARCHAR(30),
Address VARCHAR(60),
State VARCHAR(30),
Phone text,
PRIMARY KEY(id)
);
CREATE TABLE Product (
ProductId int not null,
name VARCHAR(30),
unitprice int,
BrandID int,
Brandname varchar(30),
color VARCHAR(30),
PRIMARY KEY(ProductId)
);
Create Table Sales (
SalesId int not null,
Date date,
Customerid int,
Productid int,
Purchaseamount int,
PRIMARY KEY(SalesId),
FOREIGN KEY (Productid) REFERENCES Product(ProductId),
FOREIGN KEY (Customerid) REFERENCES Customer(id)
)
Sample Data:
insert into
Customer(id, first_name, last_name, address, state, phone)
values
(1111, 'andy', 'johnson', '123 Maryland Heights', 'MO', 3211451234),
(1112, 'john', 'smith', '237 Jackson Heights', 'TX', 3671456534),
(1113, 'sandy', 'fleming', '878 Jersey Heights', 'NJ', 2121456534),
(1114, 'tony', 'anderson', '789 Harrison Heights', 'CA', 6101456534)
insert into
Product(ProductId, name, unitprice, BrandId, Brandname)
values
(1, 'watch',200, 100, 'apple'),
(2, 'ipad', 429, 100, 'apple'),
(3, 'iphone', 799, 100, 'apple'),
(4, 'gear', 300, 110, 'samsung'),
(5, 'phone',1000, 110, 'samsung'),
(6, 'tab', 250, 110, 'samsung'),
(7, 'laptop', 1300, 120, 'hp'),
(8, 'mouse', 10, 120, 'hp'),
(9, 'monitor', 400, 130, 'dell'),
(10, 'keyboard', 40, 130, 'dell'),
(11, 'dvddrive', 100, 130, 'dell'),
(12, 'dvddrive', 90, 150, 'lg')
insert into
Sales(SalesId, Date, CustomerID, ProductID, Purchaseamount)
values (30, '01-10-2019', 1111, 1, 200),
(31, '02-10-2019', 1111, 3, 799),
(32, '03-10-2019', 1111, 2, 429),
(33, '04-10-2019', 1111, 4, 300),
(34, '05-10-2019', 1111, 5, 1000),
(35, '06-10-2019', 1112, 7, 1300),
(36, '07-10-2019', 1112, 9, 400),
(37, '08-10-2019', 1113, 5, 2000),
(38, '09-10-2019', 1113, 4, 300),
(39, '10-10-2019', 1113, 3, 799),
(40, '11-10-2019', 1113, 2, 858),
(41, '01-10-2020', 1111, 1, 400),
(42, '02-10-2020', 1111, 2, 429),
(43, '03-10-2020', 1112, 7, 1300),
(44, '04-10-2020', 1113, 7, 2600),
(45, '05-10-2020', 1114, 7, 1300),
(46, '06-10-2020', 1114, 7, 1300),
(47, '07-10-2020', 1114, 9, 800)
Tried this:
SELECT PCY.Name, PCY.Year, PCY.SEQNUM
FROM (SELECT P.Name AS Name, Extract('Year' from S.Date) AS YEAR, COUNT(P.Productid) AS CNT,
RANK() OVER (PARTITION BY Extract('Year' from S.Date) ORDER BY COUNT(P.Productid) DESC) AS RANK
FROM Sales S inner JOIN
Product P
ON S.Productid = P.Productid
) PCY
WHERE PCY.RANK <= 10;
I am seeing this error:
ERROR: column "p.name" must appear in the GROUP BY clause or be used in an aggregate function
LINE 2: FROM (SELECT P.Name AS Name, Extract('Year' from S.Date) AS ...
^
SQL state: 42803
Character: 52

I don't understand why you don't want to use an aggregate function when you have to aggregate over your data. This query works fine, without any issues on the GROUP BY:
WITH stats AS (
SELECT EXTRACT
( YEAR FROM DATE ) AS y,
P.productid,
P.NAME,
COUNT ( * ) numbers_sold,
RANK ( ) OVER ( PARTITION BY EXTRACT ( YEAR FROM DATE ) ORDER BY COUNT ( * ) DESC ) r
FROM
product
P JOIN sales S ON S.Productid = P.Productid
GROUP BY
1,2
)
SELECT y
, name
, numbers_sold
FROM stats
WHERE r <= 10;
This works because the productid is the primary key that has a functional dependency to the product name.
By the way, tested on version 12, but it should work on older and newer versions as well.

Related

How to fetch the rows with their predefine order in SQL Server?

I have a SQL Server table like this:
MenuID MenuName MenuColor
---------------------------------------
10 Daily Tickets Gray
15 Kids Ticket Dark Pink
20 Group Discount Dark Ash
11 Discount ticket Brown
17 Referral Ticket Beige
22 Frequent visitor Musturd
27 Annual Pass sky blue
25 Kids Pass Pink
24 free Ticket Yellow
This table has lot of records and more columns too..
Desired result - first four Menus should be ordered with pre-defined order (which I mentioned in my trial query) and remaining should be ordered ASC on MenuName column
Desired result set:
MenuID MenuName MenuColor
---------------------------------------
10 Daily Tickets Gray
27 Annual Pass sky blue
22 Frequent visitor Musturd
20 Group Discount Dark Ash
11 Discount ticket Brown
24 free Ticket Yellow
25 Kids Pass Pink
15 Kids Ticket Dark Pink
17 Referral Ticket Beige
This is the query I tried for this:
SELECT *
FROM tMenus m
ORDER BY
(CASE m.MenuName
WHEN 'Daily Tickets' THEN 1
WHEN 'Annual Pass' THEN 2
WHEN 'Frequent visitor' THEN 3
WHEN 'Group Discount' THEN 4
END), m.MenuName ASC;
However, this is not returning the result that I want. Please correct me where I am wrong.
Thanks
Perhaps you just need an else:
ORDER BY (CASE m.MenuName
WHEN 'Daily Tickets' THEN 1
WHEN 'Annual Pass' THEN 2
WHEN 'Frequent visitor' THEN 3
WHEN 'Group Discount' THEN 4
ELSE 5
END) , m.MenuName ASC;
Adding "DisplayOrder" to the actual table...
IF OBJECT_ID('tempdb..#Menue', 'U') IS NOT NULL
DROP TABLE #Menue;
CREATE TABLE #Menue (
MenuID INT NOT NULL PRIMARY KEY,
MenuName VARCHAR(30) NOT NULL,
MenuColor VARCHAR(10) NOT NULL,
DisplayOrder INT NOT NULL
);
INSERT #Menue(MenuID, MenuName, MenuColor, DisplayOrder) VALUES
(10,'Daily Tickets', 'Gray', 100),
(15,'Kids Ticket', 'Dark Pink', 800),
(20,'Group Discount', 'Dark Ash', 400),
(11,'Discount ticket', 'Brown', 500),
(17,'Referral Ticket', 'Beige', 900),
(22,'Frequent visitor', 'Musturd', 300),
(27,'Annual Pass', 'sky blue', 200),
(25,'Kids Pass', 'Pink', 700),
(24,'free Ticket', 'Yellow', 600);
-- Note: I'm leaving gaps in the DisplayOrder values.
-- This makes it easy to add new values and set their
-- values w/o having to adjust existing values.
SELECT
m.MenuID,
m.MenuName,
m.MenuColor
FROM
#Menue m
ORDER BY
m.DisplayOrder;
Edited answer...
IF OBJECT_ID('tempdb..#MenueDisplayOrder', 'U') IS NOT NULL
DROP TABLE #MenueDisplayOrder;
CREATE TABLE #MenueDisplayOrder (
MenueID INT NOT NULL, --add FK to Menues table
DisplayTypeID INT NOT NULL, --add FK to available Types table
DisplayOrder INT NOT NULL
PRIMARY KEY CLUSTERED (DisplayTypeID, MenueID)
);
INSERT #MenueDisplayOrder (MenueID, DisplayTypeID, DisplayOrder) VALUES
(10, 1, 100), (11, 1, 500), (15, 1, 800), (17, 1, 900), (20, 1, 400),
(22, 1, 300), (24, 1, 600), (25, 1, 700), (27, 1, 200),
(27, 2, 100), (25, 2, 500), (24, 2, 800), (20, 2, 900), (17, 2, 400),
(22, 2, 300), (15, 2, 600), (11, 2, 700), (10, 2, 200),
(15, 3, 100), (11, 3, 500), (10, 3, 800), (22, 3, 900), (24, 3, 400),
(17, 3, 300), (20, 3, 600), (27, 3, 700), (25, 3, 200);
SELECT
m.MenuID,
m.MenuName,
m.MenuColor
FROM
#Menue m
JOIN #MenueDisplayOrder mdo
ON m.MenuID = mdo.MenueID
WHERE
mdo.DisplayTypeID = 2 -- alter this value to change the display order.
ORDER BY
mdo.DisplayOrder;

Performing a subquery using values from a column in Oracle

I'm trying to create a calculated column in SQL. Basically I need to get a set of distinct dates and determine how many customers there are in the population on that particular date. The result should be something like:
Date______| Customers
2016-01-01 | 1
2016-01-01 | 2
2016-01-05 | 3
2016-02-09 | 4
etc.
I created a sample database & data (using MySQL as I don't have permission to create tables in our Oracle dbs) with the following script:
create database customer_example;
use customer_example;
create table customers (
customer_id int not null primary key,
customer_name varchar(255) not null,
term_date DATE);
create table employee (
employee_id int not null primary key,
employee_name varchar(255) not null);
create table cust_emp (
ce_id int not null AUTO_INCREMENT,
emp_id int not null,
cust_id int not null,
start_date date,
end_date date,
deleted_yn boolean,
primary key (emp_id, cust_id, ce_id),
foreign key (cust_id) references customers(customer_id),
foreign key (emp_id) references employee(employee_id));
insert into customers (customer_id, customer_name)
values (1, 'Bobby Tables'), (2, 'Grover Cleveland'), (3, 'Chester Arthur'), (4, 'Jan Bush'), (5, 'Emanuel Porter'), (6, 'Darren King'), (7, 'Casey Mcguire'), (8, 'Robin Simpson'), (9, 'Robin Tables'), (10, 'Mitchell Arnold');
insert into customers (customer_id, customer_name, term_date)
values (11, 'Terrell Graves', '2017-01-01'), (12, 'Richard Wagner', '2016-10-31'), (13, 'Glenn Saunders', '2016-11-19'), (14, 'Bruce Irvin', '2016-03-11'), (15, 'Glenn Perry','2016-06-06'), (16, 'Hazel Freeman', '2016-07-10'),
(17, 'Martin Freeman', '2016-02-11'), (18, 'Morgan Freeman', '2017-02-01'), (19, 'Dirk Drake', '2017-01-12'), (20, 'Fraud Fraud', '2016-12-31');
insert into employee (employee_id, employee_name)
values (1000, 'Cedrick French'), (1001, 'Jane Phillips'), (1002, 'Brian Green'), (1003, 'Shawn Brooks'), (1004, 'Clarence Thomas');
insert into cust_emp (emp_id, cust_id, start_date, end_date)
values (1000, 1, '2016-01-01', '2016-02-01'), (1000, 1, '2016-02-01', '2016-02-01'), (1000, 2,'2016-01-05', '2016-01-16'),(1000, 3,'2016-02-09', '2016-03-14'),(1000, 4,'2016-03-20', '2016-04-23'),
(1000, 5,'2016-01-01', '2016-01-16'),(1000, 6,'2016-01-01', '2016-01-16'),(1004, 7, '2016-01-14', '206-01-16'),
(1004, 8, '2016-01-13', '2016-01-16'),(1004, 9, '2016-01-05', '2016-01-16'), (1003, 12, '2016-04-21', '2016-11-30');
insert into cust_emp (emp_id, cust_id, start_date, deleted_yn)
values (1002, 11, '2016-04-10', TRUE),(1003, 10, '2016-01-16', FALSE), (1004, 12, '2016-04-20', TRUE), (1004, 12, '2016-04-19', FALSE), (1003, 13, '2016-06-06', TRUE), (1002, 14, '2016-06-10', TRUE),
(1004, 15, '2016-03-25', TRUE), (1004, 17, '2016-01-02', TRUE), (1004, 18, '2017-01-01', TRUE), (1004, 19, '2016-11-13', TRUE), (1004, 20, '2016-03-10', TRUE), (1004, 16, '2016-05-13', TRUE);
insert into cust_emp (emp_id, cust_id, start_date)
values (1002, 1, '2016-02-01'), (1004, 2, '2016-01-16'),(1003, 3, '2016-03-14'),(1002, 4, '2016-04-23'),(1004, 5, '2016-01-16'),(1002, 6, '2016-01-16'),(1004, 7, '2016-01-16'),
(1004, 8, '2016-01-16'),(1002, 9, '2016-01-16'), (1004, 10, '2016-01-16');
The following SQL works fine in MySQL but when I try it in Oracle, I get an 'invalid identifier' on 'dates':
select distinct(ce.start_date) as dates,
(select count(distinct(c.customer_id))
from customers c
inner join cust_emp ce on c.customer_id = ce.cust_id
where ce.start_date < dates
and (ce.end_date > dates or (ce.deleted_yn = false or ce.deleted_yn is null))
and (c.term_date > dates or c.term_date is null)
)
from cust_emp as ce;
It seems as though this is because the dates is too far in a subquery. I've tried a CTE as well, but that seems to have the same issue as it gave the same error. How can I re-write this so that I can assess how many customers were there for each date in Oracle?
Huh?
Isn't this what you want?
select ce.dates as dates, count(distinct c.customer_id)
from cust_emp ce join
customers c
on c.customer_id = ce.cust_id
where ce.start_date < ce.dates and
(ce.end_date > ce.dates or ce.deleted_yn = false or ce.deleted_yn is null) and
(c.term_date > ce.dates or c.term_date is null)
group by ce.dates
order by ce.dates;
I don't really understand the use of the subquery with select distinct. The logic you describe is more easily understood as a simple aggregation.
I'm not sure where dates comes from. It is not in your data model, but it is in your sample query.

Inserting Multiple Rows into a Table

I've written the code below and keep getting an error for incorrect syntax
It said at line 10 near the , - so this line:
values(1, 'Stolz', 'Ted', 25000, NULL), )
If I only try to insert the first row of data it works fine, it's when I try to do multiple. Am I missing something really simple?
Drop Table #TPerson
CREATE TABLE #TPerson
(
personid int PRIMARY KEY NOT NULL,
lastname varchar(50) NULL,
firstname varchar(50) NULL,
salary money NULL,
managerid int NULL
);
Insert Into #TPerson(Personid, lastname, firstname, salary, managerid)
values (1, 'Stolz', 'Ted', 25000, NULL),
(2, 'Boswell', 'Nancy', 23000, 1),
(3, 'Hargett', 'Vincent', 22000, 1),
(4, 'Weekley', 'Kevin', 22000, 3),
(5, 'Metts', 'Geraldine', 22000, 2),
(6, 'McBride', 'Jeffrey', 21000, 2),
(7, 'Xiong', 'Jay', 20000, 3)
You can write something like this:
Insert Into #TPerson(Personid,lastname,firstname,salary,managerid)
select 1,'Stolz','Ted',25000,NULL
union all select 2,'Boswell','Nancy',23000,1
union all select 3,'Hargett','Vincent',22000,1
union all select 4,'Weekley','Kevin',22000,3
union all select 5,'Metts','Geraldine',22000,2
union all select 6,'McBride','Jeffrey',21000,2
union all select 7,'Xiong','Jay',20000,3

Constraints w/ Recursive Postgres Query

I'm looking to skip a certain city as I traverse my data. Currently, this query works to find all available flights from SLC to LA, including trips with layovers. You'll see this in the picture below.
However, I want to be able to exclude certain cities in a flight plan. For example, if Montreal is a stop between SLC and LA, that trip wouldn't be considered.
I've tried putting various things in the WHERE clauses, but to no avail. Any other suggestions? Sample data an queries are given below.
WITH RECURSIVE segs AS (
SELECT f0.flight_num::text as flight
, src_city, dest_city
, dep_time AS departure
, arr_time AS arrival
, airfare, mileage
, 1 as hops
, (arr_time - dep_time)::interval AS total_time
, '00:00'::interval as waiting_time
FROM flight f0
WHERE src_city = 'SLC' -- <SRC_CITY>
UNION ALL
SELECT s.flight || '-->' || f1.flight_num::text as flight
, s.src_city, f1.dest_city
, s.departure AS departure
, f1.arr_time AS arrival
, s.airfare + f1.airfare as airfare
, s.mileage + f1.mileage as mileage
, s.hops + 1 AS hops
, s.total_time + (f1.arr_time - f1.dep_time)::interval AS total_time
, s.waiting_time + (f1.dep_time - s.arrival)::interval AS waiting_time
FROM segs s
JOIN flight f1
ON f1.src_city = s.dest_city
AND f1.dep_time > s.arrival -- you can't leave until you are there
)
SELECT *
FROM segs
WHERE dest_city = 'LA' -- <DEST_CITY>
ORDER BY airfare desc
;
create table flight
( flight_num BIGSERIAL PRIMARY KEY
, src_city varchar
, dest_city varchar
, dep_time TIME
, arr_time TIME
, airfare INTEGER
, mileage INTEGER
);
insert into flight VALUES
(101, 'Montreal', 'NY', '05:30', '06:45', 180, 170),
(102, 'Montreal', 'Washington', '01:00', '02:35', 100, 180),
(103, 'NY', 'Chicago', '08:00', '10:00', 150, 300),
(105, 'Washington', 'KansasCity', '06:00', '08:45', 200, 600),
(106, 'Washington', 'NY', '12:00', '13:30', 50, 80),
(107, 'Chicago', 'SLC', '11:00', '14:30', 220, 750),
(110, 'KansasCity', 'Denver', '14:00', '15:25', 180, 300),
(111, 'KansasCity', 'SLC', '13:00', '15:30', 200, 500),
(112, 'SLC', 'SanFran', '18:00', '19:30', 85, 210),
(113, 'SLC', 'LA', '17:30', '19:00', 185, 230),
(115, 'Denver', 'SLC', '15:00', '16:00', 75, 300),
(116, 'SanFran', 'LA', '22:00', '22:30', 50, 75),
(118, 'LA', 'Seattle', '20:00', '21:00', 150, 450);
To exclude certain cities from the flight plan you should add where clauses at 2 places in your query as following:
Right after src_city condition
...
WHERE src_city = 'SLC' -- <SRC_CITY>
AND dest_city <> 'Montreal'
...
In the recursive join condition
...
AND f1.dep_time > s.arrival -- you can't leave until you are there
AND f1.dest_city <> 'Montreal'
...
I don't have Postgress but I tried it with SQL server and it seems to work.

Recursive/Hierarchical Query Using Postgres

The table: Flight (flight_num, src_city, dest_city, dep_time, arr_time, airfare, mileage)
I need to find the cheapest fare for unlimited stops from any given source city to any given destination city. The catch is that this can involve multiple flights, so for example if I'm flying from Montreal->KansasCity I can go from Montreal->Washington and then from Washington->KansasCity and so on. How would I go about generating this using a Postgres query?
Sample Data:
create table flight(
flight_num BIGSERIAL PRIMARY KEY,
source_city varchar,
dest_city varchar,
dep_time int,
arr_time int,
airfare int,
mileage int
);
insert into flight VALUES
(101, 'Montreal', 'NY', 0530, 0645, 180, 170),
(102, 'Montreal', 'Washington', 0100, 0235, 100, 180),
(103, 'NY', 'Chicago', 0800, 1000, 150, 300),
(105, 'Washington', 'KansasCity', 0600, 0845, 200, 600),
(106, 'Washington', 'NY', 1200, 1330, 50, 80),
(107, 'Chicago', 'SLC', 1100, 1430, 220, 750),
(110, 'KansasCity', 'Denver', 1400, 1525, 180, 300),
(111, 'KansasCity', 'SLC', 1300, 1530, 200, 500),
(112, 'SLC', 'SanFran', 1800, 1930, 85, 210),
(113, 'SLC', 'LA', 1730, 1900, 185, 230),
(115, 'Denver', 'SLC', 1500, 1600, 75, 300),
(116, 'SanFran', 'LA', 2200, 2230, 50, 75),
(118, 'LA', 'Seattle', 2000, 2100, 150, 450);
[this answer is based on Gordon's]
I changed arr_time and dep_time to TIME datatypes, which makes calculations easier.
Also added result columns for total_time and waiting_time. Note: if there are any loops possible in the graph, you will need to avoid them (possibly using an array to store the path)
WITH RECURSIVE segs AS (
SELECT f0.flight_num::text as flight
, src_city, dest_city
, dep_time AS departure
, arr_time AS arrival
, airfare, mileage
, 1 as hops
, (arr_time - dep_time)::interval AS total_time
, '00:00'::interval as waiting_time
FROM flight f0
WHERE src_city = 'SLC' -- <SRC_CITY>
UNION ALL
SELECT s.flight || '-->' || f1.flight_num::text as flight
, s.src_city, f1.dest_city
, s.departure AS departure
, f1.arr_time AS arrival
, s.airfare + f1.airfare as airfare
, s.mileage + f1.mileage as mileage
, s.hops + 1 AS hops
, s.total_time + (f1.arr_time - f1.dep_time)::interval AS total_time
, s.waiting_time + (f1.dep_time - s.arrival)::interval AS waiting_time
FROM segs s
JOIN flight f1
ON f1.src_city = s.dest_city
AND f1.dep_time > s.arrival -- you can't leave until you are there
)
SELECT *
FROM segs
WHERE dest_city = 'LA' -- <DEST_CITY>
ORDER BY airfare desc
;
FYI: the changes to the table structure:
create table flight
( flight_num BIGSERIAL PRIMARY KEY
, src_city varchar
, dest_city varchar
, dep_time TIME
, arr_time TIME
, airfare INTEGER
, mileage INTEGER
);
And to the data:
insert into flight VALUES
(101, 'Montreal', 'NY', '05:30', '06:45', 180, 170),
(102, 'Montreal', 'Washington', '01:00', '02:35', 100, 180),
(103, 'NY', 'Chicago', '08:00', '10:00', 150, 300),
(105, 'Washington', 'KansasCity', '06:00', '08:45', 200, 600),
(106, 'Washington', 'NY', '12:00', '13:30', 50, 80),
(107, 'Chicago', 'SLC', '11:00', '14:30', 220, 750),
(110, 'KansasCity', 'Denver', '14:00', '15:25', 180, 300),
(111, 'KansasCity', 'SLC', '13:00', '15:30', 200, 500),
(112, 'SLC', 'SanFran', '18:00', '19:30', 85, 210),
(113, 'SLC', 'LA', '17:30', '19:00', 185, 230),
(115, 'Denver', 'SLC', '15:00', '16:00', 75, 300),
(116, 'SanFran', 'LA', '22:00', '22:30', 50, 75),
(118, 'LA', 'Seattle', '20:00', '21:00', 150, 450);
You want to use a recursive CTE for this. However, you will have to make a decision about how many flights to include. The following (untested) query shows how to do this, limiting the number of flight segments to 5:
with recursive segs as (
select cast(f.flight_num as varchar(255)) as flight, src_city, dest_city, dept_time,
arr_time, airfare, mileage, 1 as numsegs
from flight f
where src_city = <SRC_CITY>
union all
select cast(s.flight||'-->'||cast(f.flight_num as varchar(255)) as varchar(255)) as flight, s.src_city, f.dest_city,
s.dept_time, f.arr_time, s.airfare + f.airfare as airfare,
s.mileage + f.mileage as milage, s.numsegs + 1
from segs s join
flight f
on s.src_city = f.dest_city
where s.numsegs < 5
)
select *
from segs
where dest_city = <DEST_CITY>
order by airfare desc
limit 1;
Something like this:
select * from
(select flight_num, airfare from flight where src_city = ? and dest_city = ?
union
select f1.flight_num || f2.flight_num, f1.airfare+f2.airfare
from flight f1, flight f2 where f1.src_city = ? and f2.dest_city = ? and f1.dest_city = f2.src_city
union
...
) s order by airfare desc
I didn't test that as I'm leaving that for you so there might be subtle problems that require testing. This is clearly homework since no airline plans things this way. So I don't mind leaving you extra work.