How can I simplify this GROUPING SETS query into a ROLLUP? - sql

I'm trying to create a SELECT statement to calculate multiple levels of subtotals across two hierarchies. I have a Geo hierarchy which consists in region and department, and a Time hierarchy which consists in month and quarter.
At the moment, I have three tables which have the following structure:
CREATE TABLE Sales(
Time_id Date NOT NULL,
Geo_id CHAR(5) NOT NULL,
Prod_id CHAR(6) NOT NULL,
sales INT NOT NULL,
PRIMARY KEY(Time_id, Geo_id, Prod_id)
);
CREATE TABLE Time(
Time_id Date NOT NULL,
month CHAR(2) NOT NULL,
quarter CHAR(1) NOT NULL,
year CHAR(4) NOT NULL,
PRIMARY KEY(Time_id)
);
CREATE TABLE Geo(
Geo_id CHAR(5) NOT NULL,
city VARCHAR(20) NOT NULL,
dept CHAR(2) NOT NULL,
region CHAR(2) NOT NULL,
PRIMARY KEY(Geo_id)
);
I wrote the following query that returns me the right values (63 rows):
SELECT
Geo.region, Geo.dept, Time.month, Time.quarter, SUM(Sales.sales) AS sales
FROM
Geo, Sales, Time
WHERE
Sales.Time_id = Time.Time_id AND Sales.Geo_id = Geo.Geo_id
GROUP BY
GROUPING SETS(
(Geo.dept, Time.month),
(Geo.dept, Time.quarter),
(Geo.region, Time.month),
(Geo.region, Time.quarter),
Geo.dept,
Geo.region,
Time.month,
Time.quarter,
() -- thanks to #Larnu
)
I find my query to be really repetitive and I think there is a better way of doing it using ROLLUP. Can anyone please help me to simplify it, as I'm relatively new to SQL analytics?

Related

Сreating a calculated field from other tables

I want to create a calculated amount field in the sales table that will accept the total of the items related to this order, but I ran into a problem.
"subqueries cannot be used in the generated column expression."
Please tell me how you can do this, taking into account the fact that the architecture cannot be changed.
CREATE TABLE products (
id bigserial PRIMARY KEY,
name varchar(255) NOT NULL,
description varchar(255) NOT NULL,
price decimal NOT NULL
);
CREATE TABLE sales (
id bigserial PRIMARY KEY,
employee_id int NOT NULL,
created_at timestamp NOT NULL,
amount decimal GENERATED ALWAYS AS ( /* subqueris */ ) STORED
);
CREATE TABLE products_sales (
sales_id int NOT NULL,
product_id int NOT NULL,
PRIMARY KEY (sales_id, product_id)
);

PostgreSQL table join

I have a lecture attendance database as an uni project.
As one of views, i came up with idea that I could make missed_lectures view with all records of Not attended or Sick attendance types. When I am making this query, it returns 2,5k rows, which is not correct.
Query looks like this
CREATE OR REPLACE VIEW missed_lectures AS
SELECT CONCAT(s.student_name, ' ', s.student_surname) AS "Student", sc.course_title AS "Study course", atp.attendance_type AS "Attendance type", a.record_date AS "Date"
FROM students AS s, study_courses AS sc, attendance_type AS atp, attendance AS a
WHERE
s.student_id=a.student_id AND
sc.course_id=a.course_id AND
a.attendance_type_id=atp.attendance_type_id AND
a.attendance_type_id=(SELECT attendance_type_id FROM attendance_type WHERE attendacne_type='Sick') OR
a.attendance_type_id=(SELECT attendance_type_id FROM attendance_type WHERE attendance_type='Not attended')
GROUP BY s.student_name, s.student_surname, sc.course_title, atp.attendance_type, a.record_date;
This is the last query I came up with, but as I mentioned earlier, it returns 2,5k rows with incorrect data.
Can anybody spot the issue here?
EDIT:
table students is
CREATE TABLE students(
student_id serial PRIMARY KEY NOT NULL,
student_name VARCHAR(30) NOT NULL,
student_surname VARCHAR(35) NOT NULL,
matriculation_number VARCHAR(7) NOT NULL CHECK(matriculation_number ~ '[A-Z]{2}[0-9]{5}'),
faculty_id INT NOT NULL,
course INT NOT NULL,
phone_number CHAR(8) CHECK(phone_number ~ '^2{1}[0-9]{7}'),
email VARCHAR(35),
gender VARCHAR(10)
);
sample data:
INSERT INTO students (student_name, student_surname, matriculation_number, faculty_id, course, phone_number, email, gender)
VALUES
('Sandis','Bērziņš','IT19047',7,1,'25404213','sandis.berzins#gmail.com','man'),
('Einārs','Kļaviņš','IT19045',7,1,'24354654','einars.klavins#gmail.com','man'),
('Jana','Lapa','EF18034',8,2,'26224941','lapajana#inbox.lv','woman'),
('Sanija','Bērza','EF18034',8,2,'24543433','berzasanija#inbox.lv','woman'),
('Valdis','Sijāts','TF19034',4,1,'25456545','valdis.sijats#gmail.com','man'),
('Jānis','Bānis','IT17034',7,3,'24658595','banis.janis#inbox.lv','man');
table study_courses is
CREATE TABLE study_courses(
course_id serial PRIMARY KEY NOT NULL,
course_title VARCHAR(55) NOT NULL,
course_code VARCHAR(8) NOT NULL CHECK(course_code ~ '[a-zA-Z]{4}[0-9]{4}'),
credit_points INT
);
sample data:
INSERT INTO study_courses (course_title, course_code, credit_points)
VALUES
('Fundamentals of Law','JurZ2005',2),
('Database technologies II','DatZ2005',2),
('Product processing','PārZ3049',4),
('Arhitecture','Arhi3063',3),
('Forest soils','LauZ1015',4);
Table attendance_type is:
CREATE TABLE attendance_type(
attendance_type_id serial PRIMARY KEY NOT NULL,
attendance_type VARCHAR(15) NOT NULL
);
sample data:
INSERT INTO attendance_type (attendance_type)
VALUES
('Attended'),
('Not attended'),
('Late'),
('Sick');
table attendance is:
CREATE TABLE attendance(
record_id serial PRIMARY KEY NOT NULL,
student_id INT NOT NULL,
course_id INT NOT NULL,
attendance_type_id INT NOT NULL,
lecturer_id INT,
lecture_type_id INT NOT NULL,
audience_id INT NOT NULL,
record_date DATE NOT NULL
);
sample data:
INSERT INTO attendance (student_id, course_id, attendance_type_id, lecturer_id, lecture_type_id, audience_id, record_date)
VALUES
(1,2,1,1,1,14,'20-05-2020'),
(2,2,1,1,1,14,'20-05-2020'),
(6,9,1,13,2,2,'20-05-2020'),
(22,9,2,13,2,2,'20-05-2020'),
(24,9,3,13,2,2,'20-05-2020');
Hoping this will help.
The problem is that OR condition in your WHERE clause.
You have to surround it by parentheses if you're going to do it that way.
However, you're already querying the attendance_type table so you can just use it to filter for those two attendance type conditions.
SELECT
CONCAT(s.student_name, ' ', s.student_surname) AS "Student",
sc.course_title AS "Study course",
atp.attendance_type AS "Attendance type",
a.record_date AS "Date"
FROM
students AS s, study_courses AS sc,
attendance_type AS atp, attendance AS a
WHERE
s.student_id=a.student_id AND
sc.course_id=a.course_id AND
a.attendance_type_id=atp.attendance_type_id AND
-- filter for these two conditions without subqueries
atp.attendance_type in ('Sick','Not attended')
GROUP BY
CONCAT(s.student_name, ' ', s.student_surname), sc.course_title,
atp.attendance_type, a.record_date;

How to get the highest appearance value in one table, from another table?

So I have 3 tables referencing cars, assurance and accident.
I want to know the brand of vehicles who had the most accidents, compared to others.
I have tried a lot of ways to that, but mostly i only get or all the brands returned or the brand of the car that was registered the most, not the one that had most accidents
These are my tables
create table car(
n_veic bigint not null,
matric varchar(15) not null,
pais_matric text not null,
n_pess bigint not null,
tipo text not null,
cor text not null,
brand text not null,
modelo varchar(15),
primary key(n_veic),
unique(matric),
foreign key (n_pess) references pessoa(n_pess)
);
create table ensurance(
apolice bigint not null,
segurado bigint not null,
car bigint not null,
datai date not null,
dataf date not null,
cobertura numeric(10,2) not null,
primary key(apolice),
unique(segurado, veiculo),
foreign key (segurado) references pessoa(n_pess),
foreign key (car) references car(n_veic)
);
create table accident(
n_acid bigint not null,
pess_segura bigint not null,
veic_seguro bigint not null,
data date not null,
local varchar(255) not null,
descr text not null,
primary key(n_acid),
unique(n_acid, veic_seguro),
foreign key (pess_segura,veic_seguro) references ensurance(segurado, car)
This is what i tried
SELECT marca
FROM veiculo NATURAL JOIN acidente
GROUP BY marca
HAVING count (distinct n_veic)>=ALL
(SELECT count (distinct n_veic)
FROM veiculo NATURAL JOIN acidente
GROUP BY marca);
I think the logic is:
select c.marca, count(*) as num_acidentes
from acidente a join
car c
on a.veic_seguro = c.n_veic
group by c.marca
order by num_acidentes desc;
You can use fetch first 1 row only -- or whatever is appropriate for your database -- to get only one row.
Try this-
Note:
1. Try to avoid NATURAL JOIN and use specific column reference.
2. Rethink DISTINCT for count is really necessary or not.
SELECT TOP 1 marca, COUNT(DISTINCT n_veic)
FROM veiculo
NATURAL JOIN acidente
GROUP BY marca
ORDER BY COUNT(DISTINCT n_veic) DESC

How to store date and compare it in sql server?

basically, i need to make database for plants and their harvest dates.
For example, table with all the plants.
CREATE TABLE plant
(
plant_name NVARCHAR(20) NOT NULL
PRIMARY KEY ,
best_to_harvest_day INT NOT NULL ,
best_to_harvest_month NVARCHAR(15)
)
Example for plant entry: Rose 16 December
And another table called harvests
Where are multiple harvested plants and dates when they were harvested.
CREATE TABLE harvests
(
plant_name nvarchar(20) NOT NULL FOREIGN KEY REFERENCES plant(plant_name),
amount int NOT NULL,
havested_day int NOT NULL,
harvested_month nvarchar(15),
harvested year int NOT NULL
)
And this method does work, because i can make a sql query to compare which plants are harvested at their best time etc.
But isnt there a tidy way?
something like this: (using the date)
CREATE TABLE plant
(
plant_name NVARCHAR(20) NOT NULL
PRIMARY KEY ,
best_to_harvest DATE --But here should only be day and month, not year.
)
CREATE TABLE harvests
(
plant_name NVARCHAR(20) NOT NULL
FOREIGN KEY REFERENCES plant ( plant_name ) ,
amount INT NOT NULL ,
harvested DATE --But here i need full date year,day,month
)
Bottom line is that i need to compare them.
Okay, i think i can use EXTRACT(unit FROM date)
and then compare them but the question still stands, how to make plant table date not to consist of year?
First, store the date parts as numbers and check their values. This isn't perfect, but probably good enough:
CREATE TABLE plant (
plant_id int not null PRIMARY KEY,
plant_name nvarchar(20) NOT NULL UNIQUE,
best_to_harvest_day int NOT NULL,
best_to_harvest_month int not NULL,
check (best_to_harvest_day between 1 and 31),
check (best_to_harvest_month between 1 and 12)
);
Note the inclusion of an integer identity primary key. This is recommended, because integers are more efficient for foreign key references.
Then use date for the harvest:
CREATE TABLE harvests (
harvest_id int not null primary key,
plant_id int NOT NULL FOREIGN KEY REFERENCES plant(plant_id),
amount int NOT NULL,
harvested date --But here i need full date year,day,month
);
And you can do:
select h.*,
(case when p.best_to_harvest_day = day(h.harvest_day) and
p.best_to_harvest_month = month(h.harvest_month)
then 'Y' else 'N'
end)
from harvest h join
plant p
on h.plant_id = p.plant_id;

SQL Logic and Aggregate Issue

I have the following problem to solve in SQL :
d) A query that provides management information on take up of the various types of activities on offer. For each type of activity, the query should show the total number of individuals who took that type of activity and the average number of individuals taking each type of activity.
Here are my tables :
CREATE TABLE accommodations
(
chalet_number int PRIMARY KEY,
chalet_name varchar(40) NOT NULL,
no_it_sleeps number(2) NOT NULL,
indivppw number(4) NOT NULL
)
CREATE TABLE supervisors
(
supervisor_number int PRIMARY KEY,
supervisor_forename varchar(30) NOT NULL,
supervisor_surname varchar(30) NOT NULL,
mobile_number varchar(11) NOT NULL
)
CREATE TABLE visitors
(
visitor_ID int PRIMARY KEY,
group_ID int NOT NULL,
forename varchar(20) NOT NULL,
surname varchar(20) NOT NULL,
dob date NOT NULL,
gender varchar(1) NOT NULL
)
CREATE TABLE activities
(
activity_code varchar(10) PRIMARY KEY,
activity_title varchar(20) NOT NULL,
"type" varchar(20) NOT NULL
)
CREATE TABLE "groups"
(
group_ID int PRIMARY KEY,
group_leader varchar(20) NOT NULL,
group_name varchar(30)
number_in_group number(2) NOT NULL
)
CREATE TABLE bookings
(
group_ID int NOT NULL,
start_date date NOT NULL,
chalet_number int NOT NULL,
no_in_chalet number(2) NOT NULL,
start_date date NOT NULL,
end_date date NOT NULL,
CONSTRAINT bookings_pk PRIMARY KEY(group_ID, chalet_number));
CREATE TABLE schedule
(
schedule_ID int PRIMARY KEY,
activity_code varchar(10) NOT NULL,
time_of_activity number(4,2) NOT NULL,
am_pm varchar(2) NOT NULL,
"date" date NOT NULL
)
CREATE TABLE activity_bookings
(
visitor_ID int NOT NULL,
schedule_ID int NOT NULL,
supervisor_number int NOT NULL,
comments varchar(200),
CONSTRAINT event_booking_pk PRIMARY KEY(visitor_ID, schedule_ID));
ALTER TABLE visitors
ADD FOREIGN KEY (group_ID)
REFERENCES "groups"(group_ID)
ALTER TABLE Schedule
ADD FOREIGN KEY (activity_code)
REFERENCES activities(activity_code)
ALTER TABLE bookings
ADD FOREIGN KEY (group_ID)
REFERENCES "groups"(group_ID)
ALTER TABLE bookings
ADD FOREIGN KEY (chalet_number)
REFERENCES accommodations(chalet_number)
ALTER TABLE activity_bookings
ADD FOREIGN KEY (visitor_ID)
REFERENCES visitors(visitor_ID)
ALTER TABLE activity_bookings
ADD FOREIGN KEY (schedule_ID)
REFERENCES schedule(schedule_ID)
ALTER TABLE activity_bookings
ADD FOREIGN KEY (supervisor_number)
REFERENCES supervisors(supervisor_number)
I have the following solution:
SELECT activities."type", 'overalltotal' AS OT, ('overalltotal' / 'activities') AS AVG
FROM activities, schedule
WHERE 'overalltotal' = (SELECT SUM(COUNT(schedule_ID))
FROM activities, schedule
WHERE schedule.activity_code = activities.activity_code
GROUP BY activities."type"
)
AND 'activities' = (SELECT COUNT(DISTINCT activities."type")
FROM activities
)
AND schedule.activity_code = activities.activity_code
GROUP BY activities."type";
I have implemented sample data and code to check the variables above:
SELECT SUM(COUNT(schedule_ID))
FROM activities, schedule
WHERE schedule.activity_code = activities.activity_code
GROUP BY activities."type";
Result : 20
SELECT COUNT(DISTINCT activities."type")
FROM activities;
Result : 5
However when running the code :
ORA-01722: invalid number
01722. 00000 - "invalid number"
*Cause:
*Action:
EDIT:
Using Dave's Code i have the following output:
Snowboarding 15
sledding 19
Snowmobiling 6
Ice Skating 5
Skiing 24
How would i do the final part of the question?
and the average number of individuals taking each type of activity.
You must use double quotes around column names in Oracle, not single quotes. For example, "overalltotal". Single quotes are for text strings, which is why you're getting an invalid number error.
EDIT: This is probably the type of query you want to use:
SELECT activities."type", COUNT(*) AS total, COUNT(*)/(COUNT(*) OVER ()) AS "avg"
FROM activities a
JOIN schedule s ON a.activity_code=s.activity_code
JOIN activity_bookings ab ON s.schedule_ID=ab.schedule_ID
GROUP BY activities."type";
Basically, because each activity booking has one visitor id, we want to get all the activity bookings for each activity. We have to go through schedule to do that. They we group the rows by the activity type and count how many activity bookings we have for each type.