Finding all instances where a foreign key appears multiple times grouped by month - sql

I am not too familiar with SQL, and I have been tasked with something that I quite frankly have no clue how to go about it.
I am just going to simplify the tables to the point where only the necessary fields are taken into consideration.
The tables look as follows.
Submission(course(string), student(foreign_key), date-submitted)
Student(id)
What I need to do is produce a table of active students per month, per course with a total. An active student being anyone who has more than 4 submissions in the month. I am only looking at specific courses, so I will need to hard code the values that I need, for the sake of the example "CourseA" and "CourseB"
The result should be as follows
month | courseA | CourseB | Total
------------------------------------------
03/2020 50 27 77
02/2020 25 12 37
01/2020 43 20 63
Any help would be greatly apreciated

You can do this with two levels of aggregation: first by month, course and student (while filtering on students having more than 4 submissions), then by month (while pivoting the dataset):
select
month_submitted,
count(*) filter(where course = 'courseA') active_students_in_courseA,
count(*) filter(where course = 'courseB') active_students_in_courseB,
count(*) total
from (
select
date_trunc('month', date_submitted) month_submitted,
course,
student_id,
count(*) no_submissions
from submission
where course in ('courseA', 'courseB')
group by 1, 2, 3
having count(*) > 4
) t
group by 1

You could do subqueries using the WITH keyword like this:
WITH monthsA AS (
SELECT to_char(date-submitted, "MM/YYYY") as month, course, COUNT(*) as students
FROM Submission
WHERE course = 'courseA'
GROUP BY 1, 2
), monthsB AS (
SELECT to_char(date-submitted, "MM/YYYY") as month, course, COUNT(*) AS students
FROM Submission
WHERE course = 'courseB'
GROUP BY 1, 2
)
SELECT ma.month,
COALESE(ma.students, 0) AS courseA,
COALESCE(mb.students) AS courseB,
COALESCE(ma.students, 0) + COALESCE(mb.students, 0) AS Total
FROM monthsA ma
LEFT JOIN monthsB mb ON ma.month = mb.month
ORDER BY 1 DESC

Related

How to consecutively count everything greater than or equal to itself in SQL?

Let's say if I have a table that contains Equipment IDs of equipments for each Equipment Type and Equipment Age, how can I do a Count Distinct of Equipment IDs that have at least that Equipment Age.
For example, let's say this is all the data we have:
equipment_type
equipment_id
equipment_age
Screwdriver
A123
1
Screwdriver
A234
2
Screwdriver
A345
2
Screwdriver
A456
2
Screwdriver
A567
3
I would like the output to be:
equipment_type
equipment_age
count_of_equipment_at_least_this_age
Screwdriver
1
5
Screwdriver
2
4
Screwdriver
3
1
Reason is there are 5 screwdrivers that are at least 1 day old, 4 screwdrivers at least 2 days old and only 1 screwdriver at least 3 days old.
So far I was only able to do count of equipments that falls within each equipment_age (like this query shown below), but not "at least that equipment_age".
SELECT
equipment_type,
equipment_age,
COUNT(DISTINCT equipment_id) as count_of_equipments
FROM equipment_table
GROUP BY 1, 2
Consider below join-less solution
select distinct
equipment_type,
equipment_age,
count(*) over equipment_at_least_this_age as count_of_equipment_at_least_this_age
from equipment_table
window equipment_at_least_this_age as (
partition by equipment_type
order by equipment_age
range between current row and unbounded following
)
if applied to sample data in your question - output is
Use a self join approach:
SELECT
e1.equipment_type,
e1.equipment_age,
COUNT(*) AS count_of_equipments
FROM equipment_table e1
INNER JOIN equipment_table e2
ON e2.equipment_type = e1.equipment_type AND
e2.equipment_age >= e1.equipment_age
GROUP BY 1, 2
ORDER BY 1, 2;
GROUP BY restricts the scope of COUNT to the rows in the group, i.e. it will not let you reach other rows (rows with equipment_age greater than that of the current group). So you need a subquery or windowing functions to get those. One way:
SELECT
equipment_type,
equipment_age,
(Select COUNT(*)
from equipment_table cnt
where cnt.equipment_type = a.equipment_type
AND cnt.equipment_age >= a.equipment_age
) as count_of_equipments
FROM equipment_table a
GROUP BY 1, 2, 3
I am not sure if your environment supports this syntax, though. If not, let us know we will find another way.

GROUP BY one column, then GROUP BY another column

I have a database table t with a sales table:
ID
TYPE
AGE
1
B
20
1
BP
20
1
BP
20
1
P
20
2
B
30
2
BP
30
2
BP
30
3
P
40
If a person buys a bundle it appears the bundle sale (TYPE B) and the different bundle products (TYPE BP), all with the same ID. So a bundle with 2 products appears 3 times (1x TYPE B and 2x TYPE BP) and has the same ID.
A person can also buy any other product in that single sale (TYPE P), which has also the same ID.
I need to calculate the average/min/max age of the customers but the multiple entries per sale tamper with the correct calculation.
The real average age is
(20 + 30 + 40) / 3 = 30
and not
(20+20+20+20 + 30+30+30 + 40) / 8 = 26,25
But I don't know how I can reduce the sales to a single row entry AND get the 4 needed values?
Do I need to GROUP BY twice (first by ID, then by AGE?) and if yes, how can I do it?
My code so far:
SELECT
AVERAGE(AGE)
, MIN(AGE)
, MAX(AGE)
, MEDIAN(AGE)
FROM t
but that does count every row.
Assuming the age is the same for all rows with the same ID (which in itself indicates a normalisation problem), you can use nest aggregation:
select avg(min(age)) from sales
group by id
AVG(MIN(AGE))
-------------
30
SQL Fiddle
The example in the documentation is very similar; and is explained as:
This calculation evaluates the inner aggregate (MAX(salary)) for each group defined by the GROUP BY clause (department_id), and aggregates the results again.
So for your version:
This calculation evaluates the inner aggregate (MIN(age)) for each group defined by the GROUP BY clause (id), and aggregates the results again.
It doesn't really matter whether the inner aggregate is min or max - again, assuming they are all the same - it's just to get a single value per ID, which can then be averaged.
You can do the same for the other values in your original query:
select
avg(min(age)) as avg_age,
min(min(age)) as min_age,
max(min(age)) as max_age,
median(min(age)) as med_age
from sales
group by id;
AVG_AGE MIN_AGE MAX_AGE MED_AGE
------- ------- ------- -------
30 20 40 30
Or if you prefer you could get the one-age-per-ID values once ina CTE or subquery and apply the second layer of aggregation to that:
select
avg(age) as avg_age,
min(age) as min_age,
max(age) as max_age,
median(age) as med_age
from (
select min(age) as age
from sales
group by id
);
which gets the same result.
SQL Fiddle

Compare row with every other row in the table except the previous ones, how can I do this?

How can I write a query that returns the sum of ages for a pair of people. I want to compare the current row with every other rows in the database except the previous ones? For example, I have a Person table and the table has 3 records (In my real table, I have a lot more). Person 1's age should be compared with Person 2 and Person 3 and Person 2 should be compared with Person 3. How can I accomplish this in query form?
You can generate a row with all the values using an inequality join:
select p1.*, p2.*
from person p1 join
person p2
on p1.id < p2.id;
What #gordon-linoff wrote as an answer for the join was correct. It's just missing the total calculation. Here is how the total can be accomplished (I had to open a new answer since I couldn't edit Gordon's answer, sorry):
SELECT
p1.id
, p1.name
, p2.id
, p2.name
, (p1.age + p2.age) TotalAge
FROM
person p1
JOIN person p2 ON p1.id < p2.id;
This compares [person.id =1 with person.id =2, 3, 4, ...], [person.id = 2 with person.id =3, 4, ...] and so on. The last p1.id will not be compared this way as there is no further p2.id greater (but that person will be included in all other comparisons as p2.id because it is larger than all others). This should satisfy the matching/joining requirement asked.
Whenever you want to do these kind of comparisons, its assumed that you have an order defined. In this case it appears to be person ID is the order.
You can use analytical function with a windowing clause to get the desired result.
You use SUM as the analytic function and consider the window of the rows between the next row all the way till the last row as per your order.
Thus the query becomes
WITH
persons
AS
(SELECT 1 AS personid, 'Person 1' AS name, 23 AS age FROM DUAL
UNION
SELECT 2 AS personid, 'Person 2' AS name, 34 AS age FROM DUAL
UNION
SELECT 3 AS personid, 'Person 3' AS name, 30 AS age FROM DUAL
UNION
SELECT 4 AS personid, 'Person 4' AS name, 28 AS age FROM DUAL)
SELECT personid,
name,
SUM (age)
OVER (ORDER BY personid ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)
AS subsequent_sum_age
FROM persons;
The ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING is your windowing clause which tells Oracle which rows to consider for summation.
The result of this is
PERSONID NAME AGE SUBSEQUENT_SUM_AGE
1 Person 1 23 92
2 Person 2 34 58
3 Person 3 30 28
4 Person 4 28

Case Statement for multiple criteria

I would like to ignore some of the results of my query as for all intents and purposes, some of the results are a duplicate, but based on the way the request was made, we need to use this hierarchy and although we are seeing different 'Company_Name' 's, we need to ignore one of the results.
Query:
SELECT
COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM
some_table AS A12
GROUP BY
2
ORDER BY
3 ASC, 2 ASC
This code omits half a doze joins and where statements that are not germane to this question.
Results:
Customer_Name_Count Company_Name Total_Sales
-------------------------------------------------------------
1 3 Blockbuster 1,000
2 6 Jimmy's Bar 1,500
3 6 Jimmy's Restaurant 1,500
4 9 Impala Hotel 2,000
5 12 Sports Drink 2,500
In the above set, we can see that numbers 2 & 3 have the same count and the same total_sales number and similar company names. Is there a way to create a case statement that takes these 3 factors into consideration and then drops one or the other for Jimmy's enterprises? The other issue is that this has to be variable as there are other instances where this happens. And I would only want this to happen if the count and sales number match each other with a similar name in the company name.
Desired result:
Customer_Name_Count Company_Name Total_Sales
--------------------------------------------------------------
1 3 Blockbuster 1,000
2 6 Jimmy's Bar 1,500
3 9 Impala Hotel 2,000
4 12 Sports Drink 2,500
Looks like other answers are accurate based on assumption that Company_IDs are the same for both.
If Company_IDs are different for both Jimmy's Bar and Jimmy's Restaurant then you can use something like this. I suggest you get functional users involved and do some data clean-up else you'll be maintaining this every time this issue arise:
SELECT
COUNT(DISTINCT CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END) AS Customer_Name_Count
,CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END AS Company_Name
,SUM(A12.Total_Sales) AS Total_Sales
FROM some_table er
GROUP BY CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END
Your problem is that the joins you are using are multiplying the number of rows. Somewhere along the way, multiple names are associated with exactly the same entity (which is why the numbers are the same). You can fix this by aggregating by the right id:
SELECT COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
MAX(Company_Name) as Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM some_table AS A12
GROUP BY Company_id -- I'm guessing the column is something like this
ORDER BY 3 ASC, 2 ASC;
This might actually overstate the sales (I don't know). Better would be fixing the join so it only returned one name. One possibility is that it is a type-2 dimension, meaning that there is a time component for values that change over time. You may need to restrict the join to a single time period.
You need to have function to return a common name for the companies and then use DISTINCT:
SELECT DISTINCT
Customer_Name_Count,
dbo.GetCommonName(Company_Name) as Company_Name,
Total_Sales
FROM dbo.theTable
You can try to use ROW_NUMBER with window function to make row number by Customer_Name_Count and Total_Sales then get rn = 1
SELECT * FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY Customer_Name_Count,Total_Sales ORDER BY Company_Name) rn
FROM (
SELECT
COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM
some_table AS A12
GROUP BY
Company_Name
)t1
)t1
WHERE rn = 1

how to classify employers into three columns based on a condition?

I wish to classify employers who took the track into three different columns as below, based on condition of the no. of days they took in completing the courses, using the DB column lrn_complt tells the no. of days taken :
no of emp who completed the track in
0-30days 30-60days 60-90days
1st column 2nd column 3rd column
Need Sql for this or if you can say logic too it may help ???
You'll need to post create table and insert statements for anyone to understand your problem correctly. Your input table, data and expected output and your Target RDBMS at the very least.
http://tkyte.blogspot.com/2005/06/how-to-ask-questions.html
Assuming you have two columns like this...
You can try inline queries like below...
Select id,
(select count(*) from courses where days between 0 and 30) 0_to_30_days,
(select count(*) from courses where days between 31 and 60) 0_to_30_days
(select count(*) from courses where days between 61 and 90) 0_to_30_days
from courses;
Basically, you need to make 3 subqueries inside one master query:
SELECT
(SELECT COUNT(*) FROM EMPLOYER WHERE LRN_COMPLT BETWEEN 0 AND 30) AS COLUMN1,
(SELECT COUNT(*) FROM EMPLOYER WHERE LRN_COMPLT BETWEEN 31 AND 60) AS COLUMN2,
(SELECT COUNT(*) FROM EMPLOYER WHERE LRN_COMPLT BETWEEN 61 AND 90) AS COLUMN3
FROM DUAL
Looks like you need a PIVOT.
Select id,
COUNT(CASE WHEN lrn_complt between 0 and 30 THEN 1 END) Group1,
COUNT(CASE WHEN lrn_complt between 31 and 60 THEN 1 END) Group2,
COUNT(CASE WHEN lrn_complt between 61 and 90 THEN 1 END) Group3
from courses;