Generating "resource allocation" report using SQL - sql

I am trying to generate a report based on the below two tables:
Name Start Year End Year No. Of Students Fill Order
School-ABC 2000 2004 1 1
School-DEF 2000 2004 2 3
School-GHI 2000 2004 1 2
Name Start Year End Year Joined On
Student-1 2000 2004 01-Jan
Student-2 2000 2004 03-Jan
Student-3 2000 2004 02-Jan
Student-4 2000 2004 15-Jan
The expected output is below:
Name Start Year End Year Joined On School
Student-1 2000 2004 01-Jan School-ABC
Student-2 2000 2004 03-Jan School-DEF
Student-3 2000 2004 02-Jan School-GHI
Student-4 2000 2004 15-Jan School-DEF
Logic behind generating the data:
First table contains the list of schools and the seats available (along with the priority in which seats will be allocated to students on FCFS basis)
The second table contains data on the list of students enrolled to schools, with their admission date and the start/end year of course.
I am required to populate based on the "Fill Order", the school that is allocated to each student.
After analyzing the problem for a while, I have come to a conclusion that, this might not be achievable using select queries alone. Currently, I am planning to do it using two Cursors for each table and process the records row-by-row. Is there a better way of doing it or is it possible through select statements? TIA
Note:
The database I use is Oracle 10g
I cannot create any temporary tables or alter the data in any of the tables. I strictly have read-only access to the database.

You could use Oracle analytic functions. row_number() over () can assign a number to each student based on their join date. sum() over () can calculate the first and last student for each school. Combining the two you get:
select stud.name
, stud.startyear
, stud.endyear
, stud.joinedon
, schl.name as SchoolName
from (
select name
, coalesce(sum(NoOfStudents) over (order by FillOrder
range between unbounded preceding and 1 preceding),0)+1 FirstStudent
, sum(NoOfStudents) over (order by FillOrder) as LastStudent
from Schools
) schl
join (
select row_number() over (order by JoinedOn) as StudentRank
, Students.*
from Students
) stud
on stud.StudentRank between schl.FirstStudent and schl.LastStudent
order by
stud.name
Live example at SQL Fiddle.

Related

SQL - Monthly cumulative count of new customer based on a created date field

Thanks in advance.
I have Customer records that look like this:
Customer_Number
Create_Date
34343
01/22/2001
54554
03/03/2020
85296
01/01/2001
...
I have about a thousand of these records (customer number is unique) and the bossman wants to see how the number of customers has grown over time.
The output I need:
Customer_Count
Monthly_Bucket
7
01/01/2021
9
02/01/2021
13
03/01/2021
20
04/01/2021
The customer count is cumulative and the Monthly Bucket will just feed the graphing package to make a nice bar chart answering the question "how many customers to we have in total in a particular month and how is it growing over time".
Try the following SELECT SQL with a sub-query:
SELECT Customer_Count=
(
SELECT COUNT(s.[Create_Date])
FROM [Customer_Sales] s
WHERE MONTH(s.[Create_Date]) <= MONTH(t.[Create_Date])
), Monthly_Bucket=MONTH([Create_Date])
FROM Customer_Sales t
WHERE YEAR(t.[Create_Date]) = ????
GROUP BY MONTH(t.[Create_Date])
Where [Customer_Sales] is the sales table and ??? = your year

Can I query a aggregated query and a specific row's query when using subqueries?

I am new to SQL and I wanted to return the results of a specific value and the average of similar values. I have gotten the average part working but I'm not sure how to do the specific value part.
For more context, I have a list of carbon emissions by companies. I wanted the average of a industry based on a company's industry(working perfectly below), but I am not sure how to add the specific companies info.
Here's my query:
SELECT
year, AVG(carbon) AS AVG_carbon,
-- carbon as CompanyCarbon, <--my not working attempt
FROM
"company"."carbon" c
WHERE
LOWER(c.ticker) IN (SELECT LOWER(g4.ticker)
FROM "company"."General" g4
WHERE industry = (SELECT industry
FROM "company"."General" g3
WHERE LOWER(g3.ticker) = 'ibm.us'))
GROUP BY
c.year
ORDER BY
year ASC;
The current result is:
year avg_carbon
--------------------------------
1998 7909.0000000000000000
1999 19465.500000000000
2000 19478.000000000000
2001 182679.274509803922
2002 179821.156862745098
My desired output is:
year avg_carbon. Carbon
---------------------------------------
1998 7909.0000000000000000 343
1999 19465.500000000000 544
2000 19478.000000000000 653
2001 182679.274509803922 654
2002 179821.156862745098 644
(adding the carbon column based on "IBM" carbon
Here's my Carbon table:
ticker year carbon
-----------------------
hurn.us 2016 6282
hurn.us 2015 6549
hurn.us 2014 5897
hurn.us 2013 5300
hurn.us 2012 5340
ibm.us 2019 1496520
ibm.us 2018 1438365
Based on my limited knowledge, I think my where the statement is causing the problem. Right now I took at a company, get a list of tickers/identifiers of the same industry then create an average for each year.
I tried to just call the carbon column but I think because it's processing the list of tickers, it's not outputting the result I want.
What can I do? Also if I'm making any other mistakes you see above please let me know.
Sample data nd output do not match. So I can't say for sure but this might be the answer you are looking for.
select year, AVG(carbon) AS AVG_carbon,
max(case when lower(ticker) = 'ibm.us' then carbon else 0 end) as CompanyCarbon
from "company"."carbon" c
GROUP BY c.year
order by year ASC;
This will select max(carbon) for any year as CompanyCarbon if lower(ticker) = 'ibm.us'. Average will be calculated as you did.
To select only rows having positive value in CompanyCarbon column:
select year, AVG_carbon, CompanyCarbon
from
(
select year, AVG(carbon) AS AVG_carbon,
max(case when lower(ticker) = 'ibm.us' then carbon else 0 end) as CompanyCarbon
from "company"."carbon" c
GROUP BY c.year
order by year ASC;
)t where carbon > 0
Similar to the answer that Kazi provided you can use the FILTER syntax on an aggregate which makes it a bit more readable than the case/when IMO.
SELECT
year,
AVG(carbon) as avg_carbon,
MAX(carbon) FILTER (WHERE ticker = 'ibm.us') as company_carbon
FROM company_carbon
GROUP BY year
ORDER by year;

How to index for a self join

I'm using SAS University Edition to analyze the following table (actually has 2.5M rows in it)
p_id c_id startyear endyear
0001 3201 2008 2013
0001 2131 2013 2015
0013 3201 2006 2010
where p_id is person_id and c_id is companyid.
I want to get number of colleagues (number of persons that worked during an overlapping span at the same companies) in a certain year, so I created a table with the distinct p_ids and do the following query:
PROC SQL;
UPDATE no_colleagues AS t1
SET c2007 = (
SELECT COUNT(DISTINCT t2.p_id) - 1
FROM table AS t2
INNER JOIN table AS t3
ON t3.p_id = t1.p_id
AND t3.c_id = t2.c_id
AND t3.startyear <= t2.endyear % checks overlapping criteria
AND t3.endyear >= t2.startyear % checks overlapping criteria
AND t3.startyear <= 2007 % limits number of returns
AND t2.startyear <= 2007 % limits number of returns
);
A single lookup on an indexed query (p_id, c_id, startyear, endyear) takes 0.04 seconds. The query above takes about 1.8 seconds for a single update, and does not use any indexes.
So my question is:
How to improve the query, and/or how to use indices to make sure the self join can use the indices?
Thanks in advance.
Based on your data, I'd do something like this, but maybe you need to tweak the code to fit your needs.
First, create a table with p_id, c_id, year.
So your first guy working at the company 3201 will have 6 observations in this table, one for each worked year.
data have_count;
set have;
do i=startyear to endyear;
worked_in = i;
output;
end;
drop i startyear endyear;
run;
Now you just count and agreggate:
proc sql;
select
worked_in as year
,c_id
,count(distinct p_id) as no_colleagues
from have_count
group by 1,2;
quit;
Result:
year c_id no_colleagues
2006 3201 1
2007 3201 1
2008 3201 2
2009 3201 2
2010 3201 2
2011 3201 1
2012 3201 1
2013 2131 1
2013 3201 1
2014 2131 1
2015 2131 1
A more efficient method:
1) Create a long format table for the results rather than wide format. This will be both easier to populate and easier to work with later.
create table colleagues_by_year (
p_id int,
year int,
colleagues int
);
Now this can be populated with a single insert statement. The only trick is getting the full list of years you want in the final table. There are a few options, but since I'm not too familiar with SAS SQL I'm going to go with a very simple one: a lookup table of years, to which you can join.
create table years (
year int
);
insert into years
values (2007),(2008),...
(A more sophisticated approach would be a recursive query that found the range of all years in the input data).
Now the final insert:
insert into colleagues_by_year
select p_id,
year,
count(*)
from colleagues
join years on
years.year between colleagues.startyear and colleagues.endyear
group by p_id,year
This won't have any rows where the number of colleagues for the year would be 0. If you wanted that you could make years be a left join and only count the rows where years.year is not null.

SQL statement to match dates that are the closest?

I have the following table, let's call it Names:
Name Id Date
Dirk 1 27-01-2015
Jan 2 31-01-2015
Thomas 3 21-02-2015
Next I have the another table called Consumption:
Id Date Consumption
1 26-01-2015 30
1 01-01-2015 20
2 01-01-2015 10
2 05-05-2015 20
Now the problem is, that I think that doing this using SQL is the fastest, since the table contains about 1.5 million rows.
So the problem is as follows, I would like to match each Id from the Names table with the Consumption table provided that the difference between the dates are the lowest, so we have: Dirk consumes on 27-01-2015 about 30. In case there are two dates that have the same "difference", I would like to calculate the average consumption on those two dates.
While I know how to join, I do not know how to code the difference part.
Thanks.
DBMS is Microsoft SQL Server 2012.
I believe that my question differs from the one mentioned in the comments, because it is much more complicated since it involves comparison of dates between two tables rather than having one date and comparing it with the rest of the dates in the table.
This is how you could it in SQL Server:
SELECT Id, Name, AVG(Consumption)
FROM (
SELECT n.Id, Name, Consumption,
RANK() OVER (PARTITION BY n.Id
ORDER BY ABS(DATEDIFF(d, n.[Date], c.[Date]))) AS rnk
FROM Names AS n
INNER JOIN Consumption AS c ON n.Id = c.Id ) t
WHERE t.rnk = 1
GROUP BY Id, Name
Using RANK with PARTITION BY n.Id and ORDER BY ABS(DATEDIFF(d, n.[Date], c.[Date])) you can locate all matching records per Id: all records with the smallest difference in days are going to have rnk = 1.
Then, using AVG in the outer query, you are calculating the average value of Consumption between all matching records.
SQL Fiddle Demo

id's who have particulars years data

I have a question regarding Oracle SQL.
My data looks like this:
id year
-- ----
1 2000
1 2001
1 2002
1 2003
1 2006
1 2000
2 2001
2 2002
2 2003
3 2003
3 2005
4 2012
4 2013
I want the id's which have the years 2001, 2002, 2003.
My result set:
id
--
1
2
Please help me with this. I actually tried searching this, but couldn't figure a way to search about my particular problem.
SQL
SELECT t.id
FROM TABLE t
WHERE t.year in(2001,2002,2003)
GROUP BY t.id
Sample SqlFiddle
http://sqlfiddle.com/#!2/4ec9f/2/0
Explanation
You want to filter your data set to only show rows with certain years, so that is what you put in the where clause WHERE t.year in(2001,2002,2003).
Since a single id can be in multiple years, your result set would contain duplicates. To remove the duplicates you could GROUP BY the ID or use the DISTINCT statement to only show unique elements.
UPDATE
Based on comments, here's a version that will only display id's that have all three years. We use DISTINCT t.YEAR to avoid counting id's that perhaps would have a single year repeated multiple times. The HAVING COUNT(DISTINCT t.YEAR) = 3 part ensures that we only include id's that have all three years.
SELECT t.id
FROM years t
WHERE t.year in(2001,2002,2003)
GROUP BY t.id
HAVING COUNT(DISTINCT t.YEAR) = 3
Updated sqlFiddle, which includes a data set where id of 3 has two rows for 2003 to show off the logic that only counts unique years for an ID.
select distinct id
from table
where year in(2001,2002,2003)