SQL Join with SUM and Group By - sql

I have an application in MVC5 C#. I want to add a feature where the admin can see how many parking spaces each parking lot has, along with how many spots are currently taken vs available.
Let's say, I have two tables (Lots and Staff).
the 'Lots' Table is the name of the existing parking lots and a count of how many spaces each lot has:
Name | Count
A | 200
B | 450
C | 375
The 'Staff' Table contains each person's ID, and an int column for each lot. These columns are either 1 (employee is assigned to this lot) or 0 (employee not assigned to this lot
StaffID | LotA | Lot B | Lot C
7264 | 0 | 1 | 0
2266 | 0 | 0 | 1
3344 | 1 | 0 | 0
4444 | 0 | 1 | 0
In the above scenario, the desired output would be . . . .
Lot | Total | Used | Vacant
A | 200 | 1 | 199
B | 450 | 2 | 448
C | 375 | 1 | 374
I have only done simple joins, in the past, and I am struggling to figure out whether it is best to use COUNT with a Where column = 1, or a Sum. But I also choke on the group by and get lost.
I have tried similar variations of the following (with no success)
SELECT Lots.Name,
Lots.Count,
SUM(Staff.LotA) As A_Occupied,
SUM(Staff.LotB) as B_Occupied,
SUM(Staff.LotC) as C_Occupied
FROM Lots
CROSS JOIN Staff
GROUP BY Lots.Name
In theory, I expected this to yield a listing of the lots, how many spaces in each lot, and how many occupied in each lot, grouped by the lot name.
This generates an error in the Group By because I am selecting columns that are not included in the Group by, but that just confuses me even more as to how to Group By for my needs.
In general, I have a bad habit of trying to over complicate things and so I apologize ahead of time. I am not a 'classically trained coder'. I did my best to organize the question and make it understandable

Firstly, your design would work better if it was properly normalised - your Staff table should have a single column representing the Lot that's used by each staff member (if you had 100 lots, would you have 100 columns?). Then there would be no need to pivot the data from columns into the rows you need.
With your current schema, you could use a cross apply to pivot the columns into rows which can then be correlated to each Lot name:
select l.[name] as Lot, q.Qty as Used, l.count - q.Qty as Vacant
from lots l
cross apply (
select Sum(LotA) A, Sum(LotB) B, Sum(LotC) C
from Staff s
)s
cross apply (
select Qty from (
select * from (values ('A', s.A),('B', s.B),('C', s.C))v(Lot, Qty)
)x
where x.Lot = l.[name]
)q;
See Demo Fiddle

is a layman solution. Hope it helps.
with parking as (
select 'A' as lot ,sum(case when lota >0 then 1 else 0 end ) as lots from staff
union all
select 'B' as lotb , sum(case when lotb >0 then 1 else 0 end ) as lots from staff
union all
select 'C' as lotc, sum(case when lotc >0 then 1 else 0 end ) as lots from staff
)
select a.name, a.count, b.lots, a.count-b.lots as places
from lots a join parking b on a.name=b.lot

Related

How to query sum total of transitively linked child transactions from database?

I got this one assignment which has a lot of weird stuff to do. I need to create an API for storing transaction details and do some operations. One such operation involves retrieving a sum of all transactions that are transitively linked by their parent_id to $transaction_id.
If A is the parent of B and C, and C is the parent of D and E, then
sum(A) = A + B + C + D + E
note: not just immediate child transactions.
I have this sample data in the SQL database as given below.
MariaDB [test_db]> SELECT * FROM transactions;
+------+-------+----------+---------+
| t_id | t_pid | t_amount | t_type |
+------+-------+----------+---------+
| 1 | NULL | 10000.00 | default |
| 2 | NULL | 25000.00 | cars |
| 3 | 1 | 30000.00 | bikes |
| 4 | NULL | 10000.00 | bikes |
| 5 | 3 | 15000.00 | bikes |
+------+-------+----------+---------+
5 rows in set (0.000 sec)
MariaDB [test_db]>
where t_id is a unique transaction_id and t_pid is a parent_id which is either null or an existing t_id.
so, when I say sum(t_amount) where t_id=1, I want the result to be
sum(1+3+5) -> sum(10000 + 30000 + 15000) = 55000.
I know I can achieve this in a programmatic way with some recursion which will do repeated query operations and add the sum. But, that will give me poor performance if the data is very large say, millions of records.
I want to know if there is any possibility of achieving this with a complex query. And if yes, then how to do it?
I have very little knowledge and experience with databases. I tried with what I know and I couldn't do it. I tried searching for any similar queries available here and I didn't find any.
With what I have researched, I guess I can achieve this with stored procedures and using the HAVING clause. Let me know if I am right there and help me do this.
So, any sort of help will be appreciated.
Thanks in advance.
You need a recursive CTE:
with recursive cte as (
select t_id as ultimate_id, t_id, t_amount
from tranctions t
where t_id = 1
union all
select cte.ultimate_id, t.t_id, t.amount
from cte join
transactions tc
on tc.p_id = cte.t_id
)
select ultimate_id, sum(t_amount)
from cte
group by ultimate_id;

SQL - Representing SUM's after using CASE to transform STR -> INT

I am sorry for what may be a long post in advance.
Background:
I am using Rational Team Concert (RTC) which stores work item data in conjunction with Jazz Reporting Service to create reports. Using the Report Builder tool, it allows you to write your own queries to pull data as a table, and has its own interface to represent the table as a graph.
There is not much options for of graphing; the chart type defaults as a count, unless you specify it to show a sum. In order to graph by sum, the data must be a number rather than a string. By default, the Report Builder assumes all variables in the SELECT statement are strings.
The data which I will be using are a bunch of work items. Each work item is associated to a team (A, B) and has a work estimation number (count1, count2).
Item # | Team | Work |
------------------------
123 | A | count1 |
------------------------
124 | A | count2 |
------------------------
125 | B | count2 |
------------------------
....
Problem:
Since the work estimation is entered as a Tag, the first step was to use a CATCH WHEN block when using SELECT to transform count1 -> 1, and count2 -> 2 (the string tag to an actual number which can be summed). This resulted in a table with numbers 1 and 2 in place of the typed tag (good so far).
Item # | Team | Work |
------------------------
123 | A | 1 |
------------------------
124 | A | 2 |
------------------------
125 | B | 2 |
------------------------
....
The problem is that I am trying to graph by sum, which means getting the tool to identify the variables in the SELECT statement as numbers, except for some reason any variable I declare in a SELECT statement is always viewed as a string (The tool has a table of the current columns i.e. variables in the SELECT, along with that the tool identifies as its variable type).
Attempted Solutions:
The first query I did was to return a table of each work item with its team name and work estimate
SELECT T1.NAME,
(CASE WHEN T1.TAGs='count1' THEN 1 ELSE 2 END) AS WORK
FROM RIDW.VW_REQUEST T1
WHERE T1.PROJECT_ID = 73
Which resulted in
Team | Work |
----------------
A | 1 |
----------------
A | 2 |
----------------
B | 2 |
----------------
....
but the tool still sees the numbers as strings. I then tried explicitly casting the CASE to an integer, but resulted in the same issue
...
CAST(CASE WHEN T1.TAGs='count1' THEN 1 ELSE 2 END AS Integer) AS WORK
...
Which again the tool still represents as a string.
Current Goal:
As I cannot confirm if the tool has an underlying problem, compatibility issues with queries, etc. What I believe will work now would be to return a table with 2 rows: The sum of the work for each team
|Sum of 1's and 2's |
-----------------------------
Team A | SUM(1) + SUM(2) |
-----------------------------
Team B | SUM(1) + SUM(2) |
-----------------------------
What I am having trouble with is using sub queries to use SUM to sum the data. When I try
SUM(CASE WHEN ... END) AS TIME2 I get an error that "Column modifiers AVG and SUM apply only to number attributes". This has me thinking that I need to have a sub query which returns the column after the CASE, and then SUM that, but I am sailing into uncharted waters and can't seem to get the syntax to work.
I understand that a post like this would be better off on the product help forum. I have tried asking around but cannot get any help. The solution I am proposing of returning the 2 row/column table should bypass any issues the software may have, but I need help sub-querying the SUM when using a case.
I appreciate your time and help!
EDIT 1:
Below is the full query code which preforms the CASE correctly, but still causes with the interpreted type by the tool:
SELECT
T1.Name,
CAST(CASE WHEN T1.TAGS='|release_points_1|' THEN 1 ELSE (CASE WHEN T1.TAGS='|release_points_2|' THEN 2 ELSE 0 END) END AS Integer) AS TAG,
FROM RIDW.VW_REQUEST T1
WHERE T1.PROJECT_ID = 73
AND
(T1.ISSOFTDELETED = 0) AND
(T1.REQUEST_ID <> -1 AND T1.REQUEST_ID IS NOT NULL
This small adjustment to your current query should work:
SELECT
T1.Name,
SUM(CAST(CASE WHEN T1.TAGS='|release_points_1|' THEN 1 ELSE (CASE WHEN T1.TAGS='|release_points_2|' THEN 2 ELSE 0 END) END AS Integer)) AS TAG,
FROM RIDW.VW_REQUEST T1
WHERE T1.PROJECT_ID = 73
AND
(T1.ISSOFTDELETED = 0) AND
(T1.REQUEST_ID <> -1 AND T1.REQUEST_ID IS NOT NULL
GROUP BY T1.Name

Find spectators that have seen the same shows (match multiple rows for each)

For an assignment I have to write several SQL queries for a database stored in a PostgreSQL server running PostgreSQL 9.3.0. However, I find myself blocked with last query. The database models a reservation system for an opera house. The query is about associating the a spectator the other spectators that assist to the same events every time.
The model looks like this:
Reservations table
id_res | create_date | tickets_presented | id_show | id_spectator | price | category
-------+---------------------+---------------------+---------+--------------+-------+----------
1 | 2015-08-05 17:45:03 | | 1 | 1 | 195 | 1
2 | 2014-03-15 14:51:08 | 2014-11-30 14:17:00 | 11 | 1 | 150 | 2
Spectators table
id_spectator | last_name | first_name | email | create_time | age
---------------+------------+------------+----------------------------------------+---------------------+-----
1 | gonzalez | colin | colin.gonzalez#gmail.com | 2014-03-15 14:21:30 | 22
2 | bequet | camille | bequet.camille#gmail.com | 2014-12-10 15:22:31 | 22
Shows table
id_show | name | kind | presentation_date | start_time | end_time | id_season | capacity_cat1 | capacity_cat2 | capacity_cat3 | price_cat1 | price_cat2 | price_cat3
---------+------------------------+--------+-------------------+------------+----------+-----------+---------------+---------------+---------------+------------+------------+------------
1 | madama butterfly | opera | 2015-09-05 | 19:30:00 | 21:30:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
2 | don giovanni | opera | 2015-09-12 | 19:30:00 | 21:45:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
So far I've started by writing a query to get the id of the spectator and the date of the show he's attending to, the query looks like this.
SELECT Reservations.id_spectator, Shows.presentation_date
FROM Reservations
LEFT JOIN Shows ON Reservations.id_show = Shows.id_show;
Could someone help me understand better the problem and hint me towards finding a solution. Thanks in advance.
So the result I'm expecting should be something like this
id_spectator | other_id_spectators
-------------+--------------------
1| 2,3
Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.
Note based on comments: Wanted to make clear that this answer may be of limited use as it was answered in the context of SQL-Server (tag was present at the time)
There is probably a better way to do it, but you could do it with the 'stuff 'function. The only drawback here is that, since your ids are ints, placing a comma between values will involve a work around (would need to be a string). Below is the method I can think of using a work around.
SELECT [id_spectator], [id_show]
, STUFF((SELECT ',' + CAST(A.[id_spectator] as NVARCHAR(10))
FROM reservations A
Where A.[id_show]=B.[id_show] AND a.[id_spectator] != b.[id_spectator] FOR XML PATH('')),1,1,'') As [other_id_spectators]
From reservations B
Group By [id_spectator], [id_show]
This will show you all other spectators that attended the same shows.
Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.
In other words, you want a list of ...
all spectators that have seen all the shows that a given spectator has seen (and possibly more than the given one)
This is a special case of relational division. We have assembled an arsenal of basic techniques here:
How to filter SQL results in a has-many-through relation
It is special because the list of shows each spectator has to have attended is dynamically determined by the given prime spectator.
Assuming that (d_spectator, id_show) is unique in reservations, which has not been clarified.
A UNIQUE constraint on those two columns (in that order) also provides the most important index.
For best performance in query 2 and 3 below also create an index with leading id_show.
1. Brute force
The primitive approach would be to form a sorted array of shows the given user has seen and compare the same array of others:
SELECT 1 AS id_spectator, array_agg(sub.id_spectator) AS id_other_spectators
FROM (
SELECT id_spectator
FROM reservations r
WHERE id_spectator <> 1
GROUP BY 1
HAVING array_agg(id_show ORDER BY id_show)
#> (SELECT array_agg(id_show ORDER BY id_show)
FROM reservations
WHERE id_spectator = 1)
) sub;
But this is potentially very expensive for big tables. The whole table hast to be processes, and in a rather expensive way, too.
2. Smarter
Use a CTE to determine relevant shows, then only consider those
WITH shows AS ( -- all shows of id 1; 1 row per show
SELECT id_spectator, id_show
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
)
SELECT sub.id_spectator, array_agg(sub.other) AS id_other_spectators
FROM (
SELECT s.id_spectator, r.id_spectator AS other
FROM shows s
JOIN reservations r USING (id_show)
WHERE r.id_spectator <> s.id_spectator
GROUP BY 1,2
HAVING count(*) = (SELECT count(*) FROM shows)
) sub
GROUP BY 1;
#> is the "contains2 operator for arrays - so we get all spectators that have at least seen the same shows.
Faster than 1. because only relevant shows are considered.
3. Real smart
To also exclude spectators that are not going to qualify early from the query, use a recursive CTE:
WITH RECURSIVE shows AS ( -- produces exactly 1 row
SELECT id_spectator, array_agg(id_show) AS shows, count(*) AS ct
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
GROUP BY 1
)
, cte AS (
SELECT r.id_spectator, 1 AS idx
FROM shows s
JOIN reservations r ON r.id_show = s.shows[1]
WHERE r.id_spectator <> s.id_spectator
UNION ALL
SELECT r.id_spectator, idx + 1
FROM cte c
JOIN reservations r USING (id_spectator)
JOIN shows s ON s.shows[c.idx + 1] = r.id_show
)
SELECT s.id_spectator, array_agg(c.id_spectator) AS id_other_spectators
FROM shows s
JOIN cte c ON c.idx = s.ct -- has an entry for every show
GROUP BY 1;
Note that the first CTE is non-recursive. Only the second part is recursive (iterative really).
This should be fastest for small selections from big tables. Row that don't qualify are excluded early. the two indices I mentioned are essential.
SQL Fiddle demonstrating all three.
It sounds like you have one half of the total question--determining which id_shows a particular id_spectator attended.
What you want to ask yourself is how you can determine which id_spectators attended an id_show, given an id_show. Once you have that, combine the two answers to get the full result.
So the final answer I got, looks like this :
SELECT id_spectator, id_show,(
SELECT string_agg(to_char(A.id_spectator, '999'), ',')
FROM Reservations A
WHERE A.id_show=B.id_show
) AS other_id_spectators
FROM Reservations B
GROUP By id_spectator, id_show
ORDER BY id_spectator ASC;
Which prints something like this:
id_spectator | id_show | other_id_spectators
-------------+---------+---------------------
1 | 1 | 1, 2, 9
1 | 14 | 1, 2
Which suits my needs, however if you have any improvements to offer, please share :) Thanks again everybody!

How to retrieve data from different rows of the same table based on different criteria

I'm trying to write a plain SQL statement for building an Oracle report but I'm stuck at some point. x_request table stores the requests made and different tasks related to specific requests that have been done are stored in x_request_work_log. To summarize the structure of these tables:
X_request
-id
-name
-requester
-request_date
x_request_work_log
-id
-request_id (foreign key)
-taskId
-start_date
-end_date
Now let's assume that these tables are filled with sample data as follows:
x_request
id name requester request_date
1 firstReq John 01/01/2012
2 secondReq Steve 21/01/2012
x_request_work_log
id requestId taskId startDate endDate
1 1 0 01/01/2012 03/01/2012
2 1 1 04/01/2012 04/01/2012
3 1 2 05/01/2012 15/01/2012
4 2 0 24/01/2012 02/02/2012
The template of my report is as follows:
requestName timeSpent(task(0)) timeSpent(task(1)) timeSpent(task(2))
| | | | | | | |
So, that's where I'm stuck. I need a Sql Select statement that will return each row in the formatted way as described above. How can i retrieve and display the start and end dates of different tasks. Btw timeSpent = endDate(task(x)) - startDate(task(x))
Note: Using different select subqueries for each spent time calculation is not an option due to performance constraints. There must be another way.
It sounds like you just want something like
SELECT r.name request_name,
SUM( (CASE WHEN l.taskId = 0
THEN l.endDate - l.StartDate
ELSE 0
END) ) task0_time_spent,
SUM( (CASE WHEN l.taskId = 1
THEN l.endDate - l.StartDate
ELSE 0
END) ) task1_time_spent,
SUM( (CASE WHEN l.taskId = 2
THEN l.endDate - l.StartDate
ELSE 0
END) ) task2_time_spent
FROM x_request_work_log l
JOIN x_request r ON (l.requestId = r.Id)
GROUP BY r.name
If you happen to be using 11g, you could also use the PIVOT operator.
If you need to display all members of a group in one row, you can accomplish this in MySQL with the GROUP_CONCAT operator (I don't know what the equivalent is in Oracle):
> SELECT requestID,
GROUP_CONCAT(DATEDIFF(endDate,startDate)) AS length
FROM request_work_log
GROUP BY requestId;
+-----------+--------+
| requestID | length |
+-----------+--------+
| 1 | 2,0,10 |
| 2 | 9 |
+-----------+--------+
(and then add in the inner join to your other table to replace requestID with the request name)

SQL: Find rows where field value differs

I have a database table structured like this (irrelevant fields omitted for brevity):
rankings
------------------
(PK) indicator_id
(PK) alternative_id
(PK) analysis_id
rank
All fields are integers; the first three (labeled "(PK)") are a composite primary key. A given "analysis" has multiple "alternatives", each of which will have a "rank" for each of many "indicators".
I'm looking for an efficient way to compare an arbitrary number of analyses whose ranks for any alternative/indicator combination differ. So, for example, if we have this data:
analysis_id | alternative_id | indicator_id | rank
----------------------------------------------------
1 | 1 | 1 | 4
1 | 1 | 2 | 6
1 | 2 | 1 | 3
1 | 2 | 2 | 9
2 | 1 | 1 | 4
2 | 1 | 2 | 7
2 | 2 | 1 | 4
2 | 2 | 2 | 9
...then the ideal method would identify the following differences:
analysis_id | alternative_id | indicator_id | rank
----------------------------------------------------
1 | 1 | 2 | 6
2 | 1 | 2 | 7
1 | 2 | 1 | 3
2 | 2 | 1 | 4
I came up with a query that does what I want for 2 analysis IDs, but I'm having trouble generalizing it to find differences between an arbitrary number of analysis IDs (i.e. the user might want to compare 2, or 5, or 9, or whatever, and find any rows where at least one analysis differs from any of the others). My query is:
declare #analysisId1 int, #analysisId2 int;
select #analysisId1 = 1, #analysisId2 = 2;
select
r1.indicator_id,
r1.alternative_id,
r1.[rank] as Analysis1Rank,
r2.[rank] as Analysis2Rank
from rankings r1
inner join rankings r2
on r1.indicator_id = r2.indicator_id
and r1.alternative_id = r2.alternative_id
and r2.analysis_id = #analysisId2
where
r1.analysis_id = #analysisId1
and r1.[rank] != r2.[rank]
(It puts the analysis values into additional fields instead of rows. I think either way would work.)
How can I generalize this query to handle many analysis ids? (Or, alternatively, come up with a different, better query to do the job?) I'm using SQL Server 2005, in case it matters.
If necessary, I can always pull all the data out of the table and look for differences in code, but a SQL solution would be preferable since often I'll only care about a few rows out of thousands and there's no point in transferring them all if I can avoid it. (However, if you have a compelling reason not to do this in SQL, say so--I'd consider that a good answer too!)
This will return your desired data set - Now you just need a way to pass the required analysis ids to the query. Or potentially just filter this data inside your application.
select r.* from rankings r
inner join
(
select alternative_id, indicator_id
from rankings
group by alternative_id, indicator_id
having count(distinct rank) > 1
) differ on r.alternative_id = differ.alternative_id
and r.indicator_id = differ.indicator_id
order by r.alternative_id, r.indicator_id, r.analysis_id, r.rank
I don't know wich database you are using, in SQL Server I would go like this:
-- STEP 1, create temporary table with all the alternative_id , indicator_id combinations with more than one rank:
select alternative_id , indicator_id
into #results
from rankings
group by alternative_id , indicator_id
having count (distinct rank)>1
-- STEP 2, retreive the data
select a.* from rankings a, #results b
where a.alternative_id = b.alternative_id
and a.indicator_id = b. indicator_id
order by alternative_id , indicator_id, analysis_id
BTW, THe other answers given here need the count(distinct rank) !!!!!
I think this is what you're trying to do:
select
r.analysis_id,
r.alternative_id,
rm.indicator_id_max,
rm.rank_max
from rankings rm
join (
select
analysis_id,
alternative_id,
max(indicator_id) as indicator_id_max,
max(rank) as rank_max
from rankings
group by analysis_id,
alternative_id
having count(*) > 1
) as rm
on r.analysis_id = rm.analysis_id
and r.alternative_id = rm.alternative_id
You example differences seems wrong. You say you want analyses whose ranks for any alternative/indicator combination differ but the example rows 3 and 4 don't satisfy this criteria. A correct result according to your requirement is:
analysis_id | alternative_id | indicator_id | rank
----------------------------------------------------
1 | 1 | 2 | 6
2 | 1 | 2 | 7
1 | 2 | 1 | 3
2 | 2 | 1 | 4
On query you could try is this:
with distinct_ranks as (
select alternative_id
, indicator_id
, rank
, count (*) as count
from rankings
group by alternative_id
, indicator_id
, rank
having count(*) = 1)
select r.analysis_id
, r.alternative_id
, r.indicator_id
, r.rank
from rankings r
join distinct_ranks d on r.alternative_id = d.alternative_id
and r.indicator_id = d.indicator_id
and r.rank = d.rank
You have to realize that on multiple analysis the criteria you have is ambiguous. What if analysis 1,2 and 3 have rank 1 and 4,5 and 6 have rank 2 for alternative/indicator 1/1? The set (1,2,3) is 'different' from the set (4,5,6) but inside each set there is no difference. what is the behavior you desire in that case, should they show up or not? My query finds all records that have a different rank for the same alternative/indicator *from all other analysis' but is not clear if this is correct in your requirement.