Difficult query in either Linq To Sql or SQL - sql

I've been working on this for 2 days and I can't figure it out. I'm hoping someone smarter than me will give this a go.
Let's say I have the following tables:
Rating:
Id | Name
1 | A
2 | B
3 | C
4 | D
5 | E
Inspection:
Id | Date (mm/dd/yyyy)
1 | 01/04/2012
2 | 04/04/2012
3 | 28/03/2012
4 | 04/04/2012
Observation:
Id | InspectionId | RatingId
1 | 2 | 3
2 | 1 | 2
3 | 4 | 3
4 | 2 | 1
5 | 3 | 3
6 | 1 | 2
I want the query to return:
RatingName | Date(mm/dd/yyyy) | ObservationCount
A | 01/04/2012 | 0
B | 01/04/2012 | 1
C | 01/04/2012 | 1
D | 01/04/2012 | 0
E | 01/04/2012 | 0
A | 04/04/2012 | 1
B | 04/04/2012 | 0
C | 04/04/2012 | 2
D | 04/04/2012 | 0
E | 04/04/2012 | 0
A | 28/03/2012 | 0
B | 28/03/2012 | 0
C | 28/03/2012 | 1
D | 28/03/2012 | 0
E | 28/03/2012 | 0
So I need the number of Observations for each rating for each date. And yes I need to have the records which return 0 Observations because I'm using this data in a stacked chart and without them it throws an error. I've managed to get the above table but without the records that return 0 Observations with the following Linq To Sql query, but from this point I get stuck.
MyDataContext DB = new MyDataContext();
var data =
(from r in DB.Ratings
join o in DB.Observations on r.Id equals o.RatingId into ro
from observation in ro.DefaultIfEmpty()
join i in DB.Inspections on observation.InspectionId equals i.Id into roi
from q in roi.DefaultIfEmpty()
group q by new { Name = r.Name, Date = q.Date } into grouped
select
new
{
RatingName = grouped.Key.Name,
Date = grouped.Key.Date,
ObservationCount = grouped.Count(x => x != null)
}).OrderBy(x => x.Date);
I would appreciate an answer in either Linq To Sql or just plain old SQL, thanks!

You should try a CROSS JOIN on the two reference tables, and then OUTER JOIN back to your 'data' table - and then check whether the Observation is null... then group and sum!
SELECT
[Name],
[Date],
SUM([Exists])
FROM
(
SELECT
name,
[Date],
CASE WHEN o.Id IS NULL THEN 0
ELSE 1
END as [Exists]
FROM
Rating r CROSS JOIN
Inspection i
LEFT OUTER JOIN Observation o
ON o.RatingId = r.Id AND o.InspectionId = i.Id
) as [source]
GROUP BY [Name], [Date]
ORDER BY
[Date],
name
Translating to LINQ would be a similar two-step process - get the inner result set (checking whether observation is NULL), before grouping and summing.

Related

Sql join multiple tables, get count of certain rows, and also check some rows satisfy condition

I have a Zoo, each Zoo has many Cages, each Cage has many Animals.
Zoo:
+----+
| Id |
+----+
| 1 |
| 2 |
+----+
Cage:
+----+-------+
| Id | ZooId |
+----+-------+
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| 4 | 2 |
| 5 | 2 |
+----+-------+
Animal:
+----+--------+----------+
| Id | CageId | IsHungry |
+----+--------+----------+
| 1 | 1 | 0 |
| 2 | 1 | 0 |
| 3 | 1 | 0 |
| 4 | 2 | 1 |
| 5 | 3 | 0 |
| 6 | 4 | 0 |
| 7 | 5 | 0 |
+----+--------+----------+
I'm trying to design a query to show each Zoo, the number of cages in that Zoo, and whether or not the Zoo has hungry Animals.
Here is the results I expect:
+-------+-----------+--------------+
| ZooID | CageCount | AnyoneHungry |
+-------+-----------+--------------+
| 1 | 2 | 1 |
| 2 | 3 | 0 |
+-------+-----------+--------------+
I can get the number of Cages in a Zoo:
SELECT
[c].[ZooId],
COUNT(*) AS [NumCages]
FROM [Cage] [c]
GROUP BY [c].[ZooId]
ORDER BY [NumCages] DESC
I can determine if a Cage has a hungry animal or not:
SELECT CASE WHEN EXISTS (
SELECT NULL
FROM [Animal] [a]
WHERE [a].[CageId] = #CageId AND [a].[IsHungry] = 1
) THEN 1 ELSE 0 END
But I'm having trouble combining these two into a single query that runs efficiently (in this universe zoos are very popular and have millions of cages and animals).
SELECT
[c].[ZooId],
COUNT(*) AS [CageCount],
MAX(CONVERT(INT, [x].[AnyoneHungry])) AS [AnyoneHungry]
FROM [Cage] [c]
INNER JOIN (
SELECT [a].[CageId], MAX(CONVERT(INT, [a].[IsHungry])) AS [AnyoneHungry]
FROM [Animal] [a]
GROUP BY [a].[CageId]
) [x] on [x].[CageId] = [c].[Id]
GROUP BY [c].[ZooId]
I feel like I'm missing something and it should be possible do run this query using a simpler statement.
This should do
SELECT
Z.Id,
COUNT(DISTINCT C.Id) AS CageCount,
COALESCE(MAX(CAST(A.IsHungry AS INT)), 0) AS AnyHungry /*The cast is only required if A.IsHungry is BIT and not INT*/
FROM Zoo Z
LEFT JOIN Cage C ON Z.Id = C.ZooId
LEFT JOIN Animal A ON C.Id = A.CageId
GROUP BY Z.Id
If you only need the zoo id and hungry animals:
SELECT c.zooid,
COUNT(DISTINCT C.Id) as CageCount,
COALESCE(MAX(CONVERT(int, a.IsHungry)), 0) AS AnyHungry
FROM Cage C LEFT JOIN
Animal A
ON c.Id = a.CageId AND a.IsHungry = 1
GROUP BY c.zooid;

Iterate over the rows of a second table to return resultset with cumulative sum

Yesterday, after the help of a SO user #
Iterate over the rows of a second table to return resultset
I was able to make a combination of rows with a selfjoin.
After some modifications, to adapt to my implementation, I faced a new challenge that I'm stuck: how to make an aggregate sum of a third column?
My issue is better explained in the image below:
Based on the code
SELECT
b1.table_a_id,
b1.label_x,
b2.label_y
FROM table_a a
INNER JOIN table_b b1
ON b1.table_a_id = a.table_a_id
INNER JOIN table_b b2
ON b2.table_a_id = b1.table_a_id AND
b2.label_y > b1.label_x
ORDER BY
b1.table_a_id,
b1.label_x,
b2.label_y;
I was able to acquire the combinations.
What should be the next step to get the cumulative sum based on a third column?
I couldn't think of a solution without using a second service, such as python with pandas, using a cumsum function.
To generate the expected resultset, you would need to join the table with itself with an inequality condition on the order column. Then, you can do a window sum:
select
t1.table_a_id,
t1.label_x,
t2.label_y,
sum(t2.value) over(
partition by t1.table_a_id, t1.label_x
order by t1."order", t2."order"
) agg_value
from
table_b t1
inner join table_b t2
on t1.table_a_id = t2.table_a_id
and t2."order" >= t1."order"
order by t1."order", t2."order"
Note: order is a reserved word, so it needs to be quoted; if you actual database column has a different name, you can remove the double quotes.
Demo on DB Fiddle:
TABLE_A_ID | LABEL_X | LABEL_Y | AGG_VALUE
---------: | :------ | :------ | --------:
1 | A | B | 1
1 | A | C | 3
1 | A | D | 6
1 | A | E | 10
1 | A | F | 15
1 | B | C | 2
1 | B | D | 5
1 | B | E | 9
1 | B | F | 14
1 | C | D | 3
1 | C | E | 7
1 | C | F | 12
1 | D | E | 4
1 | D | F | 9
1 | E | F | 5
You seem to want a cumulative sum:
SELECT b1.table_a_id, b1.label_x, b2.label_y,
SUM(b1.value) OVER (PARTITION BY b1.table_a_id, b1.label_x
ORDER BY b2.order
) as AGG_VALUE

Finding nth row using sql

select top 20 *
from dbo.DUTs D
inner join dbo.Statuses S on d.StatusID = s.StatusID
where s.Description = 'Active'
Above SQL Query returns the top 20 rows, how can I get a nth row from the result of the above query? I looked at previous posts on finding the nth row and was not clear to use it for my purpose.
Thanks.
The row order is arbitrary, so I would add an ORDER BY expression. Then, you can do something like this:
SELECT TOP 1 * FROM (SELECT TOP 20 * FROM ... ORDER BY d.StatusID) AS d ORDER BY d.StatusID DESC
to get the 20th row.
You can also use OFFSET like:
SELECT * FROM ... ORDER BY d.StatusID OFFSET 19 ROWS FETCH NEXT 1 ROWS ONLY
And a third option:
SELECT * FROM (SELECT *, rownum = ROW_NUMBER() OVER (ORDER BY d.StatusID) FROM ...) AS a WHERE rownum = 20
I tend to use CTEs with the ROW_NUMBER() function to get my lists numbered in order. As #zambonee said, you'll need an ORDER BY clause either way or SQL can put them in a different order every time. It doesn't usually, but without ordering it yourself, you're not guaranteed to get the same thing twice. Here I'm assuming there's a [DateCreated] field (DATETIME NOT NULL DEFAULT GETDATE()), which is usually a good idea so you know when that record was entered. This says "give me everything in that table and add a row number with the most recent record as #1":
; WITH AllDUTs
AS (
SELECT *
, DateCreatedRank = ROW_NUMBER() OVER(ORDER BY [DateCreated] DESC)
FROM dbo.DUTs D
INNER JOIN dbo.Statuses S ON D.StatusID = S.StatusID
WHERE S.Description = 'Active'
)
SELECT *
FROM AllDUTs
WHERE AllDUTs.DateCreatedRank = 20;
SELECT * FROM (SELECT * FROM EMP ORDER BY ROWID DESC) WHERE ROWNUM<11
It's another sample:
SELECT * ,CASE WHEN COUNT(0)OVER() =ROW_NUMBER()OVER(ORDER BY number) THEN 1 ELSE 0 END IsNth
FROM (
select top 10 *
from master.dbo.spt_values AS d
where d.type='P'
) AS t
+------+--------+------+-----+------+--------+-------+
| name | number | type | low | high | status | IsNth |
+------+--------+------+-----+------+--------+-------+
| NULL | 0 | P | 1 | 1 | 0 | 0 |
| NULL | 1 | P | 1 | 2 | 0 | 0 |
| NULL | 2 | P | 1 | 4 | 0 | 0 |
| NULL | 3 | P | 1 | 8 | 0 | 0 |
| NULL | 4 | P | 1 | 16 | 0 | 0 |
| NULL | 5 | P | 1 | 32 | 0 | 0 |
| NULL | 6 | P | 1 | 64 | 0 | 0 |
| NULL | 7 | P | 1 | 128 | 0 | 0 |
| NULL | 8 | P | 2 | 1 | 0 | 0 |
| NULL | 9 | P | 2 | 2 | 0 | 1 |
+------+--------+------+-----+------+--------+-------+

Query returned with an extra column in sql -ms access

So I am wondering. I fell into an interesting suggestion from another developer. So i basically have two tables I join in a query and I want the resulting table from the query to have an extra column that comes from the table on from the joint.
Example:
#table A: contains rating of players, changes randomly at any date depending
#on drop of form from the players
PID| Rating | DateChange |
1 | 2 | 10-May-2014 |
1 | 4 | 20-May-2015 |
1 | 20 | 1-June-2015 |
2 | 4 | 1-April-2014|
3 | 4 | 5-April-2014|
2 | 3 | 3-May-2015 |
#Table B: contains match sheets. Every player has a different match sheet
#and plays different dates.
MsID | PID | MatchDate | Win |
1 | 2 | 10-May-2014 | No |
2 | 1 | 15-May-2015 | Yes |
3 | 3 | 10-Apr-2014 | No |
4 | 1 | 21-Apr-2015 | Yes |
5 | 1 | 3-June-2015 | Yes |
6 | 2 | 5-May-2015 | No |
#I am trying to achieve this by running the ms-access query: i want to get
#every players rating at the time the match was played not his current
#rating.
MsID | PID | MatchDate | Rating |
1 | 2 | 10-May-2014 | 4 |
2 | 1 | 15-May-2015 | 2 |
3 | 3 | 10-Apr-2014 | 4 |
4 | 1 | 21-Apr-2015 | 4 |
5 | 1 | 3-June-2015 | 20 |
6 | 2 | 5-May-2015 | 3 |
This is what I have tried below:
Select MsID, PID, MatchDate, A-table.rating as Rating from B-table
left Join A-table
on B-table.PID = A-table.PID
where B-table.MatchDate > A-table.Datechange;
any help is appreciated. The solution can be in Vba as long as it returns something like a view/table I can manipulate using other queries or report.
Think of this in terms of sets of data... you need a set that lists the MAX dateChange for each player's and match date.
Soo...
SELECT MAX(A.DateChange) MDC, A.PID, B.Matchdate
FROM B-table B
INNER Join A-table A
on B.PID = A.PID
and A.DateChange <= B.MatchDate
GROUP BY A.PID, B.Matchdate
Now we take this and join it back to what you've done to limit the results in table A and B to ONLY those with that date player and matchDate (my inline table C)
SELECT B.MsID, B.PID, B.MatchDate, A.rating as Rating
FROM [B-table] B
INNER JOIN [A-table] A
on B.PID = A.PID
INNER JOIN (
SELECT MAX(Y.DateChange) MDC, Y.PID, Z.Matchdate
FROM [B-table] Z
INNER Join [A-table] Y
on Z.PID = Y.PID
and Y.DateChange <= Z.MatchDate
GROUP BY Y.PID, Z.Matchdate) C
on C.mdc = A.DateChange
and A.PID = C.PId
and B.MatchDate = C.Matchdate
I didn't create a sample for this using your data so it's untested but I believe the logic is sound...
Now Tested! SQL Fiddle using SQL server though...
My results don't match yours exactly. I think you're expected results are wrong though for MSID 4 given rules defined.

Left Join on Associative Table

I have three tables
Prospect -- holds prospect information
id
name
projectID
Sample data for Prospect
id | name | projectID
1 | p1 | 1
2 | p2 | 1
3 | p3 | 1
4 | p4 | 2
5 | p5 | 2
6 | p6 | 2
Conjoint -- holds conjoint information
id
title
projectID
Sample data
id | title | projectID
1 | color | 1
2 | size | 1
3 | qual | 1
4 | color | 2
5 | price | 2
6 | weight | 2
There is an associative table that holds the conjoint values for the prospects:
ConjointProspect
id
prospectID
conjointID
value
Sample Data
id | prospectID | conjointID | value
1 | 1 | 1 | 20
2 | 1 | 2 | 30
3 | 1 | 3 | 50
4 | 2 | 1 | 10
5 | 2 | 3 | 40
There are one or more prospects and one or more conjoints in their respective tables. A prospect may or may not have a value for each conjoint.
I'd like to have an SQL statement that will extract all conjoint values for each prospect of a given project, displaying NULL where there is no value for a value that is not present in the ConjointProspect table for a given conjoint and prospect.
Something along the lines of this for projectID = 1
prospectID | conjoint ID | value
1 | 1 | 20
1 | 2 | 30
1 | 3 | 50
2 | 1 | 10
2 | 2 | NULL
2 | 3 | 40
3 | 1 | NULL
3 | 2 | NULL
3 | 3 | NULL
I've tried using an inner join on the prospect and conjoint tables and then a left join on the ConjointProspect, but somewhere I'm getting a cartesian products for prospect/conjoint pairs that don't make any sense (to me)
SELECT p.id, p.name, c.id, c.title, cp.value
FROM prospect p
INNER JOIN conjoint c ON p.projectID = c.projectid
LEFT JOIN conjointProspect cp ON cp.prospectID = p.id
WHERE p.projectID = 2
ORDER BY p.id, c.id
prospectID | conjoint ID | value
1 | 1 | 20
1 | 2 | 30
1 | 3 | 50
1 | 1 | 20
1 | 2 | 30
1 | 3 | 50
1 | 1 | 20
1 | 2 | 30
1 | 3 | 50
2 | 1 | 10
2 | 2 | 40
2 | 1 | 10
2 | 2 | 40
2 | 1 | 10
2 | 2 | 40
3 | 1 | NULL
3 | 2 | NULL
3 | 3 | NULL
Guidance is very much appreciated!!
Then this will work for you... Prejoin a Cartesian against all prospects and elements within that project via a select as your first FROM table. Then, left join to the conjoinprospect. You can obviously change / eliminate certain columns from result, but at least all is there, in the join you want with exact results you are expecting...
SELECT
PJ.*,
CJP.Value
FROM
( SELECT
P.ID ProspectID,
P.Name,
P.ProjectID,
CJ.Title,
CJ.ID ConJointID
FROM
Prospect P,
ConJoint CJ
where
P.ProjectID = 1
AND P.ProjectID = CJ.ProjectID
ORDER BY
1, 4
) PJ
LEFT JOIN conjointProspect cjp
ON PJ.ProspectID = cjp.prospectID
AND PJ.ConjointID = cjp.conjointid
ORDER BY
PJ.ProspectID,
PJ.ConJointID
Your cartesian product is a result of joining by project Id - in your sample data there are 3 prospects with a project id of 1 and 3 conjoints with a project id of 1. Joining based on project id should then result in 9 rows of data, which is what you're getting. It looks like you really need to join via the conjointprospects table as that it what holds the mapping between prospects and conjoint.
What if you try something like:
SELECT p.id, p.name, c.id, c.title, cp.value
FROM prospect p
LEFT JOIN conjointProspect cp ON cp.prospectID = p.id
RIGHT JOIN conjoint c ON cp.conjointID = c.id
WHERE p.projectID = 2
ORDER BY p.id, c.id
Not sure if that will work, but it seems like conjointprospects needs to be at the center of your join in order to correctly map prospects to conjoints.