Find related "ordered pairs" in SQL - sql

Let's say I have a table format that looks exactly like this:
I'd like to write a query that locates the maximum station for a given frame and output case (results are grouped by frame & output case) but also return the ordered P (& eventually V2, V3, T, M2 & M3) that would be associated with the maximum station. The desired query is shown below:
I can't for the life of me figure this out. I've posted a copy of the access database to my google drive: https://drive.google.com/folderview?id=0B9VpkDoFQISJOFcwS2RMSGJ5RVk&usp=sharing

select x.*, t.p
from (select frame, outputcase, max(station) as max_station
from tbl
group by frame, outputcase) x
inner join tbl t
on x.frame = t.frame
and x.outputcase = t.outputcase
and x.max_station = t.station
order by x.frame, x.outputcase;
Just as a note to avoid confusion, w/ that second column, t is the table alias, p is the column name.
The subquery, which I've assigned an alias of x, finds the max(station) for each unique combination of (frame, outputcase). That is what you want, but the problem does not stop there, you also want column p. The reason that couldn't be selected in the same query is because you would have had to group by it, and you don't want the max(station) for each combination of (frame, outputcase, p). You want the max(station) for each combination of (frame, outputcase).
Because we couldn't get column p in that first step, we have to join back to the original table using the value we obtained (which I've assigned an alias, max_station), and the obvious join conditions of frame and outputcase. So we join back to the original table on those 3 things, 2 of which are fields on the actual table, one of which was calculated in the subquery (max_station).
Because we've joined back to the original table, we can then select column p from the original table.

Takes a bit to return the query, but the result below provides the desired result:
SELECT t1.*
FROM [Element Forces - Frames] as t1
WHERE t1.Station In (SELECT TOP 1 t2.Station
FROM [Element Forces - Frames] as t2
WHERE t2.Frame = t1.Frame
ORDER BY t2.Station DESC)
ORDER BY t1.Frame ASC, t1.OutputCase ASC;
I still want to thank everyone who posted answers. I'm sure it's just syntax errors on my part that I was struggling with.

Related

Athena/Presto | Can't match ID row on self join

I'm trying to get the bi-grams on a string column.
I've followed the approach here but Athena/Presto is giving me errors at the final steps.
Source code so far
with word_list as (
SELECT
transaction_id,
words,
n,
regexp_extract_all(f70_remittance_info, '([a-zA-Z]+)') as f70,
f70_remittance_info
FROM exploration_transaction
cross join unnest(regexp_extract_all(f70_remittance_info, '([a-zA-Z]+)')) with ordinality AS t (words, n)
where cardinality((regexp_extract_all(f70_remittance_info, '([a-zA-Z]+)'))) > 1
and f70_remittance_info is not null
limit 50 )
select wl1.f70, wl1.n, wl1.words, wl2.f70, wl2.n, wl2.words
from word_list wl1
join word_list wl2
on wl1.transaction_id = wl2.transaction_id
The specific issue I'm having is on the very last line, when I try to self join the transaction ids - it always returns zero rows. It does work if I join only by wl1.n = wl2.n-1 (the position on the array) which is useless if I can't constrain it to a same id.
Athena doesn't support the ngrams function by presto, so I'm left with this approach.
Any clues why this isn't working?
Thanks!
This is speculation. But I note that your CTE is using limit with no order by. That means that an arbitrary set of rows is being returned.
Although some databases materialize CTEs, many do not. They run the code independently each time it is referenced. My guess is that the code is run independently and the arbitrary set of 50 rows has no transaction ids in common.
One solution would be to add order by transacdtion_id in the subquery.

Record count when a certain value is present in one row and a different value is not present in another row

I have been tasked with a query that would tell me how many of a certain value are present when another value is not present in a separate row. The 2 rows do have a common field that will be the same when both are present. I have got to the following.
SELECT inci_no, count(inci_no)
FROM inc_unit
WHERE unit IN ('E08','ms08') and alm_date>='01/01/2013'
GROUP BY inci_no
So this gives me the number row for inci_no. I only need rows that have a 1 and the only unit is E08. The inci_no does not matter I have simply been using it to group by.
Thanks for your help.
As Robert said above you could write a self join. If I understand correctly, you want rows where unit is E08 and the second row is not ms08? If so, something like this might work:
Select t1.inci_no, count(t1.inci_no)
From inc_unit as t1 Inner Join inc_unit as t2 On t1.inci_no = t2.inci_no
Where t1.unit = 'E08' and t2.unit <> 'ms08' and t1.alm_date>='01/01/2013'
Group by t1.inci_no
You might have to add something to exclude joining on itself, but this should get you started.

Nested subquery in Access alias causing "enter parameter value"

I'm using Access (I normally use SQL Server) for a little job, and I'm getting "enter parameter value" for Night.NightId in the statement below that has a subquery within a subquery. I expect it would work if I wasn't nesting it two levels deep, but I can't think of a way around it (query ideas welcome).
The scenario is pretty simple, there's a Night table with a one-to-many relationship to a Score table - each night normally has 10 scores. Each score has a bit field IsDouble which is normally true for two of the scores.
I want to list all of the nights, with a number next to each representing how many of the top 2 scores were marked IsDouble (would be 0, 1 or 2).
Here's the SQL, I've tried lots of combinations of adding aliases to the column and the tables, but I've taken them out for simplicity below:
select Night.*
,
( select sum(IIF(IsDouble,1,0)) from
(SELECT top 2 * from Score where NightId=Night.NightId order by Score desc, IsDouble asc, ID)
) as TopTwoMarkedAsDoubles
from Night
This is a bit of speculation. However, some databases have issues with correlation conditions in multiply nested subqueries. MS Access might have this problem.
If so, you can solve this by using aggregation with a where clause that chooses the top two values:
select s.nightid,
sum(IIF(IsDouble, 1, 0)) as TopTwoMarkedAsDoubles
from Score as s
where s.id in (select top 2 s2.id
from score as s2
where s2.nightid = s.nightid
order by s2.score desc, s2.IsDouble asc, s2.id
)
group by s.nightid;
If this works, it is a simply matter to join Night back in to get the additional columns.
Your subquery can only see one level above it. so Night.NightId is totally unknown to it hence why you are being prompted to enter a value. You can use a Group By to get the value you want for each NightId then correlate that back to the original Night table.
Select *
From Night
left join (
Select N.NightId
, sum(IIF(S.IsDouble,1,0)) as [Number of Doubles]
from Night N
inner join Score S
on S.NightId = S.NightId
group by N.NightId) NightsWithScores
on Night.NightId = NightsWithScores.NightId
Because of the IIF(S.IsDouble,1,0) I don't see the point is using top.

left join not doing as expected with sum and group by

This is all going to have to be pseudo as I am on my phone and have no internet access right now as I have just moved but its bugging the crap out of me. This also means I can't do code blocks please bear with me: I'll try.
I have a table with amounts in it, and I have a table with labels. I want to sum the amounts in the first table grouped by the labels. The problem is, if there are no records for a label existing in the table with the amounts then I don't get a record in the result set for that label. I need a record there with nulls for the amount tables field. Here is what some sample data might look like:
Amount_table:
Columns: id, tpa, amt, link_to_label_table
Data:
1, GTL, 2000, 1
2, GTL, 1000, 1
Label_table:
Columns: link_to_amount_table, label_name
Data:
1, Label1
2, Label2
Query:
Select at.tpa, sum(at.amt) as amt, lt.label_name
From Amount_table as at
Left join Label_tabl lt on lt.link_to_amount_table = at.link_to_label_table
Where at.tpa = 'GTL'
Group by lt.label, at.tpa
Now this returns:
GTL, 3000, Label1
I tried selecting from the labels table then left joining the amount table and it still didn't give my desired results which are:
GTL, 3000, Label1
Null, Null, Label2
Is this possible with the sum and group by? The fields being grouped by have to be there otherwise you get an error.
This is in DB2 by the way. Is there any way possible to get this to return the way I need it? I have to get the labels; they are dynamic.
On the face of it, you want to have your labels table as the dominant table and the amounts table as the one that is outer joined.
SELECT a.tpa, sum(a.amt) as amt, l.label_name
FROM Label_table AS l
LEFT JOIN Amount_table AS a
ON l.link_to_amount_table = a.link_to_label_table
GROUP BY l.label, a.tpa
You have a condition Amount_table.tpa = 'GTL'; it is not entirely clear why you have that, but presumably it is significant with more data in the tables. There are (at least) two ways you can incorporate that condition into the query (other than the one you chose - which eliminates the rows where a.tpa is null).
SELECT a.tpa, sum(a.amt) as amt, l.label_name
FROM Label_table AS l
LEFT JOIN Amount_table AS a
ON l.link_to_amount_table = a.link_to_label_table
AND a.tpa = 'GTL'
GROUP BY l.label, a.tpa
Or:
SELECT a.tpa, sum(a.amt) as amt, l.label_name
FROM Label_table AS l
LEFT JOIN (SELECT *
FROM Amount_table
WHERE tpa = 'GTL') AS a
ON l.link_to_amount_table = a.link_to_label_table
GROUP BY l.label, a.tpa
A decent optimizer will produce the same query plan for both, so it probably doesn't matter which you use. There's an argument that suggests the second alternative is cleaner in that the ON clause is primarily for joining conditions, and the filter condition on a.tpa is not a joining condition. There's another argument that says the first alternative avoids a sub-query and is therefore preferable. I'd validate that the query plans are the same and would probably choose the second, but it is a somewhat nebulous decision based on a mild preference.
You were so close on your second try. Change WHERE to AND. This has the effect of applying at.tpa='GTL' to the JOIN instead of applying it to the filter so you don't filter out the NULLs.

SQL conundrum, how to select latest date for part, but only 1 row per part (unique)

I am trying to wrap my head around this one this morning.
I am trying to show inventory status for parts (for our products) and this query only becomes complex if I try to return all parts.
Let me lay it out:
single table inventoryReport
I have a distinct list of X parts I wish to display, the result of which must be X # of rows (1 row per part showing latest inventory entry).
table is made up of dated entries of inventory changes (so I only need the LATEST date entry per part).
all data contained in this single table, so no joins necessary.
Currently for 1 single part, it is fairly simple and I can accomplish this by doing the following sql (to give you some idea):
SELECT TOP (1) ldDate, ptProdLine, inPart, inSite, inAbc, ptUm, inQtyOh + inQtyNonet AS in_qty_oh, inQtyAvail, inQtyNonet, ldCustConsignQty, inSuppConsignQty
FROM inventoryReport
WHERE (ldPart = 'ABC123')
ORDER BY ldDate DESC
that gets me my TOP 1 row, so simple per part, however I need to show all X (lets say 30 parts). So I need 30 rows, with that result. Of course the simple solution would be to loop X# of sql calls in my code (but it would be costly) and that would suffice, but for this purpose I would love to work this SQL some more to reduce the x# calls back to the db (if not needed) down to just 1 query.
From what I can see here I need to keep track of the latest date per item somehow while looking for my result set.
I would ultimately do a
WHERE ldPart in ('ABC123', 'BFD21', 'AA123', etc)
to limit the parts I need. Hopefully I made my question clear enough. Let me know if you have an idea. I cannot do a DISTINCT as the rows are not the same, the date needs to be the latest, and I need a maximum of X rows.
Thoughts? I'm stuck...
SELECT *
FROM (SELECT i.*,
ROW_NUMBER() OVER(PARTITION BY ldPart ORDER BY ldDate DESC) r
FROM inventoryReport i
WHERE ldPart in ('ABC123', 'BFD21', 'AA123', etc)
)
WHERE r = 1
EDIT: Be sure to test the performance of each solution. As pointed out in this question, the CTE method may outperform using ROW_NUMBER.
;with cteMaxDate as (
select ldPart, max(ldDate) as MaxDate
from inventoryReport
group by ldPart
)
SELECT md.MaxDate, ir.ptProdLine, ir.inPart, ir.inSite, ir.inAbc, ir.ptUm, ir.inQtyOh + ir.inQtyNonet AS in_qty_oh, ir.inQtyAvail, ir.inQtyNonet, ir.ldCustConsignQty, ir.inSuppConsignQty
FROM cteMaxDate md
INNER JOIN inventoryReport ir
on md.ldPart = ir.ldPart
and md.MaxDate = ir.ldDate
You need to join into a Sub-query:
SELECT i.ldPart, x.LastDate, i.inAbc
FROM inventoryReport i
INNER JOIN (Select ldPart, Max(ldDate) As LastDate FROM inventoryReport GROUP BY ldPart) x
on i.ldPart = x.ldPart and i.ldDate = x.LastDate