Sql subquery issue suming subfields - sql

I am trying to get a subquery to work but i cant get past subquery returned more than 1 value. this is not permitted. No matter how i try to re-write this query I cannot get this sum area to work. I am only fairly new to subqueries so any assistance you could provide would be great the query currently stands as
SELECT dbo.h.DateDelivery, dbo.h.DateInProduction, dbo.h.JobNumber, dbo.h.JobKeyID, dbo.h.QuantityFrames, dbo.h.QuantityGlass, dbo.h.QuantityPanels,
dbo.r.Description, dbo.r.DivisionID, dbo.r.rKeyID, (SELECT SUM(t.QtyPacks) AS packqty
FROM t INNER JOIN
h ON t.JobKeyID = h.JobKeyID
WHERE (t.StageID = 10) OR
(t.StageID = 20) OR
(t.StageID = 28)
GROUP BY t.JobKeyID)
FROM dbo.h INNER JOIN
dbo.r ON dbo.h.rKeyID = dbo.r.rKeyID
WHERE (NOT (dbo.r.rKeyID IN (1, 50, 81, 91)))

You haven't specified which DBMS you are using, however in general a scalar subquery expression must return exactly one column value from one row (see for instance the Oracle documentation).
In your code there is a GROUP BY t.JobKeyID which might cause the subquery to return more than one row.

Related

Self joining columns from the same table with calculation on one column not displaying column name

I am fairly new to SQL and having issues figuring out how to solve the simple issue below. I have a dataset I am trying to self-join, I am using (b.calendar_year_number -1) as one of the columns to join. I applied a calculation of -1 with the goal of trying to match values from the previous year. However, it is not working as the resulting column shows (No column name) with a screenshot attached below. How do I change the alias to b.calendar_year_number after the calculation?
Code:
SELECT a.day_within_fiscal_period,
a.calendar_month_name,
a.cost_period_rolling_three_month_start_date,
a.calendar_year_number,
b.day_within_fiscal_period,
b.calendar_month_name,
b.cost_period_rolling_three_month_start_date,
(b.calendar_year_number -1)
FROM [data_mart].[v_dim_date_consumer_complaints] AS a
JOIN [data_mart].[v_dim_date_consumer_complaints] AS b
ON b.day_within_fiscal_period = a.day_within_fiscal_period AND
b.calendar_month_name = a.calendar_month_name AND
b.calendar_year_number = a.calendar_year_number
I am using (b.calendar_year_number -1) as one of the columns to join.
Nope, you're not. Look at your join statement and you'll see the third condition is:
b.calendar_year_number = a.calendar_year_number
So just change that to include the calculation. As far as the 'no column name' issue, you can use colname = somelogic syntax or somelogic as colname. Below, I used the former syntax.
select a.day_within_fiscal_period,
a.calendar_month_name,
a.cost_period_rolling_three_month_start_date,
a.calendar_year_number,
b.day_within_fiscal_period,
b.calendar_month_name,
b.cost_period_rolling_three_month_start_date,
bCalYearNum = b.calendar_year_number
from [data_mart].[v_dim_date_consumer_complaints] a
left join [data_mart].[v_dim_date_consumer_complaints] b
on b.day_within_fiscal_period = a.day_within_fiscal_period
and b.calendar_month_name = a.calendar_month_name
and b.calendar_year_number - 1 = a.calendar_year_number;
You could use the analytical function LAG/LEAD to get your required result, no self-join necessary:
select a.day_within_fiscal_period,
a.calendar_month_name,
a.cost_period_rolling_three_month_start_date,
a.calendar_year_number,
old_cost_period_rolling_three_month_start_date =
LAG(cost_period_rolling_three_month_start_date) OVER
(PARTITION BY calendar_month_name, day_within_fiscal_period
ORDER BY calendar_year_number),
old_CalYearNum = LAG(calendar_year_number) OVER
(PARTITION BY calendar_month_name, day_within_fiscal_period
ORDER BY calendar_year_number)
from [data_mart].[v_dim_date_consumer_complaints] a

Nested Query Alternatives in AWS Athena

I am running a query that gives a non-overlapping set of first_party_id's - ids that are associated with one third party but not another. This query does not run in Athena, however, giving the error: Correlated queries not yet supported.
Was looking at prestodb docs, https://prestodb.io/docs/current/sql/select.html (Athena is prestodb under the hood), for an alternative to nested queries. The with statement example given doesn't seem to translate well for this not in clause. Wondering what the alternative to a nested query would be - Query below.
SELECT
COUNT(DISTINCT i.third_party_id) AS uniques
FROM
db.ids i
WHERE
i.third_party_type = 'cookie_1'
AND i.first_party_id NOT IN (
SELECT
i.first_party_id
WHERE
i.third_party_id = 'cookie_2'
)
There may be a better way to do this - I would be curious to see it too! One way I can think of would be to use an outer join. (I'm not exactly sure about how your data is structured, so forgive the contrived example, but I hope it would translate ok.) How about this?
with
a as (select *
from (values
(1,'cookie_n',10,'cookie_2'),
(2,'cookie_n',11,'cookie_1'),
(3,'cookie_m',12,'cookie_1'),
(4,'cookie_m',12,'cookie_1'),
(5,'cookie_q',13,'cookie_1'),
(6,'cookie_n',13,'cookie_1'),
(7,'cookie_m',14,'cookie_3')
) as db_ids(first_party_id, first_party_type, third_party_id, third_party_type)
),
b as (select first_party_type
from a where third_party_type = 'cookie_2'),
c as (select a.third_party_id, b.first_party_type as exclude_first_party_type
from a left join b on a.first_party_type = b.first_party_type
where a.third_party_type = 'cookie_1')
select count(distinct third_party_id) from c
where exclude_first_party_type is null;
Hope this helps!
You can use an outer join:
SELECT
COUNT(DISTINCT i.third_party_id) AS uniques
FROM
db.ids a
LEFT JOIN
db.ids b
ON a.first_party_id = b.first_party_id
AND b.third_party_id = 'cookie_2'
WHERE
a.third_party_type = 'cookie_1'
AND b.third_party_id is null -- this line means we select only rows where there is no match
You should also use caution when using NOT IN for subqueries that may return NULL values since the condition will always be true. Your query is comparing a.first_party_id to NULL, which will always be false and so NOT IN will lead to the condition always being true. Nasty little gotcha.
One way to avoid this is to avoid using NOT IN or to add a condition to your subquery i.e. AND third_party_id IS NOT NULL.
See here for a longer explanation.

MS Access SQL Update with Minimum

This SQL is beyond my expertise. I think it should be fairly easy for someone with experience. Here is what I have so far..
SQL is as follows:
UPDATE (Tbl_Stg_Project_Schedule_Dates
INNER JOIN Tbl_Child_ITN ON Tbl_Stg_Project_Schedule_Dates.ms_itn = Tbl_Child_ITN.ITN)
INNER JOIN Tbl_Schedule ON Tbl_Child_ITN.Id = Tbl_Schedule.ID SET Tbl_Schedule.a_construction_start = [Tbl_Stg_Project_Schedule_Dates].[ms_start_date]
WHERE (((Tbl_Stg_Project_Schedule_Dates.ms_tempt_id) In (16,17,18,19,20,21,22,23)));
I want to add one last condition to this being that I only want the minimum of [Tbl_Stg_Project_Schedule_Dates].[ms_start_date] to update the table. I've tried the obvious of wrapping the field in Min, and also tried creating a separate aggregate select statement first (to get the min value with other criteria) that I then tried to create the update query from in new query but no luck.
I believe this is valid Access/Jet SQL. The idea here is to use a subquery to look up the earliest date among all the rows in your subset. I'm not sure if ms_itn was the right column to correlate on but hopefully you get the idea:
UPDATE (Tbl_Stg_Project_Schedule_Dates
INNER JOIN Tbl_Child_ITN ON Tbl_Stg_Project_Schedule_Dates.ms_itn = Tbl_Child_ITN.ITN)
INNER JOIN Tbl_Schedule ON Tbl_Child_ITN.Id = Tbl_Schedule.ID
SET Tbl_Schedule.a_construction_start = [Tbl_Stg_Project_Schedule_Dates].[ms_start_date]
WHERE (((Tbl_Stg_Project_Schedule_Dates.ms_tempt_id) In (16,17,18,19,20,21,22,23)))
and [Tbl_Stg_Project_Schedule_Dates].[ms_start_date] = (
select min(sd.[ms_start_date])
from [Tbl_Stg_Project_Schedule_Dates] as sd
where sd.ms_itn = [Tbl_Stg_Project_Schedule_Dates].ms_itn
)

single-row subquery returns more than one row. Query not working with main query

I hve to display several cell values into one cell. So I am using this query:
select LISTAGG(fc.DESCRIPTION, ';'||chr(10))WITHIN GROUP (ORDER BY fc.SWITCH_NAME) AS DESCRIP from "ORS".SWITCH_OPERATIONS fc
group by fc.SWITCH_NAME
It is working fine. But when I am merging this with my main(complete) query then I am getting the error as: Error code 1427, SQL state 21000: ORA-01427: single-row subquery returns more than one row
Here is my complete query:
SELECT
TRACK_EVENT.LOCATION,
TRACK_EVENT.ELEMENT_NAME,
(select COUNT(*) from ORS.TRACK_EVENT b where (b.ELEMENT_NAME = sw.SWITCH_NAME)AND (b.ELEMENT_TYPE = 'SWITCH')AND (b.EVENT_TYPE = 'I')AND (b.ELEMENT_STATE = 'NORMAL' OR b.ELEMENT_STATE = 'REVERSE'))as COUNTER,
(select COUNT(*) from ORS.SWITCH_OPERATIONS fc where TRACK_EVENT.ELEMENT_NAME = fc.SWITCH_NAME and fc.NO_CORRESPONDENCE = 1 )as FAIL_COUNT,
(select MAX(cw.COMMAND_TIME) from ORS.SWITCH_OPERATIONS cw where ((TRACK_EVENT.ELEMENT_NAME = cw.SWITCH_NAME) and (cw.NO_CORRESPONDENCE = 1)) group by cw.SWITCH_NAME ) as FAILURE_DATE,
(select LISTAGG(fc.DESCRIPTION, ';'||chr(10))WITHIN GROUP (ORDER BY fc.SWITCH_NAME) AS DESCRIP from "ORS".SWITCH_OPERATIONS fc
group by fc.SWITCH_NAME)
FROM
ORS.SWITCH_OPERATIONS sw,
ORS.TRACK_EVENT TRACK_EVENT
WHERE
sw.SEQUENCE_ID = TRACK_EVENT.SEQUENCE_ID
Not only are subqueries in the SELECT list required to return exactly one row (or any time they're used for a singular comparison, like <, =, etc), but their use in that context tends to make the database execute them RBAR - Row-by-agonizing-row. That is, they're slower and consume more resources than they should.
Generally, unless the result set outside the subquery contains only a few rows, you want to construct subqueries as part of a table-reference. Ie, something like:
SELECT m.n, m.z, aliasForSomeTable.a, aliasForSomeTabe.bSum
FROM mainTable m
JOIN (SELECT a, SUM(b) AS bSum
FROM someTable
GROUP BY a) aliasForSomeTable
ON aliasForSomeTable.a = m.a
This benefits you in other ways to - it's easier to get multiple columns out of the same table-reference, for example.
Assuming that LISTAGG(...) can be included with other aggregate functions, you can change your query to look like this:
SELECT Track_Event.location, Track_Event.element_name,
Counted_Events.counter,
Failure.fail_count, Failure.failure_date, Failure.descrip
FROM ORS.Track_Event
JOIN ORS.Switch_Operations
ON Switch_Operations.sequence_id = Track_Event.sequence_id
LEFT JOIN (SELECT element_name, COUNT(*) AS counter
FROM ORS.Track_Event
WHERE element_type = 'SWITCH'
AND event_type = 'I'
AND element_state IN ('NORMAL', 'REVERSE')
GROUP BY element_name) Counted_Events
ON Counted_Events.element_name = Switch_Operations.swicth_name
LEFT JOIN (SELECT switch_name,
COUNT(CASE WHEN no_correspondence = 1 THEN '1' END) AS fail_count,
MAX(CASE WHEN no_correspondence = 1 THEN command_time END) AS failure_date,
LISTAGG(description, ';' || CHAR(10)) WITHIN GROUP (ORDER BY command_time) AS descrip
FROM ORS.Switch_Operations
GROUP BY switch_name) Failure
ON Failure.switch_name = Track_Event.element_name
This query was written to (attempt to) preserve the semantics of your original query. I'm not completely sure that's what you actually need but without sample starting data and desired results, I have no way to tell how else to improve this. For instance, I'm a little suspicious of the need of Switch_Operations in the outer query, and the fact that LISTAGG(...) is run over row where no_correspondence <> 1. I did change the ordering of LISTAGG(...), because the original column would not have done anything (because the order way the same as the grouping), so would not have been a stable sort.
Single-row subquery returns more than one row.
This error message is self descriptive.
Returned field can't have multiple values and your subquery returns more than one row.
In your complete query you specify fields to be returned. The last field expects single value from the subquery but gets multiple rows instead.
I have no clue about the data you're working with but either you have to ensure that subquery returns only one row or you have to redesign the wrapping query (possibly using joins when appropriate).

Whats wrong with this nested query?

I am trying to write a query to return the id of the latest version of a market index stored in a database.
SELECT miv.market_index_id market_index_id from ref_market_index_version miv
INNER JOIN ref_market_index mi ON miv.market_index_id = mi.id
WHERE mi.short_name='dow30'
AND miv.version_num = (SELECT MAX(m1.version_num) FROM ref_market_index_version m1 INNER JOIN ref_market_index m2 ON m1.market_index_id = m2.id )
The above SQL statement can be (roughly) translated into the form:
SELECT some columns FROM SOME CRITERIA MATCHED TABLES
WHERE mi.short_name='some name'
AND miv.version_num = SOME NUMBER
What I don't understand is that when I supply an actual number (instead of a sub query), the SQL statement works - also, when I test the SUB query used to determine the latest version number, that also works - however, when I attempt to use the result returned by sub query in the outer (parent?) query, it returns 0 rows - what am I doing wrong here?
Incidentally, I also tried an IN CLAUSE instead of the strict equality match i.e.
... AND miv.version_num IN (SUB QUERY)
That also resulted in 0 rows, although as before, when running the parent query with a hard coded version number, I get 1 row returned (as expected).
BTW I am using postgeresql, but I prefer the solution to be db agnostic.
The problem is probably that the max(version_num) doesn't exist for 'dow30'.
Try the following correlated subquery:
SELECT miv.market_index_id market_index_id
from ref_market_index_version miv INNER JOIN
ref_market_index mi
ON miv.market_index_id = mi.id
WHERE mi.short_name='dow30' AND
miv.version_num = (SELECT MAX(m1.version_num)
FROM ref_market_index_version m1 INNER JOIN
ref_market_index m2
ON m1.market_index_id = m2.id
where m1.short_name = 'dow30'
)
I added the where clause in the subquery.