Agregating a subquery - sql

I try to find what I missed in the code to retrieve the value of "Last_Maintenance" in a table called "Interventions".
I try to understand the order rules of SQL and the particularities of subqueries without success.
Did I missed something, something basic or an important step?
---Interventions with PkState "Schedule_Visit" with the Last_Maintenance aggregation
SELECT Interventions.ID AS Nro_Inter,
--Interventions.PlacesList AS Nro_Place,
MaintenanceContracts.Num AS Nro_Contract,
Interventions.TentativeDate AS Schedule_Visit,
--MaintenanceContracts.NumberOfVisits AS Number_Visits_Contracts,
--Interventions.VisitNumber AS Visit_Number,
(SELECT MAX(Interventions.AssignmentDate)
FROM Interventions
WHERE PkState = 'AE4B42CF-0003-4796-89F2-2881527DFB26' AND PkMaintenanceContract IS NOT NULL) AS Last_Maintenance --PkState "Maintenance Executed"
FROM Interventions
INNER JOIN MaintenanceContracts ON MaintenanceContracts.Pk = Interventions.PkMaintenanceContract
WHERE PkState = 'AE4B42CF-0000-4796-89F2-2881527ABC26' AND PkMaintenanceContract IS NOT NULL --PkState "Schedule_Visit"
GROUP BY Interventions.AssignmentDate,
Interventions.ID,
Interventions.PlacesList,
MaintenanceContracts.Num,
Interventions.TentativeDate,
MaintenanceContracts.NumberOfVisits,
Interventions.VisitNumber
ORDER BY Nro_Contract
I try to use GROUP BY and HAVING clause in a sub query, I did not succeed. Clearly I am lacking some understanding.
Output
The output of "Last_Maintenance" is the last date of entire contracts in the DB, which is not the desirable output. The desirable output is to know the last date the maintenance was executed for each row, meaning, for each "Nro-Contract". Somehow I need to aggregate like I did below.
In opposition of what mention I did succeed in another table.
In the table Contracts I did had success as you can see.
SELECT
MaintenanceContracts.Num AS Nro_Contract,
MAX(Interventions.AssignmentDate) AS Last_Maintenance
--MaintenanceContracts.Name AS Place
--MaintenanceContracts.StartDate,
--MaintenanceContracts.EndDate
FROM MaintenanceContracts
INNER JOIN Interventions ON Interventions.PkMaintenanceContract = MaintenanceContracts.Pk
WHERE MaintenanceContracts.ActiveContract = 2 OR MaintenanceContracts.ActiveContract = 1 --// 2 = Inactive; 1 = Active
GROUP BY MaintenanceContracts.Num, MaintenanceContracts.Name,
MaintenanceContracts.StartDate,
MaintenanceContracts.EndDate
ORDER BY Nro_Contract
I am struggling to understanding how nested queries works and how I can leverage in a simple manner the queries.

I think you're mixed up in how aggregation works. The MAX function will get a single MAX value over the entire dataset. What you're trying to do is get a MAX for each unique ID. For that, you either use derived tables, subqueries or windowed functions. I'm a fan of using the ROW_NUMBER() function to assign a sequence number. If you do it correctly, you can use that row number to get just the most recent record from a dataset. From your description, it sounds like you always want to have the contract and then get some max values for that contract. If that is the case, then you're second query is closer to what you need. Using windowed functions in derived queries has the added benefit of not having to worry about using the GROUP BY clause. Try this:
SELECT
MaintenanceContracts.Num AS Nro_Contract,
--MaintenanceContracts.Name AS Place
--MaintenanceContracts.StartDate,
--MaintenanceContracts.EndDate
i.AssignmentDate as Last_Maintenance
FROM MaintenanceContracts
INNER JOIN (
SELECT *
--This fuction will order the records for each maintenance contract.
--The most recent intervention will have a row_num = 1
, ROW_NUMBER() OVER(PARTITION BY PkMaintenanceContract ORDER BY AssignmentDate DESC) as row_num
FROM Interventions
) as i
ON i.PkMaintenanceContract = MaintenanceContracts.Pk
AND i.row_num = 1 --Used to get the most recent intervention.
WHERE MaintenanceContracts.ActiveContract = 2
OR MaintenanceContracts.ActiveContract = 1 --// 2 = Inactive; 1 = Active
ORDER BY Nro_Contract
;

Related

How to use SUM in this situation?

I have the following tables below and their schema:
INV
id, product code, name, ucost, tcost, desc, type, qoh
1,123,CPASS 700,1.00,5.00,CPASS 700 Lorem, COM,5
2,456,Shelf 5,2.00,6.00,Shelf 5 KJ, BR,3
GRP
id,type,desc
1,COM,COMPASS
2,BR,SHELF
Currently I have a query like this:
SELECT INV.*,GRP.DESCR AS CATEGORY
FROM INV LEFT JOIN GRP ON INV.TYPE = GRP.TYPE
WHERE INV.QOH = 0
There is no problems with that query.
Right now,I want to know the SUM of the TCOST of every INV record where their QOH is 0.
In this situation, does that I mean all I have to do is to write a separate query like the one below:
SELECT SUM(TCOST)
FROM INV
WHERE QOH = 0
Does it make any sense for me to try and combine those two queries as one ?
First understand that SUM is the aggregate function hence either you can run the Query like
(SELECT SUM(TCOST) FROM INV WHERE QOH=0) as total
This will return Sum of TCOST in INV Table for mentioned condition.
Another approach is finding the Sum based on the some column (e.g. Type)
you could write query like
SELECT Type , SUM(TCOST) FROM INV WHERE QOH=0 GROUP BY type ;
Its not clear on what criteria you want to sum . But I think above two approaches would provide you fare idea .
Mmm, you could maybe use a correlated query, though i'm not sure it's the best approach since I'm not sure I understand what your attempting to do:
SELECT INV.*,
GRP.DESCR AS CATEGORY ,
(SELECT SUM(TCOST) FROM INV WHERE QOH=0) as your_sum
FROM INV LEFT JOIN GRP ON INV.TYPE = GRP.TYPE
WHERE INV.QOH = 0
If you want only one value for the sum(), then your query is fine. If you want a new column with the sum, then use window functions:
SELECT INV.*, GRP.DESCR AS CATEGORY,
SUM(INV.TCOST) OVER () as sum_at_zero
FROM INV LEFT JOIN
GRP
ON INV.TYPE = GRP.TYPE
WHERE INV.QOH = 0;
It does not make sense to combine the queries by adding a row to the first one, because the columns are very different. A SQL result set requires that all rows have the same columns.

SQL Filtering duplicate rows due to bad ETL

The database is Postgres but any SQL logic should help.
I am retrieving the set of sales quotations that contain a given product within the bill of materials. I'm doing that in two steps: step 1, retrieve all DISTINCT quote numbers which contain a given product (by product number).
The second step, retrieve the full quote, with all products listed for each unique quote number.
So far, so good. Now the tough bit. Some rows are duplicates, some are not. Those that are duplicates (quote number & quote version & line number) might or might not have maintenance on them. I want to pick the row that has maintenance greater than 0. The duplicate rows I want to exclude are those that have a 0 maintenance. The problem is that some rows, which have no duplicates, have 0 maintenance, so I can't just filter on maintenance.
To make this exciting, the database holds quotes over 20+ years. And the data scientists guys have just admitted that maybe the ETL process has some bugs...
--- step 0
--- cleanup the workspace
SET CLIENT_ENCODING TO 'UTF8';
DROP TABLE IF EXISTS product_quotes;
--- step 1
--- get list of Product Quotes
CREATE TEMPORARY TABLE product_quotes AS (
SELECT DISTINCT master_quote_number
FROM w_quote_line_d
WHERE item_number IN ( << model numbers >> )
);
--- step 2
--- Now join on that list
SELECT
d.quote_line_number,
d.item_number,
d.item_description,
d.item_quantity,
d.unit_of_measure,
f.ref_list_price_amount,
f.quote_amount_entered,
f.negtd_discount,
--- need to calculate discount rate based on list price and negtd discount (%)
CASE
WHEN ref_list_price_amount > 0
THEN 100 - (ref_list_price_amount + negtd_discount) / ref_list_price_amount *100
ELSE 0
END AS discount_percent,
f.warranty_months,
f.master_quote_number,
f.quote_version_number,
f.maintenance_months,
f.territory_wid,
f.district_wid,
f.sales_rep_wid,
f.sales_organization_wid,
f.install_at_customer_wid,
f.ship_to_customer_wid,
f.bill_to_customer_wid,
f.sold_to_customer_wid,
d.net_value,
d.deal_score,
f.transaction_date,
f.reporting_date
FROM w_quote_line_d d
INNER JOIN product_quotes pq ON (pq.master_quote_number = d.master_quote_number)
INNER JOIN w_quote_f f ON
(f.quote_line_number = d.quote_line_number
AND f.master_quote_number = d.master_quote_number
AND f.quote_version_number = d.quote_version_number)
WHERE d.net_value >= 0 AND item_quantity > 0
ORDER BY f.master_quote_number, f.quote_version_number, d.quote_line_number
The logic to filter the duplicate rows is like this:
For each master_quote_number / version_number pair, check to see if there are duplicate line numbers. If so, pick the one with maintenance > 0.
Even in a CASE statement, I'm not sure how to write that.
Thoughts? The database is Postgres but any SQL logic should help.
I think you will want to use Window Functions. They are, in a word, awesome.
Here is a query that would "dedupe" based on your criteria:
select *
from (
select
* -- simplifying here to show the important parts
,row_number() over (
partition by master_quote_number, version_number
order by maintenance desc) as seqnum
from w_quote_line_d d
inner join product_quotes pq
on (pq.master_quote_number = d.master_quote_number)
inner join w_quote_f f
on (f.quote_line_number = d.quote_line_number
and f.master_quote_number = d.master_quote_number
and f.quote_version_number = d.quote_version_number)
) x
where seqnum = 1
The use of row_number() and the chosen partition by and order by criteria guarantee that only ONE row for each combination of quote_number/version_number will get the value of 1, and it will be the one with the highest value in maintenance (if your colleagues are right, there would only be one with a value > 0 anyway).
Can you do something like...
select
*
from
w_quote_line_d d
inner join
(
select
...
,max(maintenance)
from
w_quote_line_d
group by
...
) d1
on
d1.id = d.id
and d1.maintenance = d.maintenance;
Am I understanding your problem correctly?
Edit: Forgot the group by!
I'm not sure, but maybe you could Group By all other columns and use MAX(Maintenance) to get only the greatest.
What do you think?

Assistance with SQL Query (aggregating)

I have a requirement to create a Sales report and I have a sql query:
SELECT --top 1
t.branch_no as TBranchNo,
t.workstation_no as TWorkstation,
t.tender_ref_no as TSaleRefNo,
t.tender_line_no as TLineNo,
t.tender_code as TCode,
T.contribution as TContribution,
l.sale_line_no as SaleLineNo
FROM TENDER_LINES t
LEFT JOIN SALES_TX_LINES l
on t.branch_no = l.branch_no and t.workstation_no = l.workstation_no and t.tender_ref_no = l.sale_tx_no
where l.sale_tx_no = 2000293 OR l.sale_tx_no = 1005246 --OR sale_tx_no = 1005261
order by t.tender_ref_no asc,
l.sale_line_no desc
The results of the query look like the following:
The results I am trying to achieve is:
With only 1 line for transaction 2 either SaleLineNo 1 or 2, while still have=ing both lines for transaction 1 because the TCode is different.
Thanks
I am using SSQL2012.
Not exactly sure on what data you have, but you might want to try
GROUP BY TlineNo, TCode ...
But you have to keep a look on not to group by something that would result in duplicate contribution values.
You can use the ROW_NUMBER function that allows to partition the rows in groups, and number the lines inside each group starting by one. If you choose the right columns to define the partition, and keep only the rows with "row_number = 1`, you have solved the first part of your problem, i.e. discarding the lines that don't have to appear in the report. (See the sample sin the linked documentation, they're quite clear).
Once you have solved this problem, you simply have to repeat what you're doing, but on the result of this data, instead of the original data. You can use a view, a CTE, or a subselect to achieve your result, i.e.
With view:
CREATE VIEW FilteredData AS -- here the rank function query, then selct from the view
SELECT --here your current query --
FROM FilteredData
With CTE
WITH -- here the rank function query
SELECT -- your current querym, from the CTE
With subselect
SELECT -- your current query
FROM (SELECT FROM -- here the rank function query -- )
Appreciate your assistance with my query. After playing around, I have found a solution that works just as I want. It is as below: I did a Group by as hinted by #Yogesh86 on a few fields.
SELECT
MAX(t.branch_no) as TBranchNo,
Max(t.workstation_no) as TWorkstation,
t.tender_ref_no as TSaleRefNo,
Max(t.tender_line_no) as TLineNo,
t.tender_code as TCode,
MAx(T.contribution) as TContribution,
MAX(l.sale_line_no) as SaleLineNo
FROM TENDER_LINES t
LEFT JOIN SALES_TX_LINES l
on t.branch_no = l.branch_no and t.workstation_no = l.workstation_no and t.tender_ref_no = l.sale_tx_no
where l.sale_tx_no = 2000293 OR l.sale_tx_no = 1005246 --OR sale_tx_no = 1005261
GROUP BY
t.tender_ref_no,
t.tender_line_no,
t.tender_code

single-row subquery returns more than one row. Query not working with main query

I hve to display several cell values into one cell. So I am using this query:
select LISTAGG(fc.DESCRIPTION, ';'||chr(10))WITHIN GROUP (ORDER BY fc.SWITCH_NAME) AS DESCRIP from "ORS".SWITCH_OPERATIONS fc
group by fc.SWITCH_NAME
It is working fine. But when I am merging this with my main(complete) query then I am getting the error as: Error code 1427, SQL state 21000: ORA-01427: single-row subquery returns more than one row
Here is my complete query:
SELECT
TRACK_EVENT.LOCATION,
TRACK_EVENT.ELEMENT_NAME,
(select COUNT(*) from ORS.TRACK_EVENT b where (b.ELEMENT_NAME = sw.SWITCH_NAME)AND (b.ELEMENT_TYPE = 'SWITCH')AND (b.EVENT_TYPE = 'I')AND (b.ELEMENT_STATE = 'NORMAL' OR b.ELEMENT_STATE = 'REVERSE'))as COUNTER,
(select COUNT(*) from ORS.SWITCH_OPERATIONS fc where TRACK_EVENT.ELEMENT_NAME = fc.SWITCH_NAME and fc.NO_CORRESPONDENCE = 1 )as FAIL_COUNT,
(select MAX(cw.COMMAND_TIME) from ORS.SWITCH_OPERATIONS cw where ((TRACK_EVENT.ELEMENT_NAME = cw.SWITCH_NAME) and (cw.NO_CORRESPONDENCE = 1)) group by cw.SWITCH_NAME ) as FAILURE_DATE,
(select LISTAGG(fc.DESCRIPTION, ';'||chr(10))WITHIN GROUP (ORDER BY fc.SWITCH_NAME) AS DESCRIP from "ORS".SWITCH_OPERATIONS fc
group by fc.SWITCH_NAME)
FROM
ORS.SWITCH_OPERATIONS sw,
ORS.TRACK_EVENT TRACK_EVENT
WHERE
sw.SEQUENCE_ID = TRACK_EVENT.SEQUENCE_ID
Not only are subqueries in the SELECT list required to return exactly one row (or any time they're used for a singular comparison, like <, =, etc), but their use in that context tends to make the database execute them RBAR - Row-by-agonizing-row. That is, they're slower and consume more resources than they should.
Generally, unless the result set outside the subquery contains only a few rows, you want to construct subqueries as part of a table-reference. Ie, something like:
SELECT m.n, m.z, aliasForSomeTable.a, aliasForSomeTabe.bSum
FROM mainTable m
JOIN (SELECT a, SUM(b) AS bSum
FROM someTable
GROUP BY a) aliasForSomeTable
ON aliasForSomeTable.a = m.a
This benefits you in other ways to - it's easier to get multiple columns out of the same table-reference, for example.
Assuming that LISTAGG(...) can be included with other aggregate functions, you can change your query to look like this:
SELECT Track_Event.location, Track_Event.element_name,
Counted_Events.counter,
Failure.fail_count, Failure.failure_date, Failure.descrip
FROM ORS.Track_Event
JOIN ORS.Switch_Operations
ON Switch_Operations.sequence_id = Track_Event.sequence_id
LEFT JOIN (SELECT element_name, COUNT(*) AS counter
FROM ORS.Track_Event
WHERE element_type = 'SWITCH'
AND event_type = 'I'
AND element_state IN ('NORMAL', 'REVERSE')
GROUP BY element_name) Counted_Events
ON Counted_Events.element_name = Switch_Operations.swicth_name
LEFT JOIN (SELECT switch_name,
COUNT(CASE WHEN no_correspondence = 1 THEN '1' END) AS fail_count,
MAX(CASE WHEN no_correspondence = 1 THEN command_time END) AS failure_date,
LISTAGG(description, ';' || CHAR(10)) WITHIN GROUP (ORDER BY command_time) AS descrip
FROM ORS.Switch_Operations
GROUP BY switch_name) Failure
ON Failure.switch_name = Track_Event.element_name
This query was written to (attempt to) preserve the semantics of your original query. I'm not completely sure that's what you actually need but without sample starting data and desired results, I have no way to tell how else to improve this. For instance, I'm a little suspicious of the need of Switch_Operations in the outer query, and the fact that LISTAGG(...) is run over row where no_correspondence <> 1. I did change the ordering of LISTAGG(...), because the original column would not have done anything (because the order way the same as the grouping), so would not have been a stable sort.
Single-row subquery returns more than one row.
This error message is self descriptive.
Returned field can't have multiple values and your subquery returns more than one row.
In your complete query you specify fields to be returned. The last field expects single value from the subquery but gets multiple rows instead.
I have no clue about the data you're working with but either you have to ensure that subquery returns only one row or you have to redesign the wrapping query (possibly using joins when appropriate).

How to combine this query

In the query
cr is customers,
chh? ise customer_pays,
cari_kod is customer code,
cari_unvan1 is customer name
cha_tarihi is date of pay,
cha_meblag is pay amount
The purpose of query, the get the specisified list of customers and their last date for pay and amount of money...
Actually my manager needs more details but the query is very slow and that is why im using only 3 subquery.
The question is how to combine them ?
I have researched about Cte and "with clause" and "subquery in "where " but without luck.
Can anybody have a proposal.
Operating system is win2003 and sql server version is mssql 2005.
Regards
select cr.cari_kod,cr.cari_unvan1, cr.cari_temsilci_kodu,
(select top 1
chh1.cha_tarihi
from dbo.CARI_HESAP_HAREKETLERI chh1 where chh1.cha_kod=cr.cari_kod order by chh1.cha_RECno) as sontar,
(select top 1
chh2.cha_meblag
from dbo.CARI_HESAP_HAREKETLERI chh2 where chh2.cha_kod=cr.cari_kod order by chh2.cha_RECno) as sontutar
from dbo.CARI_HESAPLAR cr
where (select top 1
chh3.cha_tarihi
from dbo.CARI_HESAP_HAREKETLERI chh3 where chh3.cha_kod=cr.cari_kod order by chh3.cha_RECno) >'20130314'
and
cr.cari_bolge_kodu='322'
or
cr.cari_bolge_kodu='324'
order by cr.cari_kod
You will probably speed up the query by changing your last where clause to:
where (select top 1 chh3.cha_tarihi
from dbo.CARI_HESAP_HAREKETLERI chh3 where chh3.cha_kod=cr.cari_kod
order by chh3.cha_RECno
) >'20130314' and
cr.cari_bolge_kodu in ('322', '324')
order by cr.cari_kod
Assuming that you want both the date condition met and one of the two codes. Your original logic is the (date and code = 322) OR (code = 324).
The overall query can be improved by finding the record in the chh table and then just using that. For this, you want to use the window function row_number(). I think this is the query that you want:
select cari_kod, cari_unvan1, cari_temsilci_kodu,
cha_tarihi, cha_meblag
from (select cr.*, chh.*,
ROW_NUMBER() over (partition by chh.cha_kod order by chh.cha_recno) as seqnum
from dbo.CARI_HESAPLAR cr join
dbo.CARI_HESAP_HAREKETLERI chh
on chh.cha_kod=cr.cari_kod
where cr.cari_bolge_kodu in ('322', '324')
) t
where chh3.cha_tarihi > '20130314' and seqnum = 1
order by cr.cari_kod;
This version assumes the revised logic date/code logic.
The inner subquery select might generate an error if there are two columns with the same name in both tables. If so, then just list the columns instead of using *.