I'm very much new to SQL and I'm trying to use CROSS APPLY, something I know very little about.
I'm trying to pull two SUMs of items sorted by an ID from two different tables. One SUM of all items dispensed by a cartridge, one SUM of all items refilled into a cartridge. The dispenses and refills are in separate tables. In Sample 1 you can see a piece of code that works for one of these two SUMs, currently its for the Dispensed SUM, but it also works if I change everything for the refilled SUM. Point being I can only do one SUM in this CROSS APPLY, regardless which one of the two.
So it goes wrong when I try to pull both SUMs in this one CROSS APPLY, probably cause I don't really know what I'm doing. I try to do this with the code seen in Sample 2 (which is pretty much the same code).
Some extra context:
There are two ID's here that are important:
The CartridgeRefill.FK_CartridgeRegistration_Id (or ID) is the ID for a cartridge itself. The FK_CartridgeRefill_Id is the ID for a refill, a cartridge can go through multiple refills and dispenses are registered by what refill they were dispensed from. That's why you can see the same ID multiple times in the output.
Sample 1:
SELECT CartridgeRefill.FK_CartridgeRegistration_Id AS ID, Sums.Dispensed
FROM CartridgeRefillItem
CROSS APPLY (
SELECT SUM(CartridgeDispenseAttempt.Amount) AS Dispensed
FROM CartridgeDispenseAttempt
WHERE CartridgeRefillItem.FK_CartridgeRefill_Id = CartridgeDispenseAttempt.FK_CartridgeRefill_Id
) AS Sums
JOIN CartridgeRefill ON CartridgeRefillItem.FK_CartridgeRefill_Id = CartridgeRefill.FK_CartridgeRefill_Id
Sample 2:
SELECT CartridgeRefill.FK_CartridgeRegistration_Id AS ID, Sums.Dispensed, Sums.Refilled
FROM CartridgeRefillItem
CROSS APPLY (
SELECT SUM(CartridgeDispenseAttempt.Amount) AS Dispensed
,SUM(CartridgeRefillItem.Amount) AS Refilled
FROM CartridgeDispenseAttempt
WHERE CartridgeRefillItem.FK_CartridgeRefill_Id = CartridgeDispenseAttempt.FK_CartridgeRefill_Id
) AS Sums
JOIN CartridgeRefill ON CartridgeRefillItem.FK_CartridgeRefill_Id = CartridgeRefill.FK_CartridgeRefill_Id
When I run sample 1 I get this output:
ID Dispensed
10 95
8 143
6 143
11 70
11 312
11 354
8 19
8 24
8 3
8 33
This output is correct, it displays the number of dispensed items next to the ID it belongs to.
This is the error I get when I run sample 2:
Msg 4101, Level 15, State 1, Line 15
Aggregates on the right side of an APPLY cannot reference columns from the left side.
But what I want to see is:
ID Dispensed Refilled (example)
10 95 143
8 143 12
6 143 etc...
11 70
11 312
11 354
8 19
8 24
8 3
8 33
I think it has something to do with CROSS APPLY running line by line? But again, I still don't exactly know what I'm doing yet. Any help would be really appreciated and please ask whatever you need to know :)
Error is quite self explanatory, you cannot run an aggregate using a reference that's outside of CROSS APPLY. You'll need to rewrite your query by adding a additional subquery to calculate SUM or use a GROUP BY clause. I've quickly scraped this:
SELECT CartridgeRefill.FK_CartridgeRegistration_Id AS ID, Sums.Dispensed, SUM(CartridgeRefillMedication.Amount) AS Refilled
FROM CartridgeRefillItem
CROSS APPLY (
SELECT SUM(CartridgeDispenseAttempt.Amount) AS Dispensed
FROM CartridgeDispenseAttempt
WHERE CartridgeRefillItem.FK_CartridgeRefill_Id = CartridgeDispenseAttempt.FK_CartridgeRefill_Id
) AS Sums
JOIN CartridgeRefill ON CartridgeRefillMedication.FK_CartridgeRefill_Id = CartridgeRefill.FK_CartridgeRefill_Id
GROUP BY CartridgeRefill.FK_CartridgeRegistration_Id;
Hopefully this works.
You may not want aggregation at all. The number of rows is not being reduced, so this may be what you want:
SELECT cr.FK_CartridgeRegistration_Id AS ID,
d.Dispensed, cr.Amount AS Refilled
FROM CartridgeRefillItem cr CROSS APPLY
(SELECT SUM(cd.Amount) AS Dispensed
FROM CartridgeDispenseAttempt c
WHERE cr.FK_CartridgeRefill_Id = cd.FK_CartridgeRefill_Id
) d;
I would expect that you want separate totals for each id. If so, then your sample results are not sensible because ids are repeated. But this would seem to do something useful:
select id, sum(refill_amount) as refill_amount,
sum(dispensed_amount) as dispensed_amount
from ((select cr.FK_CartridgeRegistration_Id as id,
cr.Amount as refill_amount,
0 as dispensed_amount
from CartridgeRefillItem cr
) union all
(select cd.FK_CartridgeRegistration_Id as id,
0, cd.Amount
from CartridgeDispenseAttempt cd
)
) c
group by id
I have a query is used to display information in a queue and part of that information is showing the amount of child entities (packages and labs) that belong to the parent entity (change). However instead of showing the individual counts of each type of child, they multiply with one another.
In the below case, there are supposed to be 3 labs and 18 packages, however the the multiply with one another and the output is 54 of each.
Below is the offending portion of the query.
SELECT cef.ChangeId, COUNT(pac.PackageId) AS 'Packages', COUNT(lab.LabRequestId) AS 'Labs'
FROM dbo.ChangeEvaluationForm cef
LEFT JOIN dbo.Lab
ON cef.ChangeId = Lab.ChangeId
LEFT JOIN dbo.Package pac
ON (cef.ChangeId = pac.ChangeId AND pac.PackageStatus != 6 AND pac.PackageStatus !=7)
WHERE cef.ChangeId = 255
GROUP BY cef.ChangeId
I feel like this is obvious but it's not occurring to me how to fix it so the two counts are independent of one another like to me they should be. There doesn't seem to be a scenario like this in any of my research either. Can anyone guide me in the right direction?
Because you do multiply source rows by each left join. So sometimes you have more likely cross join here.
SELECT cef.ChangeId, p.Packages, l.Labs
FROM dbo.ChangeEvaluationForm cef
OUTER APPLY(
SELECT COUNT(*) as Labs
FROM dbo.Lab
WHERE cef.ChangeId = Lab.ChangeId
) l
OUTER APPLY(
SELECT COUNT(*) AS Packages
FROM dbo.Package pac
WHERE (cef.ChangeId = pac.ChangeId AND pac.PackageStatus != 6 AND pac.PackageStatus !=7)
) p
WHERE cef.ChangeId = 255
GROUP BY cef.ChangeId
perhaps GROUP BY is not needed now.
From you question its difficult to derive what result do you expect from your query. So I presume you want following result:
+----------+----------+------+
| ChangeId | Packages | Labs |
+----------+----------+------+
| 255 | 18 | 3 |
+----------+----------+------+
Try below query if you are looking for above mentioned result.
SELECT cef.ChangeId, ISNULL(pac.PacCount, 0) AS 'Packages', ISNULL(Lab.LabCount, 0) AS 'Labs'
FROM dbo.ChangeEvaluationForm cef
LEFT JOIN (SELECT Lab.ChangeId, COUNT(*) LabCount FROM dbo.Lab GROUP BY) Lab
ON cef.ChangeId = Lab.ChangeId
LEFT JOIN (SELECT pac.ChangeId, COUNT(*) PacCount FROM dbo.Package pac WHERE pac.PackageStatus != 6 AND pac.PackageStatus !=7 GROUP BY pac.ChangeId) pac
ON cef.ChangeId = pac.ChangeId
WHERE cef.ChangeId = 255
Query Explanation:
In your query you didn't use group by, so it ended up giving you 54 as count which is Cartesian product.
In this query I tried to group by 'ChangeId' and find aggregate before joining tables. So 3 labs and 18 packages will be counted before join.
Your will also notice that I have moved PackageStatus filter before group by in pac table. So unwanted record won't mess with our count.
You start with a particular ChangeId from the dbo.ChangeEvaluationForm table (ChangeId = 255 from your example), then join to the dbo.Lab table. This join makes your result go from 1 row to 3, considering there are 3 Labs with ChangeId = 255. Your problem is on the next join, you are joining all 3 resulting rows from the previous join with the dbo.Package table, which has 18 rows for ChangeId = 255. The resulting count for columns pac.PackageId and lab.LabRequestId will then be 3 x 18 = 54.
To get what you want, there are 2 easy solutions:
Use COUNT DISTINCT instead of COUNT. This will just count the different values of pac.PackageId and lab.LabRequestId and not the repeated ones.
Split the joins into 2 subqueries and join their result (by ChangeId)
Two tables:
Parts Table:
Part_Number Load_Date TQTY
m-123 19940102 32
1234Cf 20010809 3
wf9-2 20160421 14
Locations Table:
PartNo Condition Location QTY
m-123 U A02 2
1234Cf S A02 3
m-123 U B01 1
wf9-2 S A06 7
m-123 S A18 29
wf9-2 U F16 7
Result:
Part_Number Load_Date TQTY U_LOC UQTY S_LOC SQTY
m-123 19940102 32 A02,B01 3 A18 29
1234Cf 20010809 3 A02 3
wf9-2 20160421 14 F16 7 A06 7
I am having trouble finding a solution to this with my current DB2 version. I am not completely sure how to find the version, but it is running on an AS400 system, and it seems the version of DB2, is tied to the OS version. Which the box is using: Operating system: i5/OS Version: V5R4M0
(I tried some commands to get the DB2 version using these suggestions Here but none of them worked, like most stated).
In regards to concatenating multiple rows of column data into one row I have come across many articles stating to use XMLAGG or xmlserialize, Here and, Here but I get an error stating the command is not recognized.
Not sure where to go from here, as there seem to be solutions, but I can't get those already suggested functions to work.
EDIT:
Using the accepted answer and explanation, as well as the example
HERE to get a basic idea of recursion with a simple example, and it was
HERE using the "SELECT rownumber() over(partition by category)" statements that really helped pull it all together. Once I understood that statement of course.
I also learned to make sure the data used in the recursion is as narrowed down as possible and then joined up with extra data later. This makes for exponentially faster results. <-- This seems pretty obvious, but when trying to figure all of this out, it wasn't obvious and my query was pretty slow. Once I understood what was actually happening better it was easier to make adjustments for really fast results.
This is rather complicated, so I will show all my work:
Table definitions
create table parts
(part_number Varchar(64),
load_date Date,
total_qty Dec(5,0));
create table locations
(part_number Varchar(64),
condition Char(1),
location Char(3),
qty Dec(5,0));
insert into parts
values ('m-123', '1994-01-02', 32),
('1234Cf', '2001-08-09', 3),
('wf9-2', '2016-04-21', 14);
insert into locations
values ('m-123', 'U', 'A02', 2),
('1234Cf', 'S', 'A02', 3),
('m-123', 'U', 'B01', 1),
('wf9-2', 'S', 'A06', 7),
('m-123', 'S', 'A18', 29),
('wf9-2', 'U', 'F16', 7);
The query:
with -- CTE's
-- This collects locations into a comma seperated list
tmp (part_number, condition, location, csv, level) as (
select part_number, condition, min(location),
cast(min(location) as varchar(128)), 1
from locations
group by part_number, condition
union all
select a.part_number, a.condition, b.location,
a.csv || ',' || b.location, a.level + 1
from tmp a
join locations b using (part_number, condition)
where a.csv not like '%' || b.location || '%'
and b.location > a.location),
-- This chooses the correct csv list, and adds quantity for the condition
tmp2 (part_number, condition, csv, qty) as (
select t.part_number, t.condition, t.csv,
(select sum(qty) qty
from locations
where part_number = t.part_number
and condition = t.condition)
from tmp t
where level = (select max(level)
from tmp
where part_number = t.part_number
and condition = t.condition))
-- This is the final select that combines the parts file with
-- the second stage CTE and arranges things horizontally by condition
select p.part_number, p.load_date,
(select sum(qty)
from locations
where part_number = p.part_number) as total_qty,
coalesce(u.csv, '') as u_loc,
coalesce(u.qty, 0) as uqty,
coalesce(s.csv, '') as s_loc,
coalesce(s.qty, 0) as sqty
from parts p
left outer join tmp2 u
on u.part_number = p.part_number and u.condition = 'U'
left outer join tmp2 s
on s.part_number = p.part_number and s.condition = 'S'
order by p.load_date;
EDIT I have had to add some extra bits in here to support more than two locations for a part/condition, and I have made the column naming in the CTEs more consistent. Ok, so let me explain this a bit, there are 3 parts to this quety, 2 CTEs and the query, you can see the three parts are separated by comments. The first CTE is a recursive CTE. It's purpose is to produce the comma separated location list. You should be able to run the select by itself to see just what it does. tmp is the table name, part_number, condition, csv, and level are the column names. A recursive CTE needs a SELECT to prime the CTE and a UNION ALL with a SELECT that fills in the next details. In this case the priming SELECT retrieves a part number, a condition, and the first location (alphabetically) for that combination. level is set to 1. If you run just the priming select, you will get:
part_number condition location csv level
----------- --------- -------- --- -----
1234Cf S A01 A02 1
m-123 S A18 A18 1
m-123 U A02 A02 1
wf9-2 U F16 F16 1
wf9-2 S A06 A06 1
Note one line per part/condition. The remainder of the recursive CTE will fill in the remaining locations in csv, but it will actually add additional records so we need to filter the results here and later. So records are processed as they are added. The first rows listed above are joined with the location file
on part_number and condition. Note in the priming select I have a cast of the second min(location) to a varchar(128). This leaves room for the CSV column to expand. Without this, it will still expand, but not enough to hold more than 2 locations.
The second select in the recursive CTE concatenates a comma and the next location to the end of CSV. The specific bit that does this is a.csv || ',' || b.location. It also increments the level column. This helps us keep track of where we are in the query. Eventually, the row with the highest level is the one we want to use. We also have a way to end the recursion, and some filters to reduce the number of rows added to the temporary result set. If we have 2 locations, A02 and B02, left unchecked, we will get the following rows: A02, A02,A02, A02,B02, A02,A02,A02, A02,B02,A02, A02,A02,B02, A02,B02,B02, ... ad infinitum. The anti-duplication filter where a.csv not like '%' || b.location || '%' is sufficient for two locations to end the recursion, and minimize rows, like above, for locations A02 and B02, with the anti-duplication filter, we will get rows A02, and A02,B02. Note that none of the other results from the first example with duplicate locations are returned. Adding a third location C02 will yield, with anti-duplication filter, the following rows: A02, A02,B02, A02,C02, A02,B02,C02, A02,C02,B02. No duplicates here, but we do have redundant rows, and as you add locations, it gets worse. This is where we need a way to detect these redundant rows. Since we are starting with the lowest location number, we can always make sure that locations added to CSV are greater than the previously added location. To do that all we need to do is include a column in the result set that indicates which column was added (we could interrogate CSV, but that is harder). This is why we need the location column in tmp. Then we can write filter b.location > a.location. In the above 3 location example, this filter prevents row A02,C02,B02 leaving just a single row with all three locations. Adding more than three locations to the locations file will cause the number of rows to expand even more in TMP, but for each part and condition, there will only be one row with all locations, and it will contain all locations in ascending order.
The second CTE does two things. First, it filters TMP to drop all but the rows containing all locations for a given part/condition. Second, it accumulates the total quantity for each part/condition.
The bit that performs the filtering is in the where clause:
where level = (select max(level)
from tmp
where part_number = t.part_number
and condition = t.condition))
Pretty straight forward. The bit that accumulates the total quantity for a part/condition is also an easy to understand sub-query:
(select sum(qty) qty
from locations
where part_number = t.part_number
and condition = t.condition)
The final piece of this monster query is the main select. It joins the parts file with the results of the second CTE to form the ultimate result set:
select p.part_number, p.load_date,
(select sum(qty) from locations where part_number = p.part_number) as total_qty,
coalesce(u.csv, '') as u_loc, coalesce(u.qty, 0) as uqty,
coalesce(s.csv, '') as s_loc, coalesce(s.qty, 0) as sqty
from parts p
left outer join tmp2 u
on u.part_number = p.part_number and u.condition = 'U'
left outer join tmp2 s
on s.part_number = p.part_number and s.condition = 'S'
order by p.load_date
Bits of note are the subquery to retrieve the total quantity from the locations table. You could use the tqty field in parts, but that can get out of sync with the actual quantities in the locations table. In addition there are two left outer joins with tmp2, one for condition U, and another for condition S. These construct the horizontal array of Location/Quantity in the result row. The last thing is the coalesce functions. These give null values (when a result from an outer join is missing) a default value.
End of EDIT
The final result is:
part_number load_date tqty u_loc uqty s_loc sqty
----------- ---------- ---- ------- ---- ----- ----
m-123 1994-01-02 32 A02,B01 3 A18 29
1234Cf 2001-08-09 3 0 A02 3
wf9-2 2016-04-21 14 F16 7 A06 7
Note XMLAGG and XMLSERIALIZE became available at DB2 for i v7.1 and LISTAGG became available at DB2 for i v7.2. Most recent version as of 8/9/2017 is v7.3. As you are on v5r4, it is likely you will need not only a software, but also a hardware upgrade to get current.
No idea what the rules are for UQTY, S_LOC, SQTY but here is the column you asked about ---
SELECT
P.Part_Number,
P.Load_Date,
P.TQTY,
LISTAGG(L.Location, ', ') WITHIN GROUP (ORDER BY L.Location) AS U_LOC
FROM "Parts Table" AS P
LEFT JOIN "Locations Table" AS L ON P.Part_Number = L.Part_Number
GROUP BY P.Part_Number, P.Load_Date, P.TQTY
I'm after a bit of guidance here. I have a table that has all the data I need all mixed together in a single column (attributes) like so:
Device Index Attribute index Attributes
5 59 WS020121
5 83 9C-B6-54-A0-41-40
5 90 GROUP\darkwahm
6 59 WS020122
6 83 9D-B8-54-A0-50-40
6 90 GROUP\darkperm
What I am trying to do is split out the data into multiple columns so it will display as:
Device Mac Address User
WS020121 9C-B6-54-A0-41-40 GROUP\darkwahm
WS020122 9D-B8-54-A0-50-40 GROUP\darkperm
After some research I was advised to use the XML path to get these results so I created a query which is:
SELECT cast (av1.attributeValue as varchar(50)) + ','
--cast( av1.attributeValue as varchar(50)) + ','
FROM [dbo].[DeviceAttributes] Av1
WHERE av1.AttributeIndex=59 or av1.AttributeIndex=83 or av1.AttributeIndex=90
FOR XML PATH ('')
This works in the sense of all my data is together but its in a single row all together like this:
WS022743,B0-5A-DA-B4-51-01;60-6D-C7-34-86-05,GROUP\darkwahm,WS022871,D0-27- 88-92-9B-8A,GROUP\securitypc,ABSE-PEARSOND,00-05-9A-3C-78-00;68-F7-28-92-0E-7A,GROUP\SlavinM
Just wondering how I need to tweak the query to get it split across multiple columns and rows like how I mentioned above.
So, according to your query, numbers: 59, 83 and 90 are constants for these rows, if yes, you can try this:
SELECT t1.Attributes as device, t2.Attributes as Mac_Address , t3.Attributes as "user" from
(SELECT Device_Index, Attributes FROM [dbo].[DeviceAttributes] WHERE Attribute_index= 59) t1
INNER JOIN
(SELECT Device_Index, Attributes FROM [dbo].[DeviceAttributes] WHERE Attribute_index = 83) t2
ON t1.Device_Index = t2.Device_Index
INNER JOIN
(SELECT Device_Index, Attributes FROM [dbo].[DeviceAttributes] WHERE Attribute_index = 90) t3
ON t1.Device_Index = t3.Device_Index
I have two tables, and I join them together on the date column. This works great besides when one table are missing the date.
I.e, in table two, I don't have 10.10.2016. I would still love that line to appear in the result, since this is a day I want to show that there has been no activity.
This is for a bar: I have one table where they register the count on the beer tap, and one who keeps track of sold ones.
If they are closed one day, they don't actually sell anything, but they still want the staff to register the number of tapped beers, just in case.
The data from 10.10.2016 would be something like this then:
Table 1 (sales, not open 10.10 = no data stored at all)
Date Sold
10.08 22
10.09 31
10.11 54
Table 2 (Tapped, they count every day = have data 10.10)
Date Tapped
10.08 23
10.09 31
10.10 0
10.11 54
I want the result to show it like this:
Date Tapped Sold Diff
10.08 23 22 1
10.09 31 31 0
10.10 0 0 0
10.11 54 54 0
But I cannot get this to work, because when I join in table two, it can't connect the "sold" and "tapped" ones from 10.10 since I don't have a way to mach them.
Is there any way of doing this?
CREATE TABLE #A
(
DATE NUMERIC
(22,6),SOLD INT
)
INSERT INTO #A VALUES
(10.08,22),
(10.09,31),
(10.11,54)
CREATE TABLE #B
(
DATE NUMERIC
(22,6),TAPPED INT
)
INSERT INTO #B VALUES
(10.08,23),
(10.09,31),
(10.10,0),
(10.11,54)
SELECT A.DATE,A.TAPPED,ISNULL(B.SOLD,0)SOLD,A.TAPPED-ISNULL(B.SOLD,0) AS DIFFRENCE
FROM #B A LEFT JOIN #A B ON A.DATE=B.DATE
Use a left or right join.
Below sample shows how to use RIGHT JOIN.
SELECT t2.Date,t2.Tapped,ISNULL(t1.sold,0) sold,t2.Tapped-ISNULL(t1.sold,0) as Diff
FROM Table1 t1
RIGHT JOIN Table2 t2
ON t1.Date=t2.Date
simple statement
SELECT tapped.date as date, IFNULL(tapped.tapped,0) as tapped, IFNULL(sales.sold,0) as sold, IFNULL(tapped.tapped - sales.sold,0) as diff
FROM
tapped
LEFT OUTER JOIN sales ON sales.date = tapped.date
ORDER BY
tapped.date ASC