How do I exclude entries from a recursive CTE? - sql

How can I exclude entries from a recursive CTW with Sqlite?
CREATE TABLE GroupMembers (
group_id VARCHAR,
member_id VARCHAR
);
INSERT INTO GroupMembers(group_id, member_id) VALUES
('1', '10'),
('1', '20'),
('1', '30'),
('1', '-50'),
('2', '30'),
('2', '40'),
('3', '1'),
('3', '50'),
('4', '-10'),
('10', '50'),
('10', '60');
I want a query that will give me the list of members (recursively) in the group. However, a member with the first character being '-' means that the id that comes after the minus is NOT in the group.
For example, the members of '1' are '10', '20', '30', and '-50'. '10', however, is a group so we need to add its children '50' and '60'. However, '-50' is already a member so we cannot include '50'. In conclusion the members of '1' are '10', '20', '30', '-50', and '60'.
It seems like this query should work:
WITH RECURSIVE members(id) AS (
VALUES('1')
UNION
SELECT gm.member_id
FROM members m
INNER JOIN GroupMembers gm ON mg.group_id=m.id
LEFT OUTER JOIN members e ON '-' || gm.member_id=e.id
WHERE e.id IS NULL
)
SELECT id FROM members;
But I get the error: multiple references to recursive table: members
How can I fix/rewrite this to do what I want?
Note: it doesnt matter whether the '-50' entry is returned in the result set.

I don't have a SQLite available for testing, but assuming the -50 also means that 50 should be excluded as well, I think you are looking for this:
WITH RECURSIVE members(id) AS (
VALUES('1')
UNION
SELECT gm.member_id
FROM GroupMembers gm
JOIN members m ON gm.group_id=m.id
WHERE member_id not like '-%'
AND not exists (select 1
from groupMembers g2
where g2.member_id = '-'||gm.member_id)
)
SELECT id
FROM members;
(The above works in Postgres)
You usually select from the base table in the recursive part and the join back to the actual CTE. The filtering of unwanted rows is then done with a regular where clause not by joining the CTE again. A recursive CTE is defined to terminate when the JOIN finds no more rows.
SQLFiddle (Postgres): http://sqlfiddle.com/#!15/04405/1
Edit after the requirements have changed (have been detailed):
As you need to exclude the rows based on their position (a detail that you didn't provide in your original question). The filter can only be done outside of the CTE. Again I can't test this with SQLite, only with Postgres:
WITH RECURSIVE members(id, level) AS (
VALUES('4', 1)
UNION
SELECT gm.member_id, m.level + 1
FROM GroupMembers gm
JOIN members m ON gm.group_id=m.id
)
SELECT m.id, m.level
FROM members m
where id not like '-%'
and not exists (select 1
from members m2
where m2.level < m.level
and m2.id = '-'||m.id);
Updated SQLFiddle: http://sqlfiddle.com/#!15/ec0f9/3

Related

Retrieve values from rows other than the previous in a recursive CTE

I am running a recursive CTE in order to calculate the average weighted cost of a product for x given warehouses. In this table, we can see a very simplified version of what the original data looks like:
Simplified original data
The first two rows are the initial values for the warehouses. That is why they have "N/A" in the Movement column.
The AVG_Weighted_Price column is 0 for the remaining rows because that is the value I wish to calculate with the recursive cte.
I have created a recursive cte which intends to calculate the AVG_Weighted_Price column and it does so with the following simplified (and frankly wrong) formula -> (b.Movement * a.AVG_Weighted_Price)/b.Total_Quantity (Having a as the previous row and b as the row being calculated).
In the table, it is clear this will not work because I have to retrieve the most recent value from the same Warehouse, which is not always the previous row. This could be solved simply by using the first two values as anchors and running the recursive cte for the A warehouse parent first, and later for the B warehouse parent.
However, because the AVG_Weighted_Price in one warehouse will affect the other, I have to run the recursion using the field "ID" as the order since it represents the order in which the movements (rows) happened. Nonetheless, the initial values (row 1 and 2) will pass with their original values and will not undergo any calculations (row 1 because it is the anchor and row 2 because it will be hardcoded to do so).
If I could run the recursion in the order of the warehouses and not necessarily in the order of the ID, the following query would be correct (#Sample_Table is the table showed in the picture above):
DROP TABLE IF EXISTS #RS
;WITH cte
AS
(
SELECT *
FROM #Sample_Table
WHERE Warehouse_Order = 1
UNION ALL
SELECT b.Warehouse
,b.Movement
,b.Total_Quantity
,CASE WHEN b.Warehouse_Order = 1 THEN b.AVG_Weighted_Price
ELSE (b.Movement * a.AVG_Weighted_Price) / b.Total_Quantity END AS AVG_Weighted_Price
,b.ID
,b.Warehouse_Order
FROM cte a
INNER JOIN #Sample_Table b
ON b.Warehouse = a.Warehouse AND b.Warehouse_Order = a.Warehouse_Order + 1
)
SELECT *
INTO #RS
FROM cte
This would be the result of this query:
Result from first query
This, however, is incorrect because, as I said before, the recursion must run in the same order as the ID.
For this reason, I tried to apply a LAG that retrieves the most recent value from the same warehouse. However, as far as I am aware, LAG doesn't work on recursive cte's and it always returns a NULL value. Here is the code I tried to use (note the changes in the Anchor WHERE clause and in the JOIN conditions, as well as the LAG present in the calculated field):
DROP TABLE IF EXISTS #RS
;WITH cte
AS
(
SELECT *
FROM #Sample_Table
WHERE ID = 1
UNION ALL
SELECT b.Warehouse
,b.Movement
,b.Total_Quantity
,CASE WHEN b.Warehouse_Order =1 THEN b.AVG_Weighted_Price
ELSE (b.Movement * LAG(b.AVG_Weighted_Price) OVER (PARTITION BY b.Warehouse ORDER BY b.ID)) / b.Total_Quantity END AS AVG_Weighted_Price
,b.ID
,b.Warehouse_Order
FROM cte a
INNER JOIN #Sample_Table b
ON b.ID = a.ID + 1
)
SELECT *
INTO #RS
FROM cte
The result of this query is as follows:
Result from second query
I understand why the LAG returns the NULL values and why we cannot use it here, but I honestly can't seem to find another solution.
The original data has tens of centers and millions of rows, so a WHILE loop to treat these cases one by one would be too consuming (already tested).
If anyone could help me solve this issue, I would forever be thankful as I have been banging my head on this problem for quite some time now. Thank you for your patience and sorry if I was, at anytime, confusing.
Edit: I created an Excel in order to better clarify the issue. I hope this helps:
This looks like it is a fairly simple problem. However, your question is difficult to understand.
Here is what I would want:
a SQL script to recreate a representative sample of your data. like this
declare #test table
(Warehouse varchar(20)
,Movement decimal (19,2)
, Total_Quantity decimal (19,2)
, Avg_Weighted_Price decimal (19,2)
, ID int
, Warehouse_Order int)
insert #test
values
( 'A', null, '100', '10', '1', '1')
,( 'B', null, '30', '5', '2', '1')
,( 'A', '10', '110', '0', '3', '2')
,( 'A', '-5', '105', '0', '4', '3')
,( 'B', '30', '60', '0', '5', '2')
,( 'B', '5', '65', '0', '6', '3')
,( 'B', '-25', '40', '0', '7', '4')
,( 'A', '10', '115', '0', '8', '4')
,( 'B', '10', '50', '0', '9', '5')
,( 'A', '10', '125', '0', '10', '5')
SELECT * FROM #test
Description of your data
so far, as I understand the starting value or opening balance of inventory in a warehouse can be seen in rows that have a null value for Movement.
Movement: +ve values are additons/recipts of item; -ves are reductions
Total_quantity shows current position (opening balance + movement)
what you are trying to do with this data.
as i understand it, update each row with compute avg_weighted_price
How is your average weighted price determined?
I think i understand what you are trying to do with lag and am fairly certain that your approach is wrong. (first clue: There is no cost associated with each receipt)
Try your formula in a simpler way - use excel or paper and pen and manually calculate the Avg_Weighted_price. that might clarify things a bit
Try explaining the purpose of this exercise. Why do you need this avg_weighted_price on every row?
when the problem is well defined, i expect the solution will be fairly simple.
Edit1; responding to excel sample:
Lag will work only once for you, as you can see here:
SELECT *
, lag(Avg_Weighted_Price, 1, 0) over (partition by Warehouse order by Warehouse_Order, id) as Lagprice
, case when Movement is null then Avg_Weighted_Price
when Total_Quantity <> 0
then Movement * lag(Avg_Weighted_Price, 1, 0) over (partition by Warehouse order by Warehouse_Order, id)/Total_Quantity
else 0 end as ComputedAvgPrice
From #test order by ID
Notice that the Aug_Weighted_price falls from 10 to 0.909
Is that the result you want?

sql query to determine the age difference between the oldest member and the youngest member

I'm trying to determine the age difference between the youngest child member and oldest child members in a Household. I'm able to pull all member data I want/need but I do not know how to find the difference between in their ages...I just don't know where to go from here:
SELECT Household.Name,Member.RecStatus, Member.FirstName, Member.LastName,
Member.SSN, Member.DOB, DATEDIFF(Year, Member.DOB, GETDATE()), RelationshipCat.RelationshipDesc, FinancialPlanner.LastName AS Expr1
FROM Member AS Member INNER JOIN
Household AS Household ON Member.HouseholdID = Household.HouseholdID INNER JOIN
RelationshipCat AS RelationshipCat ON Member.Relationship = RelationshipCat.Relationship INNER JOIN
FinancialPlanner AS FinancialPlanner ON Household.FinancialPlannerID = FinancialPlanner.FinancialPlannerID
Where member.Relationship in ('2', '14', '47', '69', '55', '12', '70')
You can use MIN and MAX to find youngest and oldest man:
CREATE TABLE TestAge ( Age INT );
INSERT INTO TestAge VALUES (12), (13), (18), (24), (42), (17);
SELECT MAX(Age) - MIN(Age) AS [Age Diff]
FROM TestAge
SQL FIDDLE

SQL Showing Less information depending on date

I have this code, what It returns is a list of some clients, but it lists too many. This is because it lists several of the same thing just with diffrent dates. I only want to show the latest date and none of the other ones. I tried to do a group by Client_Code but it didn't work, it just through up not an aggregate function or something similar (can get if needed). What I have been asked to get is all of our clients, with all the details listed. in the 'as' part and they all pull through properly. If I take out:
I.DATE_LAST_POSTED as 'Last Posted',
I.DATE_LAST_BILLED as 'Last Billed'
It shows up okay, but I need the last billed date only to appear. But putting these lines in shows the client several times listing all the diffrent bill dates. And I think that is because it is pulling across the diffrent Matters in the Matter_Master Table. Essentially, I would like to only show the Client Information on the highest Matter with there last billed date.
Please let me know if this needs clarification, im trying to explain best I can....
SELECT DISTINCT
A.DIWOR as 'ID',
B.Client_alpha_Name as 'Client Name',
A.ClientCODE as 'Client Code',
B.Client_address as 'Client Address',
D.COMM_NO AS 'Contact',
E.Contact_full_name as 'Possible Key Contact',
G.LOBSICDESC as 'LOBSIC Code',
H.EARNERNAME as 'Client Care Parnter',
A.CLIENTCODE + '/' + LTRIM(STR(A.LAST_MATTER_NUM)) as 'Last Matter Code',
I.DATE_LAST_POSTED as 'Last Posted',
I.DATE_LAST_BILLED as 'Last Billed'
FROM CLIENT_MASTER A
JOIN CLIENT_INFO B
ON A.CLIENTCODE=B.CLIENT_CODE
JOIN MATTER_MASTER C
ON A.DIWOR=C.CLIENTDIWOR
JOIN COMMINFO D
ON A.DIWOR=D.DIWOR
JOIN CONTACT E
ON A.CLIENTCODE=E.CLIENTCODE
JOIN VW_CONTACT F
ON E.NAME_DIWOR=F.NAME_DIWOR
JOIN LOBSIC_CODES G
ON A.LOBSICDIWOR=G.DIWOR
JOIN STAFF H
ON A.CLIENTCAREPARTNER=H.DIWOR
JOIN MATTER I
ON C.DIWOR=I.MATTER_DIWOR
WHERE F.COMPANY_FLAG='Y'
AND C.MATTER_MANAGER NOT IN ('78','466','2','104','408','73','51','561','504','101','13','534','16','461','531','144','57','365','83','107','502','514','451')
AND I.DATE_LAST_BILLED > 0
GROUP BY A.ClientCODE
ORDER BY A.DIWOR
Your problem is that you aren't using enough aggregate functions. Which is probably why you're using both the DISTINCT clause and the GROUP BY clause (the recommendation is to use GROUP BY, and not DISTINCT).
So... remove DISTINCT, add the necessary (unique, more or less) list of columns to the GROUP BY clause, and wrap the rest in aggregate functions, constants, or subselects. In the specific case of wanting the largest date, wrap it in a MAX() function.
If I understood right:
--=======================
-- sample data - simplifed output of your query
--=======================
declare #t table
(
ClientCode int,
ClientAddress varchar(50),
DateLastBilled datetime
-- the rest of fields is skipped
)
insert into #t values (1, 'address1', '2011-01-01')
insert into #t values (1, 'address1', '2011-01-02')
insert into #t values (1, 'address1', '2011-01-03')
insert into #t values (1, 'address1', '2011-01-04')
insert into #t values (2, 'address2', '2011-01-07')
insert into #t values (2, 'address2', '2011-01-08')
insert into #t values (2, 'address2', '2011-01-09')
insert into #t values (2, 'address2', '2011-01-10')
--=======================
-- solution
--=======================
select distinct
ClientCode,
ClientAddress,
DateLastBilled
from
(
select
ClientCode,
ClientAddress,
DateLastBilled,
-- list of remaining fields
MaxDateLastBilled = max(DateLastBilled) over(partition by ClientCode)
from
(
-- here should be your query
select * from #t
) t
) t
where MaxDateLastBilled = DateLastBilled

MySQL INSERT with multiple nested SELECTs

Is a query like this possible? MySQL gives me an Syntax error. Multiple insert-values with nested selects...
INSERT INTO pv_indices_fields (index_id, veld_id)
VALUES
('1', SELECT id FROM pv_fields WHERE col1='76' AND col2='val1'),
('1', SELECT id FROM pv_fields WHERE col1='76' AND col2='val2')
I've just tested the following (which works):
insert into test (id1, id2) values (1, (select max(id) from test2)), (2, (select max(id) from test2));
I imagine the problem is that you haven't got ()s around your selects as this query would not work without it.
When you have a subquery like that, it has to return one column and one row only. If your subqueries do return one row only, then you need parenthesis around them, as #Thor84no noticed.
If they return (or could return) more than row, try this instead:
INSERT INTO pv_indices_fields (index_id, veld_id)
SELECT '1', id
FROM pv_fields
WHERE col1='76'
AND col2 IN ('val1', 'val2')
or if your conditions are very different:
INSERT INTO pv_indices_fields (index_id, veld_id)
( SELECT '1', id FROM pv_fields WHERE col1='76' AND col2='val1' )
UNION ALL
( SELECT '1', id FROM pv_fields WHERE col1='76' AND col2='val2' )

SQL - need to determine implicit end dates for supplied begin dates

Consider the following:
CREATE TABLE Members
(
MemberID CHAR(10)
, GroupID CHAR(10)
, JoinDate DATETIME
)
INSERT Members VALUES ('1', 'A', 2010-01-01)
INSERT Members VALUES ('1', 'C', 2010-09-05)
INSERT Members VALUES ('1', 'B', 2010-04-15)
INSERT Members VALUES ('1', 'B', 2010-10-10)
INSERT Members VALUES ('1', 'A', 2010-06-01)
INSERT Members VALUES ('1', 'D', 2001-11-30)
What would be the best way to select from this table, determining the implied "LeaveDate", producing the following data set:
MemberID GroupID JoinDate LeaveDate
1 A 2010-01-01 2010-04-14
1 B 2010-04-15 2010-05-31
1 A 2010-06-01 2010-09-04
1 C 2010-09-05 2010-10-09
1 B 2010-10-10 2010-11-29
1 D 2010-11-30 NULL
As you can see, a member is assumed to have no lapse in membership. The [LeaveDate] for each member status period is assumed to be the day prior to the next chronological [JoinDate] that can be found for that member in a different group. Of course this is a simplified illustration of my actual problem, which includes a couple more categorization/grouping columns and thousands of different members with [JoinDate] values stored in no particular order.
Something like this perhaps? Self join, and select the minimum joining date that is greater than the joining date for the current row - i.e. the leave date plus one. Subtract one day from it.
You may need to adjust the date arithmetic for your particular RDBMS.
SELECT
m1.*
, MIN( m2.JoinDate ) - INTERVAL 1 DAY AS LeaveDate
FROM
Members m1
LEFT JOIN
Members m2
ON m2.MemberID = m1.MemberID
AND m2.JoinDate > m1.JoinDate
GROUP BY
m1.MemberID
, m1.GroupID
, m1.JoinDate
ORDER BY
m1.MemberID
, m1.JoinDate
Standard (ANSI) SQL solution:
SELECT memberid,
groupid,
joindate,
lead(joindate) OVER (PARTITION BY memberid ORDER BY joindate ASC) AS leave_date
FROM members
ORDER BY joindate ASC