SQL Rollup Hierarchy Closest - sql

I have the following scenario. As you could see Levels (1) – ItemNumber (NULL) is the incoming number, which all the ones listed below are associated to the main one.
Ultimately, what I want to do is only get the following rows
Lines 2 and 4,5,6
Since row 2 doesn’t have any children rows and USDAFoodItemID is not null
And 4,5,6 The parent row 3 does have children, and since row 3 is null,
The children 4,5,6 will be spit out.
If Line 1 was populated with USDAFoodItemID then it would only spit out line 1.
Sorry for my English, I am still working on it. But help is universal, and I would appreciate any help I could get.

Sorry, here is the code to produce the table. I am using MS SQL
DECLARE #HoldTable TABLE
(
Levels INT,
ItemNumber INT,
IngredientNumber INT,
NutritionalLoss FLOAT,
StandardAmountInGrams FLOAT,
SequenceNumber INT,
USDAFoodItemID NVARCHAR(5)
)
INSERT INTO #HoldTable VALUES (1, NULL, 1174258, 1.0000 ,113.3980925 ,1, NULL)
INSERT INTO #HoldTable VALUES (2, 1174258, 1174252 ,1.0000, 113.3980925, 1, 09096)
INSERT INTO #HoldTable VALUES (3, 1174252, 1006967, 1.0000, 56.69904625, 1, 01297)
INSERT INTO #HoldTable VALUES (3, 1174252, 1174251, 1.0000 ,56.69904625, 1, NULL)
INSERT INTO #HoldTable VALUES (4, 1174251, 1007555 ,1.0000 ,1.41747615625, 1 ,12078)
INSERT INTO #HoldTable VALUES (4, 1174251, 1010210, 1.0000, 26.93204696875, 1 ,NULL)
INSERT INTO #HoldTable VALUES (4, 1174251, 1010984, 1.0000, 28.349523125, 1 ,NULL)
SELECT * FROM #HoldTable
Thank you

Related

Find value contained in the HierarchyId at any level

I need to find a particular value contained in the SQL Server HierarchyId column. The value can occur at any level. Here is a sample code to illustrate the issue:
CREATE TABLE mytable
(
Id INT NOT NULL PRIMARY KEY,
TeamName VARCHAR(20) NOT NULL,
MyHierarchyId HIERARCHYID NOT NULL
);
INSERT INTO mytable(Id, TeamName, MyHierarchyId)
VALUES (1, 'Corporate','/1/');
INSERT INTO mytable(Id, TeamName, MyHierarchyId)
VALUES (2, 'Group A','/1/2/');
INSERT INTO mytable(Id, TeamName, MyHierarchyId)
VALUES (3, 'Team X','/1/2/3/');
INSERT INTO mytable(Id, TeamName, MyHierarchyId)
VALUES (4, 'Group B','/1/4/');
INSERT INTO mytable(Id, TeamName, MyHierarchyId)
VALUES (5, 'Team Y','/1/4/5/');
INSERT INTO mytable(Id, TeamName, MyHierarchyId)
VALUES (6, 'Team Z','/1/4/6/');
Now I would like to find all the records, which are associated with the Id = 4. This means records 4, 5 and 6. I could use a brute force methods like this:
SELECT [M].[Id],
[M].[TeamName],
[M].[MyHierarchyId],
[M].[MyHierarchyId].ToString() AS Lineage
FROM [dbo].[mytable] AS [M]
WHERE [M].[MyHierarchyId].ToString() LIKE '%4%'
But I suspect this will be very inefficient. Once again, the problem is that the level of the node I am searching for is not known in advance.
Thank you for any recommendations.
You can use IsDescendantOf()
Select *
from mytable
Where MyHierarchyID.IsDescendantOf( (select MyHierarchyID from mytable where id=4) ) = 1
Results
Id TeamName MyHierarchyId
4 Group B 0x5C20
5 Team Y 0x5C3180
6 Team Z 0x5C3280

SQL Filter rows based on multiple distinct values of a column

Given the following table
DECLARE #YourTable TABLE (id int, PLU int, Siteid int, description varchar(50))
INSERT #YourTable VALUES (1, 8972, 2, 'Beer')
INSERT #YourTable VALUES (2, 8972, 3, 'cider')
INSERT #YourTable VALUES (3, 8972, 4, 'Beer')
INSERT #YourTable VALUES (4, 8973, 2, 'Vodka')
INSERT #YourTable VALUES (5, 8973, 3, 'Vodka')
INSERT #YourTable VALUES (6, 8973, 4, 'Vodka')
I trying to write a query that would give me all rows that have multiple distinct values for a given description value against a plu.
So in the example above I would want to return rows 1,2,3 as they have both a 'cider' value and a 'beer' value for a plu of '8972'.
I thought 'GROUP BY' and 'HAVING' was the way to go but I can't seem to get it to work correctly.
SELECT P.PLU, P.Description
FROM #YourTable P
GROUP BY P.PLU, P.Description
HAVING COUNT(DISTINCT(P.DESCRIPTION)) > 1
Any help appreciated.
You shouldn't GROUP BY the description if you are doing a DISTINCT COUNT on it (then it will always be just 1). Try something like this:
SELECT P2.PLU, P2.Description
FROM #YourTable P2
WHERE P2.PLU in (
SELECT P.PLU
FROM #YourTable P
GROUP BY P.PLU
HAVING COUNT(DISTINCT(P.DESCRIPTION)) > 1
)

sql query to join two tables and a boolean flag to indicate whether it contains any words from third table

I have 3 tables with the following schema
create table main (
main_id int PRIMARY KEY,
secondary_id int NOT NULL
);
create table secondary (
secondary_id int NOT NULL,
tags varchar(100)
);
create table bad_words (
words varchar(100) NOT NULL
);
insert into main values (1, 1001);
insert into main values (2, 1002);
insert into main values (3, 1003);
insert into main values (4, 1004);
insert into secondary values (1001, 'good word');
insert into secondary values (1002, 'bad word');
insert into secondary values (1002, 'good word');
insert into secondary values (1002, 'other word');
insert into secondary values (1003, 'ugly');
insert into secondary values (1003, 'bad word');
insert into secondary values (1004, 'pleasant');
insert into secondary values (1004, 'nice');
insert into bad_words values ('bad word');
insert into bad_words values ('ugly');
insert into bad_words values ('worst');
expected output
----------------
1, 1000, good word, 0 (boolean flag indicating whether the tags contain any one of the words from the bad_words table)
2, 1001, bad word,good word,other word , 1
3, 1002, ugly,bad word, 1
4, 1003, pleasant,nice, 0
I am trying to use case to select 1 or 0 for the last column and use a join to join the main and secondary table, but getting confused and stuck. Can someone please help me with a query ? These tables are stored in redshift and i want query compatible with redshift.
you can use the above schema to try your query in sqlfiddle
EDIT: I have updated the schema and expected output now by removing the PRIMARY KEY in secondary table so that easier to join with the bad_words table.
You can use EXISTS and a regex comparison with \m and \M (markers for beginning and end of a word, respectively):
with
main(main_id, secondary_id) as (values (1, 1000), (2, 1001), (3, 1002), (4, 1003)),
secondary(secondary_id, tags) as (values (1000, 'very good words'), (1001, 'good and bad words'), (1002, 'ugly'),(1003, 'pleasant')),
bad_words(words) as (values ('bad'), ('ugly'), ('worst'))
select *, exists (select 1 from bad_words where s.tags ~* ('\m'||words||'\M'))::int as flag
from main m
join secondary s using (secondary_id)
select main_id, a.secondary_id, tags, case when c.words is not null then 1 else 0 end
from main a
join secondary b on b.secondary_id = a.secondary_id
left outer join bad_words c on c.words like b.tags
SELECT m.main_id, m.secondary_id, t.tags, t.is_bad_word
FROM srini.main m
JOIN (
SELECT st.secondary_id, st.tags, exists (select 1 from srini.bad_words b where st.tags like '%'+b.words+'%') is_bad_word
FROM
( SELECT secondary_id, LISTAGG(tags, ',') as tags
FROM srini.secondary
GROUP BY secondary_id ) st
) t on t.secondary_id = m.secondary_id;
This worked for me in redshift and produced the following output with the above mentioned schema.
1 1001 good word false
3 1003 ugly,bad word true
2 1002 good word,other word,bad word true
4 1004 pleasant,nice false

How many times are the results of this common table expression evaluated?

I am trying to work out a bug we've found during our last iteration of testing. It involves a query which uses a common table expression. The main theme of the query is that it simulates a 'first' aggregate operation (get the first row for this grouping).
The problem is that the query seems to choose rows completely arbitrarily in some circumstances - multiple rows from the same group get returned, some groups simply get eliminated altogether. However, it always picks the correct number of rows.
I have created a minimal example to post here. There are clients and addresses, and a table which defines the relationships between them. This is a much simplified version of the actual query I'm looking at, but I believe it should have the same characteristics, and it is a good example to use to explain what I think is going wrong.
CREATE TABLE [Client] (ClientID int, Name varchar(20))
CREATE TABLE [Address] (AddressID int, Street varchar(20))
CREATE TABLE [ClientAddress] (ClientID int, AddressID int)
INSERT [Client] VALUES (1, 'Adam')
INSERT [Client] VALUES (2, 'Brian')
INSERT [Client] VALUES (3, 'Charles')
INSERT [Client] VALUES (4, 'Dean')
INSERT [Client] VALUES (5, 'Edward')
INSERT [Client] VALUES (6, 'Frank')
INSERT [Client] VALUES (7, 'Gene')
INSERT [Client] VALUES (8, 'Harry')
INSERT [Address] VALUES (1, 'Acorn Street')
INSERT [Address] VALUES (2, 'Birch Road')
INSERT [Address] VALUES (3, 'Cork Avenue')
INSERT [Address] VALUES (4, 'Derby Grove')
INSERT [Address] VALUES (5, 'Evergreen Drive')
INSERT [Address] VALUES (6, 'Fern Close')
INSERT [ClientAddress] VALUES (1, 1)
INSERT [ClientAddress] VALUES (1, 3)
INSERT [ClientAddress] VALUES (2, 2)
INSERT [ClientAddress] VALUES (2, 4)
INSERT [ClientAddress] VALUES (2, 6)
INSERT [ClientAddress] VALUES (3, 3)
INSERT [ClientAddress] VALUES (3, 5)
INSERT [ClientAddress] VALUES (3, 1)
INSERT [ClientAddress] VALUES (4, 4)
INSERT [ClientAddress] VALUES (4, 6)
INSERT [ClientAddress] VALUES (5, 1)
INSERT [ClientAddress] VALUES (6, 3)
INSERT [ClientAddress] VALUES (7, 2)
INSERT [ClientAddress] VALUES (8, 4)
INSERT [ClientAddress] VALUES (5, 6)
INSERT [ClientAddress] VALUES (6, 3)
INSERT [ClientAddress] VALUES (7, 5)
INSERT [ClientAddress] VALUES (8, 1)
INSERT [ClientAddress] VALUES (5, 4)
INSERT [ClientAddress] VALUES (6, 6)
;WITH [Stuff] ([ClientID], [Name], [Street], [RowNo]) AS
(
SELECT
[C].[ClientID],
[C].[Name],
[A].[Street],
ROW_NUMBER() OVER (ORDER BY [A].[AddressID]) AS [RowNo]
FROM
[Client] [C] INNER JOIN
[ClientAddress] [CA] ON
[C].[ClientID] = [CA].[ClientID] INNER JOIN
[Address] [A] ON
[CA].[AddressID] = [A].[AddressID]
)
SELECT
[CTE].[ClientID],
[CTE].[Name],
[CTE].[Street],
[CTE].[RowNo]
FROM
[Stuff] [CTE]
WHERE
[CTE].[RowNo] IN (SELECT MIN([CTE2].[RowNo]) FROM [Stuff] [CTE2] GROUP BY [CTE2].[ClientID])
ORDER BY
[CTE].[Name] ASC,
[CTE].[Street] ASC
DROP TABLE [ClientAddress]
DROP TABLE [Address]
DROP TABLE [Client]
The query is designed to get all clients, and their first address (the address with the lowest ID). This appears to me that it should work.
I have a theory about why it sometimes will not work. The statement that follows the CTE refers to the CTE in two places. If the CTE is non-deterministic, and it gets run more than once, the result of the CTE may be different in the two places it's referenced.
In my example, the CTE's RowNo column uses ROW_NUMBER() with an order by clause that will potentially result in different orderings when run multiple times (we're ordering by address, the clients can be in any order depending on how the query is executed).
Because of this is it possible that CTE and CTE2 can contain different results? Or is the CTE only executed once and do I need to look elsewhere for the problem?
It is not guaranteed in any way.
SQL Server is free to evaluate CTE each time it's accessed or cache the results, depending on the plan.
You may want to read this article:
Generating XML in subqueries
If your CTE is not deterministic, you will have to store its result in a temporary table or a table variable and use it instead of the CTE.
PostgreSQL, on the other hand, always evaluates CTEs only once, caching their results.

Oracle PIVOT, twice?

I have been trying to move away from using DECODE to pivot rows in Oracle 11g, where there is a handy PIVOT function. But I may have found a limitation:
I'm trying to return 2 columns for each value in the base table. Something like:
SELECT somethingId, splitId1, splitName1, splitId2, splitName2
FROM (SELECT somethingId, splitId
FROM SOMETHING JOIN SPLIT ON ... )
PIVOT ( MAX(splitId) FOR displayOrder IN (1 AS splitId1, 2 AS splitId2),
MAX(splitName) FOR displayOrder IN (1 AS splitName1, 2 as splitName2)
)
I can do this with DECODE, but I can't wrestle the syntax to let me do it with PIVOT. Is this even possible? Seems like it wouldn't be too hard for the function to handle.
Edit: is StackOverflow maybe not the right Overflow for SQL questions?
Edit: anyone out there?
From oracle-developer.net it would appear that it can be done like this:
SELECT somethingId, splitId1, splitName1, splitId2, splitName2
FROM (SELECT somethingId, splitId
FROM SOMETHING JOIN SPLIT ON ... )
PIVOT ( MAX(splitId) ,
MAX(splitName)
FOR displayOrder IN (1 AS splitName1, 2 as splitName2)
)
I'm not sure from what you provided what the data looks or what exactly you would like. Perhaps if you posted the decode version of the query that returns the data you are looking for and/or the definition for the source data, we could better answer your question. Something like this would be helpful:
create table something (somethingId Number(3), displayOrder Number(3)
, splitID Number(3));
insert into something values (1, 1, 10);
insert into something values (2, 1, 11);
insert into something values (3, 1, 12);
insert into something values (4, 1, 13);
insert into something values (5, 2, 14);
insert into something values (6, 2, 15);
insert into something values (7, 2, 16);
create table split (SplitID Number(3), SplitName Varchar2(30));
insert into split values (10, 'Bob');
insert into split values (11, 'Carrie');
insert into split values (12, 'Alice');
insert into split values (13, 'Timothy');
insert into split values (14, 'Sue');
insert into split values (15, 'Peter');
insert into split values (16, 'Adam');
SELECT *
FROM (
SELECT somethingID, displayOrder, so.SplitID, sp.splitname
FROM SOMETHING so JOIN SPLIT sp ON so.splitID = sp.SplitID
)
PIVOT ( MAX(splitId) id, MAX(splitName) name
FOR (displayOrder, displayOrder) IN ((1, 1) AS split, (2, 2) as splitname)
);