Homogenous Containers with GROUP BY Part II - sql

I have three tables, one of which I need to update:
CREATE TABLE Containers (
ID int PRIMARY KEY,
FruitType int
)
CREATE TABLE Contents (
ContainerID int,
ContainedFruit int
)
CREATE TABLE Fruit (
FruitID int,
Name VARCHAR(16)
)
INSERT INTO Fruit VALUES ( 0, 'Mixed' )
INSERT INTO Fruit VALUES ( 1, 'Apple' )
INSERT INTO Fruit VALUES ( 2, 'Banana' )
INSERT INTO Fruit VALUES ( 3, 'Cherry' )
INSERT INTO Fruit VALUES ( 4, 'Date' )
INSERT INTO Containers VALUES ( 101, 0 )
INSERT INTO Containers VALUES ( 102, 0 )
INSERT INTO Containers VALUES ( 103, 0 )
INSERT INTO Containers VALUES ( 104, 3 )
INSERT INTO Contents VALUES ( 101, 1 )
INSERT INTO Contents VALUES ( 101, 1 )
INSERT INTO Contents VALUES ( 102, 1 )
INSERT INTO Contents VALUES ( 102, 2 )
INSERT INTO Contents VALUES ( 102, 3 )
INSERT INTO Contents VALUES ( 103, 3 )
INSERT INTO Contents VALUES ( 103, 4 )
INSERT INTO Contents VALUES ( 104, 3 )
Let's assume this is the state of my database. Please note the Fruit table is used twice. Once to describe the contents of the container, and once to describe if the container is meant to contain only one type of fruit or if it can be mixed. Bad design IMO, but too late to change it.
The problem is that container 101 is incorrectly marked as MIXED when it should really be APPLE. Containers with multiple contents of the same type are still homogenous containers and should be marked as such.
I know how to do a query that finds the containers that are incorrectly marked as mixed:
SELECT Contents.ContainerID
FROM Contents
INNER JOIN Containers ON
Contents.ContainerID = Containers.ID AND
Containers.FruitType = 0
GROUP BY Contents.ContainerID
HAVING COUNT( DISTINCT Contents.ContainedFruit ) = 1
However, I don't know how to update every row in Container where this error has been made. That's my question.

This will do:
UPDATE Container
SET Container.FruitType = Proper.ProperType
FROM Container
INNER JOIN
(
SELECT Contents.ContainerID,
MAX(Contents.ContainedFruit) ProperType
FROM Contents
INNER JOIN Container ON
Contents.ContainerID = Container.ID AND
Container.FruitType = 0
GROUP BY Contents.ContainerID
HAVING COUNT( DISTINCT Contents.ContainedFruit ) = 1
) Proper
ON Container.ID = Proper.ContainerID
SQL Fiddle

Related

Count of empty values in string array in PostgreSQL

I want to check Projects column values have the same values for all the same values of PartNo and PartName columns. Projects column data type is character variyng[].
For example:
PartNo
PartName
Projects
1
3
6;5
1
3
1
3
3
2
5;5
In this case, Projects have different values (6;5) and () for the same PartName(3) and PartNo(1).
This is my query, but it does not work with empty character variyng[] in projects column!
SELECT COUNT(*) from (
select c.partno, c.partname
FROM unnest(items) as c
GROUP BY c.partno, c.partname
HAVING COUNT(distinct c.projects) > 1) as xxx
INTO errCount;
IF errCount > 0 THEN
RETURN QUERY
SELECT 0 as status, format('Projects value should be the same for all Codes of the Part No %s and Name %s',c.partno,c.partname) as message
FROM unnest(items) as c
GROUP BY c.partno, c.partname
HAVING COUNT(distinct c.projects) > 1
;
RETURN;
END IF;
In the case of two different values in projects (not empty array), it works.
you can use a query like this with
coalesce
function to convert null in array[null]
WITH tt AS (
SELECT
partno,
partname,
COALESCE ( project, ARRAY [null] ) AS pro
FROM
tab1
) SELECT
*,
COUNT ( pro ) AS num
FROM
tt
GROUP BY
partno,
partname,
pro
to create test table:
CREATE TABLE "tab1" (
"pk" serial4 primary key,
"partno" int4,
"partname" int4,
"project" varchar[]
);
INSERT INTO "tab1" (partno,partname,project) VALUES ( 1, 3, '{6,5}');
INSERT INTO "tab1" (partno,partname,project) VALUES ( 1, 3, NULL);
INSERT INTO "tab1" (partno,partname,project) VALUES ( 1, 3, NULL);
INSERT INTO "tab1" (partno,partname,project) VALUES ( 3, 2, '{5,5}');

Insert Missing Values Into Table Using SQL

Objective: I need to fully populate a table with a matrix of values for each column by [PropertyId] grouping. Several [PropertyId] have all the necessary values for each column (Table 1), however, many are missing some values (Table 2). Furthermore, not every [PropertyId] needs these values as they have completely different regional values. Therefore, I need to identify which [PropertyId] both need the values populated and don't have all the necessary values.
Examples:
Table 1. Each identified [PropertyId] grouping should have 23 distinct records for these four columns [ReportingVolumeSettingId],[SpeciesGroupInventoryID],[CropCategoryID],[SortOrder].
Table 2. Here is an example of a PropertyID that is missing a value combination as it only has 22 records:
Both of these example results were queried from the same table [ReportingVolume]. I have not been successful in even identifying which record combination per [PropertyID] are missing. I would like to identify each missing record combination and then insert that record combination into the [ReportingVolume] table.
Problem to Solve -- The SQL Code below is my attempt to 1. Identify the correct List of Values; 2. Identify which properties should have matching values; 3. Identify which properties are missing values; 4. Identify the missing values per property.
;with CORRECT_LIST as
(
select
SpeciesGroupInventoryName, SpeciesGroupInventoryId, CropCategoryName,CropCategoryID, UnitOfMeasure, SortOrder
--*
from [GIS].[RST].[vPropertyDefaultTimberProductAndUnitOfMeasure]
where PropertyId in (1)
)
,
property_list as
(
select distinct rvs.propertyid as Volume_Property, pd.PropertyName, pd.PropertyId from RMS.GIS.ReportingVolumeSetting rvs
right outer JOIN RMS.GIS.PropertyDetail AS pd ON rvs.PropertyId = pd.PropertyId
left outer JOIN RMS.GIS.SpeciesGroupInventory AS sgi ON rvs.SpeciesGroupInventoryId = sgi.SpeciesGroupInventoryId
where sgi.SpeciesGroupInventoryId in (1,2,3)
or pd.PropertyId = 171
)
, Partial_LISTS as
(
select Count(distinct ReportingVolumeSettingId) as CNT_REPORT, pd.PropertyName, pd.PropertyId
from [GIS].[ReportingVolumeSetting] rvs
right outer JOIN property_list AS pd ON rvs.PropertyId = pd.PropertyId
group by pd.propertyId, pd.PropertyName
)
, Add_Props as
(
select propertyName, propertyId, SUM(CNT_REPORT) as CNT_RECORDS from Partial_LISTS
where CNT_REPORT < 23
group by propertyName, propertyId
)
, RVS_RECORDS_PROPS as
(
select addProps.PropertyName, rvs.* from [GIS].[ReportingVolumeSetting] rvs
join Add_Props addProps on addprops.PropertyId = rvs.PropertyID
where rvs.PropertyId in (select PropertyId from Add_Props)
)
select rp.PropertyName, cl.*, rp.SpeciesGroupInventoryId from correct_list cl
left outer join RVS_Records_Props rp
on rp.SpeciesGroupInventoryId = cl.SpeciesGroupInventoryId
and rp.CropCategoryId = cl.CropCategoryID
and rp.SortOrder = cl.SortOrder
Order by rp.PropertyName
How can I modify the code or create a new code block identifies the missing values and inserts them into the table per PropertyId?
I am using SQL SMSS v15.
Thanks so much.
This should identify missing entries. You could simply add an INSERT INTO command on top of this. Keep in mind as the ReportingVolumeSettingId is unique and unknown it's not covered here.
SELECT * FROM (SELECT DISTINCT PropertyId FROM ReportingVolume) rv
CROSS APPLY
(
SELECT DISTINCT SpeciesGroupInventoryId
, CropCategoryId
, SortOrder
FROM ReportingVolume
) x
EXCEPT
SELECT PropertyId, SpeciesGroupInventoryId, CropCategoryId, SortOrder FROM ReportingVolume
I don't have access to your data, so I cannot provide an example specific to your environment, but I can provide you a simple example using SQL Server's EXCEPT operator.
Run the following example in SSMS:
-- Create a list of required values.
DECLARE #Required TABLE ( required_id INT, required_val VARCHAR(50) );
INSERT INTO #Required ( required_id, required_val ) VALUES
( 1, 'Required value 1.' ), ( 2, 'Required value 2.' ), ( 3, 'Required value 3.' ), ( 4, 'Required value 4.' ), ( 5, 'Required value 5.' );
-- Create some sample data to compare against.
DECLARE #Data TABLE ( PropertyId INT, RequiredId INT );
INSERT INTO #Data ( PropertyId, RequiredId ) VALUES
( 1, 1 ), ( 1, 2 ), ( 1, 3 ), ( 2, 1 ), ( 2, 2 ), ( 2, 4 ), ( 2, 5 );
-- Set a property id value to query.
DECLARE #PropertyId INT = 1;
-- Preview #Data's rows for the specified #PropertyId.
SELECT * FROM #Data WHERE PropertyId = #PropertyId ORDER BY PropertyId, RequiredId;
At this point, I've created a list of required values (required_id 1 through 5) and some dummy data to check them against. This initial SELECT shows the current resultset for the specified #PropertyID:
+------------+------------+
| PropertyId | RequiredId |
+------------+------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
+------------+------------+
You can see that required_id values 4 and 5 are missing for the current property. Next, we can compare #Required against #Data and INSERT any missing required values using the EXCEPT operator and then return the corrected resultset.
-- Insert any missing required values for #PropertyId.
INSERT INTO #Data ( PropertyId, RequiredId )
SELECT #PropertyId, required_id FROM #Required
EXCEPT
SELECT PropertyId, RequiredId FROM #Data WHERE PropertyId = #PropertyId;
-- Preview #Data's revised rows for #PropertyId.
SELECT * FROM #Data WHERE PropertyId = #PropertyId ORDER BY PropertyId, RequiredId;
The updated resultset now looks like the following:
+------------+------------+
| PropertyId | RequiredId |
+------------+------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 5 |
+------------+------------+
You can run this again against #PropertyId = 2 to see a different scenario.
Note the order to using EXCEPT. The required rows comes first, followed by the EXCEPT operator, and then the current rows to be validated. This is important. EXCEPT is saying show me rows from #Required that are not in #Data--which allows for the inserting of any missing required values into #Data.
I know this example doesn't represent your existing data with the 23 rows requirement, but hopefully, it will get you moving with a solution for your needs. You can read more about the EXCEPT operator here.
Here is the complete SSMS example:
-- Create a list of required values.
DECLARE #Required TABLE ( required_id INT, required_val VARCHAR(50) );
INSERT INTO #Required ( required_id, required_val ) VALUES
( 1, 'Required value 1.' ), ( 2, 'Required value 2.' ), ( 3, 'Required value 3.' ), ( 4, 'Required value 4.' ), ( 5, 'Required value 5.' );
-- Create some sample data to compare against.
DECLARE #Data TABLE ( PropertyId INT, RequiredId INT );
INSERT INTO #Data ( PropertyId, RequiredId ) VALUES
( 1, 1 ), ( 1, 2 ), ( 1, 3 ), ( 2, 1 ), ( 2, 2 ), ( 2, 4 ), ( 2, 5 );
-- Set a property id value to query.
DECLARE #PropertyId INT = 1;
-- Preview #Data's rows for the specified #PropertyId.
SELECT * FROM #Data WHERE PropertyId = #PropertyId ORDER BY PropertyId, RequiredId;
-- Insert any missing required values for #PropertyId.
INSERT INTO #Data ( PropertyId, RequiredId )
SELECT #PropertyId, required_id FROM #Required
EXCEPT
SELECT PropertyId, RequiredId FROM #Data WHERE PropertyId = #PropertyId;
-- Preview #Data's revised rows for #PropertyId.
SELECT * FROM #Data WHERE PropertyId = #PropertyId ORDER BY PropertyId, RequiredId;

Updating thousands of records with different values

I've been given a spreadsheet in the format of :
Id | Val
1 57
2 99
There's approximately 10,000 records - Any ideas to handle the query below for 10,000 records without manually writing each case statement, tediously. Thanks.
update person
SET val = (
case
when Id = 1 then 57
when Id = 2 then 99
end),
where Id in (1, 2)
Quick and dirty? here you go
Add a new spredsheet call the old one datatable
In the first row first column you write
"Update person set val = ("
in the second column you link to the value on datatable spreadsheet
third column ") where ID = ("
fourth column you link to the ID of the datatable spreadsheet
fifth column ")"
Then you mark the whole row and pull it downwards to row 10000
Copy past into query escecute
I think this example can be help you :
CREATE TABLE #Person
(PrimaryKey int PRIMARY KEY,
ValueSome varchar(50)
);
GO
CREATE TABLE #MySpreadSheet
(PrimaryKey int PRIMARY KEY,
ValueSpread varchar(50)
);
GO
INSERT INTO #Person
SELECT 1, 'someValue'
INSERT INTO #Person
SELECT 2, 'someValueBeforeUpdate'
INSERT INTO #Person
SELECT 3, ''
INSERT INTO #MySpreadSheet
SELECT 1, '45'
INSERT INTO #MySpreadSheet
SELECT 2, '56'
INSERT INTO #MySpreadSheet
SELECT 3, '34'
SELECT * FROM #Person
SELECT * FROM #MySpreadSheet
UPDATE P SET P.ValueSome = SS.ValueSpread FROM #Person P JOIN #MySpreadSheet SS ON P.PrimaryKey = SS.PrimaryKey
SELECT * FROM #Person
DROP TABLE #Person
DROP TABLE #MySpreadSheet
If anyones interested, I went with this :
CREATE TABLE #TempTable(
Id int,
val int
)
INSERT INTO #TempTable (Id, val)
Values (1, 57),
(2, 99)
Update Person
Set Id = tp.Id,
val = tp.val
FROM Person p
INNER JOIN #TempTable as tp on tp.Id = p.Id
create table #example (id int , value int)
insert into #example (id, value) values (1, 10)
insert into #example (id, value) values (2, 20)
select * from #example
id value
1 10
2 20
update #example
set value = case when id = 1 then 100
when id = 2 then 200 end
where id in (1,2)
select * from #example
id value
1 100
2 200

Matching multiple key/value pairs in SQL

I have metadata stored in a key/value table in SQL Server. (I know key/value is bad, but this is free-form metadata supplied by users, so I can't turn the keys into columns.) Users need to be able to give me an arbitrary set of key/value pairs and have me return all DB objects that match all of those criteria.
For example:
Metadata:
Id Key Value
1 a p
1 b q
1 c r
2 a p
2 b p
3 c r
If the user says a=p and b=q, I should return object 1. (Not object 2, even though it also has a=p, because it has b=p.)
The metadata to match is in a table-valued sproc parameter with a simple key/value schema. The closest I have got is:
select * from [Objects] as o
where not exists (
select * from [Metadata] as m
join #data as n on (n.[Key] = m.[Key])
and n.[Value] != m.[Value]
and m.[Id] = o.[Id]
)
My "no rows exist that don't match" is an attempt to implement "all rows match" by forming its contrapositive. This does eliminate objects with mismatching metadata, but it also returns objects with no metadata at all, so no good.
Can anyone point me in the right direction? (Bonus points for performance as well as correctness.)
; WITH Metadata (Id, [Key], Value) AS -- Create sample data
(
SELECT 1, 'a', 'p' UNION ALL
SELECT 1, 'b', 'q' UNION ALL
SELECT 1, 'c', 'r' UNION ALL
SELECT 2, 'a', 'p' UNION ALL
SELECT 2, 'b', 'p' UNION ALL
SELECT 3, 'c', 'r'
),
data ([Key], Value) AS -- sample input
(
SELECT 'a', 'p' UNION ALL
SELECT 'b', 'q'
),
-- here onwards is the actual query
data2 AS
(
-- cnt is to count no of input rows
SELECT [Key], Value, cnt = COUNT(*) OVER()
FROM data
)
SELECT m.Id
FROM Metadata m
INNER JOIN data2 d ON m.[Key] = d.[Key] AND m.Value= d.Value
GROUP BY m.Id
HAVING COUNT(*) = MAX(d.cnt)
The following SQL query produces the result that you require.
SELECT *
FROM #Objects m
WHERE Id IN
(
-- Include objects that match the conditions:
SELECT m.Id
FROM #Metadata m
JOIN #data d ON m.[Key] = d.[Key] AND m.Value = d.Value
-- And discount those where there is other metadata not matching the conditions:
EXCEPT
SELECT m.Id
FROM #Metadata m
JOIN #data d ON m.[Key] = d.[Key] AND m.Value <> d.Value
)
Test schema and data I used:
-- Schema
DECLARE #Objects TABLE (Id int);
DECLARE #Metadata TABLE (Id int, [Key] char(1), Value char(2));
DECLARE #data TABLE ([Key] char(1), Value char(1));
-- Data
INSERT INTO #Metadata VALUES
(1, 'a', 'p'),
(1, 'b', 'q'),
(1, 'c', 'r'),
(2, 'a', 'p'),
(2, 'b', 'p'),
(3, 'c', 'r');
INSERT INTO #Objects VALUES
(1),
(2),
(3),
(4); -- Object with no metadata
INSERT INTO #data VALUES
('a','p'),
('b','q');

SQL Command to get all rows from a specific set of groups

Let say I have the following table (The ID is self incremental)
ID Name Serial Status
0 Pie A Fail
1 Pie A Fail
2 Pie A Pass
3 Pie B Fail
4 Pie B Pass
5 Pie C Pass
6 Pie C Fail
How can I get all the rows where the last row of each Group By (Name, Serial) is Pass?
This is the result I should get from the query. The serial C is removed since the last entry of the group by (Name, Serial) is 'Fail'
ID Name Serial Status
0 Pie A Fail
1 Pie A Fail
2 Pie A Pass
3 Pie B Fail
4 Pie B Pass
Thanks!
I would try something like this (assuming SQL Server):
DECLARE #myTable AS TABLE(
ID INT,
Name VARCHAR(10),
Serial VARCHAR(1),
[Status] VARCHAR(10))
INSERT INTO #myTable VALUES(0, 'Pie', 'A', 'Fail')
INSERT INTO #myTable VALUES(1, 'Pie', 'A', 'Fail')
INSERT INTO #myTable VALUES(2, 'Pie', 'A', 'Pass')
INSERT INTO #myTable VALUES(3, 'Pie', 'B', 'Fail')
INSERT INTO #myTable VALUES(4, 'Pie', 'B', 'Pass')
INSERT INTO #myTable VALUES(5, 'Pie', 'C', 'Pass')
INSERT INTO #myTable VALUES(6, 'Pie', 'C', 'Fail')
SELECT *
FROM #myTable
WHERE Serial NOT IN
(
--Get all Serial that end with a 'Fail'
SELECT T1.Serial
FROM #myTable T1
JOIN (
--Get Max ID for a serial
SELECT MAX(ID) as [ID] FROM #myTable GROUP BY Serial
) T2 ON T1.[ID] = T2.[ID]
WHERE T1.[Status] = 'Fail'
)
ORDER BY [ID]
or if you prefer NOT EXISTS (which is usually faster than NOT IN):
SELECT *
FROM #myTable T
WHERE NOT EXISTS
(
SELECT
T1.Serial
FROM #myTable T1
JOIN (
--Get Max ID for a serial
SELECT MAX(ID) as [ID] FROM #myTable GROUP BY Serial
) T2 ON T1.[ID] = T2.[ID]
WHERE
T1.[Status] = 'Fail'
AND T1.[Serial] = T.[Serial]
)
ORDER BY [ID]
We can use CTE to improve readability by implementing as a series of sequential steps:
Get max ids
Get serials for max ids that have status 'Fail'
Remove those rows that match the serials
It would look like this:
with maxIds as ( --Get max Ids
SELECT MAX(ID) as [ID] FROM myTable GROUP BY Serial
),
serials as ( -- Get serials for max ids that have status 'Fail'
SELECT T1.Serial FROM myTable T1 JOIN maxIds ON T1.[ID] = maxIds.[ID] WHERE [Status] = 'Fail'
)
select * from myTable where serial not in (select * from serials) -- Remove serials that match