How to return boolean value in Redshift - sql

I have a following table
id person type counted expected
1 a A 0 1
2 a A 1 0
3 a B 1 0
4 a B 2 0
5 a B 3 4
6 b C 0 0
First I'd like to group by type and aggregate by summing counted and expected
person type sum(counted) sum(expected)
a A 1 1
a B 6 4
b C 0 0
Then I'd like to add boolean whether sum(counted)equalsum(expected) or not.
person type sum(counted) sum(expected) counted=expected
a A 1 1 true
a B 6 4 false
b C 0 0 true
And then I'd like to group by in person and return boolean with and in person
person has_false
a false
b true
Are there any way to achieve this?
I went halfway but didn't proceed yet.
select person,type,sum(counted),sum(expected)
from table
group by person,type
If someone has opinion,please let me know
Thanks

This should work. I've laid it out like you described but I don't think you need to sum by person, type - rather just summing by person will work (for this example).
drop table if exists test;
create table test (id int, person varchar(1), typ varchar(1), counted int, expected int);
insert into test values
(1, 'a', 'A', 0, 1),
(2, 'a', 'A', 1, 0),
(3, 'a', 'B', 1, 0),
(4, 'a', 'B', 2, 0),
(5, 'a', 'B', 3, 4),
(6, 'b', 'C', 0, 0);
with grouped as (
select person, typ, sum(counted) as scount, sum(expected) as ecount, scount = ecount as equal
from test
group by person,typ
)
select person, bool_and(equal) as has_false
from grouped
group by person;

Related

SQL query to fetch distinct records

Can someone help me out with this sql query on postgres which I have to write but I just can't come up with, I have tried my best to simplify the problem from 1 million records and more constraints to this, I know this looks easy, but I am still unable to resolve this somehow :-
Table_name = t
Column_1_name = id
Column_2_name = st
Column_1_elements = [1,1,1,1,2,2,2,3,3]
Column_2_elements = [a,b,c,d,a,c,d,b,d]
Now I want to print to those distinct ids from id where they do not have their corresponding st equals to 'b' or 'a'.
For example, for the above example, the ouput should be [2,3] as 2 does not have corresponding 'b' and 3 does not have 'a'. [even though 3 does not have c also, but we are not concerned about 'c']. id=1 is not returned in solution as it has a relation with both 'a' and 'b'.
Let me know if you need more clarity.
Thanks in advance for helping.
edit1:- The number of elements for id = 1,2,3 could be anything. I just want those ids where there corresponding st does not "contain" 'a' or 'b'.
if there is an id=4 which has just one st which is 'r', and there is an id=5 which contains 'a','b','c','d','e','f','k','z'.
Then we want id=4 in the output as well as it does not contain 'a' or 'b'..
You might need to correct the syntax a little bit based on you SQL engine but this one is a working solution in Google BigQuery -
with temp as (
select 1 as id, 'a' as st union all
select 1 as id, 'b' as st union all
select 1 as id, 'c' as st union all
select 1 as id, 'd' as st union all
select 2 as id, 'a' as st union all
select 2 as id, 'c' as st union all
select 2 as id, 'd' as st union all
select 3 as id, 'b' as st union all
select 3 as id, 'd' as st union all
select 4 as id, 'e' as st union all
select 5 as id, 'g' as st union all
select 5 as id, 'h' as st
)
-- add 2 columns for is_a and is_b flags
, temp2 as (
select *
, case when st = 'a' then 1 else 0 end is_a
,case when st = 'b' then 1 else 0 end as is_b
from temp
)
-- IDs that have both the flags as 1 should be filtered out (like ID = 1)
select id
from temp2
group by 1
having max(is_a) + max(is_b) < 2
This solution takes care of the problem you mentioned with ID 4 . Let me know if this works for you.
See if this works:
create table t (id integer, st varchar);
insert into t values (1, 'a'), (1, 'b'), (1, 'c'), (1, 'd'), (2, 'a'), (2, 'c'), (2, 'd'), (3, 'b'), (3, 'd'), (4, 'r');
insert into t values (5, 'a'), (5, 'b'), (5, 'c'), (5, 'd'), (5, 'e'), (5, 'f'), (5, 'k'), (5, 'z');
select id, array['a', 'b'] <# array_agg(st)::text[] as tf from t group by id;
id | tf
----+----
3 | f
5 | t
4 | f
2 | f
1 | t
select * from (select id, array['a', 'b'] <# array_agg(st)::text[] as tf from t group by id) as agg where agg.tf = 'f';
id | tf
----+----
3 | f
4 | f
2 | f
In the first select query the array_agg(st) aggregates all the st values for an id via the group by id. array['a', 'b'] <# array_agg(st)::text[] then asks if the a and b are both in the array_agg.
The query is then turned into a sub-query where the outer query selects those rows that where 'f'(false), in other words did not have both a and b in the aggregated id values.

Match rows that include one of each at least once in SQL

I have a users table:
ID Name OID TypeID
1 a 1 1
2 b 1 2
3 c 1 3
4 d 2 1
5 e 2 1
6 f 2 2
7 g 3 2
8 h 3 2
9 i 3 2
for this table, I want to filter by OID and TypeID so that I get the rows that it is filtered by OID and that includes all 1, 2, and 3 in TypeID.
For example, where OID=1, we have 1, 2, and 3 in TypeID but I shouldn't get the rows with IDs 4-6 because for IDs 4-6, OIDs are the same but TypeID does not include all of each(1, 2, and 3).
You can do :
select oid
from table t
where typeid in (1,2,3)
group by oid
having count(*) = 3;
If, oid contain duplicate typeid then you can use count(distinct typeid) instead.
you could use exists
select oid from table t1
where exists ( select 1 from table t1 where t1.oid=t2.oid
group by t2.oid
having (distinct TypeID)=3
)
Asume TypeID 1,2,3
if you are using sql-server, you can try this.
DECLARE #SampleData TABLE(ID INT, Name VARCHAR(5), OID INT, TypeID INT)
INSERT INTO #SampleData VALUES
(1 , 'a', 1, 1),
(2 , 'b', 1, 2),
(3 , 'c', 1, 3),
(4 , 'd', 2, 1),
(5 , 'e', 2, 1),
(6 , 'f', 2, 2),
(7 , 'g', 3, 2),
(8 , 'h', 3, 2),
(9 , 'i', 3, 2)
SELECT * FROM #SampleData D
WHERE NOT EXISTS (
SELECT * FROM #SampleData D1
RIGHT JOIN (VALUES (1),(2),(3)) T(TypeID) ON D1.TypeID = T.TypeID
AND D.OID = D1.OID
WHERE D1.TypeID IS NULL
)
Result:
ID Name OID TypeID
----------- ----- ----------- -----------
1 a 1 1
2 b 1 2
3 c 1 3

I want to output a list of every Case_Number with Code 1 that has a higher UniqueID value than its Code 2 counterpart's UniqueID value

I have a table which looks something like this:
Case_Number | Code | UniqueID
a 1 1372
a 2 1352
a 3 1325
b 1 1642
b 2 1651
b 3 1623
c 1 1743
c 2 1739
c 3 1720
... ... ...
From this database I want to output a list of every Case_Number where the UniqueID value of Code 1 is higher than the UniqueID value of Code 2 (But ignoring the UniqueID value of Code 2, or any other Code x that might be in the table). Meaning that if the UniqueID value of Code 2 is higher than Code 1, which is the case with Case_Number b in the example above, it should not show up in the list.
So, querying the above table would result in this:
Case_Number | Code | UniqueID
a 1 1372
c 1 1743
Hmmm . . . You seem to want:
select t.*
from t
where t.code = 1 and
t.uniqueid > (select max(t2.uniqueid)
from t t2
where t2.case_number = t.case_number and t2.code = 2
);
The max() in the subquery is simply to handle the case where there is more than one matching value.
The query below gives you the expected result
CREATE TABLE CaseTab
(Case_Number VARCHAR(10),
Code INT,
UniqueID INT);
INSERT INTO CaseTab VALUES ('a', 1, 1372);
INSERT INTO CaseTab VALUES ('a', 2, 1352);
INSERT INTO CaseTab VALUES ('a', 3, 1325);
INSERT INTO CaseTab VALUES ('b', 1, 1642);
INSERT INTO CaseTab VALUES ('b', 2, 1651);
INSERT INTO CaseTab VALUES ('b', 3, 1623);
INSERT INTO CaseTab VALUES ('c', 1, 1743);
INSERT INTO CaseTab VALUES ('c', 2, 1739);
INSERT INTO CaseTab VALUES ('c', 3, 1720);
WITH v_code_gt_1 AS
(SELECT Case_Number, MAX(UniqueID) AS UniqueID
FROM CaseTab
WHERE Code > 1
GROUP BY Case_Number)
SELECT c1.Case_Number, c1.UniqueID
FROM CaseTab c1 JOIN
v_code_gt_1 c2
ON (c1.Case_Number = c2.Case_Number)
WHERE c1.UniqueID > c2.UniqueID
AND c1.Code = 1;
Basically the query gets the max UniqueID for all cases where code is greater than 1 and compares against the Unique ID for Code 1.
You haven't stated whether there can be cases with code = 1, but no other codes. If so, use LEFT JOIN as below.
WITH v_code_gt_1 AS
(SELECT Case_Number, MAX(UniqueID) AS UniqueID
FROM CaseTab
WHERE Code > 1
GROUP BY Case_Number)
SELECT c1.Case_Number, c1.UniqueID
FROM CaseTab c1 LEFT JOIN
v_code_gt_1 c2
ON (c1.Case_Number = c2.Case_Number)
WHERE c1.UniqueID > ISNULL(c2.UniqueID, 0)
AND c1.Code = 1;

SQL aggregation on the latest output per machine for each time

I have the following table:
ID machine app output time
1 1 A 12 1
2 1 B 15 1
3 1 B 8 3
4 1 A 11 4
5 2 C 14 4
6 2 D 17 4
For each app I want to get the latest output given up to each point in time, and aggregate these results grouped by machine using AVG
So for the table on top, the data before aggregation should be:
time machine app latest
1 1 A 12
1 1 B 15
3 1 A 12
3 1 B 8
4 1 A 11
4 1 B 8
4 2 C 14
4 2 D 17
And the aggregated result should be:
time machine avg
1 1 =(12+15)/2
3 1 =(12+8)/2
4 1 =(11+8)/2
4 2 =(14+17)/2
What is the correct way to approach this problem?
It is not as simple as I thought to be, but I think it works just as You want. I changed time column to ts, like this:
CREATE TABLE Table1
(ID int, machine int, app char(1), output int, ts int)
;
INSERT INTO Table1
(ID,machine,app,output, ts)
VALUES
(1, 1, 'A', 12, 1),
(2, 1, 'B', 15, 1),
(3, 1, 'B', 8, 3),
(4, 1, 'A', 11, 4),
(5, 2, 'C', 14, 4),
(6, 2, 'D', 17, 4)
;
And here is the query:
WITH
times as
(
SELECT distinct ts FROM Table1
),
machine_apps as
(
SELECT DISTINCT machine,app FROM Table1
),
grid as
(
SELECT
ts,machine,app
FROM
times
CROSS JOIN machine_apps
),
last_outputs as
(
SELECT
g.ts,
g.app,
g.machine,
max(t.ts) as last_time
FROM
grid g
JOIN Table1 t ON (t.app = g.app AND t.machine = g.machine AND t.ts <= g.ts)
GROUP BY
g.ts,
g.app,
g.machine
)
SELECT
l.ts,
l.machine,
AVG(t.output) as avg
FROM
last_outputs l
LEFT JOIN Table1 t ON (t.app = l.app AND t.machine = l.machine AND t.ts = l.last_time)
GROUP BY
l.ts,
l.machine
ORDER BY
l.ts,
l.machine

Fetch duplicate records as well as Unique record in ssis

I have table mentioned below (id and Loc are Primary Keys)
ID LOC RNK NBR1 NBR2
1 2 A 10 b --->
3 4 A 10 b --->
5 6 A 11 C
8 2 A 12 D
6 3 A 10 b --->
SO here I have to fetch only duplicate records according to NBR1 and NBR2, It should fetch all the records not only the duplicates(marked as --->).
If I understood your question correctly you can do it with a subquery
CREATE TABLE #Test (ID int, LOC int, RNK char(1), NBR1 int, NBR2 char(1) )
INSERT INTO #Test VALUES
(1, 2, 'A', 10, 'b'),
(3, 4, 'A', 10, 'b'),
(5, 6, 'A', 11, 'C'),
(8, 2, 'A', 12, 'D'),
(6, 3, 'A', 10, 'b')
SELECT *
FROM #Test t1
WHERE EXISTS
(SELECT 1
FROM #Test t2
WHERE t1.NBR1 = t2.NBR1
AND t1.NBR2 = t2.NBR2
GROUP BY NBR1, NBR2
HAVING COUNT(1) > 1)
You can also use this, but cost will be more. The RowsCount having values greater than 1 are duplicate and having values 1 are unique records.
With Temp As
(
Select ID,LOC,RNK,NBR1,NBR2,Row_NUMBER() OVER (PARTITION BY NBR2 ORDER BY NBR1) AS ROWSCOUNT FROM <<TABLE_NAME>>
)
Select * from Temp