Pivot and Sum in Amazon Redshift - sql

I have a following tables
table1
id name
1 A
3 B
table2
id label value
1 tag a
1 tag b
1 time 10
1 time 20
1 score 20
2 tag a
2 time 30
2 score 40
3 tag b
3 time 50
3 time 55
3 score 60
first I'd like to join table2 as follows
select *
from table1 left join on table2 using(id)
where label in ('tag')
id name tag
1 A a
1 A b
3 B b
and then join table2 with id and pivot and sum up them
id name tag time score
1 A a 10 20
1 A b 10 20
3 B b 50 60
I guess it is very complicated, are there any way to achieve this?
In Redshift it seems that there is no way to pivot them.
Thanks.

This looks to be a pivot query. I think this does what you are looking for:
create table table1 (id int, name varchar(16));
insert into table1 values
(1, 'A'),
(3, 'B')
;
create table table2 (id int, label varchar(16), value varchar(16));
insert into table2 values
(1, 'tag', 'a'),
(1, 'tag', 'b'),
(1, 'time', '10'),
(1, 'score', '20'),
(2, 'tag', 'a'),
(2, 'time', '30'),
(2, 'score', '40'),
(3, 'tag', 'b'),
(3, 'time', '50'),
(3, 'score', '60')
;
select t2.id, a.name, a.tag_value, sum(decode(label, 'time', value::int)) as total_time, sum(decode(label, 'score', value::int)) as total_score
from table2 t2
join (
select id, name, value as tag_value
from table1 t1 left join table2 t2 using(id)
where t2.label in ('tag')
) a
on t2.id = a.id
group by 1, 2, 3
order by 1, 2, 3
;

Related

SQL query to fetch distinct records

Can someone help me out with this sql query on postgres which I have to write but I just can't come up with, I have tried my best to simplify the problem from 1 million records and more constraints to this, I know this looks easy, but I am still unable to resolve this somehow :-
Table_name = t
Column_1_name = id
Column_2_name = st
Column_1_elements = [1,1,1,1,2,2,2,3,3]
Column_2_elements = [a,b,c,d,a,c,d,b,d]
Now I want to print to those distinct ids from id where they do not have their corresponding st equals to 'b' or 'a'.
For example, for the above example, the ouput should be [2,3] as 2 does not have corresponding 'b' and 3 does not have 'a'. [even though 3 does not have c also, but we are not concerned about 'c']. id=1 is not returned in solution as it has a relation with both 'a' and 'b'.
Let me know if you need more clarity.
Thanks in advance for helping.
edit1:- The number of elements for id = 1,2,3 could be anything. I just want those ids where there corresponding st does not "contain" 'a' or 'b'.
if there is an id=4 which has just one st which is 'r', and there is an id=5 which contains 'a','b','c','d','e','f','k','z'.
Then we want id=4 in the output as well as it does not contain 'a' or 'b'..
You might need to correct the syntax a little bit based on you SQL engine but this one is a working solution in Google BigQuery -
with temp as (
select 1 as id, 'a' as st union all
select 1 as id, 'b' as st union all
select 1 as id, 'c' as st union all
select 1 as id, 'd' as st union all
select 2 as id, 'a' as st union all
select 2 as id, 'c' as st union all
select 2 as id, 'd' as st union all
select 3 as id, 'b' as st union all
select 3 as id, 'd' as st union all
select 4 as id, 'e' as st union all
select 5 as id, 'g' as st union all
select 5 as id, 'h' as st
)
-- add 2 columns for is_a and is_b flags
, temp2 as (
select *
, case when st = 'a' then 1 else 0 end is_a
,case when st = 'b' then 1 else 0 end as is_b
from temp
)
-- IDs that have both the flags as 1 should be filtered out (like ID = 1)
select id
from temp2
group by 1
having max(is_a) + max(is_b) < 2
This solution takes care of the problem you mentioned with ID 4 . Let me know if this works for you.
See if this works:
create table t (id integer, st varchar);
insert into t values (1, 'a'), (1, 'b'), (1, 'c'), (1, 'd'), (2, 'a'), (2, 'c'), (2, 'd'), (3, 'b'), (3, 'd'), (4, 'r');
insert into t values (5, 'a'), (5, 'b'), (5, 'c'), (5, 'd'), (5, 'e'), (5, 'f'), (5, 'k'), (5, 'z');
select id, array['a', 'b'] <# array_agg(st)::text[] as tf from t group by id;
id | tf
----+----
3 | f
5 | t
4 | f
2 | f
1 | t
select * from (select id, array['a', 'b'] <# array_agg(st)::text[] as tf from t group by id) as agg where agg.tf = 'f';
id | tf
----+----
3 | f
4 | f
2 | f
In the first select query the array_agg(st) aggregates all the st values for an id via the group by id. array['a', 'b'] <# array_agg(st)::text[] then asks if the a and b are both in the array_agg.
The query is then turned into a sub-query where the outer query selects those rows that where 'f'(false), in other words did not have both a and b in the aggregated id values.

Remove duplicate value in different categories in same table SQL but keep the first category value

Let's say I have a table with id and category like the table below
D_id | D_category
-----------------
1 | A
2 | A
3 | A
1 | B
2 | B
4 | B
5 | B
1 | C
2 | C
4 | C
5 | C
6 | C
Hence the rules are like this
values in category A should not be appear in category B and category C
values in category B should not be appear in category C
The end result should be like this
D_id | D_category
-----------------
1 | A
2 | A
3 | A
4 | B
5 | B
6 | C
I will provide a solution that works but its not an ideal solution can anyone help me to provide a better solution in case there are more categories meaning that if there are more category then it should follow the rules the values in previous categories should not appear in any other categories
DECLARE #A TABLE(
D_id INT NOT NULL,
D_category VARCHAR(MAX));
INSERT INTO #A(D_id,D_category)
VALUES (1, 'A'),
(2, 'A'),
(3, 'A'),
(1, 'B'),
(2, 'B'),
(4, 'B'),
(5, 'B'),
(1, 'C'),
(2, 'C'),
(4, 'C'),
(5, 'C'),
(6, 'C')
DELETE t
FROM #A t
WHERE t.D_category = 'B' AND EXISTS (SELECT 1 FROM #A t2 WHERE t2.D_category = 'A' and t.D_id = t2.D_id)
DELETE t
FROM #A t
WHERE t.D_category = 'C' AND EXISTS (SELECT 1 FROM #A t2 WHERE t2.D_category = 'B' and t.D_id = t2.D_id)
DELETE t
FROM #A t
WHERE t.D_category = 'C' AND EXISTS (SELECT 1 FROM #A t2 WHERE t2.D_category = 'A' and t.D_id = t2.D_id)
select * from #A
Just check that the specified record doesn't exist earlier in the sequence.
select *
from #A A1
where not exists (
select 1
from #A A2
where A2.D_id = A1.D_id
and A2.D_category < A1.D_category
)
or just make use of row_number()
select *
from
(
select *, r = row_number() over (partition by D_id order by D_category)
from #A
) a
where a.r = 1
Delete using the join syntax:
delete a
from my_table a
join my_table b on a.D_id = b.D_id
and a.D_category > b.D_category
See live demo.

SQL sort and last record

this is my first post so pardon me if my question is not in it's appropriate places or tittle
I have a table like this
ID DATE Cat VALUE
-------------------------
1 07/07/2018 A 100
2 07/07/2018 A 200
3 07/07/2018 B 300
4 07/07/2018 B 400
5 07/07/2018 C 500
6 07/07/2018 C 600
7 08/07/2018 A 700
8 08/07/2018 A 800
9 08/07/2018 B 900
10 08/07/2018 B 110
11 08/07/2018 C 120
I would like to return
distinct category, sum of value, last record of the category
something like this
Cat sumValue lastrecord
--------------------------
A 1800 800
B 1710 110
C 1220 120
is it possible to do it in a single query
thanks
I am able to find the SUM
SELECT cat, SUM(value) FROM table GROUP BY cat;
and
find the last ID (autonumber key) using MAX
SELECT MAX(ID), cat FROM table GROUP BY cat;
but i just can't get the value for the last record
SQLFiddle
SELECT
t.cat,
SUM(t.value) as sumValue,
(
SELECT
t3.value
FROM
`table` t3
WHERE
t3.id = MAX(t2.id)
) as lastrecord
FROM
`table` t
JOIN
`table` t2 ON t.id = t2.id
GROUP BY
cat
EDIT shorter Version:
SELECT
t.cat,
SUM(t.value) as sumValue,
(SELECT value FROM `table` t2 WHERE t2.id = MAX(t.id)) lastValue
FROM
`table` t
GROUP BY
t.cat
This should do it
declare #t table (id int, cat char, value int);
insert into #t values
(1, 'A', 100),
(2, 'A', 200),
(3, 'B', 300),
(4, 'B', 400),
(5, 'C', 500),
(6, 'C', 600),
(7, 'A', 700),
(8, 'A', 800),
(9, 'B', 900),
(10, 'B', 110),
(11, 'C', 120);
select cat, value, sum
from
( select *
, sum(value) over (partition by cat) as sum
, ROW_NUMBER() over (partition by cat order by id desc) as rn
from #t
) tt
where tt.rn = 1
I hope you're looking for something like this,
Please replace the table name with your table name.
SELECT A.id,
A.cat,
A.date,
A.total_value,
A1.value
FROM (SELECT Max(id) AS id,
cat,
Max(date) AS Date,
Sum(value) AS Total_Value
FROM tbl_sof
GROUP BY cat) AS A
INNER JOIN tbl_sof A1
ON A.id = A1.id

how to do partitioning on VARCHAR column

DECLARE #Table1 TABLE
(ID int, STATUS varchar(1))
;
INSERT INTO #Table1
(ID, STATUS)
VALUES
(1, 'A'),
(1, 'A'),
(1, 'A'),
(1, 'B'),
(1, 'A'),
(2, 'C'),
(2, 'C')
;
Script :
Select *,ROW_NUMBER()OVER(PARTITION BY STATUS ORDER BY (SELECT NULL))RN from #Table1
Getting Result Set
ID STATUS RN
1 A 1
1 A 2
1 A 3
1 A 4
1 B 1
2 C 1
2 C 2
Need Output
ID STATUS RN
1 A 1
1 A 2
1 A 3
1 B 1
1 A 1
2 C 1
2 C 2
Try this
DECLARE #Table1 TABLE
(ID int, STATUS varchar(1));
INSERT INTO #Table1
(ID, STATUS)
VALUES
(1, 'A'),
(1, 'A'),
(1, 'A'),
(1, 'B'),
(1, 'A'),
(2, 'C'),
(2, 'C');
;WITH Tmp
AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS RowNumber FROM #Table1
)
SELECT
A.ID ,
A.STATUS ,
ROW_NUMBER() OVER (PARTITION BY A.STATUS, (A.RowNumber - A.RN) ORDER BY (SELECT NULL)) AS RN
FROM
(
Select *, ROW_NUMBER() OVER(PARTITION BY STATUS ORDER BY RowNumber) AS RN from tmp
) A
ORDER BY
A.RowNumber
Output:
ID STATUS RN
----------- ------ ------
1 A 1
1 A 2
1 A 3
1 B 1
1 A 1
2 C 1
2 C 2
Firstly, In the insert statement that you posted. How is 4 different from 1,2 and 3, if it is based on a different column then include that column as well in "row_number" in partition by sub clause. Because otherwise it will think that 'A' in 4 and 'A' in 1,2,3 are same and therefore group them together.
INSERT INTO #Table1
(ID, STATUS)
VALUES
(1, 'A'), <-- 1
(1, 'A'), <-- 2
(1, 'A'), <-- 3
(1, 'B'),
(1, 'A'), <-- 4
(2, 'C'),
(2, 'C')
;

Delete duplicate in sql based on other column value

I want to remove duplicates based on below condition.
My table contains data like cross relation. Column 1 value exist in column 2 and vice versa.
sample table
id id1
-------------
1 2
2 1
3 4
4 3
5 6
6 5
7 8
8 7
I want to delete 1 row from first two rows, same from third and forth, same for fifth and sixth and so on..
Can anyone please help?
Like this way you are going to delete just the second row from each group of 2 rows:
CREATE TABLE [LIST_ID](
[ID] [NUMERIC](4, 0) NOT NULL,
[ID_1] [NUMERIC](4, 0) NOT NULL
);
INSERT INTO LIST_ID (ID, ID_1)
VALUES
(1, 2),
(2, 1),
(3, 4),
(4, 3),
(5, 6),
(6, 5);
WITH First_Row AS
(
SELECT ROW_NUMBER() OVER (ORDER BY ID ASC) AS Row_Number, *
FROM LIST_ID
)
DELETE FROM First_Row WHERE Row_Number % 2 ='0';
SELECT * FROM LIST_ID;
How about this:
DELETE
FROM myTable
WHERE id IN (
SELECT CASE WHEN id < id1 THEN id ELSE id1 END
FROM myTable
)
Where myTable is the sample table with data.
declare #t table (id1 int, id2 int)
insert into #t (id1, id2)
values
(1, 2),
(2, 1),
(2, 1),
(2, 1),
(3, 4),
(3, 4),
(5, 6),
(7, 8),
(7, 6),
(6, 7),
(5, 0)
delete t2
from #t t1
inner join #t t2 on t2.id1 = t1.id2 and t2.id2 = t1.id1
where t2.id1 > t1.id1
select * from #t order by 1, 2
declare #t table (id1 int, id2 int)
insert into #t (id1, id2)
values
(1, 2),
(2, 1),
(3, 4),
(4, 3),
(5, 6),
(6, 5),
(7, 8),
(8, 7)
;
;with a as (
select
row_number() over (order by id1) rn
,t.id1
,t.id2
from
#t t
)
delete t from
#t t
join (
select
a.id1
,a.id2
from
a a
where
exists(
select
*
from
a b
where
a.id2 = b.id1 and a.id1 = b.id2 and a.rn > b.rn
)
) c on t.id1 = c.id1 and t.id2 = c.id2
;
select * from #t;
/* OUTPUT
id1 id2
1 2
3 4
5 6
7 8
*/
It'll vary a little based on which row you want to keep, but if you really have simple duplicates as in your example, and every pair exists in both orders, this should do it:
DELETE FROM MyTable
WHERE ID > ID1
So what i could understand you want to delete the rows from table where id = id1.
delete from TableA as a
where exists(select 1 from TableA as b where a.id = b.id1)