Postgresql - Looping through array_agg - sql

I have a table from which I need to calculate the number of times intent_level changes for each id.
Sample Table format :
id | start_time | intent_level
----+------------+--------------
1 | 2 | status
1 | 3 | status
1 | 1 |
1 | 4 | category
2 | 5 | status
2 | 8 |
2 | 7 | status
I couldn't figure out how to loop through array_agg and compare consecutive elements. Below I tried using array_agg, but then I don't know how to loop through it and compare consecutive elements.
select
id,
array_agg (intent_level ORDER BY start_time)
FROM temp.chats
GROUP BY id;
which gives output :
id | array_agg
----+-----------------------------
1 | {"",status,status,category}
2 | {status,status,""}
Desired output is :
id | array_agg
----+-----------------------------
1 | 2
2 | 1
2 (since value changes from "" to status(1 to 2) and status to category(3 to 4))
1 (since value changes from status to ""(2 to 3))
CREATE AND INSERT QUERIES :
create table temp.chats (
id varchar(5),
start_time varchar(5),
intent_level varchar(20)
);
insert into temp.chats values
('1', '2', 'status'),
('1', '3', 'status'),
('1', '1', ''),
('1', '4', 'category'),
('2', '5', 'status'),
('2', '8', ''),
('2', '7', 'status');

Use lag() and aggregate:
select id, count(*)
from (select c.*,
lag(intent_level) over (partition by id order by start_time) as prev_intent_level
from temp.chats c
) c
where prev_intent_level is distinct from intent_level
group by id;
Here is a db<>fiddle.
Arrays seem quite unnecessary for this.

Related

Grouping by column and rows

I have a table like this:
+----+--------------+--------+----------+
| id | name | weight | some_key |
+----+--------------+--------+----------+
| 1 | strawberries | 12 | 1 |
| 2 | blueberries | 7 | 1 |
| 3 | elderberries | 0 | 1 |
| 4 | cranberries | 8 | 2 |
| 5 | raspberries | 18 | 2 |
+----+--------------+--------+----------+
I'm looking for a generic request that would get me all berries where there are three entries with the same 'some_key' and one of the entries (within those three entries belonging to the same some_key) has the weight = 0
in case of the sample table, expected output would be:
1 strawberries
2 blueberries
3 cranberries
As you want to include non-grouped columns, I would approach this with window functions:
select id, name
from (
select id,
name,
count(*) over w as key_count,
count(*) filter (where weight = 0) over w as num_zero_weight
from fruits
window w as (partition by some_key)
) x
where x.key_count = 3
and x.num_zero_weight >= 1
The count(*) over w counts the number of rows in that group (= partition) and the count(*) filter (where weight = 0) over w counts how many of those have a weight of zero.
The window w as ... avoids repeating the same partition by clause for the window functions.
Online example: https://rextester.com/SGWFI49589
Try this-
SELECT some_key,
SUM(weight) --Sample aggregations on column
FROM your_table
GROUP BY some_key
HAVING COUNT(*) = 3 -- If you wants at least 3 then use >=3
AND SUM(CASE WHEN weight = 0 THEN 1 ELSE 0 END) >= 1
As per your edited question, you can try this below-
SELECT id, name
FROM your_table
WHERE some_key IN (
SELECT some_key
FROM your_table
GROUP BY some_key
HAVING COUNT(*) = 3 -- If you wants at least 3 then use >=3
AND SUM(CASE WHEN weight = 0 THEN 1 ELSE 0 END) >= 1
)
Try doing this.
Table structure and sample data
CREATE TABLE tmp (
id int,
name varchar(50),
weight int,
some_key int
);
INSERT INTO tmp
VALUES
('1', 'strawberries', '12', '1'),
('2', 'blueberries', '7', '1'),
('3', 'elderberries', '0', '1'),
('4', 'cranberries', '8', '2'),
('5', 'raspberries', '18', '2');
Query
SELECT t1.*
FROM tmp t1
INNER JOIN (SELECT some_key
FROM tmp
GROUP BY some_key
HAVING Count(some_key) >= 3
AND Min(Abs(weight)) = 0) t2
ON t1.some_key = t2.some_key;
Output
+-----+---------------+---------+----------+
| id | name | weight | some_key |
+-----+---------------+---------+----------+
| 1 | strawberries | 12 | 1 |
| 2 | blueberries | 7 | 1 |
| 3 | elderberries | 0 | 1 |
+-----+---------------+---------+----------+
Online Demo: http://sqlfiddle.com/#!15/70cca/26/0
Thank you, #mkRabbani for reminding me about the negative values.
Further reading
- ABS() Function - Link01, Link02
- HAVING Clause - Link01, Link02

How to remove rest of the rows with the same ID starting from the first duplicate?

I have the following structure for the table DataTable: every column is of the datatype int, RowID is an identity column and the primary key. LinkID is a foreign key and links to rows of an other table.
RowID LinkID Order Data DataSpecifier
1 120 1 1 1
2 120 2 1 3
3 120 3 1 10
4 120 4 1 13
5 120 5 1 10
6 120 6 1 13
7 371 1 6 2
8 371 2 3 5
9 371 3 8 1
10 371 4 10 1
11 371 5 7 2
12 371 6 3 3
13 371 7 7 2
14 371 8 17 4
.................................
.................................
I'm trying to do a query which alters every LinkID batch in the following way:
Take every row with same LinkID (e.g. the first batch is the first 6 rows here)
Order them by the Order column
Look at Data and DataSpecifier columns as one compare unit (They can be thought as one column, called dataunit):
Keep as many rows from Order 1 onwards, until a duplicate dataunit comes by
Delete every row from that first duplicate onwards for that LinkID
So for the LinkID 120:
Sort the batch (already sorted here, but should still do it)
Start looking from the top (So Order=1 here), go as long as you don't see a duplicate.
Stop at the first duplicate Order = 5 (dataunit 1 10 was already seen).
Delete everything which has the LinkID=120 AND Order>=5
After similar process for LinkID 371 (and every other LinkID in the table), the processed table will look like this:
RowID LinkID Order Data DataSpecifier
1 120 1 1 1
2 120 2 1 3
3 120 3 1 10
4 120 4 1 13
7 371 1 6 2
8 371 2 3 5
9 371 3 8 1
10 371 4 10 1
11 371 5 7 2
12 371 6 3 3
.................................
.................................
I've done quite a lot of SQL queries, but never something this complicated. I know I need to use a query which is something like this:
DELETE FROM DataTable
WHERE RowID IN (SELECT RowID
FROM DataTable
WHERE -- ?
GROUP BY LinkID
HAVING COUNT(*) > 1 -- ?
ORDER BY [Order]);
But I just can't seem to wrap my head around this and get the query right. I would preferably do this in pure SQL, with one executable (and reusable) query.
We can try using a CTE here to make things easier:
WITH cte AS (
SELECT *,
COUNT(*) OVER (PARTITION BY LinkID, Data, DataSpecifier ORDER BY [Order]) - 1 cnt
FROM DataTable
),
cte2 AS (
SELECT *,
SUM(cnt) OVER (PARTITION BY LinkID ORDER BY [Order]) num
FROM cte
)
DELETE
FROM cte
WHERE num > 0;
The logic here is to use COUNT as an analytic function to identify the duplicate records. We use a partition of LinkID along with Data and DataSpecifier. Any record with an Order value greater than or equal to the first record with a non zero count is then targeted for deletion.
Here is a demo showing that the logic of the CTE is correct:
Demo
You can use the ROW_NUMBER() window function to identify any rows that come after the original. After that you can delete and rows with a matching LinkID and a greater than or equal to any encountered Order with a row number greater than one.
(I originally used a second CTE to get the MIN order, but I realized that it wasn't necessary as long as the join to order was greater than equal to any order where there was a second instance of the DataUnitId. By removing the MIN the query plan became quite simple and efficient.)
WITH DataUnitInstances AS (
SELECT *
, ROW_NUMBER() OVER
(PARTITION BY LinkID, [Data], [DataSpecifier] ORDER BY [Order]) DataUnitInstanceId
FROM DataTable
)
DELETE FROM DataTable
FROM DataTable dt
INNER JOIN DataUnitInstances dup ON dup.LinkID = dt.LinkID
AND dup.[Order] <= dt.[Order]
AND dup.DataUnitInstanceId > 1
Here is the output from your sample data which matches your desired result:
+-------+--------+-------+------+---------------+
| RowID | LinkID | Order | Data | DataSpecifier |
+-------+--------+-------+------+---------------+
| 1 | 120 | 1 | 1 | 1 |
| 2 | 120 | 2 | 1 | 3 |
| 3 | 120 | 3 | 1 | 10 |
| 4 | 120 | 4 | 1 | 13 |
| 7 | 371 | 1 | 6 | 2 |
| 8 | 371 | 2 | 3 | 5 |
| 9 | 371 | 3 | 8 | 1 |
| 10 | 371 | 4 | 10 | 1 |
| 11 | 371 | 5 | 7 | 2 |
| 12 | 371 | 6 | 3 | 3 |
+-------+--------+-------+------+---------------+
This solution uses an APPLY to find the minimum order by each Link.
Set up:
IF OBJECT_ID('tempdb..#YourData') IS NOT NULL
DROP TABLE #YourData
CREATE TABLE #YourData (
RowID INT,
LinkID INT,
[Order] INT,
Data INT,
DataSpecifier INT)
INSERT INTO #YourData (
RowID,
LinkID,
[Order],
Data,
DataSpecifier)
VALUES
('1', ' 120', '1', '1', ' 1'),
('2', ' 120', '2', '1', ' 3'),
('3', ' 120', '3', '1', ' 10'),
('4', ' 120', '4', '1', ' 13'),
('5', ' 120', '5', '1', ' 10'),
('6', ' 120', '6', '1', ' 13'),
('7', ' 371', '1', '6', ' 2'),
('8', ' 371', '2', '3', ' 5'),
('9', ' 371', '3', '8', ' 1'),
('10', '371', '4', '10', '1'),
('11', '371', '5', '7', ' 2'),
('12', '371', '6', '3', ' 3'),
('13', '371', '7', '7', ' 2'),
('14', '371', '8', '17', '4')
Solution:
;WITH MinOrderToDeleteByLinkID AS
(
SELECT
T.LinkID,
MinOrder = MIN(C.[Order])
FROM
#YourData AS T
OUTER APPLY (
SELECT TOP 1
C.*
FROM
#YourData AS C
WHERE
C.LinkID = T.LinkID AND
C.Data = T.Data AND
C.DataSpecifier = T.DataSpecifier AND
C.[Order] > T.[Order]
ORDER BY
T.[Order]) AS C
GROUP BY
T.LinkID
)
DELETE Y FROM
-- SELECT Y.* FROM
#YourData AS Y
INNER JOIN MinOrderToDeleteByLinkID AS M ON
Y.LinkID = M.LinkID AND
Y.[Order] >= M.MinOrder
The rows to be deleted from this are the following:
RowID LinkID Order Data DataSpecifier
5 120 5 1 10
6 120 6 1 13
13 371 7 7 2
14 371 8 17 4
... which correspond to the point where the tuple Data-DataSpecified start to repeat for a particular LinkID.

Rownum order is incorrect after join - SQL Server

http://sqlfiddle.com/#!18/97fbe/1 - fiddle
I have tried to demo my real life scenario as much as possible
Tables:
CREATE TABLE [OrderTable]
(
[id] int,
[OrderGroupID] int,
[Total] int,
[fkPerson] int,
[fkitem] int
PRIMARY KEY (id)
)
INSERT INTO [OrderTable] (id, OrderGroupID, Total ,[fkPerson], [fkItem])
VALUES
('1', '1', '20', '1', '1'),
('2', '1', '45', '2', '2'),
('3', '2', '32', '1', '1'),
('4', '2', '30', '2', '2'),
('5', '2', '32', '1', '1'),
('6', '2', '32', '3', '1'),
('7', '2', '32', '4', '1'),
('8', '2', '32', '4', '1'),
('9', '2', '32', '5', '1');
CREATE TABLE [Person]
(
[id] int,
[Name] varchar(32)
PRIMARY KEY (id)
)
INSERT INTO [Person] (id, Name)
VALUES
('1', 'Fred'),
('2', 'Sam'),
('3', 'Ryan'),
('4', 'Tim'),
('5', 'Gary');
CREATE TABLE [Item]
(
[id] int,
[ItemNo] varchar(32),
[Price] int
PRIMARY KEY (id)
)
INSERT INTO [Item] (id, ItemNo, Price)
VALUES
('1', '453', '23'),
('2', '657', '34');
Query:
WITH TABLE1 AS
(
SELECT
-- P.ID AS [PersonID],
-- P.Name,
SUM(OT.[Total]) AS [Total],
i.[id] AS [ItemID],
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS rownum,
ot.fkperson,
[fkItem]
FROM
OrderTable OT
-- INNER JOIN Person P ON P.ID = OT.fkperson
INNER JOIN
Item I ON I.[id] = OT.[fkItem]
GROUP BY
-- P.ID, P.Name,
i.id, ot.fkperson, [fkItem]
)
SELECT
t1.[fkperson],
P.[Name],
t1.[itemid],
t1.[total],
t1.[rownum]
-- Totalrows = (SELECT MAX(rownum) FROM TABLE1)
FROM
TABLE1 T1
INNER JOIN
Person P ON P.ID = T1.fkperson
INNER JOIN
Item I ON I.[id] = T1.[fkItem]
Result:
| fkperson | Name | itemid | total | rownum |
+----------+------+--------+-------+--------+
| 1 | Fred | 1 | 84 | 1 |
| 3 | Ryan | 1 | 32 | 2 |
| 4 | Tim | 1 | 64 | 3 |
| 5 | Gary | 1 | 32 | 4 |
| 2 | Sam | 2 | 75 | 5 |
which is the result I want. However, my real-life example is giving me the row number in a weird order. I know its an issue with a join because when i comment these join:
INNER JOIN
Person P ON P.ID = T1.fkperson
INNER JOIN
Item I ON I.[id] = T1.[fkItem]
out it works fine.
| fkperson | Name | itemid | total | rownum |
|----------|------|--------|-------|--------|
| 1 | Fred | 1 | 84 | 4 |
| 3 | Ryan | 1 | 32 | 3 |
| 4 | Tim | 1 | 64 | 5 |
| 5 | Gary | 1 | 32 | 1 |
| 2 | Sam | 2 | 75 | 2 |
Has anyone got any advice on how the join would be causing these weird rownumber ordering? Or point me in the right direction. Thanks
Any relational database is inherently UNordered - and you won't get any guaranteed order UNLESS you explicitly ask for it - by means of an ORDER BY clause on your outer query.
You need to add the ORDER BY explicitly - like this:
WITH TABLE1 AS
(
.....
)
SELECT
(list of columns ....)
FROM
TABLE1 T1
INNER JOIN
Person P ON P.ID = T1.fkperson
INNER JOIN
Item I ON I.[id] = T1.[fkItem]
ORDER BY
T1.rownum
You are using order by (select null). That means indeterminate ordering. And the order can change from one invocation of the query to another.
You should not be depending on default ordering, even by an external order by. If you want values in a particular order, specify that ordering explicitly in the order by in the windowing clause.

SQL combine two records based on one value

Update - work done in SQL-92
I work in SQL reporting tool and trying to combine two records into one. Let's say there as some duplicates were time got split into two values and hence the duplication. Basically any values that are not duplicated should be added
wo---text---time---value
1----test---5------1
1----test---2------a
3----aaaa---3------1
4----bbbb---4------2
Results
wo---text---time----value
1----test---7--------1a
3----aaaa---3--------1
4----bbbb---4--------2
I tried:
SELECT ....
FROM ....
GROUP BY wo SUM (time) but that did not even work.
Set-up:
create table so48345659a
(
wo integer,
text varchar(4),
time integer,
value varchar(2)
);
create table so48345659b
(
wo integer,
text varchar(4),
time integer,
value varchar(2)
);
insert into so48345659a (wo, text, time, value) values (1, 'test', 5, '1');
insert into so48345659a (wo, text, time, value) values (1, 'test', 2, 'a');
insert into so48345659a (wo, text, time, value) values (3, 'aaaa', 3, '1');
insert into so48345659a (wo, text, time, value) values (4, 'bbbb', 4, '2');
insert into so48345659b (wo, text, time, value) values (1, 'test', 7, '1a');
insert into so48345659b (wo, text, time, value) values (3, 'aaaa', 3, '1');
insert into so48345659b (wo, text, time, value) values (4, 'bbbb', 4, '2');
Union, by default removes duplicates
select wo, text, time, value from so48345659a
union
select wo, text, time, value from so48345659b;
Result:
wo | text | time | value
----+------+------+-------
1 | test | 7 | 1a
1 | test | 2 | a
3 | aaaa | 3 | 1
1 | test | 5 | 1
4 | bbbb | 4 | 2
(5 rows)
So now run sum on the union
select
wo,
sum(time) as total_time
from
(
select wo, text, time, value from so48345659a
union
select wo, text, time, value from so48345659b
) x
group by
wo;
Result:
wo | total_time
----+------------
3 | 3
1 | 14
4 | 4
(3 rows)
From your supplementary question (22-Jan-2017), I guess you mean that you have one table that contains duplicate rows. Is that right?
If so, it might look like this:
select * from so48345659c;
wo | text | time | value
----+------+------+-------
1 | test | 5 | 1
1 | test | 2 | a
3 | aaaa | 3 | 1
4 | bbbb | 4 | 2
1 | test | 7 | 1a
3 | aaaa | 3 | 1
4 | bbbb | 4 | 2
(7 rows)
So then you get the sum of the times, ignoring duplicate rows, like this:
select
wo,
sum(time) as total_time
from
(
select distinct wo, text, time, value from so48345659c
) x
group by
wo;
wo | total_time
----+------------
3 | 3
1 | 14
4 | 4
(3 rows)
With just two values, you can do:
select wo, text, sum(time) as time, concat(min(value), max(value)) as value
from t
group by wo, text;
This uses the fact that the standard representation of 1 has a value less than a.
Most databases support string aggregation of some sort (group_concat(), listagg(), and string_agg() are typical functions). You can use one of these for a more general solution.

How to limit SQL result with recurrance

I have a SQL Table with recurring tasks. For instance:
+------------------+----+--------------+
| Task | ID | RecurranceID |
+------------------+----+--------------+
| Take Out Garbage | 1 | 0 |
| Order Pizza | 2 | 0 |
| Eat Breakfast | 3 | 1 |
| Eat Breakfast | 4 | 1 |
| Eat Breakfast | 5 | 1 |
| Order Pizza | 6 | 0 |
+------------------+----+--------------+
Anything with a RecurranceID of 0 is not a recurring task, but otherwise it is a recurring task.
How can I show all tasks with a limit of one row on a recurring task?
I would just like the resulting set to show:
+------------------+----+
| Task | ID |
+------------------+----+
| Take Out Garbage | 1 |
| Order Pizza | 2 |
| Eat Breakfast | 3 |
| Order Pizza | 6 |
+------------------+----+
Using SQL Server 2012
Thank you!
;WITH MyCTE AS
(
SELECT Task,
ID,
ROW_NUMBER() OVER (PARTITION BY TASK ORDER BY ID) AS rn
FROM Tasks
)
SELECT *
FROM MyCTE
WHERE rn = 1
It is not clear by your sample data, but you may need to also apply RecurranceID in the PARTITION BY clause, as bellow:
;WITH MyCTE AS
(
SELECT Task,
ID,
RecurranceID,
ROW_NUMBER() OVER (PARTITION BY TASK,RecurranceID ORDER BY ID) AS rn
FROM Tasks
)
SELECT *
FROM MyCTE
WHERE rn = 1
OR RecurranceID = 0
You seem to want all non-recurring tasks returned, along with a single row for each recurring task (whether or not it shares names with a non-recurring task). Are you looking for:
SELECT Task, ID
FROM RecurringTaskTable
WHERE RecurrenceID = 0
UNION ALL
SELECT Task, MIN(ID) AS ID
FROM RecurringTaskTable
WHERE RecurrenceID <> 0
GROUP BY Task
Try this:
select distinct task
, DENSE_RANK () OVER (order by task) dr
from tasks
Hope This Helps,
DECLARE #test TABLE(Task VARCHAR(30),
ID INT,
RecurranceID INT)
INSERT INTO #test
VALUES('Take Out Garbage', '1', '0'),
('Order Pizza', '2', '0'),
('Eat Breakfast', '3', '1'),
('Eat Breakfast', '4', '1'),
('Eat Breakfast', '5', '1'),
('Order Pizza', '6', '0')
;WITH cte_OneofEach AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY Task ORDER BY ID DESC) AS rn
FROM #test
WHERE RecurranceID = '1'
)
SELECT Task,ID,RecurranceID
FROM #test
WHERE RecurranceID = '0'
UNION
SELECT Task,ID,RecurranceID
FROM cte_OneofEach
WHERE rn = 1