How to limit SQL result with recurrance - sql

I have a SQL Table with recurring tasks. For instance:
+------------------+----+--------------+
| Task | ID | RecurranceID |
+------------------+----+--------------+
| Take Out Garbage | 1 | 0 |
| Order Pizza | 2 | 0 |
| Eat Breakfast | 3 | 1 |
| Eat Breakfast | 4 | 1 |
| Eat Breakfast | 5 | 1 |
| Order Pizza | 6 | 0 |
+------------------+----+--------------+
Anything with a RecurranceID of 0 is not a recurring task, but otherwise it is a recurring task.
How can I show all tasks with a limit of one row on a recurring task?
I would just like the resulting set to show:
+------------------+----+
| Task | ID |
+------------------+----+
| Take Out Garbage | 1 |
| Order Pizza | 2 |
| Eat Breakfast | 3 |
| Order Pizza | 6 |
+------------------+----+
Using SQL Server 2012
Thank you!

;WITH MyCTE AS
(
SELECT Task,
ID,
ROW_NUMBER() OVER (PARTITION BY TASK ORDER BY ID) AS rn
FROM Tasks
)
SELECT *
FROM MyCTE
WHERE rn = 1
It is not clear by your sample data, but you may need to also apply RecurranceID in the PARTITION BY clause, as bellow:
;WITH MyCTE AS
(
SELECT Task,
ID,
RecurranceID,
ROW_NUMBER() OVER (PARTITION BY TASK,RecurranceID ORDER BY ID) AS rn
FROM Tasks
)
SELECT *
FROM MyCTE
WHERE rn = 1
OR RecurranceID = 0

You seem to want all non-recurring tasks returned, along with a single row for each recurring task (whether or not it shares names with a non-recurring task). Are you looking for:
SELECT Task, ID
FROM RecurringTaskTable
WHERE RecurrenceID = 0
UNION ALL
SELECT Task, MIN(ID) AS ID
FROM RecurringTaskTable
WHERE RecurrenceID <> 0
GROUP BY Task

Try this:
select distinct task
, DENSE_RANK () OVER (order by task) dr
from tasks

Hope This Helps,
DECLARE #test TABLE(Task VARCHAR(30),
ID INT,
RecurranceID INT)
INSERT INTO #test
VALUES('Take Out Garbage', '1', '0'),
('Order Pizza', '2', '0'),
('Eat Breakfast', '3', '1'),
('Eat Breakfast', '4', '1'),
('Eat Breakfast', '5', '1'),
('Order Pizza', '6', '0')
;WITH cte_OneofEach AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY Task ORDER BY ID DESC) AS rn
FROM #test
WHERE RecurranceID = '1'
)
SELECT Task,ID,RecurranceID
FROM #test
WHERE RecurranceID = '0'
UNION
SELECT Task,ID,RecurranceID
FROM cte_OneofEach
WHERE rn = 1

Related

Big query query is too complex after pivot

Assume I have the following table table and a list of interests (cat, dog, music, soccer, coding)
| userId | user_interest | label |
| -------- | -------------- |----------|
| 12345 | cat | 1 |
| 12345 | dog | 1 |
| 6789 | music | 1 |
| 6789 | soccer | 1 |
I want to transform the user interest into a binary array (i.e. binarization), and the resulting table will be something like
| userId | labels |
| -------- | -------------- |
| 12345 | [1,1,0,0,0] |
| 6789 | [0,0,1,1,0] |
I am able to do it with PIVOT and ARRAY, e.g.
WITH user_interest_pivot AS (
SELECT
*
FROM (
SELECT userId, user_interest, label FROM table
) AS T
PIVOT
(
MAX(label) FOR user_interestc IN ('cat', 'dog', 'music', 'soccer', 'coding')
) AS P
)
SELECT
userId,
ARRAY[IFNULL(cat,0), IFNULL(dog,0), IFNULL(music,0), IFNULL(soccer,0), IFNULL(coding,0)] AS labels,
FROM user_interea_pivot
HOWEVER, in reality I have a very long list of interests, and the above method in bigquery seems to not work due to
Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too comple
Please help to let me know if there is anything I can do to deal with this situation. Thanks!
Still it's likely to face resource problem depending on your real data, but it is worth trying the following approach without PIVOT.
Create interests table with additional index column first
+----------+-----+-----------------+
| interest | idx | total_interests |
+----------+-----+-----------------+
| cat | 0 | 5 |
| dog | 1 | 5 |
| music | 2 | 5 |
| soccer | 3 | 5 |
| coding | 4 | 5 |
+----------+-----+-----------------+
find idx of each user interest and aggreage them like below. (assuming that user intererest is sparse over overall interests)
SELECT userId, ARRAY_AGG(idx) user_interests
FROM sample_table t JOIN interests i ON t.user_interest = i.interest
GROUP BY 1
Lastly, create labels vector using a sparse user interest array and dimension of interest space (i.e. total_interests) like below
ARRAY(SELECT IF(ui IS NULL, 0, 1)
FROM UNNEST(GENERATE_ARRAY(0, total_interests - 1)) i
LEFT JOIN t.user_interests ui ON i = ui
ORDER BY i
) AS labels
Query
CREATE TEMP TABLE sample_table AS
SELECT '12345' AS userId, 'cat' AS user_interest, 1 AS label UNION ALL
SELECT '12345' AS userId, 'dog' AS user_interest, 1 AS label UNION ALL
SELECT '6789' AS userId, 'music' AS user_interest, 1 AS label UNION ALL
SELECT '6789' AS userId, 'soccer' AS user_interest, 1 AS label;
CREATE TEMP TABLE interests AS
SELECT *, COUNT(1) OVER () AS total_interests
FROM UNNEST(['cat', 'dog', 'music', 'soccer', 'coding']) interest
WITH OFFSET idx
;
SELECT userId,
ARRAY(SELECT IF(ui IS NULL, 0, 1)
FROM UNNEST(GENERATE_ARRAY(0, total_interests - 1)) i
LEFT JOIN t.user_interests ui ON i = ui
ORDER BY i
) AS labels
FROM (
SELECT userId, total_interests, ARRAY_AGG(idx) user_interests
FROM sample_table t JOIN interests i ON t.user_interest = i.interest
GROUP BY 1, 2
) t;
Query results
I think below approach will "survive" any [reasonable] data
create temp function base10to2(x float64) returns string
language js as r'return x.toString(2);';
with your_table as (
select '12345' as userid, 'cat' as user_interest, 1 as label union all
select '12345' as userid, 'dog' as user_interest, 1 as label union all
select '6789' as userid, 'music' as user_interest, 1 as label union all
select '6789' as userid, 'soccer' as user_interest, 1 as label
), interests as (
select *, pow(2, offset) weight, max(offset + 1) over() as len
from unnest(['cat', 'dog', 'music', 'soccer', 'coding']) user_interest
with offset
)
select userid,
split(rpad(reverse(base10to2(sum(weight))), any_value(len), '0'), '') labels,
from your_table
join interests
using(user_interest)
group by userid
with output

Grouping by column and rows

I have a table like this:
+----+--------------+--------+----------+
| id | name | weight | some_key |
+----+--------------+--------+----------+
| 1 | strawberries | 12 | 1 |
| 2 | blueberries | 7 | 1 |
| 3 | elderberries | 0 | 1 |
| 4 | cranberries | 8 | 2 |
| 5 | raspberries | 18 | 2 |
+----+--------------+--------+----------+
I'm looking for a generic request that would get me all berries where there are three entries with the same 'some_key' and one of the entries (within those three entries belonging to the same some_key) has the weight = 0
in case of the sample table, expected output would be:
1 strawberries
2 blueberries
3 cranberries
As you want to include non-grouped columns, I would approach this with window functions:
select id, name
from (
select id,
name,
count(*) over w as key_count,
count(*) filter (where weight = 0) over w as num_zero_weight
from fruits
window w as (partition by some_key)
) x
where x.key_count = 3
and x.num_zero_weight >= 1
The count(*) over w counts the number of rows in that group (= partition) and the count(*) filter (where weight = 0) over w counts how many of those have a weight of zero.
The window w as ... avoids repeating the same partition by clause for the window functions.
Online example: https://rextester.com/SGWFI49589
Try this-
SELECT some_key,
SUM(weight) --Sample aggregations on column
FROM your_table
GROUP BY some_key
HAVING COUNT(*) = 3 -- If you wants at least 3 then use >=3
AND SUM(CASE WHEN weight = 0 THEN 1 ELSE 0 END) >= 1
As per your edited question, you can try this below-
SELECT id, name
FROM your_table
WHERE some_key IN (
SELECT some_key
FROM your_table
GROUP BY some_key
HAVING COUNT(*) = 3 -- If you wants at least 3 then use >=3
AND SUM(CASE WHEN weight = 0 THEN 1 ELSE 0 END) >= 1
)
Try doing this.
Table structure and sample data
CREATE TABLE tmp (
id int,
name varchar(50),
weight int,
some_key int
);
INSERT INTO tmp
VALUES
('1', 'strawberries', '12', '1'),
('2', 'blueberries', '7', '1'),
('3', 'elderberries', '0', '1'),
('4', 'cranberries', '8', '2'),
('5', 'raspberries', '18', '2');
Query
SELECT t1.*
FROM tmp t1
INNER JOIN (SELECT some_key
FROM tmp
GROUP BY some_key
HAVING Count(some_key) >= 3
AND Min(Abs(weight)) = 0) t2
ON t1.some_key = t2.some_key;
Output
+-----+---------------+---------+----------+
| id | name | weight | some_key |
+-----+---------------+---------+----------+
| 1 | strawberries | 12 | 1 |
| 2 | blueberries | 7 | 1 |
| 3 | elderberries | 0 | 1 |
+-----+---------------+---------+----------+
Online Demo: http://sqlfiddle.com/#!15/70cca/26/0
Thank you, #mkRabbani for reminding me about the negative values.
Further reading
- ABS() Function - Link01, Link02
- HAVING Clause - Link01, Link02

Selecting specific distinct column in SQL

I am trying to create a select statement so that it does a specific distinct on one column. I am trying to make it so that there is not multiple fruits within each id. If there is multiple fruits under an id, I would like use only 1 approved fruit, over the rotten fruit. If there is only 1 fruit under that id, use it.
SELECT id, fruit, fruitweight, status
FROM myfruits
Raw data from current select
id | fruit | fruitweight | status
1 | apple | .2 | approved
1 | apple | .8 | approved
1 | apple | .1 | rotten
1 | orange | .5 | approved
2 | grape | .1 | rotten
2 | orange | .7 | approved
2 | orange | .5 | approved
How it should be formatted after constraint
id | fruit | fruitweight | status
1 | apple | .2 | approved
1 | orange | .5 | approved
2 | grape | .1 | rotten
2 | orange | .7 | approved
I can do something along the lines of select distinct id,fruit,fruitweight,status from myfruits,
but that will only take out the duplicates if all columns are the same.
CTE with aggregate and row_number.
declare #YourTable table (id int, fruit varchar(64), fruitweight decimal(2,1),status varchar(64))
insert into #YourTable
values
(1,'apple',0.2,'approved'),
(1,'apple',0.8,'approved'),
(1,'apple',0.1,'rotten'),
(1,'orange',0.5,'approved'),
(2,'grape',0.1,'rotten'),
(2,'orange',0.7,'approved'),
(2,'orange',0.5,'approved')
;with cte as(
select
id
,fruit
,fruitweight = min(fruitweight)
,[status]
,RN = row_number() over (partition by id, fruit order by case when status = 'approved' then 1 else 2 end)
from
#YourTable
group by
id,fruit,status)
select
id
,fruit
,fruitweight
,status
from
cte
where RN = 1
Another method, without the aggregate... assuming you want the first fruightweight
;with cte as(
select
id
,fruit
,fruitweight
,[status]
,RN = row_number() over (partition by id, fruit order by case when status = 'approved' then 1 else 2 end, fruitweight)
from
#YourTable)
select
id
,fruit
,fruitweight
,status
from
cte
where RN = 1
Another option is using the WITH TIES clause.
Example
Select top 1 with ties *
From YourTable
Order By Row_Number() over (Partition By id,fruit order by status,fruitweight)
A shorter version of scsimon's solution without aggregates.
If you have SQL Server < 2012, you'll have to use case instead of iif.
select
id
,fruit
,fruitweight
,status
from
(
select
id
,fruit
,fruitweight
,status
,rownum = row_number() over(partition by id, fruit order by iif(status = 'approved', 0, 1), fruitweight desc)
from myfruits
) x
where rownum = 1
EDIT: I started writing before scsimon edited his post to included a version without aggregates...

Selecting row with highest ID based on another column

In SQL Server 2008 R2, suppose I have a table layout like this...
+----------+---------+-------------+
| UniqueID | GroupID | Title |
+----------+---------+-------------+
| 1 | 1 | TEST 1 |
| 2 | 1 | TEST 2 |
| 3 | 3 | TEST 3 |
| 4 | 3 | TEST 4 |
| 5 | 5 | TEST 5 |
| 6 | 6 | TEST 6 |
| 7 | 6 | TEST 7 |
| 8 | 6 | TEST 8 |
+----------+---------+-------------+
Is it possible to select every row with the highest UniqueID number, for each GroupID. So according to the table above - if I ran the query, I would expect this...
+----------+---------+-------------+
| UniqueID | GroupID | Title |
+----------+---------+-------------+
| 2 | 1 | TEST 2 |
| 4 | 3 | TEST 4 |
| 5 | 5 | TEST 5 |
| 8 | 6 | TEST 8 |
+----------+---------+-------------+
Been chomping on this for a while, but can't seem to crack it.
Many thanks,
SELECT *
FROM (SELECT uniqueid, groupid, title,
Row_number()
OVER ( partition BY groupid ORDER BY uniqueid DESC) AS rn
FROM table) a
WHERE a.rn = 1
With SQL-Server as rdbms you can use a ranking function like ROW_NUMBER:
WITH CTE AS
(
SELECT UniqueID, GroupID, Title,
RN = ROW_NUMBER() OVER (PARTITON BY GroupID
ORDER BY UniqueID DESC)
FROM dbo.TableName
)
SELECT UniqueID, GroupID, Title
FROM CTE
WHERE RN = 1
This returns exactly one record for each GroupID even if there are multiple rows with the highest UniqueID (the name does not suggest so). If you want to return all rows in then use DENSE_RANK instead of ROW_NUMBER.
Here you can see all functions and how they work: http://technet.microsoft.com/en-us/library/ms189798.aspx
Since you have not mentioned any RDBMS, this statement below will work on almost all RDBMS. The purpose of the subquery is to get the greatest uniqueID for every GROUPID. To be able to get the other columns, the result of the subquery is joined on the original table.
SELECT a.*
FROM tableName a
INNER JOIN
(
SELECT GroupID, MAX(uniqueID) uniqueID
FROM tableName
GROUP By GroupID
) b ON a.GroupID = b.GroupID
AND a.uniqueID = b.uniqueID
In the case that your RDBMS supports Qnalytic functions, you can use ROW_NUMBER()
SELECT uniqueid, groupid, title
FROM
(
SELECT uniqueid, groupid, title,
ROW_NUMBER() OVER (PARTITION BY groupid
ORDER BY uniqueid DESC) rn
FROM tableName
) x
WHERE x.rn = 1
TSQL Ranking Functions
The ROW_NUMBER() generates sequential number which you can filter out. In this case the sequential number is generated on groupid and sorted by uniqueid in descending order. The greatest uniqueid will have a value of 1 in rn.
SELECT *
FROM the_table tt
WHERE NOT EXISTS (
SELECT *
FROM the_table nx
WHERE nx.GroupID = tt.GroupID
AND nx.UniqueID > tt.UniqueID
)
;
Should work in any DBMS (no window functions or CTEs are needed)
is probably faster than a sub query with an aggregate
Keeping it simple:
select * from test2
where UniqueID in (select max(UniqueID) from test2 group by GroupID)
Considering:
create table test2
(
UniqueID numeric,
GroupID numeric,
Title varchar(100)
)
insert into test2 values(1,1,'TEST 1')
insert into test2 values(2,1,'TEST 2')
insert into test2 values(3,3,'TEST 3')
insert into test2 values(4,3,'TEST 4')
insert into test2 values(5,5,'TEST 5')
insert into test2 values(6,6,'TEST 6')
insert into test2 values(7,6,'TEST 7')
insert into test2 values(8,6,'TEST 8')

SQL Select only once of each id, chosen by earliest datetime

I'm currently having to run a query like the below for a one off report process at work.
However for each item in the table there are multiple associated "messages" that are all saved, this means each item is returned multiple times. I'd like to only show each item once, as per the examples and further explanation below.
I realize this is (at least in my opinion) a poor structure, but the report needs to be done and this is how the data is stored :-(
SQL Fiddle: http://sqlfiddle.com/#!6/76fce/8
Query:
SELECT messageId, receiver, createdDate, itemId from messages_0,items WHERE
itemId IN (1, 2, 3)
AND (receiver = '100' OR receiver = '200')
AND messages_0.description LIKE '%'+items.name+'%'
union all
SELECT messageId, receiver, createdDate, itemId from messages_1,items WHERE
itemId IN (1, 2, 3)
AND (receiver = '100' OR receiver = '200')
AND messages_1.description LIKE '%'+items.name+'%'
Note: there are two message tables, hence the union all
Example messages:
messageId | receiver | createdDate | description
--------------
1 | 100 | 2012/11/27 12:00 | The Dog is awesome
2 | 100 | 2012/11/27 13:00 | Now the Dog is boring
4 | 200 | 2012/11/27 11:30 | I have Wood :-)
Example items:
itemID | name
--------------
1 | Dave
2 | Dog
3 | Wood
Result:
messageId | receiver | createdDate | itemId
1 | 100 | 2012/11/27 12:00 | 2
2 | 100 | 2012/11/27 13:00 | 2
4 | 200 | 2012/11/27 11:00 | 3
However, I need to only show each item once. Where only the oldest row (by the createdDate) is shown.
Target Result:
messageId | receiver | createdDate | itemId
1 | 100 | 2012/11/27 12:00 | 2
4 | 200 | 2012/11/27 11:00 | 3
How can I do this in SQL (Sybase)?
So far I have been looking at both group by (which would only return an id) and some sort of sub query, but have been unable to get anything to work!
SQL Fiddle: http://sqlfiddle.com/#!6/76fce/8
If I understood you right, something like this could be a start.
SELECT
t, messageId, receiver, createdDate, itemId
FROM
(
SELECT
m.messageId, m.receiver, m.createdDate, m.t,
i.itemId
FROM
items i
INNER JOIN (
SELECT description, messageId, receiver, createdDate, 0 t FROM messages_0
UNION
SELECT description, messageId, receiver, createdDate, 1 t FROM messages_1
) m ON m.description LIKE '%' + i.name + '%'
AND m.receiver IN ('100', '200')
WHERE
i.itemId IN (1, 2, 3)
) data
WHERE
createdDate = (
SELECT MIN(createdDate) FROM (
SELECT createdDate FROM messages_0 WHERE messageId = data.messageId AND data.t = 0
UNION
SELECT createdDate FROM messages_1 WHERE messageId = data.messageId AND data.t = 1
)
)
I would put indexes on
messages_0 / messages_1
(messageId, createdDate)
(receiver, messageId, createdDate, description)
items
(itemId, name)