Keeping a running list with a specific limit

Keeping a running list with a specific limit - sql

I have the following sample dataset.
DECLARE _MAX_ACTIVE_ENROLLMENTS INT64 DEFAULT 2;
WITH `activity` AS (
SELECT "2022-01-01" AS `date_time`, "tim" AS `username`, "enrolled" AS `activity` UNION ALL
SELECT "2022-01-02" AS `date_time`, "sarah" AS `username`, "enrolled" AS `activity` UNION ALL
SELECT "2022-01-04" AS `date_time`, "tim" AS `username`, "extended" AS `activity` UNION ALL
SELECT "2022-01-05" AS `date_time`, "ed" AS `username`, "enrolled" AS `activity` UNION ALL
SELECT "2022-01-06" AS `date_time`, "ed" AS `username`, "extended" AS `activity` UNION ALL
SELECT "2022-01-07" AS `date_time`, "tim" AS `username`, "canceled" AS `activity` UNION ALL
SELECT "2022-01-08" AS `date_time`, "ed" AS `username`, "canceled" AS `activity` UNION ALL
SELECT "2022-01-09" AS `date_time`, "lisa" AS `username`, "enrolled" AS `activity` UNION ALL
SELECT "2022-01-10" AS `date_time`, "sarah" AS `username`, "canceled" AS `activity`
)
I would like to analyze this dataset based on a specific limit to the number of enrollments. So have an overview of which enrollments would have been invalid if the limit was lower than the number of people applying.
The thing that I'm struggling with in this challenge is that, on every row, you need to keep a running list of active_enrollments that contains the usernames. You need to keep this list to know which "canceled" events you should process and which ones you can ignore. That's how I'm trying to solve the challenge, but maybe there is an entirely different way.
This is the outcome that I'm looking for.
-- _MAX_ACTIVE_ENROLLMENTS = 1
| date_time | username | activity | valid_enrollment |
|------------|----------|----------|------------------|
| 2022-01-01 | tim | enrolled | 1 |
| 2022-01-02 | sarah | enrolled | 0 |
| 2022-01-04 | tim | extended | 1 |
| 2022-01-05 | ed | enrolled | 0 |
| 2022-01-06 | ed | extended | 0 |
| 2022-01-07 | tim | canceled | 1 |
| 2022-01-08 | ed | canceled | 0 |
| 2022-01-09 | lisa | enrolled | 1 |
| 2022-01-10 | sarah | canceled | 0 |
-- _MAX_ACTIVE_ENROLLMENTS = 2
| date_time | username | activity | valid_enrollment |
|------------|----------|----------|------------------|
| 2022-01-01 | tim | enrolled | 1 |
| 2022-01-02 | sarah | enrolled | 1 |
| 2022-01-04 | tim | extended | 1 |
| 2022-01-05 | ed | enrolled | 0 |
| 2022-01-06 | ed | extended | 0 |
| 2022-01-07 | tim | canceled | 1 |
| 2022-01-08 | ed | canceled | 0 |
| 2022-01-09 | lisa | enrolled | 1 |
| 2022-01-10 | sarah | canceled | 1 |
Any help or pointers in the right direction are much appreciated!

There is the need to generate an array as a list with all enroll people. In the next steps the "canceled" people are removed from that array.
DECLARE _MAX_ACTIVE_ENROLLMENTS INT64 DEFAULT 2;
WITH activity AS (
# SELECT "2021-01-05" AS `date_time`, "ed" AS `username`, "enrolled" AS `activity` UNION ALL
SELECT "2022-01-01" AS `date_time`, "tim" AS `username`, "enrolled" AS activity UNION ALL
SELECT "2022-01-02" AS `date_time`, "sarah" AS `username`, "enrolled" AS `activity` UNION ALL
SELECT "2022-01-04" AS `date_time`, "tim" AS `username`, "extended" AS `activity` UNION ALL
SELECT "2022-01-05" AS `date_time`, "ed" AS `username`, "enrolled" AS `activity` UNION ALL
SELECT "2022-01-06" AS `date_time`, "ed" AS `username`, "extended" AS `activity` UNION ALL
SELECT "2022-01-07" AS `date_time`, "tim" AS `username`, "canceled" AS `activity` UNION ALL
SELECT "2022-01-08" AS `date_time`, "ed" AS `username`, "canceled" AS `activity` UNION ALL
SELECT "2022-01-09" AS `date_time`, "lisa" AS `username`, "enrolled" AS `activity` UNION ALL
SELECT "2022-01-10" AS `date_time`, "sarah" AS `username`, "canceled" AS `activity`
),
help as (select *,
if(activity.activity="enrolled",1,0) as enroll,
if(activity.activity="canceled",1,0) as cancel,
from activity ),
help1 as (select *,
array_agg(if(enroll=1,username,"NULL")) over win as arr1
from help
window win as (order by date_time range between unbounded preceding and current row)
),
help2 as (select *,
(Select array_agg(x order by offset) from (select x,offset from unnest(arr1) x with offset where cancel!=1 or x!=username)) as arr2
from help1) ,
help3 as (select *,
(Select array_agg(x order by offset) from
(Select x,offset from
(select x,min(offset) offset from unnest(arr2) x with offset where x!="NULL" group by x)
qualify row_number() over (order by offset)<= _MAX_ACTIVE_ENROLLMENTS)
) as arr3
from help2)
Select *,
array_length(arr3) as arr3_length,
array_length(arr3)-ifnull((array_length(lag(arr3) over (order by date_time))),0) as delta_enroll,
enroll-cancel as delta_enroll_expected
from
help3
order by date_time
The table help adds two new columns enroll and cancel, each set to one, if true.
The table help1 generates an array arr of all enrolled people to that date. If in a row not an enroll happens, we add the string "NULL" and delete these in the further steps.
The table help2 removes the people who canceled. This is done by a sub-select-query.
The table3 removes the "NULL" dummy string and if there are people applying several times. Also the array is set to the maximum length _MAX_ACTIVE_ENROLLMENTS.
Finally we check for the change of the enrollment list delta_enroll. If comparing this value with the delta_enroll_expected the enrollment/canceling can be check for successfulness.

I hope that below give you some direction to your problem.
It will return enrolled usernames limited to the number of _MAX_ACTIVE_ENROLLMENTS at a given datetime, (though valid_enrollment column in your outcome is not clear to me).
SELECT * EXCEPT (enrollments),
ARRAY (
SELECT username FROM (
SELECT DISTINCT username,
LAST_VALUE(IF(activity = 'enrolled', date_time, NULL) IGNORE NULLS) OVER w1 AS last_enrolled
FROM UNNEST(enrollments)
QUALIFY LAST_VALUE(activity) OVER w1 <> 'canceled'
WINDOW w1 AS (PARTITION BY username ORDER BY date_time
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
) ORDER BY last_enrolled LIMIT 2 --<-- _MAX_ACTIVE_ENROLLMENTS
) AS valid_enrollment
FROM (
SELECT *, ARRAY_AGG(STRUCT(username, t.activity, date_time)) OVER w0 AS enrollments
FROM `activity` t
WINDOW w0 AS (ORDER BY date_time)
);
Query results
In row 4, ed's enrollment is pending since the queue with limit 2 is full at the moment.
In row 6, since tim cancelled enrollment, sarah and ed become next valid enrolled users based on their enrollment datetime.

Related

Big query query is too complex after pivot

Assume I have the following table table and a list of interests (cat, dog, music, soccer, coding)
| userId | user_interest | label |
| -------- | -------------- |----------|
| 12345 | cat | 1 |
| 12345 | dog | 1 |
| 6789 | music | 1 |
| 6789 | soccer | 1 |
I want to transform the user interest into a binary array (i.e. binarization), and the resulting table will be something like
| userId | labels |
| -------- | -------------- |
| 12345 | [1,1,0,0,0] |
| 6789 | [0,0,1,1,0] |
I am able to do it with PIVOT and ARRAY, e.g.
WITH user_interest_pivot AS (
SELECT
*
FROM (
SELECT userId, user_interest, label FROM table
) AS T
PIVOT
(
MAX(label) FOR user_interestc IN ('cat', 'dog', 'music', 'soccer', 'coding')
) AS P
)
SELECT
userId,
ARRAY[IFNULL(cat,0), IFNULL(dog,0), IFNULL(music,0), IFNULL(soccer,0), IFNULL(coding,0)] AS labels,
FROM user_interea_pivot
HOWEVER, in reality I have a very long list of interests, and the above method in bigquery seems to not work due to
Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too comple
Please help to let me know if there is anything I can do to deal with this situation. Thanks!

Still it's likely to face resource problem depending on your real data, but it is worth trying the following approach without PIVOT.
Create interests table with additional index column first
+----------+-----+-----------------+
| interest | idx | total_interests |
+----------+-----+-----------------+
| cat | 0 | 5 |
| dog | 1 | 5 |
| music | 2 | 5 |
| soccer | 3 | 5 |
| coding | 4 | 5 |
+----------+-----+-----------------+
find idx of each user interest and aggreage them like below. (assuming that user intererest is sparse over overall interests)
SELECT userId, ARRAY_AGG(idx) user_interests
FROM sample_table t JOIN interests i ON t.user_interest = i.interest
GROUP BY 1
Lastly, create labels vector using a sparse user interest array and dimension of interest space (i.e. total_interests) like below
ARRAY(SELECT IF(ui IS NULL, 0, 1)
FROM UNNEST(GENERATE_ARRAY(0, total_interests - 1)) i
LEFT JOIN t.user_interests ui ON i = ui
ORDER BY i
) AS labels
Query
CREATE TEMP TABLE sample_table AS
SELECT '12345' AS userId, 'cat' AS user_interest, 1 AS label UNION ALL
SELECT '12345' AS userId, 'dog' AS user_interest, 1 AS label UNION ALL
SELECT '6789' AS userId, 'music' AS user_interest, 1 AS label UNION ALL
SELECT '6789' AS userId, 'soccer' AS user_interest, 1 AS label;
CREATE TEMP TABLE interests AS
SELECT *, COUNT(1) OVER () AS total_interests
FROM UNNEST(['cat', 'dog', 'music', 'soccer', 'coding']) interest
WITH OFFSET idx
;
SELECT userId,
ARRAY(SELECT IF(ui IS NULL, 0, 1)
FROM UNNEST(GENERATE_ARRAY(0, total_interests - 1)) i
LEFT JOIN t.user_interests ui ON i = ui
ORDER BY i
) AS labels
FROM (
SELECT userId, total_interests, ARRAY_AGG(idx) user_interests
FROM sample_table t JOIN interests i ON t.user_interest = i.interest
GROUP BY 1, 2
) t;
Query results

I think below approach will "survive" any [reasonable] data
create temp function base10to2(x float64) returns string
language js as r'return x.toString(2);';
with your_table as (
select '12345' as userid, 'cat' as user_interest, 1 as label union all
select '12345' as userid, 'dog' as user_interest, 1 as label union all
select '6789' as userid, 'music' as user_interest, 1 as label union all
select '6789' as userid, 'soccer' as user_interest, 1 as label
), interests as (
select *, pow(2, offset) weight, max(offset + 1) over() as len
from unnest(['cat', 'dog', 'music', 'soccer', 'coding']) user_interest
with offset
)
select userid,
split(rpad(reverse(base10to2(sum(weight))), any_value(len), '0'), '') labels,
from your_table
join interests
using(user_interest)
group by userid
with output

Sub query in the where clause

The code below is working now in the VIEW based on Windows Authentication, users should able to see all the data that they own and data of those reports to them direct or indirect. Now another WHERE clause needed to handle the additional result of data that giving to the user in the Authorize column.
SAMPLE DATA: Table TORGANIZATION_HIERARCHY
ManagerID | ManagerEmail | Email | EmployeeID | Authorize | Level
---------------------------------------------------------------------------------
NULL | NULL | user0##abc.com | 1 | NULL | 0
1 | user0##abc.com | user1##abc.com | 273 | NULL | 1
273 | user1##abc.com | user2##abc.com | 16 | NULL | 2
273 | user1##abc.com | SJiang##abc.com | 274 | NULL | 2
273 | user1##abc.com | SAbbas#abc.com | 285 | user2##abc.com; user3#abc.com | 2
285 | SAbbas#abc.com | LTsoflias#abc.com | 286 | NULL | 3
274 | SJiang##abc.com | MBlythe#abc.com | 275 | NULL | 3
274 | SJiang##abc.com | LMitchell#abc.com | 276 | NULL | 3
16 | JWhite#abc.com | user3#abc.com | 23 | NULL | 3
SAMPLE DATA: Table TRANS
Email | Destination_account | Customer_service_rep_code
-----------------------------------------------------------
SAbbas#abc.com | Philippines | 12646
Junerk#abc.com | Canada | 95862
LTsoflias#abc.com | Italy | 98524
user2##abc.com | Italy | 29185
user3##abc.com | Brazil | 58722
The bottom query is working when user SAbbas#abc.com (285) log in. It can see all the data of EmployeeID 285 and 286. I need add another where statement that user (SAbbas#abc.com) authorized to see to see in column Authorize. So the result user SAbbas#abc.com should see EmployeeID 285, 286, 16, 23.
WITH CTE
AS (SELECT OH.employeeid,
OH.managerid,
OH.email AS EMPEMAIL,
1 AS level
FROM TORGANIZATION_HIERARCHY OH
WHERE OH.[email] = (SELECT SYSTEM_USER) --Example SAbbas#abc.com
UNION ALL
SELECT CHIL.employeeid,
CHIL.managerid,
CHIL.email,
level + 1
FROM TORGANIZATION_HIERARCHY CHIL
JOIN CTE PARENT
ON CHIL.[managerid] = PARENT.[employeeid]),
ANOTHERCTE
AS (SELECT
T.[email],
T.[destination_account],
T.[customer_service_rep_code]
FROM [KGFGJK].[DBO].[TRANS] AS T)
SELECT *
FROM ANOTHERCTE
INNER JOIN CTE
ON CTE.empemail = ANOTHERCTE.[email];

This will give you what you need based on column Authorize. The result should be 16 and 23
Select b.employeeid from TORGANIZATION_HIERARCHY a inner join TORGANIZATION_HIERARCHY b
on a.Authorize like '%' + b.Email + '%'
where a.Email = 'SAbbas#abc.com'
Let me know
Complete Solution:
For you to be able to see user3#abc.com, I had to correct the email in 6the table #TRANS. You worte in there user3##abc.com instead of user3#abc.com. # and not ##.
the code is below for your tests. After you can replace with you table names
IF OBJECT_ID('tempdb..#TORGANIZATION_HIERARCHY') IS NOT NULL DROP TABLE #TORGANIZATION_HIERARCHY;
select NULL as ManagerID ,NULL as ManagerEmail ,'user0##abc.com' as Email ,1 as EmployeeID ,NULL as Authorize , 0 as Level into #TORGANIZATION_HIERARCHY
union select 1 ,'user0##abc.com', 'user1##abc.com' ,273 ,NULL , 1
union select 273 ,'user1##abc.com', 'user2##abc.com' ,16 ,NULL , 2
union select 273 ,'user1##abc.com', 'SJiang##abc.com' ,274 ,NULL , 2
union select 273 ,'user1##abc.com', 'SAbbas#abc.com' ,285 ,'user2##abc.com; user3#abc.com' , 2
union select 285 ,'SAbbas#abc.com', 'LTsoflias#abc.com' ,286 ,NULL , 3
union select 274 ,'SJiang##abc.com', 'MBlythe#abc.com' ,275 ,NULL , 3
union select 274 ,'SJiang##abc.com', 'LMitchell#abc.com' ,276 ,NULL , 3
union select 16 ,'JWhite#abc.com', 'user3#abc.com' ,23 ,NULL , 3
--select * from #TORGANIZATION_HIERARCHY
IF OBJECT_ID('tempdb..#TRANS') IS NOT NULL DROP TABLE #TRANS;
select 'SAbbas#abc.com' as Email , 'Philippines' as Destination_account , 12646 as Customer_service_rep_code into #TRANS
union select 'Junerk#abc.com' , 'Canada' , 95862
union select 'LTsoflias#abc.com', 'Italy' , 98524
union select 'user2##abc.com' , 'Italy' , 29185
union select 'user3#abc.com' , 'Brazil' , 58722
;WITH CTE
AS (SELECT OH.employeeid,
OH.managerid,
OH.Authorize,
OH.email AS EMPEMAIL,
1 AS [level]
FROM #TORGANIZATION_HIERARCHY OH
WHERE OH.[email] = (SELECT 'SAbbas#abc.com') --Example
UNION ALL
SELECT CHIL.employeeid,
CHIL.managerid,
CHIL.Authorize,
CHIL.email,
CHIL.[level] + 1
FROM #TORGANIZATION_HIERARCHY CHIL
JOIN CTE PARENT
ON CHIL.[managerid] = PARENT.[employeeid]),
ANOTHERCTE
AS (SELECT
T.[email],
T.[destination_account],
T.[customer_service_rep_code]
FROM #TRANS AS T)
SELECT *
FROM ANOTHERCTE
RIGHT JOIN
(
select a.EmployeeID, a.ManagerID, a.Authorize, a.Email as empemail, a.[level] From CTE INNER JOIN #TORGANIZATION_HIERARCHY a on lower(CTE.Authorize) like '%' + lower(a.Email) + '%'
union
select * From CTE
) CTE
ON CTE.empemail = ANOTHERCTE.[email]
order by [level]
Output:

Possible to use a column name in a UDF in SQL?

I have a query in which a series of steps is repeated constantly over different columns, for example:
SELECT DISTINCT
MAX (
CASE
WHEN table_2."GRP1_MINIMUM_DATE" <= cohort."ANCHOR_DATE" THEN 1
ELSE 0
END)
OVER (PARTITION BY cohort."USER_ID")
AS "GRP1_MINIMUM_DATE",
MAX (
CASE
WHEN table_2."GRP2_MINIMUM_DATE" <= cohort."ANCHOR_DATE" THEN 1
ELSE 0
END)
OVER (PARTITION BY cohort."USER_ID")
AS "GRP2_MINIMUM_DATE"
FROM INPUT_COHORT cohort
LEFT JOIN INVOLVE_EVER table_2 ON cohort."USER_ID" = table_2."USER_ID"
I was considering writing a function to accomplish this as doing so would save on space in my query. I have been reading a bit about UDF in SQL but don't yet understand if it is possible to pass a column name in as a parameter (i.e. simply switch out "GRP1_MINIMUM_DATE" for "GRP2_MINIMUM_DATE" etc.). What I would like is a query which looks like this
SELECT DISTINCT
FUNCTION(table_2."GRP1_MINIMUM_DATE") AS "GRP1_MINIMUM_DATE",
FUNCTION(table_2."GRP2_MINIMUM_DATE") AS "GRP2_MINIMUM_DATE",
FUNCTION(table_2."GRP3_MINIMUM_DATE") AS "GRP3_MINIMUM_DATE",
FUNCTION(table_2."GRP4_MINIMUM_DATE") AS "GRP4_MINIMUM_DATE"
FROM INPUT_COHORT cohort
LEFT JOIN INVOLVE_EVER table_2 ON cohort."USER_ID" = table_2."USER_ID"
Can anyone tell me if this is possible/point me to some resource that might help me out here?
Thanks!

There is no such direct as #Tejash already stated, but the thing looks like your database model is not ideal - it would be better to have a table that has USER_ID and GRP_ID as keys and then MINIMUM_DATE as seperate field.
Without changing the table structure, you can use UNPIVOT query to mimic this design:
WITH INVOLVE_EVER(USER_ID, GRP1_MINIMUM_DATE, GRP2_MINIMUM_DATE, GRP3_MINIMUM_DATE, GRP4_MINIMUM_DATE)
AS (SELECT 1, SYSDATE, SYSDATE, SYSDATE, SYSDATE FROM dual UNION ALL
SELECT 2, SYSDATE-1, SYSDATE-2, SYSDATE-3, SYSDATE-4 FROM dual)
SELECT *
FROM INVOLVE_EVER
unpivot ( minimum_date FOR grp_id IN ( GRP1_MINIMUM_DATE AS 1, GRP2_MINIMUM_DATE AS 2, GRP3_MINIMUM_DATE AS 3, GRP4_MINIMUM_DATE AS 4))
Result:
| USER_ID | GRP_ID | MINIMUM_DATE |
|---------|--------|--------------|
| 1 | 1 | 09/09/19 |
| 1 | 2 | 09/09/19 |
| 1 | 3 | 09/09/19 |
| 1 | 4 | 09/09/19 |
| 2 | 1 | 09/08/19 |
| 2 | 2 | 09/07/19 |
| 2 | 3 | 09/06/19 |
| 2 | 4 | 09/05/19 |
With this you can write your query without further code duplication and if you need use PIVOT-syntax to get one line per USER_ID.
The final query could then look like this:
WITH INVOLVE_EVER(USER_ID, GRP1_MINIMUM_DATE, GRP2_MINIMUM_DATE, GRP3_MINIMUM_DATE, GRP4_MINIMUM_DATE)
AS (SELECT 1, SYSDATE, SYSDATE, SYSDATE, SYSDATE FROM dual UNION ALL
SELECT 2, SYSDATE-1, SYSDATE-2, SYSDATE-3, SYSDATE-4 FROM dual)
, INPUT_COHORT(USER_ID, ANCHOR_DATE)
AS (SELECT 1, SYSDATE-1 FROM dual UNION ALL
SELECT 2, SYSDATE-2 FROM dual UNION ALL
SELECT 3, SYSDATE-3 FROM dual)
-- Above is sampledata query starts from here:
, unpiv AS (SELECT *
FROM INVOLVE_EVER
unpivot ( minimum_date FOR grp_id IN ( GRP1_MINIMUM_DATE AS 1, GRP2_MINIMUM_DATE AS 2, GRP3_MINIMUM_DATE AS 3, GRP4_MINIMUM_DATE AS 4)))
SELECT qcsj_c000000001000000 user_id, GRP1_MINIMUM_DATE, GRP2_MINIMUM_DATE, GRP3_MINIMUM_DATE, GRP4_MINIMUM_DATE
FROM INPUT_COHORT cohort
LEFT JOIN unpiv table_2
ON cohort.USER_ID = table_2.USER_ID
pivot (MAX(CASE WHEN minimum_date <= cohort."ANCHOR_DATE" THEN 1 ELSE 0 END) AS MINIMUM_DATE
FOR grp_id IN (1 AS GRP1,2 AS GRP2,3 AS GRP3,4 AS GRP4))
Result:
| USER_ID | GRP1_MINIMUM_DATE | GRP2_MINIMUM_DATE | GRP3_MINIMUM_DATE | GRP4_MINIMUM_DATE |
|---------|-------------------|-------------------|-------------------|-------------------|
| 3 | | | | |
| 1 | 0 | 0 | 0 | 0 |
| 2 | 0 | 1 | 1 | 1 |
This way you only have to write your calculation logic once (see line starting with pivot).

Removing group of results if total is 0

I am using the following table to create a stacked bar chart - its quite a bit larger than this:
ID | Name | foodEaten | total
1 | Sam | Burger | 3
1 | Sam | Pizza | 1
1 | Sam | Kebab | 0
1 | Sam | Cheesecake| 3
1 | Sam | Sandwich | 5
2 | Jeff | Burger | 0
2 | Jeff | Pizza | 0
2 | Jeff | Kebab | 0
2 | Jeff | Cheesecake| 0
2 | Jeff | Sandwich | 0
I need to find a way to remove results like Jeff. Where the entire total for what he ate is 0. I can't think of the easiest way to achieve this. I've tried grouping the entire result by Id and creating a total, but its just not happening.
If the person has eaten a total of 0 food, then he needs to be excluded. But if he hasn't, and he hasn't eaten any kebabs, as shown in my above table, this needs to be included in the result!
So the output needed is:
ID | Name | foodEaten | total
1 | Sam | Burger | 3
1 | Sam | Pizza | 1
1 | Sam | Kebab | 0
1 | Sam | Cheesecake| 3
1 | Sam | Sandwich | 5

Assuming that you want the data as it appears, and not the aggregate out and then exclude:
WITH CTE AS (
SELECT ID,
[Name],
foodEaten,
total,
SUM(total) OVER (PARTITION BY [Name]) AS nameTotal
FROM YourTable)
SELECT ID,
[Name],
foodEaten,
total
FROM CTE
WHERE nameTotal > 0;

select id, name, foodEaten, sum(total) as total from <table> group by ID having sum(total) > 0
Does this work for you?

You can try below -
select id,name
from tablename a
group by id,name
having sum(total)>0
OR
DEMO
select * from tablename a
where not exists (select 1 from tablename b where a.id=b.id group by id,name
having sum(total)=0)

Try this
;WITH CTE (ID , Name , foodEaten , total)
AS
(
SELECT 1 , 'Sam' , 'Burger' , 3 UNION ALL
SELECT 1 , 'Sam' , 'Pizza' , 1 UNION ALL
SELECT 1 , 'Sam' , 'Kebab' , 2 UNION ALL
SELECT 1 , 'Sam' , 'Cheesecake', 3 UNION ALL
SELECT 1 , 'Sam' , 'Sandwich' , 5 UNION ALL
SELECT 2 , 'Jeff' , 'Burger' , 0 UNION ALL
SELECT 2 , 'Jeff' , 'Pizza' , 0 UNION ALL
SELECT 2 , 'Jeff' , 'Kebab' , 0 UNION ALL
SELECT 2 , 'Jeff' , 'Cheesecake', 0 UNION ALL
SELECT 2 , 'Jeff' , 'Sandwich' , 0
)
SELECT ID , Name ,SUM( total) AS Grandtotal
FROM CTE
GROUP BY ID , Name
HAVING SUM( total) >0
Result
ID Name Grandtotal
----------------------
1 Sam 14

Using DELETE with HAVING SUM(total) = 0 will remove the group of result which their total is 0
DELETE FROM TableName
WHERE ID IN (SELECT Id FROM TableName GROUP BY ID HAVING SUM(total) = 0)
or if you want to remvoe and select only the records which has sum of total is zero, then
SELECT * FROM TableName
WHERE ID NOT IN (SELECT Id FROM TableName GROUP BY ID HAVING SUM(total) = 0)

Assuming total is never negative, then probably the most efficient method is to use exists:
select t.*
from t
where exists (select 1
from t t2
where t2.name = t.name and
t2.total > 0
);
In particular, this can take advantage of an index on (name, total).

Self join next timestamp

I am looking to merge timestamp from 2 different row based on Employee and punch card but the max or limit does not work with the from statement, if I only use > then i get every subsequent for everyday... I want the next higher value on a self join, also I have to mention that i have to use SQL 2008! so the lag and Lead does not work!
please help me.
SELECT , Det.name
,Det.[time]
,Det2.[time]
,Det.[type]
,det2.type
,Det.[detail]
FROM [detail] Det
join [detail] Det2 on
Det2.name = Det.name
and
Det2.time > Det.time Max 1
where det.type <>3
Table detail
NAME | Time | Type | detail
john | 10:30| 1 | On
steve| 10:32| 1 | On
john | 10:34| 2 | break
paul | 10:35| 1 | On
steve| 10:45| 3 | Off
john | 10:49| 2 | on
paul | 10:55| 3 | Off
john | 11:12| 3 | Off
Wanted result
John | 10:30 | 10:34 | 1 | 2 | On
John | 10:34 | 10:49 | 2 | 1 | Break
John | 10:49 | 11:12 | 1 | 3 | on
Steve| 10:32 | 10:45 | 1 | 3 | on
Paul | 10:35 | 10:55 | 1 | 3 | On
Thank you in advance!

You can do it with cross apply:
SELECT Det.name
,Det.[time]
,ca.[time]
,Det.[type]
,ca.type
,Det.[detail]
FROM [detail] Det
Cross Apply(Select Top 1 * From detail det2 where det.Name = det2.Name Order By det2.Time) ca
Where det.Type <> 3

As you said LAG or LEAD functions won't work for you, but you could use ROW_NUMBER() OVER (PARTITION BY name ORDER BY time DESC) on both tables and then do a JOIN on RN1 = RN2 + 1
This is just a idea, but I don't see an issue why it shouldn't work.
Query:
;WITH Data (NAME, TIME, type, detail)
AS (
SELECT 'john', CAST('10:30' AS DATETIME2), 1, 'On'
UNION ALL
SELECT 'steve', '10:32', 1, 'On'
UNION ALL
SELECT 'john', '10:34', 2, 'break'
UNION ALL
SELECT 'paul', '10:35', 1, 'On'
UNION ALL
SELECT 'steve', '10:45', 3, 'Off'
UNION ALL
SELECT 'john', '10:49', 2, 'on'
UNION ALL
SELECT 'paul', '10:55', 3, 'Off'
UNION ALL
SELECT 'john', '11:12', 3, 'Off'
)
SELECT t.NAME, LTRIM(RIGHT(CONVERT(VARCHAR(25), t.TIME, 100), 7)) AS time, LTRIM(RIGHT(CONVERT(VARCHAR(25), t2.TIME, 100), 7)) AS time, t.type, t2.type, t.detail
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY NAME ORDER BY TIME) rn, *
FROM Data
) AS t
INNER JOIN (
SELECT ROW_NUMBER() OVER (PARTITION BY NAME ORDER BY TIME) rn, *
FROM Data
) AS t2
ON t2.NAME = t.NAME
AND t2.rn = t.rn + 1;
Result:
NAME time time type type detail
----------------------------------------------
john 10:30AM 10:34AM 1 2 On
john 10:34AM 10:49AM 2 2 break
john 10:49AM 11:12AM 2 3 on
paul 10:35AM 10:55AM 1 3 On
steve 10:32AM 10:45AM 1 3 On
Any comments, concerns - let me know. :)

As #evaldas-buinauskas said,
The OVER and LAG statements in SQL will work for you.
Here is a similar example:
http://www.databasejournal.com/features/mssql/lead-and-lag-functions-in-sql-server-2012.html

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Keeping a running list with a specific limit - sql

Related

Big query query is too complex after pivot

Sub query in the where clause

Possible to use a column name in a UDF in SQL?

Removing group of results if total is 0

Self join next timestamp

Categories

Resources