The following example demonstrates the case:
Following is the sample data:
Following is the output expectation (note that there are more than 1 entities in the 'entity' column):
is_hit is defined when variable a is <=4
variable_a is defined if the total hit from the past days have reached 3
What I have to do is to tag whether the entity has a cumulative hit reached a total count of 3. Once the entity is tagged, the hit count should reset to 0 again.
By following this logic, looking at the demonstration above the Entity A tagged on 4th June and 9th June.
Currently, my issue is applying the is_tagged logic to the query. Is there a way to do this in SQL?
If I understand correctly, you want row_number():
select t.*,
(case when is_tagged and
mod(row_number() over (partition by entity, is_tagged
order by date
),
3) = 0
then true
end)
from t;
Note: This assumes that your columns are booleans. If they are strings then use 'true'.
Related
I have a table where the first column is time, which increases by 1 second increments.
The second column brings the code that started the day in its first line. This could be any between 1-5. The 0 (zero) value indicates the code didn't change. When the code changes, it is indicated the time it changed and which number it changed to (thus, Event), but for as long as it stays the same it will be shown 0 (zero) again.
My intent is to make a new column specifying the present code at any time. So far, I have been doing this in Excel, with the following conditions and results:
Is there a way for this Excel condition to be applied in a query to create this new column?
I've been testing CASE WHEN statements, and I tried to implement Lag or Lead functions in it. But so far none of them worked to apply the same value of the previous row when the event is 0 (zero).
If event is always increasing, you can use a window max():
select time, event, max(event) over(order by time) code
from mytable
Else, it is a bit more complicated. One option is to build the groups with a window sum:
select time, event, max(event) over(partition by grp order by time) code
from (
select
t.*,
sum(case when event > 0 then 1 else 0 end) over(order by time) grp
from mytable t
) t
How can I do a SUMIF function so that it adds up values when the value in another column is "False", but then when it hits a value that is "True", it restarts the count over again, but includes the value of the first "True" encounter in the SUM calculation? I would also like it so that it adds up the value in chronological order.
I did some research and I think I need to use an over partition and make a row number column to call all row number = "1", but I'm not sure how to do this.
Edit: the Sum should also include the "distance" value for the first "true" value it encounters
Edit 2: Ultimately, I am trying to calculate the average distance each vehicle travels before an Alert is triggered to "True" which means it needs to be taken to the shop to be fixed. Perhaps there is a better way to do this than what I was originally thinking?
Sorry for the poor phrasing...
You want to define groups. It sounds like you want the definition to be the number of "trues" up to and including a given row. Then, you can do a cumulative sum within each group. So:
select t.*,
sum(distance) over (partition by vehicleid, grp
order by date
rows between unbounded preceding and current row
)
from (select t.*,
sum(case when alert = 'True' then 1 else 0 end) over
(partition by vehicleid
order by date
rows between unbounded preceding and current row
) as grp
from t
) t;
Here is a db<>fiddle that illustrates that this code works.
You are right in thinking that you can use SUM analytical function. Something like this will do the cumulative sum for you.
For you to restart the SUM when the alert is True, you include the alert in the partition window and Order by date to achieve the order.
SELECT SUM(CASE WHEN alert = 'FALSE'
THEN distance
ELSE 0
END)
OVER(PARTITION BY alert
ORDER BY date) cumm_sum
, date
, alert
FROM Table
I have a "daily changes" table that records when a customer "upgrades" or "downgrades" their membership level. In the table, let's say field 1 is customer ID, field 2 is membership type and field 3 is the date of change. Customers 123 and ABC each have two rows in the table. Values in field 1 (ID) are the same, but values in field 2 (TYPE) and 3 (DATE) are different. I'd like to write a SQL query to tell me how many customers "upgraded" from membership type 1 to membership type 2 how many customers "downgraded" from membership type 2 to membership type 1 in any given time frame.
The table also shows other types of changes. To identify the records with changes in the membership type field, I've created the following code:
SELECT *
FROM member_detail_daily_changes_new
WHERE customer IN (
SELECT customer
FROM member_detail_daily_changes_new
GROUP BY customer
HAVING COUNT(distinct member_type_cd) > 1)
I'd like to see an end report which tells me:
For Fiscal 2018,
X,XXX customers moved from Member Type 1 to Member Type 2 and
X,XXX customers moved from Member Type 2 to Member type 1
Sounds like a good time to use a LEAD() analytical function to look ahead for a given customer's member_Type; compare it to current record and then evaluate if thats an upgrade/downgrade then sum results.
DEMO
CTE AS (SELECT case when lead(Member_Type_Code) over (partition by Customer order by date asc) > member_Type_Code then 1 else 0 end as Upgrade
, case when lead(Member_Type_Code) over (partition by Customer order by date asc) < member_Type_Code then 1 else 0 end as DownGrade
FROM member_detail_daily_changes_new
WHERE Date between '20190101' and '20190201')
SELECT sum(Upgrade) upgrades, sum(downgrade) downgrades
FROM CTE
Giving us: using my sample data
+----+----------+------------+
| | upgrades | downgrades |
+----+----------+------------+
| 1 | 3 | 2 |
+----+----------+------------+
I'm not sure if SQL express on rex tester just doesn't support the sum() on the analytic itself which is why I had to add the CTE or if that's a rule in non-SQL express versions too.
Some other notes:
I let the system implicitly cast the dates in the where clause
I assume the member_Type_Code itself tells me if it's an upgrade or downgrade which long term probably isn't right. Say we add membership type 3 and it goes between 1 and 2... now what... So maybe we need a decimal number outside of the Member_Type_Code so we can handle future memberships and if it's an upgrade/downgrade or a lateral...
I assumed all upgrades/downgrades are counted and a user can be counted multiple times if membership changed that often in time period desired.
I assume an upgrade/downgrade can't occur on the same date/time. Otherwise the sorting for lead may not work right. (but if it's a timestamp field we shouldn't have an issue)
So how does this work?
We use a Common table expression (CTE) to generate the desired evaluations of downgrade/upgrade per customer. This could be done in a derived table as well in-line but I find CTE's easier to read; and then we sum it up.
Lead(Member_Type_Code) over (partition by customer order by date asc) does the following
It organizes the data by customer and then sorts it by date in ascending order.
So we end up getting all the same customers records in subsequent rows ordered by date. Lead(field) then starts on record 1 and Looks ahead to record 2 for the same customer and returns the Member_Type_Code of record 2 on record 1. We then can compare those type codes and determine if an upgrade or downgrade occurred. We then are able to sum the results of the comparison and provide the desired totals.
And now we have a long winded explanation for a very small query :P
You want to use lag() for this, but you need to be careful about the date filtering. So, I think you want:
SELECT prev_membership_type, membership_type,
COUNT(*) as num_changes,
COUNT(DISTINCT member) as num_members
FROM (SELECT mddc.*,
LAG(mddc.membership_type) OVER (PARTITION BY mddc.customer_id ORDER BY mddc.date) as prev_membership_type
FROM member_detail_daily_changes_new mddc
) mddc
WHERE prev_membership_type <> membership_type AND
date >= '2018-01-01' AND
date < '2019-01-01'
GROUP BY membership_type, prev_membership_type;
Notes:
The filtering on date needs to occur after the calculation of lag().
This takes into account that members may have a certain type in 2017 and then change to a new type in 2018.
The date filtering is compatible with indexes.
Two values are calculated. One is the overall number of changes. The other counts each member only once for each type of change.
With conditional aggregation after self joining the table:
select
2018 fiscal,
sum(case when m.member_type_cd > t.member_type_cd then 1 else 0 end) upgrades,
sum(case when m.member_type_cd < t.member_type_cd then 1 else 0 end) downgrades
from member_detail_daily_changes_new m inner join member_detail_daily_changes_new t
on
t.customer = m.customer
and
t.changedate = (
select max(changedate) from member_detail_daily_changes_new
where customer = m.customer and changedate < m.changedate
)
where year(m.changedate) = 2018
This will work even if there are more than 2 types of membership level.
I work for a sports film analysis company. We have teams with unique team IDs and I would like to find the number of consecutive weeks they have uploaded film to our site moving backwards from today. Each upload also has its own row in a separate table that I can join on teamid and has a unique date of when it was uploaded. So far I put together a simple query that pulls each unique DATEDIFF(week) value and groups on teamid.
Select teamid, MAX(weekdiff)
(Select teamid, DATEDIFF(week, dateuploaded, GETDATE()) as weekdiff
from leroy_events
group by teamid, weekdiff)
What I am given is a list of teamIDs and unique weekly date differences. I would like to then find the max for each teamID without breaking an increment of 1. For example, if my data set is:
Team datediff
11453 0
11453 1
11453 2
11453 5
11453 7
11453 13
I would like the max value for team: 11453 to be 2.
Any ideas would be awesome.
I have simplified your example assuming that I already have a table with weekdiff column. That would be what you're doing with DATEDIFF to calculate it.
First, I'm using LAG() window function to assign previous value (in ordered set) of a weekdiff to the current row.
Then, using a WHERE condition I'm retrieving max(weekdiff) value that has a previous value which is current_value - 1 for consecutive weekdiffs.
Data:
create table leroy_events ( teamid int, weekdiff int);
insert into leroy_events values (11453,0),(11453,1),(11453,2),(11453,5),(11453,7),(11453,13);
Code:
WITH initial_data AS (
Select
teamid,
weekdiff,
lag(weekdiff,1) over (partition by teamid order by weekdiff) as lag_weekdiff
from
leroy_events
)
SELECT
teamid,
max(weekdiff) AS max_weekdiff_consecutive
FROM
initial_data
WHERE weekdiff = lag_weekdiff + 1 -- this insures retrieving max() without breaking your consecutive increment
GROUP BY 1
SQLFiddle with your sample data to see how this code works.
Result:
teamid max_weekdiff_consecutive
11453 2
You can use SQL window functions to probe relationships between rows of the table. In this case the lag() function can be used to look at the previous row relative to a given order and grouping. That way you can determine whether a given row is part of a group of consecutive rows.
You still need overall to aggregate or filter to reduce the number of rows for each group of interest (i.e. each team) to 1. It's convenient in this case to aggregate. Overall, it might look like this:
select
team,
case min(datediff)
when 0 then max(datediff)
else -1
end as max_weeks
from (
select
team,
datediff,
case
when (lag(datediff) over (partition by team order by datediff) != datediff - 1)
then 0
else 1
end as is_consec
from diffs
) cd
where is_consec = 1
group by team
The inline view just adds an is_consec column to the data, marking whether each row is part of a group of consecutive rows. The outer query filters on that column (you cannot filter directly on a window function), and chooses the maximum datediff from the remaining rows for each team.
There are a few subtleties there:
The case expression in the inline view is written as it is to exploit the fact that the lag() computed for the first row of each partition will be NULL, which does not evaluate unequal (nor equal) to any value. Thus the first row in each partition is always marked consecutive.
The case testing min(datediff) in the outer select clause picks up teams that have no record with datediff = 0, and assigns -1 to column max_weeks for them.
It would also have been possible to mark rows non-consecutive if the first in their group did not have datediff = 0, but then you would lose such teams from the results altogether.
Let's say I have two columns: Date and Indicator
Usually the indicator goes from 0 to 1 (when the data is sorted by date) and I want to be able to identify if it goes from 1 to 0 instead. Is there an easy way to do this with SQL?
I am already aggregating other fields in the same table. If I can add this to as another aggregation (e.g. without using a separate "where" statement or passing over the data a second time) it would be pretty awesome.
This is the phenomena I want to catch:
Date Indicator
1/5/01 0
1/4/01 0
1/3/01 1
1/2/01 1
1/1/01 0
This isn't a teradata-specific answer, but this can be done in normal SQL.
Assuming that the sequence is already 'complete' and xn+1 can be derived from xn, such as when the dates are sequential and all present:
SELECT date -- the 1 on the day following the 0
FROM r curr
JOIN r prev
-- join each day with the previous day
ON curr.date = dateadd(d, 1, prev.date)
WHERE curr.indicator = 1
AND prev.indicator = 0
YMMV on the ability of such a query to use indexes efficiently.
If the sequence is not complete the same can be applied after making a delegate sequence which is well ordered and similarly 'complete'.
This can also be done using correlated subqueries, each selecting the indicator of the 'previous max', but.. uhg.
Joining the table against it self it quite generic, but most SQL Dialects now support Analytical Functions. Ideally you could use LAG() but TeraData seems to try to support the absolute minimum of these, and so so they point you to use SUM() combined with rows preceding.
In any regard, this method avoids a potentially costly join and effectively deals with gaps in the data, whilst making maximum use of indexes.
SELECT
*
FROM
yourTable t
QUALIFY
t.indicator
<
SUM(t.indicator) OVER (PARTITION BY t.somecolumn /* optional */
ORDER BY t.Date
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING
)
QUALIFY is a bit TeraData specific, but slightly tidier than the alternative...
SELECT
*
FROM
(
SELECT
*,
SUM(t.indicator) OVER (PARTITION BY t.somecolumn /* optional */
ORDER BY t.Date
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING
)
AS previous_indicator
FROM
yourTable t
)
lagged
WHERE
lagged.indicator < lagged.previous_indicator
Supposing you mean that you want to determine whether any row having 1 as its indicator value has an earlier Date than a row in its group having 0 as its indicator value, you can identify groups with that characteristic by including the appropriate extreme dates in your aggregate results:
SELECT
...
MAX(CASE indicator WHEN 0 THEN Date END) AS last_ind_0,
MIN(CASE indicator WHEN 1 THEN Date END) AS first_ind_1,
...
You then test whether first_ind_1 is less than last_ind_0, either in code or as another selection item.