How to effective get the max data in sub group and get the min data in among big groups? - sql

Firstly, I hope to get the max date in each sub-groups.
Group A = action 1 & 2
Group B = action 3 & 4
actionName
action
actionBy
actiontime
999
1
Tom
2022-07-15 09:18:00
999
1
Tom
2022-07-15 15:21:00
999
2
Peter
2022-07-15 14:06:00
999
2
Peter
2022-07-15 14:08:00
999
3
Sally
2022-07-15 14:20:00
999
3
Mary
2022-07-15 14:22:00
999
4
Mary
2022-07-15 14:25:00
In this example:
The max time of group A is "1 | Tom | 2022-07-15 15:21:00 "
The max time of group B is " 4 | Mary | 2022-07-15 14:25:00 "
The final answer is "1 | Tom | 2022-07-15 14:25:00 ", which is the minimum data among groups.
I have a method how to get the max date in each group like the following code.
with cte1
as (select actionName,
actiontime,
actionBy,
row_number() over (partition by actionName order by actiontime desc) as rn
from actionDetails
where action in ( '1', '2' )
UNION
select actionName,
actiontime,
actionBy,
row_number() over (partition by actionName order by actiontime desc) as rn
from actionDetails
where action in ( '3', '4' )
)
select *
from cte1
where rn = 1
ActionName is not PK. It would get the max data in each group.
Then, I don't know how to use an effective way to get the minimum data between group A and group B. Would you give me some ideas?
I know one of the methods is self join again. However, I think that is not the best solution.

First of all, you can simplify your query by putting the action groups into the partition clause. Use a case expression to get one group for actions 1 and 2 and another for actions 3 and 4.
Then after getting the maximum dates per actionname and action group you want to get the minimum dates of these per actionname. This means you want a second CTE building up on the first one:
with max_per_group as
(
select top(1) with ties
actionname,
actiontime,
actionby
from actiondetails
where action in (1, 2, 3, 4)
order by row_number()
over (partition by actionname, case when action <= 2 then 1 else 2 end
order by actiontime desc)
)
, min_of_max as
(
select top(1) with ties
actionname,
actiontime,
actionby
from max_per_group
order by row_number() over (partition by actionname order by actiontime)
)
select actionname, actiontime, actionby
from min_of_max
order by actionname;
As you see, instead of computing a row number and then have to limit rows based on that in the next query, I limit the rows right away by putting the row numbering into the ORDER BY clause and applying TOP(1) WITH TIES to get all rows numbered 1. I like this a tad better, because the CTE already produces the rows that I want to work with rather than only marking them in a bigger data set. But that's personal preference I guess.
Discaimer:
In my query I assume that the column action is numeric. If the column is a string instead, because it can hold values that are not numbers, then work with strings:
where action in ('1', '2', '3', '4')
partition by actionname, case when action in ('1', '2') then 1 else 2 end
If on the other hand the column is a string, but there are only numbers in that column, fix your table instead.

Related

How to select rows where values changed for an ID

I have a table that looks like the following
id effective_date number_of_int_customers
123 10/01/19 0
123 02/01/20 3
456 10/01/19 6
456 02/01/20 6
789 10/01/19 5
789 02/01/20 4
999 10/01/19 0
999 02/01/20 1
I want to write a query that looks at each ID to see if the salespeople have newly started working internationally between October 1st and February 1st.
The result I am looking for is the following:
id effective_date number_of_int_customers
123 02/01/20 3
999 02/01/20 1
The result would return only the salespeople who originally had 0 international customers and now have at least 1.
I have seen similar posts here that use nested queries to pull records where the first date and last have different values. But I only want to pull records where the original value was 0. Is there a way to do this in one query in SQL?
In your case, a simple aggregation would do -- assuming that 0 is the earliest value:
select id, max(number_of_int_customers)
from t
where effective_date in ('2019-10-01', '2020-02-01')
group by id
having min(number_of_int_customers) = 0;
Obviously, this is not correct if the values can decrease to zero. But this having clause fixes that problem:
having min(case when number_of_int_customers = 0 then effective_date end) = min(effective_date)
An alternative is to use window functions, such asfirst_value():
select distinct id, last_noic
from (select t.*,
first_value(number_of_int_customers) over (partition by id order by effective_date) as first_noic,
first_value(number_of_int_customers) over (partition by id order by effective_date desc) as last_noic,
from t
where effective_date in ('2019-10-01', '2020-02-01')
) t
where first_noic = 0;
Hmmm, on second thought, I like lag() better:
select id, number_of_int_customers
from (select t.*,
lag(number_of_int_customers) over (partition by id order by effective_date) as prev_noic
from t
where effective_date in ('2019-10-01', '2020-02-01')
) t
where prev_noic = 0;

SQL Query - Design struggle

I am fairly new to SQL Server (2012) but I was assigned the project where I have to use it.
The database consists of one table (counted in millions of rows) which looks mainly like this:
Number (float) Date (datetime) Status (nvarchar(255))
999 2016-01-01 14:00:00.000 Error
999 2016-01-02 14:00:00.000 Error
999 2016-01-03 14:00:00.000 Ok
999 2016-01-04 14:00:00.000 Error
888 2016-01-01 14:00:00.000 Error
888 2016-01-02 14:00:00.000 Ok
888 2016-01-03 14:00:00.000 Error
888 2016-01-04 14:00:00.000 Error
777 2016-01-01 14:00:00.000 Error
777 2016-01-02 14:00:00.000 Error
I have to create a query which will show me only the phone numbers (one number per row so probably Group by number?) that meet the conditions:
Number reappears at least 3 times
Last two times (that has to be based on date; originally records are not sorted by date) has to be an Error
For example, in the table above the phone number that meets the criteria is only 888, beacuse for 999 2nd newest status is Ok and number 777 reoccurs only 2 times.
I will appreciate any kind of help!
Thanks in advance!
You can use row_number() and conditional aggregation:
select number
from (select t.*,
row_number() over (partition by number order by date desc) as seqnum
from t
) t
group by number
having count(*) >= 3 and
max(case when seqnum = 1 then status end) = 'Error' and
max(case when seqnum = 2 then status end) = 'Error';
Note: float is a really, really bad type to use for the "number" column. In particular, two numbers can look the same but differ in low-order bits. They will produce different rows in the group by.
You should probably use varchar() for telephone numbers. That gives you the most flexibility. If you need to store the number as a number, then decimal/numeric is a much, much better choice than float.
select *, ROW_NUMBER() OVER(partition by Number, order by date desc) as times
FROM
(
select Number, Date
From table
where Number in
(
select Number
from table
group by Number
having count (*) >3
) as ABC
WHERE ABC.times in (1,2) and ABC.Status = 'Error'
with CTE as
(
select t1.*, row_number() over(partition by t1.Number order by t1.date desc) as r_ord
from MyTable t1
)
select C1.*
from CTE C1
inner join
(
select Number
from CTE
group by Number
having max(r_ord) >=3
) C2
on C1.Number = C2.Number
where C1.r_ord in (1,2)
and C1.Status = 'Error'

SQL: Take maximum value, but if a field is missing for a particular ID, ignore all values

This is somewhat difficult to explain...(this is using SQL Assistant for Teradata, which I'm not overly familiar with).
ID creation_date completion_date Difference
123 5/9/2016 5/16/2016 7
123 5/14/2016 5/16/2016 2
456 4/26/2016 4/30/2016 4
456 (null) 4/30/2016 (null)
789 3/25/2016 3/31/2016 6
789 3/1/2016 3/31/2016 30
An ID may have more than one creation_date, but it will always have the same completion_date. If the creation_date is populated for all records for an ID, I want to return the record with the most recent creation_date. However, if ANY creation_date for a given ID is missing, I want to ignore all records associated with this ID.
Given the data above, I would want to return:
ID creation_date completion_date Difference
123 5/14/2016 5/16/2016 2
789 3/25/2016 3/31/2016 6
No records are returned for 456 because the second record has a missing creation_date. The record with the most recent creation_date is returned for 123 and 789.
Any help would be greatly appreciated. Thanks!
Depending on your database, here's one option using row_number to get the max date per group. You can then filter those results with not exists to check against null values:
select *
from (
select *,
row_number() over (partition by id order by creation_date desc) rn
from yourtable
) t
where rn = 1 and not exists (
select 1
from yourtable t2
where t2.creationdate is null and t.id = t2.id
)
row_number is a window function that is supported in many databases. mysql doesn't but you can achieve the same result using user-defined variables.
Here is a more generic version using conditional aggregation:
select t.*
from yourtable t
join (select id, max(creation_date) max_creation_date
from yourtable
group by id
having count(case when creation_date is null then 1 end) = 0
) t2 on t.id = t2.id and t.creation_date = t2.max_creation_date
SQL Fiddle Demo

How to add a running count to rows in a 'streak' of consecutive days

Thanks to Mike for the suggestion to add the create/insert statements.
create table test (
pid integer not null,
date date not null,
primary key (pid, date)
);
insert into test values
(1,'2014-10-1')
, (1,'2014-10-2')
, (1,'2014-10-3')
, (1,'2014-10-5')
, (1,'2014-10-7')
, (2,'2014-10-1')
, (2,'2014-10-2')
, (2,'2014-10-3')
, (2,'2014-10-5')
, (2,'2014-10-7');
I want to add a new column that is 'days in current streak'
so the result would look like:
pid | date | in_streak
-------|-----------|----------
1 | 2014-10-1 | 1
1 | 2014-10-2 | 2
1 | 2014-10-3 | 3
1 | 2014-10-5 | 1
1 | 2014-10-7 | 1
2 | 2014-10-2 | 1
2 | 2014-10-3 | 2
2 | 2014-10-4 | 3
2 | 2014-10-6 | 1
I've been trying to use the answers from
PostgreSQL: find number of consecutive days up until now
Return rows of the latest 'streak' of data
but I can't work out how to use the dense_rank() trick with other window functions to get the right result.
Building on this table (not using the SQL keyword "date" as column name.):
CREATE TABLE tbl(
pid int
, the_date date
, PRIMARY KEY (pid, the_date)
);
Query:
SELECT pid, the_date
, row_number() OVER (PARTITION BY pid, grp ORDER BY the_date) AS in_streak
FROM (
SELECT *
, the_date - '2000-01-01'::date
- row_number() OVER (PARTITION BY pid ORDER BY the_date) AS grp
FROM tbl
) sub
ORDER BY pid, the_date;
Subtracting a date from another date yields an integer. Since you are looking for consecutive days, every next row would be greater by one. If we subtract row_number() from that, the whole streak ends up in the same group (grp) per pid. Then it's simple to deal out number per group.
grp is calculated with two subtractions, which should be fastest. An equally fast alternative could be:
the_date - row_number() OVER (PARTITION BY pid ORDER BY the_date) * interval '1d' AS grp
One multiplication, one subtraction. String concatenation and casting is more expensive. Test with EXPLAIN ANALYZE.
Don't forget to partition by pid additionally in both steps, or you'll inadvertently mix groups that should be separated.
Using a subquery, since that is typically faster than a CTE. There is nothing here that a plain subquery couldn't do.
And since you mentioned it: dense_rank() is obviously not necessary here. Basic row_number() does the job.
You'll get more attention if you include CREATE TABLE statements and INSERT statements in your question.
create table test (
pid integer not null,
date date not null,
primary key (pid, date)
);
insert into test values
(1,'2014-10-1'), (1,'2014-10-2'), (1,'2014-10-3'), (1,'2014-10-5'),
(1,'2014-10-7'), (2,'2014-10-1'), (2,'2014-10-2'), (2,'2014-10-3'),
(2,'2014-10-5'), (2,'2014-10-7');
The principle is simple. A streak of distinct, consecutive dates minus row_number() is a constant. You can group by the constant, and take the dense_rank() over that result.
with grouped_dates as (
select pid, date,
(date - (row_number() over (partition by pid order by date) || ' days')::interval)::date as grouping_date
from test
)
select * , dense_rank() over (partition by grouping_date order by date) as in_streak
from grouped_dates
order by pid, date
pid date grouping_date in_streak
--
1 2014-10-01 2014-09-30 1
1 2014-10-02 2014-09-30 2
1 2014-10-03 2014-09-30 3
1 2014-10-05 2014-10-01 1
1 2014-10-07 2014-10-02 1
2 2014-10-01 2014-09-30 1
2 2014-10-02 2014-09-30 2
2 2014-10-03 2014-09-30 3
2 2014-10-05 2014-10-01 1
2 2014-10-07 2014-10-02 1

How to rank partitions by date order when values which I am partitioning on can repeat?

I have a query which looks for the number of different values of a key field over a period of time and assigns a rank to the values in the order they occur.
So, for example I might have:
ID Date Value
1 2010-01-01 125.00
1 2010-02-01 125.00
1 2010-03-01 130.00
1 2010-04-01 131.00
1 2010-05-01 131.00
1 2010-06-01 131.00
1 2010-07-01 126.00
1 2010-08-01 140.00
I am using
ROW_NUMBER() over(partition by [ID] order by [Date]) as [row]
to rank the different values of the Value column in the date order they occur. So I would get something like
Value row
125.00 1
130.00 2
131.00 3
126.00 4
etc
THe problem I am having is that sometimes a value might repeat. So in the above example if the value on 1st August was 125.00 for example. I want to treat this as a seperate occurance but using the ranking function I am using at the moment it obviously gets aggregated into a partition with the other instances of 125.00 when calculating the row number.
What's the easiest way for me to overcome this problem please? Thanks in advance!
This should work:
WITH A
AS
(SELECT ID, [Date], [Value], ROW_NUMBER() over(partition by [ID] order by [Value], [Date]) as [row]
FROM YourTable)
SELECT A.[Date], A.[Value], B.min_row as row
FROM A JOIN (SELECT ID, [Value], MIN([row]) AS min_row
FROM A) AS B
ON A.ID = B.ID AND A.[Value] = B.[Value]