Increase number if row value duped

Increase number if row value duped - sql

Sorry not sure how to example this by using words (that's why have an issue to find an answer)
I need that Sequence will increase the number if ID value is duped
ID Sequence Value
1111 0 234324
2222 0 23432
3333 0 324
3333 1 234
3333 2 432234
4444 0 23423
4444 1 234

If you want to start from 0, use this query:
select *, ROW_NUMBER()over( partition by id order by id)-1 from tbl

You want row_number(). Something like this:
select t.*,
row_number() over (partition by id order by id) as seqnum
from t;
Note: the numbers will be incrementing, but the rows will not be in any particular order.
SQL tables represent unordered sets. You data has no obvious ordering. You can change the order by to get values in a particular order for a given id.

Related

SQL Server: How to retrieve all record based on recent datetime

First off, apologies if this has been asked elsewhere as I was unable to find any solution. The best I get is retrieving latest 1 record or 2-3 records. I'm more in search of all records (the number could be dynamic, could be 1 or 2 or maybe 50+) based on recent Datetime value. Well so basically here is the problem,
I have a table as follows,
APILoadDatetime
RowId
ProjectId
Value
2021-07-13 15:09:14.620
1
Proj-1
101
2021-07-13 15:09:14.620
2
Proj-2
81
2021-07-13 15:09:14.620
3
Proj-3
111
2021-07-13 15:09:14.620
4
Proj-4
125
2021-05-05 04:46:07.913
1
Proj-1
99
2021-05-05 04:46:07.913
2
Proj-2
69
2021-05-05 04:46:07.913
3
Proj-3
105
2021-05-05 04:46:07.913
4
Proj-4
115
...
...
...
...
What I am looking to do is, write up a query which will give me all the recent data based on Datetime, so in this case, I should get the following result,
APILoadDatetime
RowId
ProjectId
Value
2021-07-13 15:09:14.620
1
Proj-1
101
2021-07-13 15:09:14.620
2
Proj-2
81
2021-07-13 15:09:14.620
3
Proj-3
111
2021-07-13 15:09:14.620
4
Proj-4
125
The RowId shows (as the name suggests) gives the number of Rows for a particular Datetime block. This will not always be 4, it's dynamic based on the data received so could be 1,2,4 or even 50+ ...
Hope I was able to convey the question properly, Thank you all for reading and Pre-Thank you to those who provide solution to this.

you can use window function row_number to find out the latest entry for each projectid:
select * from (
select * , rank() over (order by APILoadDatetime desc) rn
from tablename
) t where rn = 1

select top 1 with ties
*
from
tablename
order by
row_number() over(
partition by RowId
order by APILoadDatetime desc
);
TOP 1 works with WITH TIES here.
WITH TIES means that when ORDER BY = 1, then SELECT takes this record (because of TOP 1) and all others that have ORDER BY = 1 (because of WITH TIES).
Update #1:
If you need the last record by APILoadDatetime and several records which might have the same APILoadDatetime (as the first found), then the query is simplier:
select top 1 with ties
*
from
tablename
order by
APILoadDatetime desc;

Dense Rank grouping by IDs

I am having trouble getting my DENSE_RANK() function in Oracle to work how I would like. First, my dataset:
ID DATE
1234 01-OCT-2020
1234 01-OCT-2021
1234 01-OCT-2022
2345 01-APR-2020
2345 01-APR-2021
2345 01-APR-2022
I am trying to use the dense rank function to return results with a sequence number based on the DATE field, and grouping by ID. How I want the data to return:
ID DATE SEQ
1234 01-OCT-2020 1
1234 01-OCT-2021 2
1234 01-OCT-2022 3
2345 01-APR-2020 1
2345 01-APR-2021 2
2345 01-APR-2022 3
The query I have so far:
SELECT ID, DATE, DENSE_RANK() Over (order by ID, DATE asc) as SEQ
However, this returns incorrectly as the sequence number will go to 6 (Like its disregarding my intentions to sequence based on the DATE field within a certain ID). If anyone has any insights into how to make this work it would be very much appreciated!

You want row_number():
select id, date, row_number() over (partition by id order by date) as seq
You could actually use dense_rank() as well, if you want duplicates to have the same idea. The key idea is partition by.

How to select rows where values changed for an ID

I have a table that looks like the following
id effective_date number_of_int_customers
123 10/01/19 0
123 02/01/20 3
456 10/01/19 6
456 02/01/20 6
789 10/01/19 5
789 02/01/20 4
999 10/01/19 0
999 02/01/20 1
I want to write a query that looks at each ID to see if the salespeople have newly started working internationally between October 1st and February 1st.
The result I am looking for is the following:
id effective_date number_of_int_customers
123 02/01/20 3
999 02/01/20 1
The result would return only the salespeople who originally had 0 international customers and now have at least 1.
I have seen similar posts here that use nested queries to pull records where the first date and last have different values. But I only want to pull records where the original value was 0. Is there a way to do this in one query in SQL?

In your case, a simple aggregation would do -- assuming that 0 is the earliest value:
select id, max(number_of_int_customers)
from t
where effective_date in ('2019-10-01', '2020-02-01')
group by id
having min(number_of_int_customers) = 0;
Obviously, this is not correct if the values can decrease to zero. But this having clause fixes that problem:
having min(case when number_of_int_customers = 0 then effective_date end) = min(effective_date)
An alternative is to use window functions, such asfirst_value():
select distinct id, last_noic
from (select t.*,
first_value(number_of_int_customers) over (partition by id order by effective_date) as first_noic,
first_value(number_of_int_customers) over (partition by id order by effective_date desc) as last_noic,
from t
where effective_date in ('2019-10-01', '2020-02-01')
) t
where first_noic = 0;
Hmmm, on second thought, I like lag() better:
select id, number_of_int_customers
from (select t.*,
lag(number_of_int_customers) over (partition by id order by effective_date) as prev_noic
from t
where effective_date in ('2019-10-01', '2020-02-01')
) t
where prev_noic = 0;

RANK records partitioned by a column in series (Vertica SQL)

I'm trying to use the Vertica rank analytic function to create a rank column partitioned by a column, but only include records that are in a series. For example the query below produces the output below the query
select when_created, status
from tablea
when_created Status
1/1/2015 ACTIVE
3/1/2015 ACTIVE
4/1/2015 INACTIVE
4/6/2015 INACTIVE
6/7/2015 ACTIVE
10/9/2015 INACTIVE
I could modify my query to include a rank column which would produce the following output
select
when_created, status, rank() OVER (PARTITION BY status order by when_created) as rnk
from tablea
when_created Status rnk
1/1/2015 ACTIVE 1
3/1/2015 ACTIVE 2
4/1/2015 INACTIVE 1
4/6/2015 INACTIVE 2
6/7/2015 ACTIVE 3
10/9/2015 INACTIVE 3
However my goal is start over the rank when a series is broken so the desired output is:
when_created Status rnk
1/1/2015 ACTIVE 1
3/1/2015 ACTIVE 2
4/1/2015 INACTIVE 1
4/6/2015 INACTIVE 2
6/7/2015 ACTIVE 1
10/9/2015 INACTIVE 1
Is there a way to accomplish this using the RANK function or is there another way to do it in vertica sql?
Thanks,
Ben

This is a gap-and-islands problem, where the tricky part is to identify the groups to use for a row_number() calculation. One solution uses a difference of row numbers to identify the different groups:
select a.*,
row_number() over (partition by status, seqnum - seqnum_s order by when_created) as rnk
from (select a.*,
row_number() over (order by when_created) as seqnum,
row_number() over (partition by status order by when_created) as seqnum_s
from tablea a
) a;
The logic behind this is tricky when you first see it. I advise you to run the subquery and understand the two row_number() calculations -- and to observe that the difference is constant for the groups you are interested in.

selecting top N rows for each group in a table

I am facing a very common issue regarding "Selecting top N rows for each group in a table".
Consider a table with id, name, hair_colour, score columns.
I want a resultset such that, for each hair colour, get me top 3 scorer names.
To solve this i got exactly what i need on Rick Osborne's blogpost "sql-getting-top-n-rows-for-a-grouped-query"
That solution doesn't work as expected when my scores are equal.
In above example the result as follow.
id name hair score ranknum
---------------------------------
12 Kit Blonde 10 1
9 Becca Blonde 9 2
8 Katie Blonde 8 3
3 Sarah Brunette 10 1
4 Deborah Brunette 9 2 - ------- - - > if
1 Kim Brunette 8 3
Consider the row 4 Deborah Brunette 9 2. If this also has same score (10) same as Sarah, then ranknum will be 2,2,3 for "Brunette" type of hair.
What's the solution to this?

If you're using SQL Server 2005 or newer, you can use the ranking functions and a CTE to achieve this:
;WITH HairColors AS
(SELECT id, name, hair, score,
ROW_NUMBER() OVER(PARTITION BY hair ORDER BY score DESC) as 'RowNum'
)
SELECT id, name, hair, score
FROM HairColors
WHERE RowNum <= 3
This CTE will "partition" your data by the value of the hair column, and each partition is then order by score (descending) and gets a row number; the highest score for each partition is 1, then 2 etc.
So if you want to the TOP 3 of each group, select only those rows from the CTE that have a RowNum of 3 or less (1, 2, 3) --> there you go!

The way the algorithm comes up with the rank, is to count the number of rows in the cross-product with a score equal to or greater than the girl in question, in order to generate rank. Hence in the problem case you're talking about, Sarah's grid would look like
a.name | a.score | b.name | b.score
-------+---------+---------+--------
Sarah | 9 | Sarah | 9
Sarah | 9 | Deborah | 9
and similarly for Deborah, which is why both girls get a rank of 2 here.
The problem is that when there's a tie, all girls take the lowest value in the tied range due to this count, when you'd want them to take the highest value instead. I think a simple change can fix this:
Instead of a greater-than-or-equal comparison, use a strict greater-than comparison to count the number of girls who are strictly better. Then, add one to that and you have your rank (which will deal with ties as appropriate). So the inner select would be:
SELECT a.id, COUNT(*) + 1 AS ranknum
FROM girl AS a
INNER JOIN girl AS b ON (a.hair = b.hair) AND (a.score < b.score)
GROUP BY a.id
HAVING COUNT(*) <= 3
Can anyone see any problems with this approach that have escaped my notice?

Use this compound select which handles OP problem properly
SELECT g.* FROM girls as g
WHERE g.score > IFNULL( (SELECT g2.score FROM girls as g2
WHERE g.hair=g2.hair ORDER BY g2.score DESC LIMIT 3,1), 0)
Note that you need to use IFNULL here to handle case when table girls has less rows for some type of hair then we want to see in sql answer (in OP case it is 3 items).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Increase number if row value duped - sql

Sorry not sure how to example this by using words (that's why have an issue to find an answer) I need that Sequence will increase the number if ID value is duped ID Sequence Value 1111 0 234324 2222 0 23432 3333 0 324 3333 1 234 3333 2 432234 4444 0 23423 4444 1 234

If you want to start from 0, use this query: select *, ROW_NUMBER()over( partition by id order by id)-1 from tbl

Related

SQL Server: How to retrieve all record based on recent datetime

Dense Rank grouping by IDs

How to select rows where values changed for an ID

RANK records partitioned by a column in series (Vertica SQL)

selecting top N rows for each group in a table

Categories

Resources