group and return rows with the minimum value - sql

There is a tasks table.
id | name | project_id | created | ...
Tasks can be in different projects. I need to return one task from each project with a minimum creation date. Here is my solution
SELECT *
FROM tasks a
JOIN (
SELECT project_id, min(created) as created
FROM tasks
GROUP BY project_id
) b
ON a.project_id=b.project_id AND a.created = b.created;
but if there are points in the project with the same creation dates, then I return two records for one project

To ensure that 1, and only 1, row is returned per project_id a better method is to use row_number() over() where the partition by within the over() clause is similar to what you would have grouped by and the order by controls which row within each partition is given the value of 1. In this case the value of 1 is given to a row with the earliest created date, and further columns can also be referenced as tie-breakers (e.g. using id). Every other row within the partition is given the next integer value so only one row in each partition can be equal to 1. So to limit the final result, use a derived table (subquery) followed by a where clause that restricts the result to the first row per partition i.e. where rn = 1.
SELECT
*
FROM (SELECT *
, row_number() over(partition by project_id order by created, id) as rn
FROM tasks
) AS derived
WHERE rn = 1
nb: to get the most recent row reverse the direction of ordering on the date column
Not only will this technique ensure only 1 row per partition is returned it also requires fewer passes through the data (than your original approach), so it is efficient as well.
tip: if you did want to get more than 1 row per partition returned then use rank() or dense_rank() instead of row_number() - because the ranking functions will recognize rows of equal rank and hence return the same rank value. i.e. more than 1 row could get a rank value of 1

Related

How to remove duplicate data from microsoft sql database(on the result only)

the column code has values that have duplicate on it , i do want to remove the duplicate of that row.
for example i want to remove the duplicates of column code as well the row that has duplicate on it. it doesent matter if the other column has duplicate but i do want to base it on the code column. what sql query can i use.? Thank you
this is the table I am working to.
as you can see there are isdeleted column that has value of 1 on them. I only want the recored with a value of 0 on them
here is a sample record, in here you can see that row 1 has a isdeleted value of 1, which mean that this record is deleted and i only need the row 2 of this code.
You could use the windowing function ROW_NUMBER() to single out the last entry per code like in:
SELECT code, shortdesc, longdesc, isobsolete, effectivefromdate
FROM (
SELECT ROW_NUMBER() OVER(PARTITION BY code ORDER BY effectivefromdate DESC) AS rn, *
FROM CodingSuite_STG
WHERE isobsolete=1 AND isdeleted=0
) AS cs
WHERE rn=1
ORDER BY effectivefromdate
Explanation:
Core of the operation is a "sub-query". That is a "table-like" expression generated by having a SELECT clause surrounded by parentheses and followed by a table name like:
( SELECT * FROM CodingSuite_STG WHERE iobsolete=1 ) AS cs
For the outer SELECT it will appear like a table with the name "cs".
Within this sub-query I placed a special function (a "window function") consisting of two parts:
ROWN_NUMBER() OVER ( PARTITION BY code ORDER BY effectivefromdate DESC) AS rn
The ROW_NUMBER() function returns a sequential number for a certain "window" of records defined by the immediately following OVER ( ... ) clause. The PARTITION BY inside it defines a group division scheme (similar to GROUP BY), so the row numbers start from 1 for each partitioned group. ORDER BY determines the numbering order within each group. So, with entries having the same code value ROW_NUMBER() will supply the number sequence 1, 2, 3... for each record, with 1 being assigned to the record with the highest value of effectivefromdate because of ORDER BY effectivefromdate DESC.
All we need to do in the outer SELECT clause is to pick up those records from the sub-query cs that have an rn-value of 1 and we're done!

How to distinguish rows in a database table on the basis of two or more columns while returning all columns in sql server

I want to distinguish Rows on the basis of two or more columns value of the same table at the same time returns all columns from the table.
Ex: I have this table
DB Table
I want my result to be displayed as: filter on the basis of type and Number only. As in abover table type and Number for first and second Row is same so it should be suppressed in result.
txn item Discrip Category type Number Mode
60 2 Loyalty L 6174 XXXXXXX1390 0
60 4 Visa C 1600 XXXXXXXXXXXX4108 1
I have tried with sub query but yet unsuccessful. Please suggest what to try.
Thanks
You can do what you want with row_number():
select t.*
from (select t.*,
row_number() over (partition by type, number order by item) as seqnum
from t
) t
where seqnum = 1;

Group by multiple columns, get group total count and specific column from last two rows in each group

I have an SQL Server table with the following columns:
Notification
===================
Id (int)
UserId (int)
Area (int)
Action (int)
ObjectId (int)
RelatedUserLink (nvarchar(100))
Created (datetime)
The goal is to create a query that groups notifications of the same Area, Action and ObjectId for a specific user (UserId) and
returns a single row including total count for the group and also the value of a specific column for the last two rows.
The query will only be executed for one user (UserId) each time.
The problem is that I need the column RelatedUserLink for the last two records (based on Created) of each group. The RelatedUserLink should be distinct for each group (if there are more than one, only the latest should be included and counted).
The result for each group should be represented in one result-row. It doesn´t matter if the two RelatedUserLink-values are concatenated in the same column or separated in two columns as "RelatedUserLink1" and "RelatedUserLink2". If the group only consists of one result the second RelatedUserLink should simply be null.
Desired result:
UserId | Area | Action | ObjectId | RelatedUserLink1 | RelatedUserLink2 | Created (latest in group) | Count
10 1 2 100 "userlink1" "userlink2" 2016-04-08 20
10 1 3 200 "userlink1" "userlink2" 2016-04-09 4
The table will be quite large, 100.000-200.000 rows.
(The related User-table has approx 10.000 rows)
I also have the option to get all notifications for a user and then do the grouping in code but I hope there is a faster way by letting SQL server handle it!?
Any help is much appreciated!
Thanks!
I would attempt this by using the following WITH clause:
WITH RUL AS (
select
UserId,
Area,
Action,
ObjectId,
RelatedUserLink as RelatedUserLink1,
LAG(RelatedUserLink) OVER (PARTITION BY UserId, Area, Action, ObjectId ORDER BY Created) as RelatedUserLink2,
ROW_NUMBER() OVER (PARTITION BY UserId, Area, Action, ObjectId ORDER BY Created DESC) latest_to_earliest,
MAX(Created) OVER (PARTITION BY UserId, Area, Action, ObjectId) as Created,
COUNT(*) OVER OVER (PARTITION BY UserId, Area, Action, ObjectId) as Count
from
Notification
where UserId = 10
)
select
UserId,
Area,
Action,
ObjectId,
RelatedUserLink1,
RelatedUserLink2,
Created,
Count
from
RUL
where
latest_to_earliest = 1;
The LAG function will always hold the previous RelatedUserLink value (unless there is only one value in the group, which means it will be NULL). The ROW_NUMBER counts down through the group in Created order until it reaches 1 at the last row. The MAX and COUNT functions keep the maximum and count values for the entire group on each row, effectively the same as a GROUP BY, eliminating the need to perform a separate query and join back.
The SELECT outside the WITH clause just picks up the final row for each group, which should hold the last RelatedUserLink value in RelatedUserLink1 and the penultimate (or NULL) RelatedUserLink value in RelatedUserLink2.

Write Oracle SQL query to fetch from Tasks table top Approval Statuses that appear after some first null value

Write Oracle SQL query to fetch from Tasks table top Approval Statuses that appear after some first null value in the Approval_Status Column and then Approval Status sequence and then some null values
Facts
I only need the top Approval Statuses sequence
Serial Number for each task ID Sequence starts from 1 and then comes in Sequence like 1.2.3... and so on
There are thousands of tasks in the table like from T1 .... Tn
See the Query Result below i need to write a query that returns data in that format
I have heard analytic function i.e. "Partition By clause" for this can be used but i don't know how to use that
Tasks
Query Result
I really appreciate experts help in this regard
Thanks
You can do this with analytic functions, but there is a trick. The idea is to look only at rows where approval_status is not null. You want the first group of sequential serial numbers in this group.
The group is identified by the difference between a sequence that enumerates all the rows and the existing serial number. To get the first, use dense_rank(). Finally, choose the first by looking for the ones with a rank equal to 1:
select t.*
from (select t.*, dense_rank(diff) over (partition by taskid) as grpnum
from (select t.*,
(row_number() over (partition by taskid order by serial_number) -
serial_number
) as diff
from tasks
where approval_status is not null
) t
) t
where grpnum = 1;

How to get the position of a record in a table (SQL Server)

Following problem:
I need to get the position of a record in the table. Let's say I have five record in the table:
Name: john doe, ID: 1
Name: jane doe, ID: 2
Name: Frankie Boy, ID: 4
Name: Johnny, ID: 9
Now, "Frankie Boy" is in the third position in the table. But How to get this information from the SQL server? I could count IDs, but they are not reliable, Frankie has the ID 4, but is in the third position because the record with the ID '3' was deleted.
Is there a way? I am aware of ROW_RANK but it would be costly, because I need to select basically the whole set first before I can rank row_rank them.
I am using MS SQL Server 2008 R2.
Tables don't have 'position'. Rows in a table (=set) are identified by their primary key value. Only result rows have 'position' which can be deterministic when a ORDER BY clause is present. Assuming that tables (=sets) have a position will lead only to problems and is the wrong mind set.
You can use row_number() to "label" rows. You've got to specify a way to order the rows:
select row_number() over (order by id) as PositionInTable
, *
from YourTable
If performance is an issue, you could store the position in a new column:
update yt1
set PositionInTable = rn
from YourTable yt1
join (
select row_number() over (order by id) as rn
, id
from YourTable
) yt2
on yt1.id = yt2.id
With an index on PositionInTable, this would be lightning fast. But you would have to update this after each insert on the table.
Tables are [conceptually] without order. Unless you specify ORDER BY in a select statement to order a results set, results may be returned in any order. Repeated executions of the exact same SQL may return the results set in different orders fro each execution.
To get the row number in a particular result set, use the row_number() function:
select row = row_number() over( order by id ) , *
from sysobjects
This will assign a row number to each row in sysobjects as if the table were ordered by id.
A simple way to do this without having to use ROW_NUMBER would be to simply count how many rows in the table have an index less or equal to the selected index, this would give the row number.
SELECT COUNT(*) FROM YourTable WHERE ID <= 4 -- Frankie Boy, Result = 3
This may not be the most efficient way to do it for your particular scenario, but it's a simple way of achieving it.