Group by multiple columns, get group total count and specific column from last two rows in each group - sql

I have an SQL Server table with the following columns:
Notification
===================
Id (int)
UserId (int)
Area (int)
Action (int)
ObjectId (int)
RelatedUserLink (nvarchar(100))
Created (datetime)
The goal is to create a query that groups notifications of the same Area, Action and ObjectId for a specific user (UserId) and
returns a single row including total count for the group and also the value of a specific column for the last two rows.
The query will only be executed for one user (UserId) each time.
The problem is that I need the column RelatedUserLink for the last two records (based on Created) of each group. The RelatedUserLink should be distinct for each group (if there are more than one, only the latest should be included and counted).
The result for each group should be represented in one result-row. It doesn´t matter if the two RelatedUserLink-values are concatenated in the same column or separated in two columns as "RelatedUserLink1" and "RelatedUserLink2". If the group only consists of one result the second RelatedUserLink should simply be null.
Desired result:
UserId | Area | Action | ObjectId | RelatedUserLink1 | RelatedUserLink2 | Created (latest in group) | Count
10 1 2 100 "userlink1" "userlink2" 2016-04-08 20
10 1 3 200 "userlink1" "userlink2" 2016-04-09 4
The table will be quite large, 100.000-200.000 rows.
(The related User-table has approx 10.000 rows)
I also have the option to get all notifications for a user and then do the grouping in code but I hope there is a faster way by letting SQL server handle it!?
Any help is much appreciated!
Thanks!

I would attempt this by using the following WITH clause:
WITH RUL AS (
select
UserId,
Area,
Action,
ObjectId,
RelatedUserLink as RelatedUserLink1,
LAG(RelatedUserLink) OVER (PARTITION BY UserId, Area, Action, ObjectId ORDER BY Created) as RelatedUserLink2,
ROW_NUMBER() OVER (PARTITION BY UserId, Area, Action, ObjectId ORDER BY Created DESC) latest_to_earliest,
MAX(Created) OVER (PARTITION BY UserId, Area, Action, ObjectId) as Created,
COUNT(*) OVER OVER (PARTITION BY UserId, Area, Action, ObjectId) as Count
from
Notification
where UserId = 10
)
select
UserId,
Area,
Action,
ObjectId,
RelatedUserLink1,
RelatedUserLink2,
Created,
Count
from
RUL
where
latest_to_earliest = 1;
The LAG function will always hold the previous RelatedUserLink value (unless there is only one value in the group, which means it will be NULL). The ROW_NUMBER counts down through the group in Created order until it reaches 1 at the last row. The MAX and COUNT functions keep the maximum and count values for the entire group on each row, effectively the same as a GROUP BY, eliminating the need to perform a separate query and join back.
The SELECT outside the WITH clause just picks up the final row for each group, which should hold the last RelatedUserLink value in RelatedUserLink1 and the penultimate (or NULL) RelatedUserLink value in RelatedUserLink2.

Related

group and return rows with the minimum value

There is a tasks table.
id | name | project_id | created | ...
Tasks can be in different projects. I need to return one task from each project with a minimum creation date. Here is my solution
SELECT *
FROM tasks a
JOIN (
SELECT project_id, min(created) as created
FROM tasks
GROUP BY project_id
) b
ON a.project_id=b.project_id AND a.created = b.created;
but if there are points in the project with the same creation dates, then I return two records for one project
To ensure that 1, and only 1, row is returned per project_id a better method is to use row_number() over() where the partition by within the over() clause is similar to what you would have grouped by and the order by controls which row within each partition is given the value of 1. In this case the value of 1 is given to a row with the earliest created date, and further columns can also be referenced as tie-breakers (e.g. using id). Every other row within the partition is given the next integer value so only one row in each partition can be equal to 1. So to limit the final result, use a derived table (subquery) followed by a where clause that restricts the result to the first row per partition i.e. where rn = 1.
SELECT
*
FROM (SELECT *
, row_number() over(partition by project_id order by created, id) as rn
FROM tasks
) AS derived
WHERE rn = 1
nb: to get the most recent row reverse the direction of ordering on the date column
Not only will this technique ensure only 1 row per partition is returned it also requires fewer passes through the data (than your original approach), so it is efficient as well.
tip: if you did want to get more than 1 row per partition returned then use rank() or dense_rank() instead of row_number() - because the ranking functions will recognize rows of equal rank and hence return the same rank value. i.e. more than 1 row could get a rank value of 1

field subtraction sql server

If I would like to subtract the fields from each other,
i.e. in A there are 11 fields described as 'Faktura zakupu' and in B there are 5 fields described as 'Faktura zakupu'. I would like to get a return of records in the form of 6 items 'Faktura zakupu' (11-5 = 6)
I tried the EXCEPT operation, but it does not return the desired results
what operation do i need to perform?
You can add row number to each row in both tables. Then SQL Server can determine that the first (Faktura zakupu, Original) in table A is a duplicate of the first (Faktura zakupu, Original) in table B and remove it during EXCEPT operation:
SELECT Name, StatusReq, ROW_NUMBER() OVER (PARTITION BY Name, StatusReq ORDER BY (SELECT NULL))
FROM a
EXCEPT
SELECT Name, StatusReq, ROW_NUMBER() OVER (PARTITION BY Name, StatusReq ORDER BY (SELECT NULL))
FROM b
It'll return 6 rows from table A... numbered 6 through 11.

Group by question in SQL Server, migration from MySQL

Failed finding a solution to my problem, would love your help.
~~ Post has been edited to have only one question ~~-
Group by one query while selecting multiple columns.
In MySQL you can simply group by whatever you want, and it will still select all of them, so if for example I wanted to select the newest 100 transactions, grouped by Email (only get the last transaction of a single email)
In MySQL I would do that:
SELECT * FROM db.transactionlog
group by Email
order by TransactionLogId desc
LIMIT 100;
In SQL Server its not possible, googling a bit suggested to specify each column that I want to have with an aggregate as a hack, that couldn't cause a mix of values (mixing columns between the grouped rows)?
For example:
SELECT TOP(100)
Email,
MAX(ResultCode) as 'ResultCode',
MAX(Amount) as 'Amount',
MAX(TransactionLogId) as 'TransactionLogId'
FROM [db].[dbo].[transactionlog]
group by Email
order by TransactionLogId desc
TransactionLogId is the primarykey which is identity , ordering by it to achieve the last inserted.
Just want to know that the ResultCode and Amount that I'll get doing such query will be of the last inserted row, and not the highest of the grouped rows or w/e.
~Edit~
Sample data -
row1:
Email : test#email.com
ResultCode : 100
Amount : 27
TransactionLogId : 1
row2:
Email: test#email.com
ResultCode:50
Amount: 10
TransactionLogId: 2
Using the sample data above, my goal is to get the row details of
TransactionLogId = 2.
but what actual happens is that I get a mixed values of the two, as I do get transactionLogId = 2, but the resultcode and amount of the first row.
How do I avoid that?
Thanks.
You should first find out which is the latest transaction log by each email, then join back against the same table to retrieve the full record:
;WITH MaxTransactionByEmail AS
(
SELECT
Email,
MAX(TransactionLogId) as LatestTransactionLogId
FROM
[db].[dbo].[transactionlog]
group by
Email
)
SELECT
T.*
FROM
[db].[dbo].[transactionlog] AS T
INNER JOIN MaxTransactionByEmail AS M ON T.TransactionLogId = M.LatestTransactionLogId
You are currently getting mixed results because your aggregate functions like MAX() is considering all rows that correspond to a particular value of Email. So the MAX() value for the Amount column between values 10 and 27 is 27, even if the transaction log id is lower.
Another solution is using a ROW_NUMBER() window function to get a row-ranking by each Email, then just picking the first row:
;WITH TransactionsRanking AS
(
SELECT
T.*,
MostRecentTransactionLogRanking = ROW_NUMBER() OVER (
PARTITION BY
T.Email -- Start a different ranking for each different value of Email
ORDER BY
T.TransactionLogId DESC) -- Order the rows by the TransactionLogID descending
FROM
[db].[dbo].[transactionlog] AS T
)
SELECT
T.*
FROM
TransactionsRanking AS T
WHERE
T.MostRecentTransactionLogRanking = 1

Access 10th through 70th element in STRUCT

I have 3 fields: username, tracking_id, timestamp. One user will have multiple rows (some have more, some have less) with different tracking ids and timestamps for each action he has taken on my website. I want to group by the username and get the tracking ids of that user's 10th through 70th action. I use standard SQL on BigQuery.
First problem is, I can't find syntax to access a range in the STRUCT (only a single row or using a limit to get the first/last 70 rows for example). Then, I can image after managing to access a range, there could be an issue with the index being out of bounds because some users might not have 70 or more actions.
SELECT
username,
ARRAY_AGG(STRUCT(tracking_id,
timestamp)
ORDER BY
timestamp
)[OFFSET (9 to 69)] #??????
FROM
table
The result should be a table with the same 3 fields: username, tracking_id, timestamp, but instead of containing ALL the user's rows, it should only contain each users 10th to 70th row.
Below is for BigQuery Standard SQL
#standardSQL
SELECT username,
ARRAY_AGG(STRUCT(tracking_id, `timestamp`) ORDER BY `timestamp`) AS selected_actions
FROM (
SELECT * EXCEPT(pos) FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY username ORDER BY `timestamp`) pos
FROM `project.dataset.table`
)
WHERE pos BETWEEN 10 AND 70
)
GROUP BY username

How to distinguish rows in a database table on the basis of two or more columns while returning all columns in sql server

I want to distinguish Rows on the basis of two or more columns value of the same table at the same time returns all columns from the table.
Ex: I have this table
DB Table
I want my result to be displayed as: filter on the basis of type and Number only. As in abover table type and Number for first and second Row is same so it should be suppressed in result.
txn item Discrip Category type Number Mode
60 2 Loyalty L 6174 XXXXXXX1390 0
60 4 Visa C 1600 XXXXXXXXXXXX4108 1
I have tried with sub query but yet unsuccessful. Please suggest what to try.
Thanks
You can do what you want with row_number():
select t.*
from (select t.*,
row_number() over (partition by type, number order by item) as seqnum
from t
) t
where seqnum = 1;