How to distinguish rows in a database table on the basis of two or more columns while returning all columns in sql server - sql

I want to distinguish Rows on the basis of two or more columns value of the same table at the same time returns all columns from the table.
Ex: I have this table
DB Table
I want my result to be displayed as: filter on the basis of type and Number only. As in abover table type and Number for first and second Row is same so it should be suppressed in result.
txn item Discrip Category type Number Mode
60 2 Loyalty L 6174 XXXXXXX1390 0
60 4 Visa C 1600 XXXXXXXXXXXX4108 1
I have tried with sub query but yet unsuccessful. Please suggest what to try.
Thanks

You can do what you want with row_number():
select t.*
from (select t.*,
row_number() over (partition by type, number order by item) as seqnum
from t
) t
where seqnum = 1;

Related

Limiting output of rows based on count of values in another table?

As a base example, I have a query that effectively produces a table with a list of values (ID numbers), each of which is attached to a specific category. As a simplified example, it would produce something like this (but at a much larger scale):
IDS
Categories
12345
type 1
12456
type 6
77689
type 3
32456
type 4
67431
type 2
13356
type 2
.....
.....
Using this table, I want to populate another table that gives me a list of ID numbers, with a limit placed on how many of each category are in that list, cross referenced against a sort of range based chart. For instance, if there are 5-15 IDS of type 1 in my first table, I want the new table with the column of IDS to have 3 type 1 IDS in it, if there are 15-30 type 1 IDS in the first table, I want to have 6 type 1 IDS in the new table.
This sort of range based limit would apply to each category, and the IDS would all populate the same column in the new table. The order, or specific IDS that end up in the final table don't matter, as long as the correct number of IDS end up as a part of that final list of ID numbers. This is being used to provide a semi-random sampling of ID numbers based on categories for a sort of QA related process.
If parts of this are unclear I can do my best to explain more. My initial thought was using a variable for a limit clause, but that isnt possible. I have been trying to sort out how to do this with a case statement but I'm really just not making any headway there but I feel like I am at this sort of paper thin wall I just can't break through.
You can use two window functions:
COUNT to keep track of the amount of ids for each category
ROW_NUMBER to uniquely identify each id within each category
Once you have collected these information, it's sufficient to keep all those rows that satisfy either of the following conditions:
count of rows less or equal to 30 >> ranking less or equal to 6
count of rows less or equal to 15 >> ranking less or equal to 3
WITH cte AS (
SELECT IDS,
Categories,
ROW_NUMBER() OVER(ORDER BY IDS PARTITION BY Categories) AS rn
COUNT(IDS) OVER(PARTITION BY Categories) AS cnt
FROM tab
)
SELECT *
FROM cte
WHERE (rn <= 3 AND cnt <= 15)
OR (rn <= 6 AND cnt <= 30)
Note: If you have concerns regarding a specific ordering, you need to fix the ORDER BY clause inside the ROW_NUMBER window function.

group and return rows with the minimum value

There is a tasks table.
id | name | project_id | created | ...
Tasks can be in different projects. I need to return one task from each project with a minimum creation date. Here is my solution
SELECT *
FROM tasks a
JOIN (
SELECT project_id, min(created) as created
FROM tasks
GROUP BY project_id
) b
ON a.project_id=b.project_id AND a.created = b.created;
but if there are points in the project with the same creation dates, then I return two records for one project
To ensure that 1, and only 1, row is returned per project_id a better method is to use row_number() over() where the partition by within the over() clause is similar to what you would have grouped by and the order by controls which row within each partition is given the value of 1. In this case the value of 1 is given to a row with the earliest created date, and further columns can also be referenced as tie-breakers (e.g. using id). Every other row within the partition is given the next integer value so only one row in each partition can be equal to 1. So to limit the final result, use a derived table (subquery) followed by a where clause that restricts the result to the first row per partition i.e. where rn = 1.
SELECT
*
FROM (SELECT *
, row_number() over(partition by project_id order by created, id) as rn
FROM tasks
) AS derived
WHERE rn = 1
nb: to get the most recent row reverse the direction of ordering on the date column
Not only will this technique ensure only 1 row per partition is returned it also requires fewer passes through the data (than your original approach), so it is efficient as well.
tip: if you did want to get more than 1 row per partition returned then use rank() or dense_rank() instead of row_number() - because the ranking functions will recognize rows of equal rank and hence return the same rank value. i.e. more than 1 row could get a rank value of 1

Group by question in SQL Server, migration from MySQL

Failed finding a solution to my problem, would love your help.
~~ Post has been edited to have only one question ~~-
Group by one query while selecting multiple columns.
In MySQL you can simply group by whatever you want, and it will still select all of them, so if for example I wanted to select the newest 100 transactions, grouped by Email (only get the last transaction of a single email)
In MySQL I would do that:
SELECT * FROM db.transactionlog
group by Email
order by TransactionLogId desc
LIMIT 100;
In SQL Server its not possible, googling a bit suggested to specify each column that I want to have with an aggregate as a hack, that couldn't cause a mix of values (mixing columns between the grouped rows)?
For example:
SELECT TOP(100)
Email,
MAX(ResultCode) as 'ResultCode',
MAX(Amount) as 'Amount',
MAX(TransactionLogId) as 'TransactionLogId'
FROM [db].[dbo].[transactionlog]
group by Email
order by TransactionLogId desc
TransactionLogId is the primarykey which is identity , ordering by it to achieve the last inserted.
Just want to know that the ResultCode and Amount that I'll get doing such query will be of the last inserted row, and not the highest of the grouped rows or w/e.
~Edit~
Sample data -
row1:
Email : test#email.com
ResultCode : 100
Amount : 27
TransactionLogId : 1
row2:
Email: test#email.com
ResultCode:50
Amount: 10
TransactionLogId: 2
Using the sample data above, my goal is to get the row details of
TransactionLogId = 2.
but what actual happens is that I get a mixed values of the two, as I do get transactionLogId = 2, but the resultcode and amount of the first row.
How do I avoid that?
Thanks.
You should first find out which is the latest transaction log by each email, then join back against the same table to retrieve the full record:
;WITH MaxTransactionByEmail AS
(
SELECT
Email,
MAX(TransactionLogId) as LatestTransactionLogId
FROM
[db].[dbo].[transactionlog]
group by
Email
)
SELECT
T.*
FROM
[db].[dbo].[transactionlog] AS T
INNER JOIN MaxTransactionByEmail AS M ON T.TransactionLogId = M.LatestTransactionLogId
You are currently getting mixed results because your aggregate functions like MAX() is considering all rows that correspond to a particular value of Email. So the MAX() value for the Amount column between values 10 and 27 is 27, even if the transaction log id is lower.
Another solution is using a ROW_NUMBER() window function to get a row-ranking by each Email, then just picking the first row:
;WITH TransactionsRanking AS
(
SELECT
T.*,
MostRecentTransactionLogRanking = ROW_NUMBER() OVER (
PARTITION BY
T.Email -- Start a different ranking for each different value of Email
ORDER BY
T.TransactionLogId DESC) -- Order the rows by the TransactionLogID descending
FROM
[db].[dbo].[transactionlog] AS T
)
SELECT
T.*
FROM
TransactionsRanking AS T
WHERE
T.MostRecentTransactionLogRanking = 1

how can I calculate the sum of my top n records in crystal report?

I m using report tab -> group sort expert-> top n to get top n record but i m getting sum of value in report footer for all records
I want only sum of value of top n records...
In below image i have select top 3 records but it gives sum of all records.
The group sort expert (and the record sort expert too) intervenes in your final result after the total summary is calculated. It is unable to filter and remove rows, in the same way an ORDER BY clause of SQL cannot effect the SELECT's count result (this is a job for WHERE clause). As a result, your summary will always be computed for all rows of your detail section and, of course, for all your group sums.
If you have in mind a specific way to exlude specific rows in order to appear the appropriate sum the you can use the Select Expert of Crystal Reports to remove rows.
Alternatively (and I believe this is the best way), I would make all the necessary calculations in the SQL command and I would sent to the report only the Top 3 group sums (then you can get what you want with a simple total summary of these 3 records)
Something like that
CREATE TABLE #TEMP
(
DEP_NAME varchar(50),
MINVAL int,
RMAVAL int,
NETVAL int
)
INSERT INTO #TEMP
SELECT TOP 3
T.DEP_NAME ,T.MINVAL,T.RMAVAL,T.NETVAL
FROM
(SELECT DEP_NAME AS DEP_NAME,SUM(MINVAL) AS MINVAL,SUM(RMAVAL) AS
RMAVAL,SUM(NETVAL) AS NETVAL
FROM YOURTABLE
GROUP BY DEP_NAME) AS T
ORDER BY MINVAL DESC
SELECT * FROM #TEMP

SQL Server Sum multiple rows into one - no temp table

I would like to see a most concise way to do what is outlined in this SO question: Sum values from multiple rows into one row
that is, combine multiple rows while summing a column.
But how to then delete the duplicates. In other words I have data like this:
Person Value
--------------
1 10
1 20
2 15
And I want to sum the values for any duplicates (on the Person col) into a single row and get rid of the other duplicates on the Person value. So my output would be:
Person Value
-------------
1 30
2 15
And I would like to do this without using a temp table. I think that I'll need to use OVER PARTITION BY but just not sure. Just trying to challenge myself in not doing it the temp table way. Working with SQL Server 2008 R2
Simply put, give me a concise stmt getting from my input to my output in the same table. So if my table name is People if I do a select * from People on it before the operation that I am asking in this question I get the first set above and then when I do a select * from People after the operation, I get the second set of data above.
Not sure why not using Temp table but here's one way to avoid it (tho imho this is an overkill):
UPDATE MyTable SET VALUE = (SELECT SUM(Value) FROM MyTable MT WHERE MT.Person = MyTable.Person);
WITH DUP_TABLE AS
(SELECT ROW_NUMBER()
OVER (PARTITION BY Person ORDER BY Person) As ROW_NO
FROM MyTable)
DELETE FROM DUP_TABLE WHERE ROW_NO > 1;
First query updates every duplicate person to the summary value. Second query removes duplicate persons.
Demo: http://sqlfiddle.com/#!3/db7aa/11
All you're asking for is a simple SUM() aggregate function and a GROUP BY
SELECT Person, SUM(Value)
FROM myTable
GROUP BY Person
The SUM() by itself would sum up the values in a column, but when you add a secondary column and GROUP BY it, SQL will show distinct values from the secondary column and perform the aggregate function by those distinct categories.