Keep Track of already summed tuples sql - sql

If we have a table with values for a and b, is there a way to only add up the b's if its not a duplicate a? For example
a b
1 2
2 3
2 3
so we would get only 5 (instead of 8)
A sort of
select sum(b if unique a),
from table
where ...

The following query selects the lowest value of b for each group a
select min(b) min_b
from mytable
group by a
You can then sum those values by selecting the sum from a derived table
select sum(min_b) from (
select min(b) min_b
from mytable
group by a
) t
http://sqlfiddle.com/#!9/d82c5/1

You haven't specified your RDBMS, but if you are using a database which supporting window functions like SQL Server, you can query the unique rows first by using WITH clause and ROW_NUMBER() function and then get the SUM out of that.
;WITH C AS(
SELECT a, b,
ROW_NUMBER() OVER (PARTITION BY a ORDER BY a) AS Rn
FROM Table1
)
SELECT SUM(b) FROM C
WHERE Rn = 1
SQL Fiddle

Related

Update column as Duplicate

I have a table with three columns, A, B, and status.
first, I filter the table to get only duplicate value
using this query
SELECT A
FROM Table_1
GROUP BY A
HAVING COUNT(A) >1
the output :
In the second step, I need to check if column B has a duplicate value or not, if have duplicate I need to update the status as D.
I try this query
UPDATE Table_1
SET status = 'D'
WHERE exists
(SELECT B
FROM Table_1
GROUP BY B
HAVING COUNT(B) >1)
but it is updated all the rows.
The following does what you need using row_number to identify any group with a duplicate and an updateable CTE to check for any row that's part of a group with a duplicate:
with d as (
select *, row_number() over(partition by a,b order by a,b) dn
from t
)
update d set d.status='D'
where exists (select * from d d2 where d2.a=d.a and d2.b=d.b and d2.dn>1)
You can do this with an updatable CTE without any further joins by using a windowed COUNT
WITH d AS (
SELECT *,
cnt = COUNT(*) OVER (PARTITION BY a, b)
FROM t
)
UPDATE d
SET status = 'D'
WHERE cnt > 1;

SQL Server query for all columns with group by and having

I'm wondering is there a way to query all columns with group by and having in SQL Server? For example, I have 6 columns, a, b,…,f, and this is something I want to get:
Select *
From table
Group by table.b, table.c
Having max(table.d)=table.d
This works in sybase, since I'm trying to migrate stuff from sybase to SQL Server, I'm not sure what I can do in new environment. Thanks.
Why do you want to group by every column when you don't use any aggragate-functions in your select? Just use the following code to get all columns of the table:
select * from table
Group by only gets used when you have aggragete-functions (e.g. max(), avg(), count(), ...) in your select.
Having limits the aggrageted columns and where the normal columns of the table.
You can use MIN, MAX, AVG, and COUNT functions with the OVER clause to provide aggregated values for each column (to imitate the group by clause for each column) and Common table expression CTE to filter out the results (to imitate the having clause) as:
;With CTE as
(
SELECT
MIN(a) OVER (PARTITION BY a) AS MinCol_a
, MAX(b) OVER (PARTITION BY b) AS MaxCol_b
, AVG(c) OVER (PARTITION BY c) AS AvgCol_c
, COUNT(e) OVER (PARTITION BY d) AS Counte_PerCol_d
FROM Tbl_Test
)
select MinCol_a,MaxCol_b ,AvgCol_c,Counte_PerCol_d
from CTE
Join --here you can join the table Test results with other tables
where --any filter condition similar to Having clause
If what you want is to get the rows with maximum d for each combination of b and c then use NOT EXISTS:
select t.* from tablename t
where not exists (
select 1 from tablename
where b = t.b and c = t.c and d > t.d
)
or with rank() window function:
select t.a, t.b, t.c, t.d, t.e, t.f
from (
select *,
rank() over (partition by b, c order by d desc) rn
from tablename
) t
where t.rn = 1
Without using having you can get the result which you want. Try below
Select table.b, table.c, max(table.d)
From table
Group by table.b, table.c

SQL counting values and selecting Column Name as result

I am new to SQL and a little embarrassed to ask this.
I have A table that contains 2 columns A and B
A B
0 2
1 3
3 1
I want a query that will return
Category | Sum
A 4
B 6
What is the best way to write this query?
select 'A', sum(A) from table
union
select 'B', sum(B) from table
Or this...
SELECT SUM(A) A
, SUM(B) B
FROM #MyTable
If UNPIVOT is supported by the SQL product you are using:
SELECT Category, SUM(Value) AS Sum
FROM atable
UNPIVOT (Value FOR Category IN (A, B)) u
GROUP BY Category
;
In particular, the above syntax works in Oracle and SQL Server.

How to select duplicate records without a primary key in SQL Server

If I run this query:
SELECT
a,
b,
c,
...
FROM [DMS].[dbo].[CreditDebitAdjustment]
I get 24197 records.
If I run this query:
SELECT DISTINCT
a,
b,
c,
...
FROM [DMS].[dbo].[CreditDebitAdjustment]
I get 24176 records.
How do I go about selecting only the rows that are identical?
SELECT
a,
b,
c
FROM [DMS].[dbo].[CreditDebitAdjustment]
group by a,b,c
having count(*) > 1
If you want to delete those duplicates, use
;WITH CTE AS
(
SELECT
a, b, c,
RowNum = ROW_NUMBER() OVER(PARTITION BY a,b,c ORDER BY ...(define how to order those rows)..)
FROM
[DMS].[dbo].[CreditDebitAdjustment]
)
DELETE FROM CTE
WHERE RowNum > 1
This "partitions" (groups) all your data by the tuple (a,b,c) and gives each row a number - starting at 1 for each new tuple.
So any cases where you have a RowNum that's larger than 1 - that's a duplicate, and I delete it away.
But really: any serious data table ought to have a proper primary key!

get subset of a table in SQL

I want to get a subset of a table, here's the example:
1 A
2 A
3 B
4 B
5 C
6 D
7 D
8 D
I want to get the unique record, but with the smallest id:
1 A
3 B
5 C
6 D
How can I write the SQL in SQL Server? Thanks!
Use a common-table expression like this:
;WITH DataCTE AS
(
SELECT ID, OtherCol,
ROW_NUM() OVER(PARTITION BY OtherCol ORDER BY ID) 'RowNum'
FROM dbo.YourTable
)
SELECT *
FROM DataCTE
WHERE RowNum = 1
This "partitions" your data by the second column you have (A, B, C) and orders by the ID (1, 2, 3) - smallest ID first.
Therefore, for each "partition" (i.e. each value of your second column), the entry with RowNum = 1 is the one with the smallest ID for each value of the second column.
select min(id), othercol
from thetable
group by othercol
and maybe with
order by othercol
... at the end if thats important
Try this:
SELECT MIN(Id) AS Id, Name
FROM MyTable
GROUP BY Name
select min(id), column2
from table
group by column2
It helps if you provide the table information in the question - I've just guessed at the column names...