How can I remove duplicate rows from a table but keeping the summation of values of a column

How can I remove duplicate rows from a table but keeping the summation of values of a column - sql

Suppose there is a table which has several identical rows. I can copy the distinct values by
SELECT DISTINCT * INTO DESTINATIONTABLE FROM SOURCETABLE
but if the table has a column named value and for the sake of simplicity its value is 1 for one particular item in that table. Now that row has another 9 duplicates. So the summation of the value column for that particular item is 10. Now I want to remove the 9 duplicates(or copy the distinct value as I mentioned) and for that item now the value should show 10 and not 1. How can this be achieved?
item| value
----+----------------
A | 1
A | 1
A | 1
A | 1
B | 1
B | 1
I want to show this as below
item| value
----+----------------
A | 4
B | 2
Thanks in advance

You can try to use SUM and group by
SELECT item,SUM(value) value
FROM T
GROUP BY item
SQLfiddle:http://sqlfiddle.com/#!18/fac26/1
[Results]:
| item | value |
|------|-------|
| A | 4 |
| B | 2 |

Broadly speaking, you can just us a sum and a GROUP BY clause.
Something like:
SELECT column1, SUM(column2) AS Count
FROM SOURCETABLE
GROUP BY column1
Here it is in action: Sum + Group By
Since your table probably isn't just two columns of data, here is a slightly more complex example showing how to do this to a larger table: SQL Fiddle
Note that I've selected my rows individually so that I can access the necessary data, rather than using
SELECT *
And I have achieved this result without the need for selecting data into another table.
EDIT 2:
Further to your comments, it sounds like you want to alter the actual data in your table rather than just querying it. There may be a more elegant way to do this, but a simple way use the above query to populate a temporary table, delete the contents of the existing table, then move all the data back. To do this in my existing example:
WITH MyQuery AS (
SELECT name, type, colour, price, SUM(number) AS number
FROM MyTable
GROUP BY name, type, colour, price
)
SELECT * INTO MyTable2 FROM MyQuery;
DELETE FROM MyTable;
INSERT INTO MyTable(name, type, colour, price, number)
SELECT * FROM MyTable2;
DROP TABLE MyTable2;
WARNING: If youre going to try this, please use a development environment first (i.e one you don't mind breaking!) to ensure it does exactly what you want it to do. It's imperative that your initial query captures ALL the data you want.
Here is the SQL Fiddle of this example in action: SQL Fiddle

Related

Is the ordering of a GROUP BY with a MAX aggregate well defined?

Let's assume I run the following in SQLite:
CREATE TABLE my_table
(
id INTEGER PRIMARY KEY,
NAME VARCHAR(20),
date DATE,
num INTEGER,
important VARCHAR(20)
);
INSERT INTO my_table (NAME, date, num, important)
VALUES ('A', '2000-01-01', 10, 'Important 1');
INSERT INTO my_table (NAME, date, num, important)
VALUES ('A', '2000-02-01', 20, 'Important 2');
INSERT INTO my_table (NAME, date, num, important)
VALUES ('A', '1999-12-01', 30, 'Important 3');
The table looks like this:
id
NAME
date
num
important
1
A
2000-01-01
10
Important 1
2
A
2000-02-01
20
Important 2
3
A
1999-12-01
30
Important 3
If I execute:
SELECT id
FROM my_table
GROUP BY NAME;
the results are:
+----+
| id |
+----+
| 1 |
+----+
If I execute:
SELECT id, MAX(date)
FROM my_table
GROUP BY NAME;
The results are:
+----+------------+
| id | max(date) |
+----+------------+
| 2 | 2000-02-01 |
+----+------------+
And if I execute:
SELECT id,
MAX(date),
MAX(num)
FROM my_table
GROUP BY NAME;
The results are:
+----+------------+----------+
| id | max(date) | max(num) |
+----+------------+----------+
| 3 | 2000-02-01 | 30 |
+----+------------+----------+
My question is, is this well defined? Specifically, am I guaranteed to always get id = 2 when doing the second query (with the single Max(date) aggregate), or is this just a side effect of how SQLite is likely ordering the table to grab the Max before grouping?
I ask this because I specifically do want id = 2. I will then execute another query that selects the important field for that row (for my actual problem the first query would return multiple ids and I'd select all important fields for all those rows at once.
Additionally, this is all happening in an iOS Core Data query, so I'm not able to do more complicated subqueries. If I knew that the ordering of a GROUP BY is defined by an aggregate then I'd feel pretty confident my queries wouldn't break (until Apple moves away from SQLite for Core Data).
Thanks!

From the Sqlite manual
2.5. Bare columns in an aggregate query
The usual case is that all column names in an aggregate query are either arguments to aggregate functions or else appear in the GROUP BY clause. A result column which contains a column name that is not within an aggregate function and that does not appear in the GROUP BY clause (if one exists) is called a "bare" column. Example:
SELECT a, b, sum(c) FROM tab1 GROUP BY a;
In the query above, the "a" column is part of the GROUP BY clause and so each row of the output contains one of the distinct values for "a". The "c" column is contained within the sum() aggregate function and so that output column is the sum of all "c" values in rows that have the same value for "a". But what is the result of the bare column "b"? The answer is that the "b" result will be the value for "b" in one of the input rows that form the aggregate. The problem is that you usually do not know which input row is used to compute "b", and so in many cases the value for "b" is undefined.
Special processing occurs when the aggregate function is either min() or max(). Example:
SELECT a, b, max(c) FROM tab1 GROUP BY a;
When the min() or max() aggregate functions are used in an aggregate query, all bare columns in the result set take values from the input row which also contains the minimum or maximum. So in the query above, the value of the "b" column in the output will be the value of the "b" column in the input row that has the largest "c" value. There is still an ambiguity if two or more of the input rows have the same minimum or maximum value or if the query contains more than one min() and/or max() aggregate function. Only the built-in min() and max() functions work this way.
If bare columns appear in an aggregate query that lacks a GROUP BY clause, and the number of input rows is zero, then the values of the bare columns are arbitrary. For example, in this query:
SELECT count(*), b FROM tab1;
If the tab1 table contains no rows (of count(*) evaluates to 0) then the bare column "b" will have an arbitrary and meaningless value.
Most other SQL database engines disallow bare columns. If you include a bare column in a query, other database engines will usually raise an error. The ability to include bare columns in a query is an SQLite-specific extension.
https://www.sqlite.org/lang_select.html

am I guaranteed to always get id = 2 when doing the second query (with
the single Max(date) aggregate), or is this just a side effect of how
SQLite is likely ordering the table to grab the Max before grouping?
Yes, the result that you get is guaranteed because it is documented in Bare columns in an aggregate query.
The value for the column id that you get is from the row that contains the max date.

Create a table without knowing its columns in SQL

How can I create a table without knowing in advance how many and what columns it exactly holds?
The idea is that I have a table DATA that has 3 columns : ID, NAME, and VALUE
What I need is a way to get multiple values depending on the value of NAME - I can't do it with simple WHERE or JOIN (because I'll need other values - with other NAME values - later on in my query).
Because of the way this table is constructed I want to PIVOT it in order to transform every distinct value of NAME into a column so it will be easier to get to it in my later search.
What I want now is to somehow save this to a temp table / variable so I can use it later on to join with the result of another query...
So example:
Columns:
CREATE TABLE MainTab
(
id int,
nameMain varchar(max),
notes varchar(max)
);
CREATE TABLE SecondTab
(
id int,
id_mainTab, int,
nameSecond varchar(max),
notes varchar(max)
);
CREATE TABLE DATA
(
id int,
id_second int,
name varchar(max),
value varchar(max)
);
Now some example data from the table DATA:
| id | id_second_int | name | value |
|-------------------------------------------------------|
| 1 | 5550 | number | 111115550 |
| 2 | 6154 | address | 1, First Avenue |
| 3 | 1784 | supervisor | John Smith |
| 4 | 3467 | function | Marketing |
| 5 | 9999 | start_date | 01/01/2000 |
::::
Now imagine that 'name' has A LOT of different values, and in one query I'll need to get a lot of different values depending on the value of 'name'...
That's why I pivot it so that number, address, supervisor, function, start_date, ... become colums.
This I do dynamically because of the amount of possible columns - it would take me a while to write all of them in an 'IN' statement - and I don't want to have to remember to add it manually every time a new 'name' value gets added...
herefore I followed http://sqlhints.com/2014/03/18/dynamic-pivot-in-sql-server/
the thing is know that I want the result of my execute(#query) to get stored in a tempTab / variable. I want to use it later on to join it with mainTab...
It would be nice if I could use #cols (which holds the values of DATA.name) but I can't seem to figure out a way to do this.
ADDITIONALLY:
If I use the not dynamic way (write down all the values manually after 'IN') I still need to create a column called status. Now in this column (so far it's NULL everywhere because that value doesn't exist in my unpivoted table) i want to have 'open' or 'closed', depending on the date (let's say i have start_date and end_date,
CASE end_date
WHEN end_date < GETDATE() THEN pivotTab.status = 'closed'
ELSE pivotTab.status = 'open'
Where can I put this statement? Let's say my main query looks like this:
SELECT * FROM(
(SELECT id_second, name, value, id FROM TABLE_DATA) src
PIVOT (max(value) FOR name IN id, number, address, supervisor, function, start_date, end_date, status) AS pivotTab
JOIN SecondTab ON SecondTab.id = pivotTab.id_second
JOIN MainTab ON MainTab.id = SecondTab.id_mainTab
WHERE pivotTab.status = 'closed';

Well, as far as I can understand - you have some select statement and just need to "dump" its result to some temporary table. In this case you can use select into syntax like:
select .....
into #temp_table
from ....
This will create temporary table according to columns in select statement and populate it with data returned by select datatement.
See MDSN for reference.

Oracle - SQL - Count multiple fields

Using Oracle 10G
Say for example I have a table with three fields in it, I'd like one query which selects the counts of each column where they are not null. Field name
----------------------------------
| strTest1 | strTest2 | strTest3 |
----------------------------------
I know how to get the count of each one individually:
select count(*) from tablename where strTest1 is not null
but I'd like to know if it's possible to do this within one query for all 3 fields.
Thanks

It sounds like you want:
SELECT COUNT(STRTEST1), COUNT(STRTEST2), COUNT(STRTEST3) FROM YOUR_TABLE

I DISTINCTly hate MySQL (help building a query)

This is staight forward I believe:
I have a table with 30,000 rows. When I SELECT DISTINCT 'location' FROM myTable it returns 21,000 rows, about what I'd expect, but it only returns that one column.
What I want is to move those to a new table, but the whole row for each match.
My best guess is something like SELECT * from (SELECT DISTINCT 'location' FROM myTable) or something like that, but it says I have a vague syntax error.
Is there a good way to grab the rest of each DISTINCT row and move it to a new table all in one go?

SELECT * FROM myTable GROUP BY `location`
or if you want to move to another table
CREATE TABLE foo AS SELECT * FROM myTable GROUP BY `location`

Distinct means for the entire row returned. So you can simply use
SELECT DISTINCT * FROM myTable GROUP BY 'location'
Using Distinct on a single column doesn't make a lot of sense. Let's say I have the following simple set
-id- -location-
1 store
2 store
3 home
if there were some sort of query that returned all columns, but just distinct on location, which row would be returned? 1 or 2? Should it just pick one at random? Because of this, DISTINCT works for all columns in the result set returned.

Well, first you need to decide what you really want returned.
The problem is that, presumably, for some of the location values in your table there are different values in the other columns even when the location value is the same:
Location OtherCol StillOtherCol
Place1 1 Fred
Place1 89 Fred
Place1 1 Joe
In that case, which of the three rows do you want to select? When you talk about a DISTINCT Location, you're condensing those three rows of different data into a single row, there's no meaning to moving the original rows from the original table into a new table since those original rows no longer exist in your DISTINCT result set. (If all the other columns are always the same for a given Location, your problem is easier: Just SELECT DISTINCT * FROM YourTable).
If you don't care which values come from the other columns you can use a (bad, IMHO) MySQL extension to SQL and do:
SELECT * FROM YourTable GROUP BY Location
which will give a result set with one row per location and values for the other columns derived from the original data in an undefined fashion.

Multiple rows with identical values in all columns don't have any sense. OK - the question might be a way to correct exactly that situation.
Considering this table, with id being the PK:
kram=# select * from foba;
id | no | name
----+----+---------------
2 | 1 | a
3 | 1 | b
4 | 2 | c
5 | 2 | a,b,c,d,e,f,g
you may extract a sample for every single no (:=location) by grouping over that column, and selecting the row with minimum PK (for example):
SELECT * FROM foba WHERE id IN (SELECT min (id) FROM foba GROUP BY no);
id | no | name
----+----+------
2 | 1 | a
4 | 2 | c

How to select 10 rows below the result returned by the SQL query?

Here is the SQL table:
KEY | NAME | VALUE
---------------------
13b | Jeffrey | 23.5
F48 | Jonas | 18.2
2G8 | Debby | 21.1
Now, if I type:
SELECT *
FROM table
WHERE VALUE = 23.5
I will get the first row.
What I need to accomplish is to get the first and the next two rows below. Is there a way to do it?
Columns are not sorted and WHERE condition doesn't participate in the selection of the rows, except for the first one. I just need the two additional rows below the returned one - the ones that were entered after the one which has been returned by the SELECT query.

Without a date column or an auto-increment column, you can't reliably determine the order the records were entered.
The physical order with which rows are stored in the table is non-deterministic.

You need to define an order to the results to do this. There is no guaranteed order to the data otherwise.
If by "the next 2 rows after" you mean "the next 2 records that were inserted into the table AFTER that particular row", you will need to use an auto incrementing field or a "date create" timestamp field to do this.

If each row has an ID column that is unique and auto incrementing, you could do something like:
SELECT * FROM table WHERE id > (SELECT id FROM table WHERE value = 23.5)

If I understand correctly, you're looking for something like:
SELECT * FROM table WHERE value <> 23.5

You can obviously write a program to do that but i am assuming you want a query. What about using a Union. You would also have to create a new column called value_id or something in those lines which is incremented sequentially (probably use a sequence). The idea is that value_id will be incremented for every insert and using that you can write a where clause to return the remaining two values you want.
For example:
Select * from table where value = 23.5
Union
Select * from table where value_id > 2 limit 2;
Limit 2 because you already got the first value in the first query

You need an order if you want to be able to think in terms of "before" and "after".
Assuming you have one you can use ROW_NUMBER() (see more here http://msdn.microsoft.com/en-us/library/ms186734.aspx) and do something like:
With MyTable
(select row_number() over (order by key) as n, key, name, value
from table)
select key, name, value
from MyTable
where n >= (select n from MyTable where value = 23.5)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How can I remove duplicate rows from a table but keeping the summation of values of a column - sql

You can try to use SUM and group by SELECT item,SUM(value) value FROM T GROUP BY item SQLfiddle:http://sqlfiddle.com/#!18/fac26/1 [Results]: | item | value | |------|-------| | A | 4 | | B | 2 |

Related

Is the ordering of a GROUP BY with a MAX aggregate well defined?

Create a table without knowing its columns in SQL

Oracle - SQL - Count multiple fields

I DISTINCTly hate MySQL (help building a query)

How to select 10 rows below the result returned by the SQL query?

Categories

Resources