How to group and sum a result of a query sums - sql

The title is slightly confusing but I am not sure how to phrase the question. Here are some more details:
I have written a query that gives me results such as these:
This is the correct result and I do not want to change this query. What I want to accomplish is to only have two results with the same name, keeping the earliest date and grouping the rest.
I have tried:
select name, SUM(sum) as sum, edate
FROM
(--MY QUERY RESULT--)
group by name, date having edate > '15-Apr-02';
I have also tried using case based on if the record edate is null but I want this to always return two records per name (1 with the earliest edate and group all the others to the next earliest edate)
The result I am looking for with the super simple query above:
Any help/ideas would be fantastic!

You can try something like this:
WITH CTETab (NAME, EDATE)
AS
(
--SELECT The earliest EDATE for each NAME
SELECT
NAME
, MIN(EDATE) AS EDATE
FROM
#Tab
GROUP BY NAME
)
SELECT
T.NAME, [COUNT], T.EDATE
FROM
#Tab AS T
INNER JOIN
CTETab AS CT
ON T.NAME = CT.NAME
AND T.EDATE = CT.EDATE
UNION ALL
SELECT
T.NAME, SUM([COUNT]), MAX(T.EDATE)
FROM
#Tab AS T
INNER JOIN
CTETab AS CT
ON T.NAME = CT.NAME
AND (T.EDATE <> CT.EDATE OR T.EDATE IS NULL)
GROUP BY T.NAME
ORDER BY T.NAME, T.EDATE
It works in SQL Server 2012 (2014, 2016).
You can change WITH clause by Subquery and use it for each SELECT statement.

Related

SQL Query to exclude duplicates

I am having problems writing some code for my workplace to eliminate duplicate records that appear in a query.
The current query outputs:
ID
Name
RelationID
RelationName
RelationDescription
Year
ModifiedDate
ModifiedBy
The problem I am having is that the ModifiedDate(datetime) column sometimes has multiple modifications on the same day, therefore you get a duplicate record displayed when you execute the query.
I have tried using SELECT DISTINCT, GROUP BY, WHERE statements to filter down year etc. ModifiedDate = convert(varchar(10), ModifiedDate, 102) to break the modifiedDate and time into separate columns (cannot filter by this as some modifications have been made on the same day at similar times) and have tried different methods to try and filter the column RelationID so it only displays one record but none of it has worked.
I am wondering if anyone could please help me to filter the column RelationID to only display the latest modified? I have trawled the Internet for days but I just can't get it to work.
Query currently looks like this:
My original query:
SELECT DISTINCT
ID, Name, RelationID, RelationName, RelationDescription, Year, ModifiedDate, ModifiedBy
FROM table1, table2
WHERE Year = YEAR(GETDATE()) AND ModifiedDate IS NOT NULL
OUTPUT:
123, Dave, 321, Sarah, 2018, 2015-12-01 09:47:36.347
123, Dave, 321, Sarah, 2018, 2015-12-01 09:47:36.347
Table 1 and Table 2 are inner joined by RelationID.
ModifiedDate and ModifiedBy are on Table 1.
Thank you for your patience - please let me know if you need more info.
You can use CTE and Row Number and filter the data by the highest modified time
See the below query
;with CTE
AS
(SELECT ID, Name, RelationID, RelationName, RelationDescription, [Year], ModifiedDate, ModifiedBy
,ROW_NUMBER() OVER (Partition by RelationID ORDER BY ModifiedDate DESC) RN
FROM Table1
INNER JOIN Table2 ON Table1.RelationID = Table2.RelationID
)
Select * from CTE
where RN = 1
Try this , using CTE
WITH TEMP AS
(
SELECT RELATIONID,MAX(MODIFIEDDATE) MDATE FROM TABLENAME
GROUP BY RELATIONID
)
SELECT A.* FROM TEMP T
INNER JOIN TABLENAME A
ON A.RELATIONID=T.RELATIONID AND A.MODIFIEDDATE=T.MDATE
The query below displays only the latest modified per RelationID. An OUTER APPLY is basically an OUTER JOIN to which you can apply other filters, like in this case a TOP 1 to only get the last modified record.
SELECT
[ID]
,[Name]
,[RelationID]
,[RelationName]
,[RelationDescription]
,[Year]
,[ModifiedDate]
,[ModifiedBy]
FROM
[table2]
OUTER APPLY (
SELECT
TOP 1
[ModifiedDate]
,[ModifiedBy]
FROM
[table1]
WHERE
[table2].[RelationID] = [table1].[RelationID]
ORDER BY
[ModifiedDate] DESC
) [table1]
You could try this kind of construct:
SELECT ID, Name, RelationID, RelationName, RelationDescription, Year, ModifiedDate, ModifiedBy
FROM Table1
INNER JOIN Table2 ON Table1.RelationID = Table2.RelationID
WHERE ModifiedDate = (
SELECT MAX(ModifiedDate) FROM Table1 AS TableX
WHERE Table1.ID = TableX.ID -- and other columns as necessary
)

Use column defined in a subquery

Sorry if the title is not clear, I'm a beginner and I didn't know exactly how to formule it...
I have this query working with Oracle :
SELECT
( SELECT COUNT(*)
FROM CATEGORY
) AS NBCATEGORIES,
( SELECT ROUND(AVG(FINANCIALOPERATIONBYPERSON),2)
FROM
(
SELECT SUM(AMOUNT) AS FINANCIALOPERATIONBYPERSON
FROM FINANCIALOPERATION
WHERE PERSONID IS NOT NULL
GROUP BY PERSONID
)
) AS AVERAGELOADAMOUNTBYPERSON
FROM DUAL
I'm looking for the equivalent for Sql Server...
The goal is to have multiple queries in a single query.
So I removed the "FROM DUAL" but I get an error on "FINANCIALOPERATIONBYPERSON" (Invalid column name), certainly because it's defined in the subquery...
How can I modify the query for SQL-Server ?
SQL Server requires aliases for subqueries. So, you can rewrite this as:
SELECT (SELECT COUNT(*)
FROM CATEGORY
) AS NBCATEGORIES,
(SELECT ROUND(AVG(FINANCIALOPERATIONBYPERSON),2)
FROM (SELECT SUM(AMOUNT) AS FINANCIALOPERATIONBYPERSON
FROM FINANCIALOPERATION
WHERE PERSONID IS NOT NULL
GROUP BY PERSONID
) t
) AS AVERAGELOADAMOUNTBYPERSON;
In both databases, though, I would be inclined to write this as:
SELECT c.NBCATEGORIES, ROUND(fo.AVERAGELOADAMOUNTBYPERSON, 2) AS AVERAGELOADAMOUNTBYPERSON
FROM (SELECT COUNT(*) as NBCATEGORIES
FROM CATEGORY c
) c CROSS JOIN
(SELECT SUM(AMOUNT) / COUNT(DISTINCT PERSONID) AS AVERAGELOADAMOUNTBYPERSON
FROM FINANCIALOPERATION fo
WHERE PERSONID IS NOT NULL
) fo;
One note for both these forms: SQL Server does integer arithmetic on integers. So, if AMOUNT is an integer, then you should convert it to an appropriate floating or fixed point numeric type.
You need to add a table alias for the subquery.
SELECT
( SELECT COUNT(*)
FROM CATEGORY
) AS NBCATEGORIES,
( SELECT ROUND(AVG(RESULTS.FINANCIALOPERATIONBYPERSON),2)
FROM
(
SELECT SUM(AMOUNT) AS FINANCIALOPERATIONBYPERSON
FROM FINANCIALOPERATION
WHERE PERSONID IS NOT NULL
GROUP BY PERSONID
) RESULTS
) AS AVERAGELOADAMOUNTBYPERSON

SQL Query select optimization

i'm using ms sqlserver 2005.
i have a query that need to filter according to date.
lets say i have a table containing phone numbers and dates.
i need to provide a count number of phone numbers in a time frame (begin date and end date).
this phone numbers shouldn't be in the result count if they appear in the past.
i'm doing something like this :
select (phoneNumber) from someTbl
where phoneNumber not in (select phoneNumber from someTbl where date<#startDate)
This is looking not efficient at all to me (and it is taking too much time to preform resulting with some side effects that maybe should be presented in a different question)
i have about 300K rows in someTbl that should be checked.
after i'm doing this check i need to check one more thing.
i have a past database that contains yet another 30K of phone numbers.
so i'm adding
and phoneNumber not in (select pastPhoneNumber from somePastTbl)
and that really nail the coffin or the last straw that break the camel or what ever phrase you are using to explain fatal state.
So i'm looking for a better way to preform this 2 actions.
UPDATE
i have choose to go with Alexander's solution and ended up with this kind of query :
SELECT t.number
FROM tbl t
WHERE t.Date > #startDate
--this is a filter for different customers
AND t.userId in (
SELECT UserId
FROM Customer INNER JOIN UserToCustomer ON Customer.customerId = UserToCustomer.CustomerId
Where customerName = #customer
)
--this is the filter for past number
AND NOT EXISTS (
SELECT 1
FROM pastTbl t2
WHERE t2.Numbers = t.number
)
-- this is the filter for checking if the number appeared in the table before startdate
AND NOT EXISTS (
SELECT *
FROM tbl t3
WHERE t3.Date<#startDate and t.number=t3.number
)
Thanks Gilad
Since its a not in just switch the less than to a greater than.
select phoneNumber from someTbl where date > #startDate
Next to filter out somePastTbl
select s1.phoneNumber from someTbl s1
LEFT JOIN somePastTbl s2 on s1.phoneNumber = s2.phonenumber
where s1.date > #startDate and s2 IS NULL
UPDATE
As Per comment:
Less than month of start date
SELECT COUNT(s1.phoneNumber) FROM someTbl s1
LEFT JOIN somePastTbl s2 on s1.phoneNumber = s2.phonenumber
where DATEADD(MONTH,-1,#startDate) < s1.date AND s1.date < #startDate and s2 IS NULL
One more option
SELECT t.phoneNumber
FROM SomeTbl t
WHERE t.date > #startDate
AND NOT EXISTS (
SELECT 1
FROM SomePastTbl t2
WHERE t2.phoneNumber = t.phoneNumber
)
one simple index
CREATE NONCLUSTERED INDEX IX_SomeTbl_date_phoneNumber
ON SomeTbl
(
date ASC,
phoneNumber ASC
)
then
SELECT phoneNumber FROM SomeTbl WHERE date > #startDate
EXCEPT
SELECT phoneNumber FROM SomePastTbl;
You want phone numbers whose minimum start date is greater than your start date. This suggests aggregation at the phone number level before doing the count (or creating the list).
Here is one way, with the condition in the having clause:
select COUNT(*)
from (select t.phonenumber,
from someTble t left outer join
somePastTble pt
on t.phonenumber = pt.phonenumber
where pt.phonenumber is null
having MIN(date) >= #startdate
) t
You can also write this using window functions (SQL 2005 or greater). Here is a version using min():
select COUNT(distinct t.phonenumber)
from (select t.*, t.phonenumber, MIN(date) over (partition by phonenumber) as mindate
from someTble t
) t left outer join
somePastTble pt
on t.phonenumber = pt.phonenumber
where pt.phonenumber is null and mindate >= #startdate

Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause

I'm trying to select the latest date and group by name and keep other columns.
For example:
name status date
-----------------------
a l 13/19/04
a n 13/09/05
a dd 13/18/03
b l 13/01/01
b dd 13/01/02
b n 13/01/03
and I want the result like:
name status date
-----------------
a n 13/09/05
b n 13/01/03
Here's my code
SELECT
Name,
MAX(DATE) as Date,
Status
FROM
[ST].[dbo].[PS_RC_STATUS_TBL]
GROUP BY
Name
I know that I should put max(status) because There are a lot of possibilities in each case, and nothing in the query makes it clear which value to choose for status in each group. Is there anyway to use inner join ?
It's not clear to me you want the max or min status. Rather it seems to me you want the name and status as of a date certain. That is, you want the rows with the lastest date for each name. So ask for that:
select * from PS_RC_STATUS_TBL as T
where exists (
select 1 from PS_RC_STATUS_TBL
where name = T.name
group by name
having max(date) = T.date
)
Another way to think about it is
select T.*
from PS_RC_STATUS_TBL as T
join (
select name, max(date) as date
from PS_RC_STATUS_TBL
group by name
) as D
on T.name = D.name
and T.date = D.date
SQL Server needs to know what to do with the rows that you are not grouping on (it has multiple rows to show on 1 line - so how?). If you have aggregated on them (MIN, MAX, AVG, etc) then you are telling it what to do with these rows. If not it will not know what to do - and will give you an error like the one you are getting.
From what you are saying though - it sounds like you do not want to group by the status. It sounds like you are not interested in that column at all. Let me know If that assumption is wrong.
SELECT
Name,
MAX(Date) AS 'Date',
FROM
PS_RC_STATUS_TBL
GROUP BY
Name
If you really do want the status, but don't want to group on it - try this:
SELECT
MyTable1.Name,
MyTable2.Status,
MyTable1.Date
FROM
(SELECT Name, MAX(Date) AS 'Date' FROM PS_RC_STATUS_TBL GROUP BY Name) MyTable1
INNER JOIN
(SELECT Name, Date, Status FROM PS_RC_STATUS_TBL) MyTable2
ON MyTable1.Name = MyTable2.Name
AND MyTable1.Date = MyTable2.Date
That gives the exact results you've asked for - so does the method below using a CTE.
OR
WITH cte AS (
SELECT Name, MAX(Date) AS Date
FROM PS_RC_STATUS_TBL
GROUP BY Name)
SELECT cte.Name,
tbl.Status,
cte.Date
FROM cte INNER JOIN
PS_RC_STATUS_TBL tbl ON cte.Name = tbl.Name
AND cte.Date = tbl.Date
SQLFiddle example.
It just means that you need to put all non-aggregated columns in the GROUP BY clause, so in the case you need to put the other one
Select Name ,
MAX(DATE) as Date ,
Status
FROM [ST].[dbo].[PS_RC_STATUS_TBL] PS
Group by Name, Status
This is a common problem with text fields in SQL aggregation scenarios. Using either MAX(Status) or MIN(Status) in your field list is a solution, usually MAX(Status) because of the lexical ordering:
"" < " " < "a"
In cases where you really need a more detailed ordering:
Join to a StatusOrder relation (*Status, OrderSequence) in your main query;
select Max(OrderSequence) in your aggregated query; and
Join back to your StatusOrder relation on OrderSequence to select the correct Status value for display.
Whatever fields you're selecting other than aggregation function, need to mention in group by clause.
SELECT
gf.app_id,
ma.name as name,
count(ma.name) as count
FROM [dbo].[geo_fen_notification_table] as gf
inner join dbo.mobile_applications as ma on gf.app_id = ma.id
GROUP BY app_id,name
Here im accessing app_id and name in select, so i need to mention that after group by clause. otherwise it will throw error.

Get list of records with multiple entries on the same date

I need to return a list of record id's from a table that may/may not have multiple entries with that record id on the same date. The same date criteria is key - if a record has three entries on 09/10/2008, then I need all three returned. If the record only has one entry on 09/12/2008, then I don't need it.
SELECT id, datefield, count(*) FROM tablename GROUP BY datefield
HAVING count(*) > 1
Since you mentioned needing all three records, I am assuming you want the data as well. If you just need the id's, you can just use the group by query. To return the data, just join to that as a subquery
select * from table
inner join (
select id, date
from table
group by id, date
having count(*) > 1) grouped
on table.id = grouped.id and table.date = grouped.date
The top post (Leigh Caldwell) will not return duplicate records and needs to be down modded. It will identify the duplicate keys. Furthermore, it will not work if your database doesn't allows the group by to not include all select fields (many do not).
If your date field includes a time stamp then you'll need to truncate that out using one of the methods documented above ( I prefer: dateadd(dd,0, datediff(dd,0,#DateTime)) ).
I think Scott Nichols gave the correct answer and here's a script to prove it:
declare #duplicates table (
id int,
datestamp datetime,
ipsum varchar(200))
insert into #duplicates (id,datestamp,ipsum) values (1,'9/12/2008','ipsum primis in faucibus')
insert into #duplicates (id,datestamp,ipsum) values (1,'9/12/2008','Vivamus consectetuer. ')
insert into #duplicates (id,datestamp,ipsum) values (2,'9/12/2008','condimentum posuere, quam.')
insert into #duplicates (id,datestamp,ipsum) values (2,'9/13/2008','Donec eu sapien vel dui')
insert into #duplicates (id,datestamp,ipsum) values (3,'9/12/2008','In velit nulla, faucibus sed')
select a.* from #duplicates a
inner join (select id,datestamp, count(1) as number
from #duplicates
group by id,datestamp
having count(1) > 1) b
on (a.id = b.id and a.datestamp = b.datestamp)
SELECT RecordID
FROM aTable
WHERE SameDate IN
(SELECT SameDate
FROM aTable
GROUP BY SameDate
HAVING COUNT(SameDate) > 1)
If I understand your question correctly you could do something similar to:
select
recordID
from
tablewithrecords as a
left join (
select
count(recordID) as recordcount
from
tblwithrecords
where
recorddate='9/10/08'
) as b on a.recordID=b.recordID
where
b.recordcount>1
http://www.sql-server-performance.com/articles/dba/delete_duplicates_p1.aspx will get you going. Also, http://en.allexperts.com/q/MS-SQL-1450/2008/8/SQL-query-fetch-duplicate.htm
I found these by searching Google for 'sql duplicate data'. You'll see this isn't an unusual problem.
SELECT * FROM the_table WHERE ROW(record_id,date) IN
( SELECT record_id, date FROM the_table
GROUP BY record_id, date WHERE COUNT(*) > 1 )
For matching on just the date part of a Datetime:
select * from Table
where id in (
select alias1.id from Table alias1, Table alias2
where alias1.id != alias2.id
and datediff(day, alias1.date, alias2.date) = 0
)
I think. This is based on my assumption that you need them on the same day month and year, but not the same time of day, so I did not use a Group by clause. From the other posts it looks like I could have more cleverly used a Having clause. Can you use a having or group by on a datediff expression?
I'm not sure I understood your question, but maybe you want something like this:
SELECT id, COUNT(*) AS same_date FROM foo GROUP BY id, date HAVING same_date = 3;
This is just written from my mind and not tested in any way. Read the GROUP BY and HAVING section here. If this is not what you meant, please ignore this answer.
Note that there's some extra processing necessary if you're using a SQL DateTime field. If you've got that extra time data in there, then you can't just use that column as-is. You've got to normalize the DateTime to a single value for all records contained within the day.
In SQL Server here's a little trick to do that:
SELECT CAST(FLOOR(CAST(CURRENT_TIMESTAMP AS float)) AS DATETIME)
You cast the DateTime into a float, which represents the Date as the integer portion and the Time as the fraction of a day that's passed. Chop off that decimal portion, then cast that back to a DateTime, and you've got midnight at the beginning of that day.
Without knowing the exact structure of your tables or what type of database you're using it's hard to answer. However if you're using MS SQL and if you have a true date/time field that has different times that the records were entered on the same date then something like this should work:
select record_id,
convert(varchar, date_created, 101) as log date,
count(distinct date_created) as num_of_entries
from record_log_table
group by convert(varchar, date_created, 101), record_id
having count(distinct date_created) > 1
Hope this helps.
SELECT id, count(*)
INTO #tmp
FROM tablename
WHERE date = #date
GROUP BY id
HAVING count(*) > 1
SELECT *
FROM tablename t
WHERE EXISTS (SELECT 1 FROM #tmp WHERE id = t.id)
DROP TABLE tablename
TrickyNixon writes;
The top post (Leigh Caldwell) will not return duplicate records and needs to be down modded.
Yet the question doesn't ask about duplicate records. It asks about duplicate record-ids on the same date...
GROUP-BY,HAVING seems good to me. I've used it in production before.
.
Something to watch out for:
SELECT ... FROM ... GROUP BY ... HAVING count(*)>1
Will, on most database systems, run in O(NlogN) time. It's a good solution. (Select is O(N), sort is O(NlogN), group by is O(N), having is O(N) -- Worse case. Best case, date is indexed and the sort operation is more efficient.)
.
Select ... from ..., .... where a.data = b.date
Granted only idiots do a Cartesian join. But you're looking at O(N^2) time. For some databases, this also creates a "temporary" table. It's all insignificant when your table has only 10 rows. But it's gonna hurt when that table grows!
Ob link: http://en.wikipedia.org/wiki/Join_(SQL)
select id from tbl where date in
(select date from tbl group by date having count(*)>1)
GROUP BY with HAVING is your friend:
select id, count(*) from records group by date having count(*) > 1