Grouping in SQL Statement - sql

I have the following SQL statement:
SELECT TOP 30
a.ClassAdID, -- 0
a.AdTitle, -- 1
a.ClassAdCatID, -- 2
b.ClassAdCat, -- 3
a.Img1, -- 4
e.Domain, -- 5
a.AdText, -- 6
a.RegionID, -- 7
a.IsEvent, -- 8
a.IsCoupon, -- 9
b.ParentID, -- 10
a.MemberID, -- 11
a.AdURL, -- 12
a.Location, -- 13
a.GroupID -- 14
FROM ClassAd a
INNER JOIN ClassAdCat b ON b.ClassAdCatID = a.ClassAdCatID
INNER JOIN Member d ON d.MemberID = a.MemberID
INNER JOIN Region e ON e.RegionID = a.RegionID
WHERE DATEDIFF(d, GETDATE(), a.ExpirationDate) >= 0
AND PostType <> 'CPN'
ORDER BY a.CreateDate DESC
I want to only show one from each GROUPID... How can I adjust the statement to achieve this as I am lost with DISTINCT, GROUP BY etc..
Any help would be appreciated.
Many thanks,
Paul

You can use ROW_NUMBER function to partition data set based on GroupId values thus: for every new GroupId values the counter is restarted from 1 and the first row (with ROW_NUMBER = 1) is the newest record (a.CreateDate DESC). Then, we filter all records having ROW_NUMBER = 1 .
SELECT TOP 30 *
FROM
(
SELECT
a.ClassAdID, -- 0
a.AdTitle, -- 1
a.ClassAdCatID, -- 2
b.ClassAdCat, -- 3
a.Img1, -- 4
e.Domain, -- 5
a.AdText, -- 6
a.RegionID, -- 7
a.IsEvent, -- 8
a.IsCoupon, -- 9
b.ParentID, -- 10
a.MemberID, -- 11
a.AdURL, -- 12
a.Location, -- 13
a.GroupID, -- 14
ROW_NUMBER() OVER(PARTITION BY a.GroupId ORDER BY a.CreateDate DESC) AS PseudoId
FROM ClassAd a
INNER JOIN ClassAdCat b ON b.ClassAdCatID = a.ClassAdCatID
INNER JOIN Member d ON d.MemberID = a.MemberID
INNER JOIN Region e ON e.RegionID = a.RegionID
WHERE DATEDIFF(d, GETDATE(), a.ExpirationDate) >= 0
AND PostType <> 'CPN'
) q
WHERE q.PseudoId = 1;

GROUP BY goes with an AGGREGATE function... meaning you want to add up the values in the group, or find the biggest, or smallest in the group etc.
DISTINCT will remove duplicate rows.
in your query, you may be getting a bunch of not-so-similar rows that all happen to have the same group_id... if this is so, then you need to decide which one of those rows you really want to see.
maybe you want the newest one, or the one with the longest name, or something like that.
for grouping, you would pick a column like createdon and say something like MAX( createdon ) in the select list, then group on every other column in the select list to find the rows that match each other (except for created on), and return that only once with the largest value for created on... hope that makes sense.
edit:
very simple example for group id and create date. ( you can keep adding more columns as needed - one in the group by list for every one in the select list :
SELECT groupid, max( createdate )
FROM ClassAd
GROUP BY groupId

If I understand correctly you want to get one row from each group (like groupid)
I used sql server 2005 (Nothwind)
SELECT TOP 30 Customers.CompanyName, Orders.ShipCity, Orders.Freight
FROM Customers INNER JOIN
Orders ON Customers.CustomerID = Orders.CustomerID
GROUP BY Customers.CompanyName, Orders.ShipCity, Orders.Freight

Related

Creating a loop containing a union of two tables one of which does not change

I have a union of two tables and I count the number of rows in the union. I have to do this multiple times, where one of the two tables will stay the same, but the other will change.
To be more specific, take the mock query below as an example, the first table (above UNION), stays unchanged, but the conditional in the 2nd table will change for each iteration of the loop I would like to have. For example, e.YearsOfEmployment > 1 AND e.YearsOfEmployment <= 2 for the next iteration of the loop, and e.YearsOfEmployment > 2 AND e.YearsOfEmployment <= 3 for the one after.
I wonder if a loop for what I want to do is possible. If it is, any advice on how to construct it would be much appreciated. Crucially, I wonder if it is possible to construct the loop in a way such that the first table doesn't have to be queried for every iteration since it will stay the same.
SELECT COUNT(*)
FROM (
SELECT e.EmployeeID
FROM HumanResources.Employee AS e
JOIN Sales.SalesPerson AS s
ON e.FullName = s.FullName
UNION
SELECT e.EmployeeID
FROM HumanResources.Employee AS e
WHERE e.YearsOfEmployment > 0 AND e.YearsOfEmployment <= 1
) AS temp
EDIT: Here is a description of what I am trying to accomplish: I want to look for the number of unique items that fulfill (at least) one of the two criteria. The first criterion doesn't change (as in the first table), and the second criterion is whether an item scores within a range. The entire range of scores is 0 - 100, and I would like to find items between every 5-point increment, e.g., between 0 and 5, between 5 and 10, and between 10 and 15... so the outcome I want to achieve is something like below:
# of items fulfilling criterion 1 or score between 0 and 5
# of items fulfilling criterion 1 or score between 5 and 10
# of items fulfilling criterion 1 or score between 10 and 15
# of items fulfilling criterion 1 or score between 15 and 20
.
.
.
# of items fulfilling criterion 1 or score between 95 and 100
This query assigns the years into five year buckets - run it and see
SELECT e.EmployeeID, (e.YearsOfEmployment -1) / 5 as yeargroup
FROM HumanResources.Employee AS e
This query counts how many are in each bucket:
SELECT (e.YearsOfEmployment -1) / 5 as yeargroup, COUNT(*)
FROM HumanResources.Employee AS e
GROUP BY (e.YearsOfEmployment -1) / 5
and this query sticks it all together:
SELECT yeargroup, COUNT(*) EmployeeCount
FROM (
-- first get the unique list of employees - they might be in both tables
-- so we need UNION not UNION ALL
SELECT e.employeeid,
(e.YearsOfEmployment -1) / 5 as yeargroup
FROM HumanResources.Employee AS e
JOIN Sales.SalesPerson AS s
ON e.FullName = s.FullName
UNION
SELECT e.employeeid,
(e.YearsOfEmployment -1) / 5 as yeargroup
FROM HumanResources.Employee AS e
) AS temp
GROUP BY yeargroup
A classic way to "create" a loop is SQL is to use a table of numbers.
It is simply a table with numbers from 1 to some large enough value.
If you need to loop, say, 20 times, select 20 rows from a table of numbers.
In the query below I generate a table of 1000 numbers on the fly and pick 20 rows from it (CTE_Numbers). In production I use a permanent table with 100K rows.
For each row in CTE_Numbers I run your query using CROSS APPLY, which allows to pass the current row's number into the logic of the query.
WITH
e1 AS
(
SELECT N
FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS T(N)
) -- 10
,e2 AS
(
SELECT 1 AS N FROM e1 AS a CROSS JOIN e1 AS b
) -- 10*10
,e3 AS
(
SELECT 1 AS N FROM e1 CROSS JOIN e2
) -- 10*100
,CTE_Numbers
AS
(
SELECT TOP(20)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS Number
FROM e3
ORDER BY Number
)
SELECT
CTE_Numbers.Number
,CA.cc
FROM
CTE_Numbers
CROSS APPLY
(
SELECT COUNT(*) AS cc
FROM
(
SELECT e.EmployeeID
FROM
HumanResources.Employee AS e
INNER JOIN Sales.SalesPerson AS s ON e.FullName = s.FullName
UNION
SELECT e.EmployeeID
FROM
HumanResources.Employee AS e
WHERE
e.YearsOfEmployment > CTE_Numbers.Number - 1
AND e.YearsOfEmployment <= CTE_Numbers.Number
) AS Temp
) AS CA
ORDER BY Number;

why sqlite returns row counts of while Join

sqlite returns row count while executing this query
SELECT sum(s.Card_10) ,sum(p.Card_10) FROM Sales_Table s , Purchase_Table p
ANSWER
is sum(s.Card_10) 4 and sum(card_10) is 40
but if any execute these queries separately it reruns correct answer
Select sum(Card_10) from sales_table
Answer
1
and
Select sum(Card_10) from Purchase_table
Answer
40
why the error happen in such type of JOINS ?
In the query
SELECT sum(s.Card_10) ,sum(p.Card_10) FROM Sales_Table s , Purchase_Table p
a cross join of sales_table and purchase_table would be performed. So if sales_table has 1 row with card_10 column value of 1 and purchase_table has 4 rows with different values of card_10 that sum up to 40.
So the cross join (with some dummy data) would look like
s.card_10 p.card_10
1 5
1 10
1 8
1 17
Hence you get the incorrect result.
One way of getting the correct counts in a single query is to use union.
select sum(Card_10) from sales_table
union all
select sum(Card_10) from Purachase_table
or
select max(fromsalestable) as fromsalestable, max(frompurchasetable) as frompurchasetable
from
(
select sum(Card_10) as fromsalestable, null as frompurchasetable from sales_table
union all
select null, sum(Card_10) from Purachase_table
) t

Select rows in one table, adding column where MAX(Date) of rows in other, related table

I have a table containing a set of tasks to perform:
Task
ID Name
1 Washing Up
2 Hoovering
3 Dusting
The user can add one or more Notes to a Note table. Each note is associated with a task:
Note
ID ID_Task Completed(%) Date
11 1 25 05/07/2013 14:00
12 1 50 05/07/2013 14:30
13 1 75 05/07/2013 15:00
14 3 20 05/07/2013 16:00
15 3 60 05/07/2013 17:30
I want a query that will select the Task ID, Name and it's % complete, which should be zero if there aren't any notes for it. The query should return:
ID Name Completed (%)
1 Washing Up 75
2 Hoovering 0
3 Dusting 60
I've really been struggling with the query for this, which I've read is a "greatest n per group" type problem, of which there are many examples on SO, none of which I can apply to my case (or at least fully understand). My intuition was to start by finding the MAX(Date) for each task in the note table:
SELECT ID_Task,
MAX(Date) AS Date
FROM
Note
GROUP BY
ID_Task
Annoyingly, I can't just add "Complete %" to the above query unless it's contained in a GROUP clause. Argh! I'm not sure how to jump through this hoop in order to somehow get the task table rows with the column appended to it. Here is my pathetic attempt, which fails as it only returns tasks with notes and then duplicates task records at that (one for each note, so it's a complete fail).
SELECT Task.ID,
Task.Name,
Note.Complete
FROM
Task
JOIN
(SELECT ID_Task,
MAX(Date) AS Date
FROM
Note
GROUP BY
ID_Task) AS InnerNote
ON
Task.ID = InnerNote.ID_Task
JOIN
Note
ON
Task.ID = Note.ID_Task
Can anyone help me please?
If we assume that tasks only become more complete, you can do this with a left outer join and aggregation:
select t.ID, t.Name, coalesce(max(n.complete), 0)
from tasks t left outer join
notes n
on t.id = n.id_task
group by t.id, t.name
If tasks can become "less complete" then you want the one with the last date. For this, you can use row_number():
select t.ID, t.Name, coalesce(n.complete, 0)
from tasks t left outer join
(select n.*, row_number() over (partition by id_task order by date desc) as seqnum
from notes n
) n
on t.id = n.id_task and n.seqnum = 1;
In this case, you don't need a group by, because the seqnum = 1 performs the same role.
How about this just get the max of completed and group by taskid
SELECT t.ID_Task as ID,n.`name`,MAX(t.completed) AS completed
FROM `task` t RIGHT JOIN `note` n on ( t.ID_Task=n.ID )
GROUP BY t. ID_Task
OR
SELECT t.ID_Task as ID,n.`name`,
(CASE when MAX(t.completed) IS NULL THEN '0' ELSE MAX(t.completed))AS completed
FROM `task` t RIGHT JOIN `note` n on ( t.ID_Task=n.ID )
GROUP BY t. ID_Task
select a.ID,
a.Name,
isnull((select completed
from Note
where ID_Task = b.ID_Task
and Date = b.date),0)
from Task a
LEFT OUTER JOIN (select ID_Task,
max(date) date
from Note
group by ID_Task) b
ON a.ID = b.ID_Task;
See DEMO here

Consolidate records

I want to consolidate a set of records
(id) / (referencedid)
1 10
1 11
2 11
2 10
3 10
3 11
3 12
The result of query should be
1 10
1 11
3 10
3 11
3 12
So, since id=1 and id=2 has same set of corresponding referenceids {10,11} they would be consolidated. But id=3 s corresponding referenceids are not the same, hence wouldnt be consolidated.
What would be good way to get this done?
Select id, referenceid
From MyTable
Where Id In (
Select Min( Z.Id ) As Id
From (
Select Z1.id, Group_Concat( Z1.referenceid ) As signature
From (
Select id, referenceid
From MyTable
Order By id, referenceid
) As Z1
Group By Z1.id
) As Z
Group By Z.Signature
)
-- generate count of elements for each distinct id
with Counts as (
select
id,
count(1) as ReferenceCount
from
tblReferences R
group by
R.id
)
-- generate every pairing of two different id's, along with
-- their counts, and how many are equivalent between the two
,Pairings as (
select
R1.id as id1
,R2.id as id2
,C1.ReferenceCount as count1
,C2.ReferenceCount as count2
,sum(case when R1.referenceid = R2.referenceid then 1 else 0 end) as samecount
from
tblReferences R1 join Counts C1 on R1.id = C1.id
cross join
tblReferences R2 join Counts C2 on R2.id = C2.id
where
R1.id < R2.id
group by
R1.id, C1.ReferenceCount, R2.id, C2.ReferenceCount
)
-- generate the list of ids that are safe to remove by picking
-- out any id's that have the same number of matches, and same
-- size of list, which means their reference lists are identical.
-- since id2 > id, we can safely remove id2 as a copy of id, and
-- the smallest id of which all id2 > id are copies will be left
,RemovableIds as (
select
distinct id2 as id
from
Pairings P
where
P.count1 = P.count2 and P.count1 = P.samecount
)
-- validate the results by just selecting to see which id's
-- will be removed. can also include id in the query above
-- to see which id was identified as the copy
select id from RemovableIds R
-- comment out `select` above and uncomment `delete` below to
-- remove the records after verifying they are correct!
--delete from tblReferences where id in (select id from RemovableIds) R

How to find "holes" in a table

I recently inherited a database on which one of the tables has the primary key composed of encoded values (Part1*1000 + Part2).
I normalized that column, but I cannot change the old values.
So now I have
select ID from table order by ID
ID
100001
100002
101001
...
I want to find the "holes" in the table (more precisely, the first "hole" after 100000) for new rows.
I'm using the following select, but is there a better way to do that?
select /* top 1 */ ID+1 as newID from table
where ID > 100000 and
ID + 1 not in (select ID from table)
order by ID
newID
100003
101029
...
The database is Microsoft SQL Server 2000. I'm ok with using SQL extensions.
select ID +1 From Table t1
where not exists (select * from Table t2 where t1.id +1 = t2.id);
not sure if this version would be faster than the one you mentioned originally.
SELECT (ID+1) FROM table AS t1
LEFT JOIN table as t2
ON t1.ID+1 = t2.ID
WHERE t2.ID IS NULL
This solution should give you the first and last ID values of the "holes" you are seeking. I use this in Firebird 1.5 on a table of 500K records, and although it does take a little while, it gives me what I want.
SELECT l.id + 1 start_id, MIN(fr.id) - 1 stop_id
FROM (table l
LEFT JOIN table r
ON l.id = r.id - 1)
LEFT JOIN table fr
ON l.id < fr.id
WHERE r.id IS NULL AND fr.id IS NOT NULL
GROUP BY l.id, r.id
For example, if your data looks like this:
ID
1001
1002
1005
1006
1007
1009
1011
You would receive this:
start_id stop_id
1003 1004
1008 1008
1010 1010
I wish I could take full credit for this solution, but I found it at Xaprb.
from How do I find a "gap" in running counter with SQL?
select
MIN(ID)
from (
select
100001 ID
union all
select
[YourIdColumn]+1
from
[YourTable]
where
--Filter the rest of your key--
) foo
left join
[YourTable]
on [YourIdColumn]=ID
and --Filter the rest of your key--
where
[YourIdColumn] is null
The best way is building a temp table with all IDs
Than make a left join.
declare #maxId int
select #maxId = max(YOUR_COLUMN_ID) from YOUR_TABLE_HERE
declare #t table (id int)
declare #i int
set #i = 1
while #i <= #maxId
begin
insert into #t values (#i)
set #i = #i +1
end
select t.id
from #t t
left join YOUR_TABLE_HERE x on x.YOUR_COLUMN_ID = t.id
where x.YOUR_COLUMN_ID is null
Have thought about this question recently, and looks like this is the most elegant way to do that:
SELECT TOP(#MaxNumber) ROW_NUMBER() OVER (ORDER BY t1.number)
FROM master..spt_values t1 CROSS JOIN master..spt_values t2
EXCEPT
SELECT Id FROM <your_table>
This solution doesn't give all holes in table, only next free ones + first available max number on table - works if you want to fill in gaps in id-es, + get free id number if you don't have a gap..
select numb + 1 from temp
minus
select numb from temp;
This will give you the complete picture, where 'Bottom' stands for gap start and 'Top' stands for gap end:
select *
from
(
(select <COL>+1 as id, 'Bottom' AS 'Pos' from <TABLENAME> /*where <CONDITION*/>
except
select <COL>, 'Bottom' AS 'Pos' from <TABLENAME> /*where <CONDITION>*/)
union
(select <COL>-1 as id, 'Top' AS 'Pos' from <TABLENAME> /*where <CONDITION>*/
except
select <COL>, 'Top' AS 'Pos' from <TABLENAME> /*where <CONDITION>*/)
) t
order by t.id, t.Pos
Note: First and Last results are WRONG and should not be regarded, but taking them out would make this query a lot more complicated, so this will do for now.
Many of the previous answer are quite good. However they all miss to return the first value of the sequence and/or miss to consider the lower limit 100000. They all returns intermediate holes but not the very first one (100001 if missing).
A full solution to the question is the following one:
select id + 1 as newid from
(select 100000 as id union select id from tbl) t
where (id + 1 not in (select id from tbl)) and
(id >= 100000)
order by id
limit 1;
The number 100000 is to be used if the first number of the sequence is 100001 (as in the original question); otherwise it is to be modified accordingly
"limit 1" is used in order to have just the first available number instead of the full sequence
For people using Oracle, the following can be used:
select a, b from (
select ID + 1 a, max(ID) over (order by ID rows between current row and 1 following) - 1 b from MY_TABLE
) where a <= b order by a desc;
The following SQL code works well with SqLite, but should be used without issues also on MySQL, MS SQL and so on.
On SqLite this takes only 2 seconds on a table with 1 million rows (and about 100 spared missing rows)
WITH holes AS (
SELECT
IIF(c2.id IS NULL,c1.id+1,null) as start,
IIF(c3.id IS NULL,c1.id-1,null) AS stop,
ROW_NUMBER () OVER (
ORDER BY c1.id ASC
) AS rowNum
FROM |mytable| AS c1
LEFT JOIN |mytable| AS c2 ON c1.id+1 = c2.id
LEFT JOIN |mytable| AS c3 ON c1.id-1 = c3.id
WHERE c2.id IS NULL OR c3.id IS NULL
)
SELECT h1.start AS start, h2.stop AS stop FROM holes AS h1
LEFT JOIN holes AS h2 ON h1.rowNum+1 = h2.rowNum
WHERE h1.start IS NOT NULL AND h2.stop IS NOT NULL
UNION ALL
SELECT 1 AS start, h1.stop AS stop FROM holes AS h1
WHERE h1.rowNum = 1 AND h1.stop > 0
ORDER BY h1.start ASC