SQL Server 2005 - Select top N plus "Other" - sql

I have a table for which I want to select top the 5 rows by some column A. I also want to have a 6th row titled 'Other' which sums the values in column A for all but the top 5 rows.
Is there an easy way to do this? I'm starting with:
select top 5
columnB, columnA
from
someTable t
order by
columnA desc

Not tested, but try something like this:
select * from (
select top 5
columnB, columnA
from
someTable t
order by
columnA desc
union all
select
null, sum(columnA)
from
someTable t
where primaryKey not in (
select top 5
primaryKey
from
someTable t
order by
columnA desc
)
) a

select top 5 columnB, columnA
from someTable
order by columnA desc
select SUM(columnA) as Total
from someTable
Do the subtraction on the client side.

100% untested, and off the top of my head, but you can give something like this a go. If I have a chance to test tonight I'll update the post, but there's a bottle of wine open for dinner and it's Friday night... :)
WITH CTE AS
(
SELECT
ColumnB,
ColumnA,
ROW_NUMBER() OVER (ORDER BY ColumnB) AS RowNumber
FROM
dbo.SomeTable
)
SELECT
CASE WHEN RowNumber <= 5 THEN ColumnB ELSE 'Other' END AS ColumnB,
SUM(ColumnA) AS ColumnA
FROM
CTE
GROUP BY
CASE WHEN RowNumber <= 5 THEN ColumnB ELSE 'Other' END
ORDER BY
MIN(RowNumber)
EDIT: Looks like this worked after a couple of silly syntax errors. I've corrected those, so it should work as listed above now. I can't speak to performance on a large data set though, but it's worth giving it a shot.

This is off the top of my head, and i will garuntee horribly efficient:
SELECT TOP 5 columnB, columnA
FROM comTable t
ORDER BY columnA desc
UNION
SELECT 'Other', (A.Sum - B.Sum) AS Summary
FROM (SELECT SUM(columnA) FROM someTable) A
JOIN (SELECT SUM(One) FROM
(SELECT TOP 5 columnA FROM someTable ORDER BY columnA DESC)) B

I have recently used the EXCEPT statemnet a lot: (Not tested but I give it a go)
select top 5
columnB, columnA
from
someTable t
order by
columnA desc
UNION ALL
SELECT 'OTHER' ColumnB, SUM(ColumnA)
FROM
(SELECT ColumnB, ColumnA
FROM someTable t
EXCEPT
select top 5
columnB, columnA
from
someTable t
order by
columnA desc
) others

Related

SQL: Add a SUM(ColumnA) AS ColumnB to a query returning ColumnA

I have a query that returns a number of columns, including ColumnA (which is numerical).
I want to add an additional column to the end of the query that returns the sum of ColumnA
ColumnA
ColumnB
10
37
20
37
5
37
2
37
SELECT
ColumnA,
SUM(ColumnA) AS ColumnB
FROM
Table
The code above doesn't work, but I'm not sure how to create something that will.
I think you need this query:
SELECT ColumnA, (SELECT SUM(ColumnA) FROM table) as ColumnB
FROM table
Something like
SELECT
ColumnA,
ColumnASum
FROM Table
LEFT JOIN (SELECT SUM(columnA) ColumnASum FROM Table)
ON TRUE;
Should work
You could create a variable of the SUM() first.
DECLARE #ColumnB int
SET #ColumnB = (SELECT SUM(ColumnA) FROM Table)
SELECT ColumnA, #ColumnB
FROM Table
This should give you what you need.
I would use CROSS APPLY.
SELECT Table.ColumnA
,CA.ColumnB
FROM Table
CROSS APPLY (SELECT SUM(ColumnA) ColumnB FROM Table) CA
You basically define a subquery that outputs an aggregate value that you can have as another column.

Big Query view (table without duplicate rows)

I need to create a view that is pretty much just like some table with some simple transformations and I want to make sure the values in a particular column are not duplicate.
So let's say the table looks like this:
ID, ColumnA, ColumnB
-------------------
1 cars shirts
2 tvs dogs
1 fingers computers
And the resulting view would look like this:
ID, ColumnA, ColumnB
-------------------
1 cars shirts
2 tvs dogs
So, is there an equivalent to SELECT distint(ID), ColumnA, ColumnB?
What's the most efficient way to do it?
If you just want an arbitrary row for each ID, use ANY_VALUE:
#standardSQL
WITH Input AS (
SELECT 1 AS ID, 'cars' AS ColumnA, 'shirts' AS ColumnB UNION ALL
SELECT 2 AS ID, 'tvs' AS ColumnA, 'dogs' AS ColumnB UNION ALL
SELECT 1 AS ID, 'fingers' AS ColumnA, 'computers' AS ColumnB
)
SELECT
ANY_VALUE(t).*
FROM Input AS t
GROUP BY t.ID;
Or you can use the ARRAY_AGG trick to select the latest row based on a condition.
Below is for BigQuery Standard SQL
#standardSQL
WITH yourTable AS (
SELECT 1 AS id, 'cars' AS columnA, 'shirts' AS columnB UNION ALL
SELECT 2, 'tvs', 'dogs' UNION ALL
SELECT 1, 'fingers', 'computers'
)
SELECT r.*
FROM (
SELECT ARRAY_AGG(t ORDER BY columnA LIMIT 1)[OFFSET (0)] AS r
FROM yourTable t
GROUP BY id
)
-- ORDER BY id
Note: you should have some logic about selecting row with cars over the fingers!
Above version (as an example) is based on asc order

Group BY on Condition basis

I have data in following way....
ColumnA ColumnB
7675 22838
7675 24907
7675 NULL
I want the results in following way.....
ColumnA ColumnB
7675 2 (need total count for Not Null value)
7675 0 (need count 0 for NULL value)
SELECT ColumnA, COUNT(ColumnB) ColumnB
FROM YourTable
GROUP BY ColumnA
UNION ALL
SELECT ColumnA, 0
FROM YourTable
WHERE ColumnB IS NULL
GROUP BY ColumnA
You could introduce a calculated column indicating whether ColumnB is null or not and use it as a grouping criterion together with ColumnA:
SELECT
t.ColumnA,
ColumnB = COUNT(t.ColumnB)
FROM
dbo.YourTable AS t
CROSS APPLY
(SELECT CASE WHEN t.ColumnB IS NULL THEN 1 ELSE 0 END) AS x (SubGroup)
GROUP BY
t.ColumnA,
x.SubGroup
ORDER BY
t.ColumnA,
x.SubGroup
;
The COUNT(t.ColumnB) expression would always be NULL for a null subgroup, and for the corresponding non-null subgroup it would return the number of the non-null entries.
select columnA,
count(columnB) as non_null_count,
sum(columnB is null) as null_count
from your_table
group by ColumnA
you could easily do with a count and sum which may be faster if there are a lot of rows rather than selecting all of the rows twice with a UNION
SELECT columna, columnb, SUM(mycount)
FROM
( SELECT *, COUNT(columnb) as mycount
FROM test
GROUP BY columnb
)t
GROUP BY mycount
ORDER BY CASE WHEN mycount = 0 THEN 1 ELSE 2 END DESC;
Fiddle Demo

How to use order by with union all in sql?

I tried the sql query given below:
SELECT * FROM (SELECT *
FROM TABLE_A ORDER BY COLUMN_1)DUMMY_TABLE
UNION ALL
SELECT * FROM TABLE_B
It results in the following error:
The ORDER BY clause is invalid in views, inline functions, derived
tables, subqueries, and common table expressions, unless TOP or FOR
XML is also specified.
I need to use order by in union all. How do I accomplish this?
SELECT *
FROM
(
SELECT * FROM TABLE_A
UNION ALL
SELECT * FROM TABLE_B
) dum
-- ORDER BY .....
but if you want to have all records from Table_A on the top of the result list, the you can add user define value which you can use for ordering,
SELECT *
FROM
(
SELECT *, 1 sortby FROM TABLE_A
UNION ALL
SELECT *, 2 sortby FROM TABLE_B
) dum
ORDER BY sortby
You don't really need to have parenthesis. You can sort directly:
SELECT *, 1 AS RN FROM TABLE_A
UNION ALL
SELECT *, 2 AS RN FROM TABLE_B
ORDER BY RN, COLUMN_1
Not an OP direct response, but I thought I would jimmy in here responding to the the OP's ERROR messsage, which may point you in another direction entirely!
All these answers are referring to an overall ORDER BY once the record set has been retrieved and you sort the lot.
What if you want to ORDER BY each portion of the UNION independantly, and still have them "joined" in the same SELECT?
SELECT pass1.* FROM
(SELECT TOP 1000 tblA.ID, tblA.CustomerName
FROM TABLE_A AS tblA ORDER BY 2) AS pass1
UNION ALL
SELECT pass2.* FROM
(SELECT TOP 1000 tblB.ID, tblB.CustomerName
FROM TABLE_B AS tblB ORDER BY 2) AS pass2
Note the TOP 1000 is an arbitary number. Use a big enough number to capture all of the data you require.
There will be times when you need to do something like this :
Pull top 5 from table 1 based on a sort
and bottom 5 from table 2 based on another sort
and union these together.
solution
select * from (
-- top 5 records
select top 5 col1, col2, col3
from table1
group by col1, col2
order by col3 desc ) z
union all
select * from (
-- bottom 5 records
select top 5 col1, col2, col3
from table2
group by col1, col2
order by col3 ) z
this was the only way i was able to get around the error and worked fine for me.
SELECT * FROM (SELECT *
FROM TABLE_A ORDER BY COLUMN_1)DUMMY_TABLE
UNION ALL
SELECT * FROM TABLE_B
ORDER BY 2;
2 is column number here .. In Oracle SQL you can use the column number by which you want to sort the data
This solved my SELECT statement:
SELECT * FROM
(SELECT id,name FROM TABLE_A
UNION ALL
SELECT id,name FROM TABLE_B ) dum
order by dum.id , dum.name
where id and name columns available in tables and you can use your columns .
Simply use that , no need parenthesis or anything else
SELECT *, id as TABLE_A_ID FROM TABLE_A
UNION ALL
SELECT *, id as TABLE_B_ID FROM TABLE_B
ORDER BY TABLE_A_ID, TABLE_B_ID
ORDER BY after the last UNION should apply to both datasets joined by union.
The solution shown below:
SELECT *,id AS sameColumn1 FROM Locations
UNION ALL
SELECT *,id AS sameColumn2 FROM Cities
ORDER BY sameColumn1,sameColumn2
select CONCAT(Name, '(',substr(occupation, 1, 1), ')') AS f1
from OCCUPATIONS
union
select temp.str AS f1 from
(select count(occupation) AS counts, occupation, concat('There are a total of ' ,count(occupation) ,' ', lower(occupation),'s.') As str from OCCUPATIONS group by occupation order by counts ASC, occupation ASC
) As temp
order by f1

Joining a table on itself

Is there a better way to write this SQL query?
SELECT *, (SELECT TOP 1 columnB FROM mytable WHERE mytable.columnC = T1.columnC ORDER BY columnD) as firstRecordOfColumnB
FROM
(SELECT * FROM mytable WHERE columnA = 'apple') as T1
Notice that columnC is not the primary key.
If the keyColumns is really a key column (i.e. unique), than the query can definitly be written more elegantly and efficiently...
SELECT
*, columnB
FROM
mytable
WHERE
columnA = 'apple'
This might be better in case of performance:
SELECT
*,
(TOP 1 myLookupTable.columnB FROM mytable AS myLookupTable WHERE myLookupTable.keyColumn = mytable.keyColumn) as firstRecordOfColumnB
FROM
mytable
WHERE
columnA = 'apple'
But for the TOP 1 part I don't know any better solution.
Edit:
If the keyColumn is unique, the data in firstRecordOfColumnB would be the same as in mytable.columnB.
If it's not unique at least you need to sort that data to get a relevant TOP 1, example:
SELECT
*,
(TOP 1 myLookupTable.columnB FROM mytable AS myLookupTable WHERE myLookupTable.keyColumn = mytable.keyColumn
ORDER BY myLookupTable.sortColumn) as firstRecordOfColumnB
FROM
mytable
WHERE
columnA = 'apple'