Pivot multiple rows / columns into 1 row - sql

We need to take multiple rows and multiple columns and transpose them into 1 row per key. I have a pivot query, but it is not working. I get some error about "Column ambiguously defined'
Our data looks like this:
SECTOR TICKER COMPANY
-----------------------------------------------------
5 ADNT Adient PLC
5 AUTO Autobytel Inc.
5 THRM Gentherm Inc
5 ALSN Allison Transmission Holdings, Inc.
5 ALV Autoliv, Inc.
12 HES Hess Corporation
12 AM Antero Midstrm
12 PHX Panhandle Royalty Company
12 NBR Nabors Industries Ltd.
12 AMRC Ameresco, Inc.
What we need is 1 row per ID, with each TICKER / COMPANY in a different column. So, output would look like:
5 ADNT Adient PLC AUTO Autobytel Inc. THRM Gentherm Inc........
You get the idea. 1 row per ID, and each other value in its own column. The query I tried is:
SELECT sector, ticker, company_name
FROM (SELECT d.sector, d.ticker, v.company_name, ROW_NUMBER() OVER(PARTITION BY d.sector ORDER BY d.sector) rn
FROM template13_ticker_data d, template13_vw v
WHERE d.m_ticker = v.m_ticker)
PIVOT (MAX(sector) AS sector, MAX(ticker) AS ticker, MAX(company_name) AS company_name
FOR (rn) IN (1 AS sector, 2 AS ticker, 3 AS company_name))
ORDER BY sector;

First thing to understand about pivots, you pick a single column in a result set to act as the as the PIVOT anchor, the hinge that the data will be pivoted around, this is specified in the FOR clause.
You can only PIVOT FOR a single column, but you can construct this column in a subquery or from joins or views as your target data query, OP has used ROW_NUMBER() but you can use any SQL mechanism you wish, even CASE statement to build a custom column to pivot around if there is no natural column within the dataset to use.
PIVOT will make a column for each value in the FOR column and will give that column the value of the aggregation function that you specify
It helps to visualise the constructed record set, before you apply the pivot, the following SQL can recreate the data scenario that OP has presented. I have used table variables here in place of OPs tables and views.
-- template13_ticker_data (with sector_char added)
DECLARE #tickerData Table
(
sector INT,
ticker CHAR(4),
m_ticker CHAR(4),
sector_char char(10)
)
-- template13_vw
DECLARE #Company Table
(
m_ticker CHAR(4),
ticker CHAR(4),
company_name VARCHAR(100)
)
INSERT INTO #tickerData (sector, ticker)
VALUES (5 ,'ADNT')
,(5 ,'AUTO')
,(5 ,'THRM')
,(5 ,'ALSN')
,(5 ,'ALV')
,(12,'HES')
,(12,'AM')
,(12,'PHX')
,(12,'NBR')
,(12,'AMRC')
INSERT INTO #Company (ticker, company_name)
VALUES ('ADNT','Adient PLC')
,('AUTO','Autobytel Inc.')
,('THRM','Gentherm Inc')
,('ALSN','Allison Transmission Holdings, Inc.')
,('ALV ','Autoliv, Inc.')
,('HES ','Hess Corporation')
,('AM ','Antero Midstrm')
,('PHX ','Panhandle Royalty Company')
,('NBR ','Nabors Industries Ltd.')
,('AMRC','Ameresco, Inc.')
-- Just re-creating a record set that matches the given data and query structure
UPDATE #tickerData SET m_ticker = ticker
UPDATE #Company SET m_ticker = ticker
-- populate 'sector_char' to show multiple aggregates
UPDATE #tickerData SET sector_char = '|' + cast(sector as varchar) + '|'
-- Unpivoted data Proof
SELECT d.sector, d.sector_char, d.ticker, v.company_name, ROW_NUMBER() OVER(PARTITION BY d.sector ORDER BY d.sector) rn
FROM #tickerData d, #Company v
WHERE d.m_ticker = v.m_ticker
The data before the pivot looks like this:
sector sector_char ticker company_name rn
------------------------------------------------------------------------
5 |5| ADNT Adient PLC 1
5 |5| AUTO Autobytel Inc. 2
5 |5| THRM Gentherm Inc 3
5 |5| ALSN Allison Transmission Holdings, Inc. 4
5 |5| ALV Autoliv, Inc. 5
12 |12| HES Hess Corporation 1
12 |12| AM Antero Midstrm 2
12 |12| PHX Panhandle Royalty Company 3
12 |12| NBR Nabors Industries Ltd. 4
12 |12| AMRC Ameresco, Inc. 5
Now visualise a subset of the results that you are expecting, to show the limitations around multiple column operations I have created sector_char to include in the final output
sector sector_char ticker_1 company_1 ticker_2 company_2
-----------------------------------------------------------------------------
5 |5| ADNT Adient PLC AUTO Autobytel Inc.
12 |12| HES Hess Corporation AM Antero Midstrm
Because we want more than 1 column output from the original row output, (ticker and company from each row) we have to use one of the following techniques:
Concatenate the values from multiple columns into a single column
Only useful if you can easily split those columns before you need to use the individual values, or if you don't need to process the columns, it is purely for visualisations.
execute multiple PIVOT queries and join the results
Necessary when the aggregation logic is different for each column, or you are not simply transposing a row value into a column value (aggregating multiple rows into a single cell response.)
In scenarios like this one, when we are just transposing the value (eg, the result of the aggregate will match the original cell value) I regard this as a bit of a hack but can also be less syntax than the alternative.
I say hack because the core PIVOT logic is duplicated, which makes it harder to maintain as the query evolves.
execute a single PIVOT on the unique column, join on other tables to build out the additional columns
This easily allows an unlimited number of additional rows in the output. The PIVOT resolves the ID of the table that holds the multiple values that we want to display in the final results.
Lets look at 3 first, as this demonstrates a single PIVOT and how to include multiple columns for each of the PIVOT results:
In this example I have allowed for up to 8 results for each sector, it is important to note that you MUST specify all the output columns from the PIVOT, it is not dynamic.
You could use dynamic queries to test for the max number of columns you need and generate out the following query based on those results.
Also note that in this solution, we do not need to join on the template13_vw table within the PIVOT source query, instead we have joined on the result, that is why the pivot is returning m_ticker (which I assume to be the key) instead of ticker that is displayed in the final result.
-- NOTE: using CTE here, you could use table variables, temporary tables or whatever else you need
;WITH TickersBySector as
(
-- You must specify the fixed number of columns in the output
SELECT sector, sector_char, [1] as [m_ticker_1],[2] as [m_ticker_2],[3] as [m_ticker_3],[4] as [m_ticker_4],[5] as [m_ticker_5],[6] as [m_ticker_6],[7] as [m_ticker_7],[8] as [m_ticker_8]
FROM (
SELECT d.sector, d.sector_char, d.m_ticker, ROW_NUMBER() OVER(PARTITION BY d.sector ORDER BY d.sector) rn
FROM template13_ticker_data d /* OPs Syntax */
-- FROM #tickerData d /* Use this with the proof table variables */
) data
PIVOT (
MAX(m_ticker)
FOR rn IN ( [1],[2],[3],[4],[5],[6],[7],[8])
) as PivotTable
)
-- To use with the proof table variables, replace 'template13_vw' with '#Company'
SELECT sector, sector_char
,c1.[ticker] as [ticker_1], c1.company_name as [company_1]
,c2.[ticker] as [ticker_2], c2.company_name as [company_2]
,c3.[ticker] as [ticker_3], c3.company_name as [company_3]
,c4.[ticker] as [ticker_4], c4.company_name as [company_4]
,c5.[ticker] as [ticker_5], c5.company_name as [company_5]
,c6.[ticker] as [ticker_6], c6.company_name as [company_6]
,c7.[ticker] as [ticker_7], c7.company_name as [company_7]
,c8.[ticker] as [ticker_8], c8.company_name as [company_8]
FROM TickersBySector
LEFT OUTER JOIN template13_vw c1 ON c1.m_ticker = TickersBySector.m_ticker_1
LEFT OUTER JOIN template13_vw c2 ON c2.m_ticker = TickersBySector.m_ticker_2
LEFT OUTER JOIN template13_vw c3 ON c3.m_ticker = TickersBySector.m_ticker_3
LEFT OUTER JOIN template13_vw c4 ON c4.m_ticker = TickersBySector.m_ticker_4
LEFT OUTER JOIN template13_vw c5 ON c5.m_ticker = TickersBySector.m_ticker_5
LEFT OUTER JOIN template13_vw c6 ON c6.m_ticker = TickersBySector.m_ticker_6
LEFT OUTER JOIN template13_vw c7 ON c7.m_ticker = TickersBySector.m_ticker_7
LEFT OUTER JOIN template13_vw c8 ON c8.m_ticker = TickersBySector.m_ticker_8
The following is the same query, using multiple PIVOT queries joins together.
Notice that in this scenario it is not important that both PIVOTs bring back the additional common column sector_char, so use this style of syntax when the aggregate or the additional common column might be different for the different result sets.
;WITH TickersBySector as
(
-- You must specify the fixed number of columns in the output
SELECT sector, sector_char, [1] as [ticker_1],[2] as [ticker_2],[3] as [ticker_3],[4] as [ticker_4],[5] as [ticker_5],[6] as [ticker_6],[7] as [ticker_7],[8] as [ticker_8]
FROM (
SELECT d.sector, d.sector_char, d.m_ticker, ROW_NUMBER() OVER(PARTITION BY d.sector ORDER BY d.sector) rn
FROM template13_ticker_data d /* OPs Syntax */
-- FROM #tickerData d /* Use this with the proof table variables */
) data
PIVOT (
MAX(m_ticker)
FOR rn IN ( [1],[2],[3],[4],[5],[6],[7],[8])
) as PivotTable
)
, CompanyBySector as
(
-- You must specify the fixed number of columns in the output
SELECT sector,[1] as [company_1],[2] as [company_2],[3] as [company_3],[4] as [company_4],[5] as [company_5],[6] as [company_6],[7] as [company_7],[8] as [company_8]
FROM (
SELECT d.sector, v.company_name, ROW_NUMBER() OVER(PARTITION BY d.sector ORDER BY d.sector) rn
FROM template13_ticker_data d /* OPs Syntax */
-- FROM #tickerData d /* Use this with the proof table variables */
INNER JOIN template13_vw v /* OPs Syntax */
-- INNER JOIN #Company v /* Use this with the proof table variables */
ON d.m_ticker = v.m_ticker
) data
PIVOT (
MAX(company_name)
FOR rn IN ( [1],[2],[3],[4],[5],[6],[7],[8])
) as PivotTable
)
SELECT TickersBySector.sector, sector_char
,[ticker_1], [company_1]
,[ticker_2], [company_2]
,[ticker_3], [company_3]
,[ticker_4], [company_4]
,[ticker_5], [company_5]
,[ticker_6], [company_6]
,[ticker_7], [company_7]
,[ticker_8], [company_8]
FROM TickersBySector
INNER JOIN CompanyBySector ON TickersBySector.sector = CompanyBySector.sector

Related

SQL Server [PATSTAT] query | Multiple charindex values &

Hello Stack Overflow Community.
I am retrieving data with SQL from PATSTAT (patent data base from the European Patent Office). I have two issues (see below). For your info the PATSAT sql commands are quite limited.
I. Charindex with multiple values
I am looking for specific two specific patent groups ["Y02E" and "Y02C"] and want to retrieve data on these. I have found that using the charindex function works if I insert one group;
and charindex ('Y02E', cpc_class_symbol) > 0
But if I want to use another charindex function the query just times out;
and charindex ('Y02E', cpc_class_symbol) > 0 or charindex ('Y02C', cpc_class_symbol) >0
I am an absolute SQL rookie but would really appreciate your help!
II. List values from column in one cell with comma separation
Essentially I want to apply what I found as the "string_agg"-command, however, it does not work for this database. I have entries with a unique ID, which have multiple patent categories. For example:
appln_nr_epodoc | cpc_class_symbol
EP20110185794 | Y02E 10/125
EP20110185794 | Y02E 10/127
I would like to have it like this, however:
appln_nr_epodoc | cpc_class_symbol
EP20110185794 | Y02E 10/125, Y02E 10/127
Again, I am very new to sql, so any help is appreciated! Thank you!
I will also attach the full code here for transparency
SELECT a.appln_nr_epodoc, a.appln_nr_original, psn_name, person_ctry_code, person_name, person_address, appln_auth+appln_nr,
appln_filing_date, cpc_class_symbol
FROM
tls201_appln a
join tls207_pers_appln b on a.appln_id = b.appln_id
join tls206_person c on b.person_id = c.person_id
join tls801_country on c.person_ctry_code= tls801_country.ctry_code
join tls224_appln_cpc on a.appln_id = tls224_appln_cpc.appln_id
WHERE appln_auth = 'EP'
and appln_filing_year between 2005 and 2012
and eu_member = 'Y'
and granted = 'Y'
and psn_sector = 'company'
and charindex ('Y02E', cpc_class_symbol) > 0
For your part 2 here is a sample data i created
And here is the code. It gives me YOUR requested output.
create table #test_1 (
appln_nr_epodoc varchar(20) null
,cpc_class_symbol varchar(20) null
)
insert into #test_1 values
('EP20110185794','Y02E 10/125')
,('EP20110185794','Y02E 10/127')
,('EP20110185795','Y02E 10/130')
,('EP20110185796','Y02E 20/140')
,('EP20110185796','Y02E 21/142')
with CTE_1 as (select *
from (
select *
,R1_1 = Rank() over(partition by appln_nr_epodoc order by cpc_class_symbol )
from #test_1
) as a
where R1_1 = 1
)
,CTE_2 as (select *
from (
select *
,R1_1 = Rank() over(partition by appln_nr_epodoc order by cpc_class_symbol )
from #test_1
) as a
where R1_1 = 2 )
select a.appln_nr_epodoc
,a.cpc_class_symbol+','+c.cpc_class_symbol
from CTE_1 a
join CTE_2 c on c.appln_nr_epodoc = a.appln_nr_epodoc
Out put

Teradata SQL stack rows per user

Is there a way to stack/group string/text per user ?
data I have
USER STATES
1 CA
1 AR
1 IN
2 CA
3 CA
3 NY
4 CA
4 AL
4 SD
4 TX
What I need is
USER STATES
1 CA / AR / IN
2 CA
3 CA / NY
4 CA / AL / SD / TX
I tried cross join and then another cross join however but the data spools out. Thanks!
If Teradata's XML-services are installed there's a function named XMLAGG, which returns a similar result: CA, AR, IN
SELECT user,
TRIM(TRAILING ',' FROM (XMLAGG(TRIM(states)|| ',' /* optionally ORDER BY ...*/) (VARCHAR(10000))))
FROM tab
GROUP BY 1
Btw, using recursion will result in huge spool usage, because you keep all the intermediate rows in spool before returning the final row.
I am not an expert but this should work. You may need to modify it a bit per your exact requirement. Hope this helps!
CREATE VOLATILE TABLE temp AS (
SELECT
USER
,STATES
,ROW_NUMBER() OVER (PARTITION BY USER ORDER BY STATES) AS rn
FROM yourtable
) WITH DATA PRIMARY INDEX(USER) ON COMMIT PRESERVE ROWS;
WITH RECURSIVE rec_test(US,ST, LVL)
AS
(
SELECT USER,STATES (VARCHAR(10)),1
FROM temp
WHERE rn = 1
UNION ALL
SELECT USER, TRIM(STATES) || ', ' || ST,LVL+1
FROM temp INNER JOIN rec_test
ON USER = US
AND temp.rn = rec_test.lvl+1
)
SELECT US,ST, LVL
FROM rec_test
QUALIFY RANK() OVER(PARTITION BY US ORDER BY LVL DESC) = 1;
Unfortunately there is no GROUP_CONCAT or any string aggregate functions in Teradata (at least none that I'm aware of) so one way to achieve your result would be to use recursion, since you don't know the maximum values of states per user.
For recursion you should use a Volatile Table, as OLAP functions are not allowed in the recursive part. This is a non-tested code (I've got no way of testing it unfortunately), so there might be several bugs, but should give you the concept and with some troubleshooting (if needed) give you expected result.
Replace yourtable in definition of Volatile Table with your real table name.
CREATE VOLATILE TABLE vt AS (
SELECT
user
, states
, ROW_NUMBER() OVER (PARTITION BY user ORDER BY states) AS rn
, COUNT(*) OVER (PARTITION BY user) AS cnt
FROM yourtable
) WITH DATA
UNIQUE PRIMARY INDEX(user, rn)
ON COMMIT PRESERVE ROWS;
WITH RECURSIVE cte (user, list, rn) AS (
SELECT
user
, CAST(states AS VARCHAR(1000)) -- maximum size based on maximum number of rows * length of states
, rn
FROM vt
WHERE rn = cnt -- start with last states row
UNION ALL
SELECT
vt.user
, cte.list || ',' || vt.states
, vt.rn
FROM vt
JOIN cte ON vt.user = cte.user AND vt.rn = cte.rn - 1 -- append a row that is rn-1 of your rows for a given user
)
SELECT user, list
FROM cte
WHERE rn = 1; -- going from last to first, in this condition there should be entire list
This solution isn't perfect - it forces the engine to store immediate results in a temporary area during query processing. You may encounter a No more spool space error.

Calculate Sum From Moving 4 Rows in SQL

I've have the following data.
WM_Week POS_Store_Count POS_Qty POS_Sales POS_Cost
------ --------------- ------ -------- --------
201541 3965 77722 153904.67 102593.04
201542 3952 77866 154219.66 102783.12
201543 3951 70690 139967.06 94724.60
201544 3958 70773 140131.41 95543.55
201545 3958 76623 151739.31 103441.05
201546 3956 73236 145016.54 98868.60
201547 3939 64317 127368.62 86827.95
201548 3927 60762 120309.32 82028.70
I need to write a SQL query to get the last four weeks of data, and their last four weeks summed for each of the following columns: POS_Store_Count,POS_Qty,POS_Sales, and POS_Cost.
For example, if I wanted 201548's data it would contain 201548, 201547, 201546, and 201545's.
The sum of 201547 would contain 201547, 201546, 201545, and 201544.
The query should return 4 rows when ran successfully.
How would I formulate a recursive query to do this? Is there something easier than recursive to do this?
Edit: The version is Azure Sql DW with version number 12.0.2000.
Edit2: The four rows that should be returned would have the sum of the columns from itself and it's three earlier weeks.
For example, if I wanted the figures for 201548 it would return the following:
WM_Week POS_Store_Count POS_Qty POS_Sales POS_Cost
------ --------------- ------- -------- --------
201548 15780 274938 544433.79 371166.3
Which is the sum of the four (non-identity) columns from 201548, 201547, 201546, and 201545.
Pretty sure this will get you what you want.. Im using cross apply after ordering the data to apply the SUMS
Create Table #WeeklyData (WM_Week Int, POS_Store_Count Int, POS_Qty Int, POS_Sales Money, POS_Cost Money)
Insert #WeeklyData Values
(201541,3965,77722,153904.67,102593.04),
(201542,3952,77866,154219.66,102783.12),
(201543,3951,70690,139967.06,94724.6),
(201544,3958,70773,140131.41,95543.55),
(201545,3958,76623,151739.31,103441.05),
(201546,3956,73236,145016.54,98868.6),
(201547,3939,64317,127368.62,86827.95),
(201548,3927,60762,120309.32,82028.7)
DECLARE #StartWeek INT = 201548;
WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (ORDER BY [WM_Week] DESC) rn
FROM #WeeklyData
WHERE WM_Week BETWEEN #StartWeek - 9 AND #StartWeek
)
SELECT *
FROM cte c1
CROSS APPLY (SELECT SUM(POS_Store_Count) POS_Store_Count_SUM,
SUM(POS_Qty) POS_Qty_SUM,
SUM(POS_Sales) POS_Sales_SUM,
SUM(POS_Cost) POS_Cost_SUM
FROM cte c2
WHERE c2.rn BETWEEN c1.rn AND (c1.rn + 3)
) ca
WHERE c1.rn <= 4
You can use SUM() in combination with the OVER Clause
Something like:
SELECT WM_Week.
, SUM(POS_Store_Count) OVER (ORDER BY WM_Week ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
FROM Table
You should be able to use a SQL window function for this.
Add a column to your query like the following:
SUM(POS_Sales) OVER(
ORDER BY WM_Week
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
) AS POS_Sales_4_Weeks
If I understand correctly, you don't want to return 4 rows, but rather 4 summed columns for each group? If so, here's one option:
select max(WM_Week) as WM_Week,
sum(POS_Store_Count),
sum(POS_Qty),
sum(POS_Sales),
sum(POS_Cost)
from (select top 4 *
from yourtable
where wm_week <= 201548
order by wm_week desc) t
This uses a subquery with top to get the 4 rows you want to aggregate based on the where criteria and order by clause.
Here is a condensed fiddle demonstrating the example (sorry fiddle isn't supporting sql server right now, so the syntax is slightly off):

SQL percentage of the total

Hi how can I get the percentage of each record over the total?
Lets imagine I have one table with the following
ID code Points
1 101 2
2 201 3
3 233 4
4 123 1
The percentage for ID 1 is 20% for 2 is 30% and so one
how do I get it?
There's a couple approaches to getting that result.
You essentially need the "total" points from the whole table (or whatever subset), and get that repeated on each row. Getting the percentage is a simple matter of arithmetic, the expression you use for that depends on the datatypes, and how you want that formatted.
Here's one way (out a couple possible ways) to get the specified result:
SELECT t.id
, t.code
, t.points
-- , s.tot_points
, ROUND(t.points * 100.0 / s.tot_points,1) AS percentage
FROM onetable t
CROSS
JOIN ( SELECT SUM(r.points) AS tot_points
FROM onetable r
) s
ORDER BY t.id
The view query s is run first, that gives a single row. The join operation matches that row with every row from t. And that gives us the values we need to calculate a percentage.
Another way to get this result, without using a join operation, is to use a subquery in the SELECT list to return the total.
Note that the join approach can be extended to get percentage for each "group" of records.
id type points %type
-- ---- ------ -----
1 sold 11 22%
2 sold 4 8%
3 sold 25 50%
4 bought 1 50%
5 bought 1 50%
6 sold 10 20%
To get that result, we can use the same query, but a a view query for s that returns total GROUP BY r.type, and then the join operation isn't a CROSS join, but a match based on type:
SELECT t.id
, t.type
, t.points
-- , s.tot_points_by_type
, ROUND(t.points * 100.0 / s.tot_points_by_type,1) AS `%type`
FROM onetable t
JOIN ( SELECT r.type
, SUM(r.points) AS tot_points
FROM onetable r
GROUP BY r.type
) s
ON s.type = t.type
ORDER BY t.id
To do that same result with the subquery, that's going to be a correlated subquery, and that subquery is likely to get executed for every row in t.
This is why it's more natural for me to use a join operation, rather than a subquery in the SELECT list... even when a subquery works the same. (The patterns we use for more complex queries, like assigning aliases to tables, qualifying all column references, and formatting the SQL... those patterns just work their way back into simple queries. The rationale for these patterns is kind of lost in simple queries.)
try like this
select id,code,points,(points * 100)/(select sum(points) from tabel1) from table1
To add to a good list of responses, this should be fast performance-wise, and rather easy to understand:
DECLARE #T TABLE (ID INT, code VARCHAR(256), Points INT)
INSERT INTO #T VALUES (1,'101',2), (2,'201',3),(3,'233',4), (4,'123',1)
;WITH CTE AS
(SELECT * FROM #T)
SELECT C.*, CAST(ROUND((C.Points/B.TOTAL)*100, 2) AS DEC(32,2)) [%_of_TOTAL]
FROM CTE C
JOIN (SELECT CAST(SUM(Points) AS DEC(32,2)) TOTAL FROM CTE) B ON 1=1
Just replace the table variable with your actual table inside the CTE.

Sorting twice on same column

I'm having a bit of a weird question, given to me by a client.
He has a list of data, with a date between parentheses like so:
Foo (14/08/2012)
Bar (15/08/2012)
Bar (16/09/2012)
Xyz (20/10/2012)
However, he wants the list to be displayed as follows:
Foo (14/08/2012)
Bar (16/09/2012)
Bar (15/08/2012)
Foot (20/10/2012)
(notice that the second Bar has moved up one position)
So, the logic behind it is, that the list has to be sorted by date ascending, EXCEPT when two rows have the same name ('Bar'). If they have the same name, it must be sorted with the LATEST date at the top, while staying in the other sorting order.
Is this even remotely possible? I've experimented with a lot of ORDER BY clauses, but couldn't find the right one. Does anyone have an idea?
I should have specified that this data comes from a table in a sql server database (the Name and the date are in two different columns). So I'm looking for a SQL-query that can do the sorting I want.
(I've dumbed this example down quite a bit, so if you need more context, don't hesitate to ask)
This works, I think
declare #t table (data varchar(50), date datetime)
insert #t
values
('Foo','2012-08-14'),
('Bar','2012-08-15'),
('Bar','2012-09-16'),
('Xyz','2012-10-20')
select t.*
from #t t
inner join (select data, COUNT(*) cg, MAX(date) as mg from #t group by data) tc
on t.data = tc.data
order by case when cg>1 then mg else date end, date desc
produces
data date
---------- -----------------------
Foo 2012-08-14 00:00:00.000
Bar 2012-09-16 00:00:00.000
Bar 2012-08-15 00:00:00.000
Xyz 2012-10-20 00:00:00.000
A way with better performance than any of the other posted answers is to just do it entirely with an ORDER BY and not a JOIN or using CTE:
DECLARE #t TABLE (myData varchar(50), myDate datetime)
INSERT INTO #t VALUES
('Foo','2012-08-14'),
('Bar','2012-08-15'),
('Bar','2012-09-16'),
('Xyz','2012-10-20')
SELECT *
FROM #t t1
ORDER BY (SELECT MIN(t2.myDate) FROM #t t2 WHERE t2.myData = t1.myData), T1.myDate DESC
This does exactly what you request and will work with any indexes and much better with larger amounts of data than any of the other answers.
Additionally it's much more clear what you're actually trying to do here, rather than masking the real logic with the complexity of a join and checking the count of joined items.
This one uses analytic functions to perform the sort, it only requires one SELECT from your table.
The inner query finds gaps, where the name changes. These gaps are used to identify groups in the next query, and the outer query does the final sorting by these groups.
I have tried it here (SQL Fiddle) with extended test-data.
SELECT name, dat
FROM (
SELECT name, dat, SUM(gap) over(ORDER BY dat, name) AS grp
FROM (
SELECT name, dat,
CASE WHEN LAG(name) OVER (ORDER BY dat, name) = name THEN 0 ELSE 1 END AS gap
FROM t
) x
) y
ORDER BY grp, dat DESC
Extended test-data
('Bar','2012-08-12'),
('Bar','2012-08-11'),
('Foo','2012-08-14'),
('Bar','2012-08-15'),
('Bar','2012-08-16'),
('Bar','2012-09-17'),
('Xyz','2012-10-20')
Result
Bar 2012-08-12
Bar 2012-08-11
Foo 2012-08-14
Bar 2012-09-17
Bar 2012-08-16
Bar 2012-08-15
Xyz 2012-10-20
I think that this works, including the case I asked about in the comments:
declare #t table (data varchar(50), [date] datetime)
insert #t
values
('Foo','20120814'),
('Bar','20120815'),
('Bar','20120916'),
('Xyz','20121020')
; With OuterSort as (
select *,ROW_NUMBER() OVER (ORDER BY [date] asc) as rn from #t
)
--Now we need to find contiguous ranges of the same data value, and the min and max row number for such a range
, Islands as (
select data,rn as rnMin,rn as rnMax from OuterSort os where not exists (select * from OuterSort os2 where os2.data = os.data and os2.rn = os.rn - 1)
union all
select i.data,rnMin,os.rn
from
Islands i
inner join
OuterSort os
on
i.data = os.data and
i.rnMax = os.rn-1
), FullIslands as (
select
data,rnMin,MAX(rnMax) as rnMax
from Islands
group by data,rnMin
)
select
*
from
OuterSort os
inner join
FullIslands fi
on
os.rn between fi.rnMin and fi.rnMax
order by
fi.rnMin asc,os.rn desc
It works by first computing the initial ordering in the OuterSort CTE. Then, using two CTEs (Islands and FullIslands), we compute the parts of that ordering in which the same data value appears in adjacent rows. Having done that, we can compute the final ordering by any value that all adjacent values will have (such as the lowest row number of the "island" that they belong to), and then within an "island", we use the reverse of the originally computed sort order.
Note that this may, though, not be too efficient for large data sets. On the sample data it shows up as requiring 4 table scans of the base table, as well as a spool.
Try something like...
ORDER BY CASE date
WHEN '14/08/2012' THEN 1
WHEN '16/09/2012' THEN 2
WHEN '15/08/2012' THEN 3
WHEN '20/10/2012' THEN 4
END
In MySQL, you can do:
ORDER BY FIELD(date, '14/08/2012', '16/09/2012', '15/08/2012', '20/10/2012')
In Postgres, you can create a function FIELD and do:
CREATE OR REPLACE FUNCTION field(anyelement, anyarray) RETURNS numeric AS $$
SELECT
COALESCE((SELECT i
FROM generate_series(1, array_upper($2, 1)) gs(i)
WHERE $2[i] = $1),
0);
$$ LANGUAGE SQL STABLE
If you do not want to use the CASE, you can try to find an implementation of the FIELD function to SQL Server.