View to identify grouped values or object - sql

As an example I have 5 objects. An object is the red dots bound together or adjacent to each other. In other words X+1 or X-1 or Y+1 or Y-1.
I need to create a MS SQL VIEW with will contain the first XY coordinate of each object like:
X,Y
=======
1. 1,1
2. 1,8
3. 4,3
4. 5,7
5. 6,5
I can't figure out how to group it in a VIEW (NOT using stored procedure). Anybody have any idea would be of great help.
Thanks

The other answer is already pretty long, so I'm leaving it as-is. This answer is much better, simpler and also correct whereas the other one has some edge-cases that will produce a wrong answer - I shall leave that exercise to the reader.
Note: Line breaks are added for clarity. The entire block is a single query
;with Walker(StartX,StartY,X,Y,Visited) as (
select X,Y,X,Y,CAST('('+right(X,3)+','+right(Y,3)+')' as Varchar(Max))
from puzzle
union all
select W.StartX,W.StartY,P.X,P.Y,W.Visited+'('+right(P.X,3)+','+right(P.Y,3)+')'
from Walker W
join Puzzle P on
(W.X=P.X and W.Y=P.Y+1 OR -- these four lines "collect" a cell next to
W.X=P.X and W.Y=P.Y-1 OR -- the current one in any direction
W.X=P.X+1 and W.Y=P.Y OR
W.X=P.X-1 and W.Y=P.Y)
AND W.Visited NOT LIKE '%('+right(P.X,3)+','+right(P.Y,3)+')%'
)
select X, Y, Visited
from
(
select W.X, W.Y, W.Visited, rn=row_number() over (
partition by W.X,W.Y
order by len(W.Visited) desc)
from Walker W
left join Walker Other
on Other.StartX=W.StartX and Other.StartY=W.StartY
and (Other.Y<W.Y or (Other.Y=W.Y and Other.X<W.X))
where Other.X is null
) Z
where rn=1
The first step is to set up a "walker" recursive table expression that will start at every
cell and travel as far as it can without retracing any step. Making sure that cells are not revisited is done by using the visited column, which stores each cell that has been visited from every starting point. In particular, this condition AND W.Visited NOT LIKE '%('+right(P.X,3)+','+right(P.Y,3)+')%' rejects cells that it has already visited.
To understand how the rest works, you need to look at the result generated by the "Walker" CTE by running "Select * from Walker order by StartX, StartY" after the CTE. A "piece" with 5 cells appears in at least 5 groups, each with a different (StartX,StartY), but each group has all the 5 (X,Y) pieces with different "Visited" paths.
The subquery (Z) uses a LEFT JOIN + IS NULL to weed the groups down to the single row in each group that contains the "first XY coordinate", defined by the condition
Other.StartX=W.StartX and Other.StartY=W.StartY
and (Other.Y<W.Y or (Other.Y=W.Y and Other.X<W.X))
The intention is for each cell that can be visited starting from (StartX, StartY), to compare against each other cell in the same group, and to find the cell where NO OTHER cell is on a higher row, or if they are on the same row, is to the left of this cell. This still leaves us with too many results, however. Consider just a 2-cell piece at (3,4) and (4,4):
StartX StartY X Y Visited
3 4 3 4 (3,4) ******
3 4 4 4 (3,4)(4,4)
4 4 4 4 (4,4)
4 4 3 4 (4,4)(3,4) ******
2 rows remain with the "first XY coordinate" of (3,4), marked with ******. We only need one row, so we use Row_Number and since we're numbering, we might as well go for the longest Visited path, which would give us as many of the cells within the piece as we can get.
The final outer query simply takes the first rows (RN=1) from each similar (X,Y) group.
To show ALL the cells of each piece, change the line
select X, Y, Visited
in the middle to
select X, Y, (
select distinct '('+right(StartX,3)+','+right(StartY,3)+')'
from Walker
where X=Z.X and Y=Z.Y
for xml path('')
) PieceCells
Which give this output
X Y PieceCells
1 1 (1,1)(2,1)(2,2)(3,2)
3 4 (3,4)(4,4)
5 6 (5,6)
7 5 (7,5)(8,5)(9,5)
8 1 (10,1)(8,1)(8,2)(9,1)(9,2)(9,3)

Ok. Its little bit hard. But in any case, I'm sure that in a simpler way this problem can not be solved.
So we have table:
CREATE Table Tbl1(Id int, X int, Y int)
INSERT INTO Tbl1
SELECT 1,1,1 UNION ALL
SELECT 2,1,2 UNION ALL
SELECT 3,1,8 UNION ALL
SELECT 4,1,9 UNION ALL
SELECT 5,1,10 UNION ALL
SELECT 6,2,2 UNION ALL
SELECT 7,2,3 UNION ALL
SELECT 8,2,8 UNION ALL
SELECT 9,2,9 UNION ALL
SELECT 10,3,9 UNION ALL
SELECT 11,4,3 UNION ALL
SELECT 12,4,4 UNION ALL
SELECT 13,5,7 UNION ALL
SELECT 14,5,8 UNION ALL
SELECT 15,5,9 UNION ALL
SELECT 16,6,5
And here is select query
with cte1 as
/*at first we make recursion to define groups of filled adjacent cells*/
/*as output of cte we have a lot of strings like <X>cell(1)X</X><Y>cell(1)Y</Y>...<X>cell(n)X</X><Y>cell(n)Y</Y>*/
(
SELECT id,X,Y,CAST('<X>'+CAST(X as varchar(10))+'</X><Y>'+CAST(Y as varchar(10))+'</Y>' as varchar(MAX)) info
FROM Tbl1
UNION ALL
SELECT b.id,a.X,a.Y,CAST(b.info + '<X>'+CAST(a.X as varchar(10))+'</X><Y>'+CAST(a.Y as varchar(10))+'</Y>' as varchar(MAX))
FROM Tbl1 a JOIN cte1 b
ON ((((a.X=b.X+1) OR (a.X=b.X-1)) AND a.Y=b.Y) OR (((a.Y=b.Y+1) OR (a.Y=b.Y-1)) AND a.X=b.X))
AND a.id<>b.id
AND
b.info NOT LIKE
('%'+('<X>'+CAST(a.X as varchar(10))+'</X><Y>'+CAST(a.Y as varchar(10))+'</Y>')+'%')
),
cte2 as
/*In this query, we select only the longest sequence of cell connections (first filter)*/
/*And we convert the string to a new standard (x,y | x,y | x,y |...| x,y) (for further separation)*/
(
SELECT *, ROW_NUMBER()OVER(ORDER BY info) cellGroupId
FROM(
SELECT REPLACE(REPLACE(REPLACE(REPLACE(info,'</Y><X>','|'),'</X><Y>',','),'<X>',''),'</Y>','') info
FROM(
SELECT info, MAX(LEN(info))OVER(PARTITION BY id)maxlen FROM cte1
) AS tmpTbl
WHERE maxlen=LEN(info)
)AS tmpTbl
),
cte3 as
/*In this query, we separated strings like (x,y | x,y | x,y |...| x,y) to many (x,y)*/
(
SELECT cellGroupId, CAST(LEFT(XYInfo,CHARINDEX(',',XYInfo)-1) as int) X, CAST(RIGHT(XYInfo,LEN(XYInfo)-CHARINDEX(',',XYInfo)) as int) Y
FROM(
SELECT cellGroupId, tmpTbl2.n.value('.','varchar(MAX)') XYinfo
FROM
(SELECT CAST('<r><c>' + REPLACE(info,'|','</c><c>')+'</c></r>' as XML) n, cellGroupId FROM cte2) AS tmpTbl1
CROSS APPLY n.nodes('/r/c') tmpTbl2(n)
) AS tmpTbl
),
cte4 as
/*In this query, we finally determined group of individual objects*/
(
SELECT cellGroupId,X,Y
FROM(
SELECT cellGroupId,X,Y,ROW_NUMBER()OVER(PARTITION BY X,Y ORDER BY cellGroupId ASC)rn
FROM(
SELECT *,
MAX(SumOfAdjacentCellsByGroup)OVER(PARTITION BY X,Y) Max_SumOfAdjacentCellsByGroup_ByXY /*calculated max value of <the sum of the cells in the group> by each cell*/
FROM(
SELECT *, SUM(1)OVER(PARTITION BY cellGroupId) SumOfAdjacentCellsByGroup /*calculated the sum of the cells in the group*/
FROM cte3
)AS TmpTbl
)AS TmpTbl
/*We got rid of the subgroups (i.e. [(1,2)(2,2)(2,3)] its subgroup of [(1,2)(1,1)(2,2)(2,3)])*/
/*it was second filter*/
WHERE SumOfAdjacentCellsByGroup=Max_SumOfAdjacentCellsByGroup_ByXY
)AS TmpTbl
/*We got rid of the same groups (i.e. [(1,1)(1,2)(2,2)(2,3)] its same as [(1,2)(1,1)(2,2)(2,3)])*/
/*it was third filter*/
WHERE rn=1
)
SELECT X,Y /*result*/
FROM(SELECT a.X,a.Y, ROW_NUMBER()OVER(PARTITION BY cellGroupId ORDER BY id)rn
FROM cte4 a JOIN Tbl1 b ON a.X=b.X AND a.Y=b.Y)a /*connect back*/
WHERE rn=1 /*first XY coordinate*/

Let's assume your coordinates are stored in X,Y form, something like this:
CREATE Table Puzzle(
id int identity, Y int, X int)
INSERT INTO Puzzle VALUES
(1,1),(1,2),(1,8),(1,9),(1,10),
(2,2),(2,3),(2,8),(2,9),
(3,9),
(4,3),(4,4),
(5,7),(5,8),(5,9),
(6,5)
This query then shows your Puzzle in board form (run in TEXT mode in SQL Management Studio)
SELECT (
SELECT (
SELECT CASE WHEN EXISTS (SELECT *
FROM Puzzle T
WHERE T.X=X.X and T.Y=Y.Y)
THEN 'X' ELSE '.' END
FROM (values(0),(1),(2),(3),(4),(5),
(6),(7),(8),(9),(10),(11)) X(X)
ORDER BY X.X
FOR XML PATH('')) + Char(13) + Char(10)
FROM (values(0),(1),(2),(3),(4),(5),(6),(7)) Y(Y)
ORDER BY Y.Y
FOR XML PATH(''), ROOT('a'), TYPE
).value('(/a)[1]','varchar(max)')
It gives you this
............
.XX.....XXX.
..XX....XX..
.........X..
...XX.......
.......XXX..
.....X......
............
This query done in 4 stages will give you the result of the TopLeft cell, if you define it as the Leftmost cell of the TopMost row.
-- the first table expression joins cells together on the Y-axis
;WITH FlattenOnY(Y,XLeft,XRight) AS (
-- start with all pieces
select Y,X,X
from puzzle
UNION ALL
-- keep connecting rightwards from each cell as far as possible
select B.Y,A.XLeft,B.X
from FlattenOnY A
join puzzle B on A.Y=B.Y and A.XRight+1=B.X
)
-- the second table expression flattens the results from the first, so that
-- it represents ALL the start-end blocks on each row of the Y-axis
,YPieces(Y,XLeft,XRight) as (
--
select Y,XLeft,Max(XRight)
from(
select Y,Min(XLeft)XLeft,XRight
from FlattenOnY
group by XRight,Y)Z
group by XLeft,Y
)
-- here, select * from YPieces will return the "blocks" such as
-- Row 1: 1-2 & 8-10
-- Row 2: 2-3 (equals Y,XLeft,XRight of 2,2,3)
-- etc
-- the third expression repeats the first, except it now combines on the X-axis
,FlattenOnX(Y,XLeft,CurPieceXLeft,CurPieceXRight,CurPieceY) AS (
-- start with all pieces
select Y,XLeft,XLeft,XRight,Y
from YPieces
UNION ALL
-- keep connecting rightwards from each cell as far as possible
select A.Y,A.XLeft,B.XLeft,B.XRight,B.Y
from FlattenOnX A
join YPieces B on A.CurPieceY+1=B.Y and A.CurPieceXRight>=B.XLeft and B.XRight>=A.CurPieceXLeft
)
-- and again we repeat the 2nd expression as the 4th, for the final pieces
select Y,XLeft X
from (
select *, rn2=row_number() over (
partition by Y,XLeft
order by CurPieceY desc)
from (
select *, rn=row_number() over (
partition by CurPieceXLeft, CurPieceXRight, CurPieceY
order by Y)
from flattenOnX
) Z1
where rn=1) Z2
where rn2=1
The result being
Y X
----------- -----------
1 1
1 8
4 3
5 7
6 5
Or is your representation in flat form something like this? If it is, give us a shout and I'll redo the solution
create table Puzzle (
row int,
[0] bit, [1] bit, [2] bit, [3] bit, [4] bit, [5] bit,
[6] bit, [7] bit, [8] bit, [9] bit, [10] bit, [11] bit
)
insert Puzzle values
(0,0,0,0,0,0,0,0,0,0,0,0,0),
(1,0,1,1,0,0,0,0,0,1,1,1,0),
(2,0,0,1,1,0,0,0,0,1,1,0,0),
(3,0,0,0,0,0,0,0,0,0,1,0,0),
(4,0,0,0,1,1,0,0,0,0,0,0,0),
(5,0,0,0,0,0,0,0,1,1,1,0,0),
(6,0,0,0,0,0,1,0,0,0,0,0,0),
(7,0,0,0,0,0,0,0,0,0,0,0,0)

Related

How do I break a range into homogeneous sub-ranges in PostgreSQL?

Say I have a table like this
WITH conds(cond) AS (
SELECT '[3, 5)'::int4range
UNION
SELECT '[6, 8)'::int4range
UNION
SELECT '[9, 20)'::int4range
)
SELECT cond FROM conds;
For a given input range, I want to break it into homogeneous sub-ranges which either are entirely contained in some row in conds, or do not overlap with any row in conds. There should be an additional column indicating whether each sub-range is covered by conds.
More concretely, for an input period of '[1, 11)'::int4range, the expected output is
?column? | ?column?
-----------+----------
[1,3) | f
[3,5) | t
[5,6) | f
[6,8) | t
[8,9) | f
[9,11) | t
(6 rows)
Every two rows in conds are guaranteed to be disjoint, but conds may also be empty (in which case the output is just the input range and f), and each cond may overlap with the bound of the input range (as shown in the example above).
Which query can achieve this? This answer tells me how to handle the case where cond only has one row, but it may contain multiple rows for me.
You can use a brute force approach -- expand the desired range into individual elements. Check each of those, and then aggregate back down to ranges:
WITH conds(cond) AS (
SELECT '[3, 5)'::int4range
UNION ALL
SELECT '[6, 8)'::int4range
UNION ALL
SELECT '[9, 20)'::int4range
)
SELECT int4range(min(r.val), max(r.val) + 1), flag
FROM (SELECT gs.val, (c.cond IS NULL) as flag,
ROW_NUMBER() OVER (PARTITION BY c.cond IS NULL ORDER BY gs.val) as seqnum
FROM (VALUES ('[1, 11)'::int4range)) v(range) CROSS JOIN
generate_series(lower(v.range), upper(v.range), 1) gs(val) LEFT JOIN
conds c
ON gs.val <# c.cond
) r
GROUP BY flag, r.val - seqnum
ORDER BY min(r.val);
Here is a db<>fiddle.
You can also generate the covered and uncovered subranges separately, fusion them together with UNION, and give them the correct order with ORDER BY
WITH conds(cond) AS (
SELECT '[3, 5)'::int4range
UNION
SELECT '[6, 8)'::int4range
UNION
SELECT '[9, 20)'::int4range
),
intersections(subrange) AS (
SELECT cond * '[1, 11)'::int4range
FROM conds
WHERE cond && '[1, 11)'::int4range
),
fusion(s, covered) AS (
SELECT int4range(LAG(UPPER(subrange), 1, LOWER('[1, 11)'::int4range)) OVER (ORDER BY LOWER(subrange)),
LOWER(subrange)),
false
FROM intersections
UNION
SELECT subrange,
true
FROM intersections
)
SELECT s, covered
FROM fusion
ORDER BY LOWER(s)

Count Similar Substrings SQL query

I've tried a few scenarios and googled a lot, but still can't find a solution.
I have a table of user names with entries something like the below:
UserName
Cakes420
18Jack01
18Jack04
16Jack22
22Jack16
Mapple7609
Chrom44
chrom22
chrom77
013Cake
016Cake
122Cake
123Cake87
So I need a query that checks for all records that share 4 or more (in sequence) characters in the table.
So I need to return something like :
Characters
Times Used
Names Sharing
Cake
5
Cakes420, 013Cake, 016Cake, 122Cake, 123Cake87
Chro
3
Chrom44, chrom22, chrom77
or anything similar as I'd prefer not to repeat patterns, but hey, at this stage if it returns the values properly, I don't mind.
The shared characters can naturally appear in any place in the string, which is what makes this so difficult.
Should you do this in T-SQL? Probably not.
Can you do this in T-SQL? Yes.
Sample data
create table Names
(
Name nvarchar(20)
);
insert into Names (Name) values
('Cakes420'),
('18Jack01'),
('18Jack04'),
('16Jack22'),
('22Jack16'),
('Mapple7609'),
('Chrom44'),
('chrom22'),
('chrom77'),
('013Cake'),
('016Cake'),
('122Cake'),
('123Cake87');
Solution
Using STRING_AGG() for easy concatenation. Available from SQL Server 2017. Alternatives available for older SQL versions (use the search box on this site, there are many examples).
with rcte as
(
select n.Name,
convert(nvarchar(4), substring(n.Name, 1, 4)) as Part,
1 as PartFrom
from Names n
where len(n.Name) >= 4
union all
select r.Name,
convert(nvarchar(4), substring(r.Name, r.PartFrom+1, r.PartFrom+4)),
r.PartFrom+1
from rcte r
where len(r.Name) >= r.PartFrom+4
),
cte_count as
(
select r.Part,
count(1) as PartCount
from rcte r
where r.Part not like '%[0-9]%' -- exclude parts with numbers in them
group by r.Part
having count(1) > 1
)
select c.Part,
c.PartCount,
string_agg(r.Name, ', ') as Names
from cte_count c
join rcte r
on r.Part = c.Part
group by c.Part,
c.PartCount
order by c.Part;
Result
Part PartCount Names
---- --------- ----------------------------------------------
Cake 5 Cakes420, 123Cake87, 122Cake, 016Cake, 013Cake
Chro 3 Chrom44, chrom22, chrom77
hrom 3 chrom77, chrom22, Chrom44
Jack 4 22Jack16, 16Jack22, 18Jack04, 18Jack01
Fiddle to see it in action with the intermediate CTE results.
Let's use Itzik Ben-Gan's Tally Function to break out a list of substrings, then group them. This is called N-Gram, after the more common Trigram which is 3-character substrings.
I've removed one extra cross-join from the function to speed it up slightly, it's now good for up to varchar(65536):
CREATE OR ALTER FUNCTION dbo.GetNums(#num AS BIGINT)
RETURNS TABLE
AS
RETURN
WITH
L0 AS ( SELECT 1 AS c
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1)) AS D(c) ),
L1 AS ( SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B ),
L2 AS ( SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B ),
Nums AS ( SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rownum
FROM L2 )
SELECT TOP(#num)
rownum AS rn
FROM Nums
ORDER BY rownum;
GO
DECLARE #substringLen int = 4;
SELECT
Characters,
[Times Used] = COUNT(*),
[Names Sharing] = STRING_AGG(Username, ', ')
FROM (
SELECT DISTINCT
-- remove DISTINCT if you want to know about multiple in a single username
t.Username,
Characters = SUBSTRING(t.Username, n.rn, #substringLen)
FROM myTable t
CROSS APPLY dbo.GetNums (LEN(t.UserName) - #substringLen + 1) n
) t
GROUP BY t.Characters
HAVING COUNT(*) > 1

SQL unique combinations

I have a table with three columns with an ID, a therapeutic class, and then a generic name. A therapeutic class can be mapped to multiple generic names.
ID therapeutic_class generic_name
1 YG4 insulin
1 CJ6 maleate
1 MG9 glargine
2 C4C diaoxy
2 KR3 supplies
3 YG4 insuilin
3 CJ6 maleate
3 MG9 glargine
I need to first look at the individual combinations of therapeutic class and generic name and then want to count how many patients have the same combination. I want my output to have three columns: one being the combo of generic names, the combo of therapeutic classes and the count of the number of patients with the combination like this:
Count Combination_generic combination_therapeutic
2 insulin, maleate, glargine YG4, CJ6, MG9
1 supplies, diaoxy C4C, KR3
One way to match patients by the sets of pairs (therapeutic_class, generic_name) is to create the comma-separated strings in your desired output, and to group by them and count. To do this right, you need a way to identify the pairs. See my Comment under the original question and my Comments to Gordon's Answer to understand some of the issues.
I do this identification in some preliminary work in the solution below. As I mentioned in my Comment, it would be better if the pairs and unique ID's existed already in your data model; I create them on the fly.
Important note: This assumes the comma-separated lists don't become too long. If you exceed 4000 characters (or approx. 32000 characters in Oracle 12, with certain options turned on), you CAN aggregate the strings into CLOBs, but you CAN'T GROUP BY CLOBs (in general, not just in this case), so this approach will fail. A more robust approach is to match the sets of pairs, not some aggregation of them. The solution is more complicated, I will not cover it unless it is needed in your problem.
with
-- Begin simulated data (not part of the solution)
test_data ( id, therapeutic_class, generic_name ) as (
select 1, 'GY6', 'insulin' from dual union all
select 1, 'MH4', 'maleate' from dual union all
select 1, 'KJ*', 'glargine' from dual union all
select 2, 'GY6', 'supplies' from dual union all
select 2, 'C4C', 'diaoxy' from dual union all
select 3, 'GY6', 'insulin' from dual union all
select 3, 'MH4', 'maleate' from dual union all
select 3, 'KJ*', 'glargine' from dual
),
-- End of simulated data (for testing purposes only).
-- SQL query solution continues BELOW THIS LINE
valid_pairs ( pair_id, therapeutic_class, generic_name ) as (
select rownum, therapeutic_class, generic_name
from (
select distinct therapeutic_class, generic_name
from test_data
)
),
first_agg ( id, tc_list, gn_list ) as (
select t.id,
listagg(p.therapeutic_class, ',') within group (order by p.pair_id),
listagg(p.generic_name , ',') within group (order by p.pair_id)
from test_data t join valid_pairs p
on t.therapeutic_class = p.therapeutic_class
and t.generic_name = p.generic_name
group by t.id
)
select count(*) as cnt, tc_list, gn_list
from first_agg
group by tc_list, gn_list
;
Output:
CNT TC_LIST GN_LIST
--- ------------------ ------------------------------
1 GY6,C4C supplies,diaoxy
2 GY6,KJ*,MH4 insulin,glargine,maleate
You are looking for listagg() and then another aggregation. I think:
select therapeutics, generics, count(*)
from (select id, listagg(therapeutic_class, ', ') within group (order by therapeutic_class) as therapeutics,
listagg(generic_name, ', ') within group (order by generic_name) as generics
from t
group by id
) t
group by therapeutics, generics;

Finding nearest dates in SQL

I know that there are some threads on this subject, however, my query is slightly different to what I've seen and the solutions presented before don't seem to be working for me.
I have two tables, X and Y, here simplified to one ID, in fact of course I have multiple IDs. The period category lasts from the Date given to the beginning of the next period.
ID Date Period
A 12/01/2010 1
A 12/03/2010 2
A 15/06/2010 3
A 17/08/2010 4
A 20/10/2010 5
and
ID SampleDate
A 20/01/2010
A 25/01/2010
A 21/11/2010
What I need to get is:
ID SampleDate Period
A 20/01/2010 1
A 25/01/2010 1
A 21/11/2010 5
I've tried this:
with cte as
(
select
Y.ID,
Y.sampleDate,
X.Period,
ROW_NUMBER() over (PARTITION by Y.ID, Y.sampleDate order by DATEDIFF(day,X.Date, Y.sampleDate)) as DaysSince
from X
left join Y
on X.ID=Y.ID
)
select ID,
sampleDate,
Period
from cte
where DaysSince=1
This produces the correct size of the table, but instead of giving the perspective periods for the samples, it just prints out the top period number for all of them (for a given ID).
Any idea where I'm making a mistake?
There is nothing in your query that removes entries with negative datediff, so if you add that to the join:
with cte as
(
select
Y.ID,
Y.sampleDate,
X.Period,
ROW_NUMBER() over (PARTITION by Y.ID, Y.sampleDate order by DATEDIFF(day,X.Date, Y.sampleDate)) as DaysSince
from X
left join Y
on X.ID=Y.ID and X.Date < Y.sampleDate /* skip periods after the one we're interested in */
)
select ID,
sampleDate,
Period
from cte
where DaysSince=1

Sorting twice on same column

I'm having a bit of a weird question, given to me by a client.
He has a list of data, with a date between parentheses like so:
Foo (14/08/2012)
Bar (15/08/2012)
Bar (16/09/2012)
Xyz (20/10/2012)
However, he wants the list to be displayed as follows:
Foo (14/08/2012)
Bar (16/09/2012)
Bar (15/08/2012)
Foot (20/10/2012)
(notice that the second Bar has moved up one position)
So, the logic behind it is, that the list has to be sorted by date ascending, EXCEPT when two rows have the same name ('Bar'). If they have the same name, it must be sorted with the LATEST date at the top, while staying in the other sorting order.
Is this even remotely possible? I've experimented with a lot of ORDER BY clauses, but couldn't find the right one. Does anyone have an idea?
I should have specified that this data comes from a table in a sql server database (the Name and the date are in two different columns). So I'm looking for a SQL-query that can do the sorting I want.
(I've dumbed this example down quite a bit, so if you need more context, don't hesitate to ask)
This works, I think
declare #t table (data varchar(50), date datetime)
insert #t
values
('Foo','2012-08-14'),
('Bar','2012-08-15'),
('Bar','2012-09-16'),
('Xyz','2012-10-20')
select t.*
from #t t
inner join (select data, COUNT(*) cg, MAX(date) as mg from #t group by data) tc
on t.data = tc.data
order by case when cg>1 then mg else date end, date desc
produces
data date
---------- -----------------------
Foo 2012-08-14 00:00:00.000
Bar 2012-09-16 00:00:00.000
Bar 2012-08-15 00:00:00.000
Xyz 2012-10-20 00:00:00.000
A way with better performance than any of the other posted answers is to just do it entirely with an ORDER BY and not a JOIN or using CTE:
DECLARE #t TABLE (myData varchar(50), myDate datetime)
INSERT INTO #t VALUES
('Foo','2012-08-14'),
('Bar','2012-08-15'),
('Bar','2012-09-16'),
('Xyz','2012-10-20')
SELECT *
FROM #t t1
ORDER BY (SELECT MIN(t2.myDate) FROM #t t2 WHERE t2.myData = t1.myData), T1.myDate DESC
This does exactly what you request and will work with any indexes and much better with larger amounts of data than any of the other answers.
Additionally it's much more clear what you're actually trying to do here, rather than masking the real logic with the complexity of a join and checking the count of joined items.
This one uses analytic functions to perform the sort, it only requires one SELECT from your table.
The inner query finds gaps, where the name changes. These gaps are used to identify groups in the next query, and the outer query does the final sorting by these groups.
I have tried it here (SQL Fiddle) with extended test-data.
SELECT name, dat
FROM (
SELECT name, dat, SUM(gap) over(ORDER BY dat, name) AS grp
FROM (
SELECT name, dat,
CASE WHEN LAG(name) OVER (ORDER BY dat, name) = name THEN 0 ELSE 1 END AS gap
FROM t
) x
) y
ORDER BY grp, dat DESC
Extended test-data
('Bar','2012-08-12'),
('Bar','2012-08-11'),
('Foo','2012-08-14'),
('Bar','2012-08-15'),
('Bar','2012-08-16'),
('Bar','2012-09-17'),
('Xyz','2012-10-20')
Result
Bar 2012-08-12
Bar 2012-08-11
Foo 2012-08-14
Bar 2012-09-17
Bar 2012-08-16
Bar 2012-08-15
Xyz 2012-10-20
I think that this works, including the case I asked about in the comments:
declare #t table (data varchar(50), [date] datetime)
insert #t
values
('Foo','20120814'),
('Bar','20120815'),
('Bar','20120916'),
('Xyz','20121020')
; With OuterSort as (
select *,ROW_NUMBER() OVER (ORDER BY [date] asc) as rn from #t
)
--Now we need to find contiguous ranges of the same data value, and the min and max row number for such a range
, Islands as (
select data,rn as rnMin,rn as rnMax from OuterSort os where not exists (select * from OuterSort os2 where os2.data = os.data and os2.rn = os.rn - 1)
union all
select i.data,rnMin,os.rn
from
Islands i
inner join
OuterSort os
on
i.data = os.data and
i.rnMax = os.rn-1
), FullIslands as (
select
data,rnMin,MAX(rnMax) as rnMax
from Islands
group by data,rnMin
)
select
*
from
OuterSort os
inner join
FullIslands fi
on
os.rn between fi.rnMin and fi.rnMax
order by
fi.rnMin asc,os.rn desc
It works by first computing the initial ordering in the OuterSort CTE. Then, using two CTEs (Islands and FullIslands), we compute the parts of that ordering in which the same data value appears in adjacent rows. Having done that, we can compute the final ordering by any value that all adjacent values will have (such as the lowest row number of the "island" that they belong to), and then within an "island", we use the reverse of the originally computed sort order.
Note that this may, though, not be too efficient for large data sets. On the sample data it shows up as requiring 4 table scans of the base table, as well as a spool.
Try something like...
ORDER BY CASE date
WHEN '14/08/2012' THEN 1
WHEN '16/09/2012' THEN 2
WHEN '15/08/2012' THEN 3
WHEN '20/10/2012' THEN 4
END
In MySQL, you can do:
ORDER BY FIELD(date, '14/08/2012', '16/09/2012', '15/08/2012', '20/10/2012')
In Postgres, you can create a function FIELD and do:
CREATE OR REPLACE FUNCTION field(anyelement, anyarray) RETURNS numeric AS $$
SELECT
COALESCE((SELECT i
FROM generate_series(1, array_upper($2, 1)) gs(i)
WHERE $2[i] = $1),
0);
$$ LANGUAGE SQL STABLE
If you do not want to use the CASE, you can try to find an implementation of the FIELD function to SQL Server.