SQL question how do I select rows with column condition values - sql

Say I have this table:
id timeline
---|--------|
1 | BASELINE |
1 | MIDTIME |
1 | ENDTIME |
2 | BASELINE |
2 | MIDTIME |
3 | BASELINE |
4 | BASELINE |
5 | BASELINE |
5 | MIDTIME |
5 | ENDTIME |
How do I get the output so that it looks like this:
id timeline
---|--------|
1 | BASELINE |
1 | MIDTIME |
2 | BASELINE |
2 | MIDTIME |
5 | BASELINE |
5 | MIDTIME |
How do I select the first two terms of each ID which has 2 timeline values?
Many Thanks

Here is a sample. I used ROW_NUMBER() function.
Hope to help, my friend :))
CREATE TABLE TestData(id int, timeline varchar(20))
insert into TestData
values (1, 'BaseLine'),
(1, 'Midtime'),
(1, 'Endtime'),
(2, 'BaseLine'),
(2, 'Midtime'),
(3, 'BaseLine'),
(4, 'BaseLine'),
(5, 'BaseLine'),
(5, 'Midtime'),
(5, 'Endtime')
-----
SELECT id, timeline
FROM (SELECT id, timeline,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY (SELECT NULL)) AS rownum
FROM TestData
WHERE
id IN (
SELECT id FROM TestData group by id having count(id)>=2
)
) as T
WHERE rownum < 3

-- assume your table name is t
with d as (
select id
,timeline
,row_number() over(partition by id
order by case when timeline = 'BASELINE' then 1
when timeline = 'MIDTIME' then 2
when timeline = 'ENDTIME' then 3
end
) seq
,count(0) over(partition by id) cnt
from t
)
select id
,timeline
from d
where cnt >= 2
and seq <= 2
order
by id
,seq
If you run the query against your test data, you will get the exact result as you listed.
If you really want "select the first two terms of each ID which has more than 2 timeline values", you need to change "cnt >= 2" to "cnt > 2".

Related

How to create a field which shows a custom order based on two fields?

I am trying to create a field which shows an order based on two columns. I have one column with a code in and one with a date. There are many dates for each code, but I am trying to pick out the latest date for each code. The table below shows the two columns I have and the order column that I need to create.
code date order column
1 10/04/22 3
1 11/04/22 2
1 14/05/22 1
2 10/04/22 2
2 15/04/22 1
3 11/04/22 1
4 12/04/22 2
4 16/04/22 1
5 15/04/22 2
5 17/04/22 1
As Larnu and Sean have already stated, Row_number is your friend here.
Start with the data:
CREATE TABLE #Table (code int, date date)
INSERT INTO #table
VALUES
(1, '04/10/22')
,(1, '04/11/22')
,(1, '05/14/22')
,(2, '04/10/22')
,(2, '04/15/22')
,(3, '04/11/22')
,(4, '04/12/22')
,(4, '04/16/22')
,(5, '04/15/22')
,(5, '04/17/22');
Then we write the query with the row numbers. The magic here is in the partition by/order by. That partitions your data based on the code, so it takes the three 1s and puts the dates in descending order. It then numbers them 1, 2, 3 with the latest date being number 1. Then it does code 2...
SELECT code
, date
, ROW_NUMBER() OVER (PARTITION BY code ORDER BY date desc) rn
FROM #table
GROUP BY code, date
ORDER BY code asc, date asc;
And that gets us the result you asked for:
|code | date | rn |
|:----|:---------|:----|
| 1 |2022-04-10| 3 |
| 1 |2022-04-11| 2 |
| 1 |2022-05-14| 1 |
| 2 |2022-04-10| 2 |
| 2 |2022-04-15| 1 |
| 3 |2022-04-11| 1 |
| 4 |2022-04-12| 2 |
| 4 |2022-04-16| 1 |
| 5 |2022-04-15| 2 |
| 5 |2022-04-17| 1 |
And then if you only want the max date for each code... keep only the ones where row number equals 1.
WITH CTE AS
(SELECT code
, date
, ROW_NUMBER() OVER (PARTITION BY code ORDER BY date desc) rn
FROM #table
)
SELECT code
, date
FROM CTE
WHERE rn = 1
ORDER BY code
| code | date |
|:-----|:---------|
| 1 |2022-05-14|
| 2 |2022-04-15|
| 3 |2022-04-11|
| 4 |2022-04-16|
| 5 |2022-04-17|

SQL How to filter table with values having more than one unique value of another column

I have data table Customers that looks like this:
ID | Sequence No |
1 | 1 |
1 | 2 |
1 | 3 |
2 | 1 |
2 | 1 |
2 | 1 |
3 | 1 |
3 | 2 |
I would like to filter the table so that only IDs with more than 1 distinct count of Sequence No remain.
Expected output:
ID | Sequence No |
1 | 1 |
1 | 2 |
1 | 3 |
3 | 1 |
3 | 2 |
I tried
select ID, Sequence No
from Customers
where count(distinct Sequence No) > 1
order by ID
but I'm getting error. How to solve this?
You can get the desired result by using the below query. This is similar to what you were trying -
Sample Table & Data
Declare #Data table
(Id int, [Sequence No] int)
Insert into #Data
values
(1 , 1 ),
(1 , 2 ),
(1 , 3 ),
(2 , 1 ),
(2 , 1 ),
(2 , 1 ),
(3 , 1 ),
(3 , 2 )
Query
Select * from #Data
where ID in(
select ID
from #Data
Group by ID
Having count(distinct [Sequence No]) > 1
)
Using analytic functions, we can try:
WITH cte AS (
SELECT *, MIN([Sequence No]) OVER (PARTITION BY ID) min_seq,
MAX([Sequence No]) OVER (PARTITION BY ID) max_seq
FROM Customers
)
SELECT ID, [Sequence No]
FROM cte
WHERE min_seq <> max_seq
ORDER BY ID, [Sequence No];
Demo
We are checking for a distinct count of sequence number by asserting that the minimum and maximum sequence numbers are not the same for a given ID. The above query could benefit from the following index:
CREATE INDEX idx ON Customers (ID, [Sequence No]);
This would let the min and max values be looked up faster.

Grouping by column and rows

I have a table like this:
+----+--------------+--------+----------+
| id | name | weight | some_key |
+----+--------------+--------+----------+
| 1 | strawberries | 12 | 1 |
| 2 | blueberries | 7 | 1 |
| 3 | elderberries | 0 | 1 |
| 4 | cranberries | 8 | 2 |
| 5 | raspberries | 18 | 2 |
+----+--------------+--------+----------+
I'm looking for a generic request that would get me all berries where there are three entries with the same 'some_key' and one of the entries (within those three entries belonging to the same some_key) has the weight = 0
in case of the sample table, expected output would be:
1 strawberries
2 blueberries
3 cranberries
As you want to include non-grouped columns, I would approach this with window functions:
select id, name
from (
select id,
name,
count(*) over w as key_count,
count(*) filter (where weight = 0) over w as num_zero_weight
from fruits
window w as (partition by some_key)
) x
where x.key_count = 3
and x.num_zero_weight >= 1
The count(*) over w counts the number of rows in that group (= partition) and the count(*) filter (where weight = 0) over w counts how many of those have a weight of zero.
The window w as ... avoids repeating the same partition by clause for the window functions.
Online example: https://rextester.com/SGWFI49589
Try this-
SELECT some_key,
SUM(weight) --Sample aggregations on column
FROM your_table
GROUP BY some_key
HAVING COUNT(*) = 3 -- If you wants at least 3 then use >=3
AND SUM(CASE WHEN weight = 0 THEN 1 ELSE 0 END) >= 1
As per your edited question, you can try this below-
SELECT id, name
FROM your_table
WHERE some_key IN (
SELECT some_key
FROM your_table
GROUP BY some_key
HAVING COUNT(*) = 3 -- If you wants at least 3 then use >=3
AND SUM(CASE WHEN weight = 0 THEN 1 ELSE 0 END) >= 1
)
Try doing this.
Table structure and sample data
CREATE TABLE tmp (
id int,
name varchar(50),
weight int,
some_key int
);
INSERT INTO tmp
VALUES
('1', 'strawberries', '12', '1'),
('2', 'blueberries', '7', '1'),
('3', 'elderberries', '0', '1'),
('4', 'cranberries', '8', '2'),
('5', 'raspberries', '18', '2');
Query
SELECT t1.*
FROM tmp t1
INNER JOIN (SELECT some_key
FROM tmp
GROUP BY some_key
HAVING Count(some_key) >= 3
AND Min(Abs(weight)) = 0) t2
ON t1.some_key = t2.some_key;
Output
+-----+---------------+---------+----------+
| id | name | weight | some_key |
+-----+---------------+---------+----------+
| 1 | strawberries | 12 | 1 |
| 2 | blueberries | 7 | 1 |
| 3 | elderberries | 0 | 1 |
+-----+---------------+---------+----------+
Online Demo: http://sqlfiddle.com/#!15/70cca/26/0
Thank you, #mkRabbani for reminding me about the negative values.
Further reading
- ABS() Function - Link01, Link02
- HAVING Clause - Link01, Link02

Average of Days between ordered dates per group

+-------+-------+-----------+
| EmpID | PerID | VisitDate |
+-------+-------+-----------+
| 1 | 22 | 2/24/2017 |
| 1 | 22 | 3/25/2017 |
| 1 | 22 | 4/5/2017 |
| 2 | 33 | 5/6/2017 |
| 2 | 33 | 8/9/2017 |
| 2 | 33 | 6/7/2017 |
+-------+-------+-----------+
I am trying to find the latest visit date and average days between visits per EmpID. For Avg, I'll first have to order the days consecutively and then find the average.
Eg: Avg. days for EmpID=1 and PerID=22 would be [29(Days between 3/25 and 2/24) + 11 (Days between 3/25 and 4/5)/2] = 20 Days.
Desired Output:
+-------+-------+----------+----------+
| EmpID | PerID | MaxVDate | AvgVDays |
+-------+-------+----------+----------+
| 1 | 22 | 4/5/2017 | 20 |
| 2 | 33 | 8/9/2017 | 47.5 |
+-------+-------+----------+----------+
Attempt:
SELECT
EmpID
,PerID
,MAX(VisitDate) AS MaxVDate
,--Dunno how to find average AS AvgVDays
FROM
T1
GROUP BY
EmpID
,PerID
You can use lag to get the previous date and compute the date difference. Then use avg window function to get the average days.
Select distinct empid,perid,maxVdate,avg(diff_with_prev) OVER(Partition by empid) as avgVDays
from (
SELECT EmpID,PerID
,MAX(VisitDate) OVER(Partition BY EmpID) AS MaxVDate
,DATEDIFF(DAY,LAG(VisitDate) OVER(Partition BY EmpID order by VisitDate), VisitDate) as diff_with_prev
FROM T1
) t
Here's an option...
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
DROP TABLE #TestData;
CREATE TABLE #TestData (
EmpID INT NOT NULL,
PerID INT NOT NULL,
VisitDate DATE NOT NULL
);
INSERT #TestData (EmpID, PerID, VisitDate) VALUES
(1, 22, '2/24/2017'),
(1, 22, '3/25/2017'),
(1, 22, '4/5/2017'),
(2, 33, '5/6/2017'),
(2, 33, '8/9/2017'),
(2, 33, '6/7/2017');
-- SELECT * FROM #TestData td;
SELECT
db.EmpID,
db.PerID,
AvgDays = AVG(db.DaysBetween * 1.0)
FROM (
SELECT
*,
DaysBetween = DATEDIFF(dd, LAG(td.VisitDate, 1) OVER (PARTITION BY td.EmpID, td.PerID ORDER BY td.VisitDate), td.VisitDate)
FROM
#TestData td
) db
GROUP BY
db.EmpID,
db.PerID;
Results...
EmpID PerID AvgDays
----------- ----------- ---------------------------------------
1 22 20.000000
2 33 47.500000
The task is much easier than you think. You get the average with (last visit - first visit) / (count visits - 1).
select
empid,
perid,
max(VisitDate) as MaxVDate,
datediff(day, min(VisitDate), max(VisitDate)) * 1.0 / (count(*) - 1) as avgvdays
from mytable
group by empid, perid
having count(*) > 1
order by empid, perid;
The multiplication with 1.0 is necessary in order to avoid integer division. (You could also cast to decimal instead.)
As the calcualtion only makes sense for empid/perid pairs with more than one entry (and in order to avoid division by zero), I have applied an according HAVING clause.
Here is a test: http://rextester.com/AIFPA62612

Selecting row with highest ID based on another column

In SQL Server 2008 R2, suppose I have a table layout like this...
+----------+---------+-------------+
| UniqueID | GroupID | Title |
+----------+---------+-------------+
| 1 | 1 | TEST 1 |
| 2 | 1 | TEST 2 |
| 3 | 3 | TEST 3 |
| 4 | 3 | TEST 4 |
| 5 | 5 | TEST 5 |
| 6 | 6 | TEST 6 |
| 7 | 6 | TEST 7 |
| 8 | 6 | TEST 8 |
+----------+---------+-------------+
Is it possible to select every row with the highest UniqueID number, for each GroupID. So according to the table above - if I ran the query, I would expect this...
+----------+---------+-------------+
| UniqueID | GroupID | Title |
+----------+---------+-------------+
| 2 | 1 | TEST 2 |
| 4 | 3 | TEST 4 |
| 5 | 5 | TEST 5 |
| 8 | 6 | TEST 8 |
+----------+---------+-------------+
Been chomping on this for a while, but can't seem to crack it.
Many thanks,
SELECT *
FROM (SELECT uniqueid, groupid, title,
Row_number()
OVER ( partition BY groupid ORDER BY uniqueid DESC) AS rn
FROM table) a
WHERE a.rn = 1
With SQL-Server as rdbms you can use a ranking function like ROW_NUMBER:
WITH CTE AS
(
SELECT UniqueID, GroupID, Title,
RN = ROW_NUMBER() OVER (PARTITON BY GroupID
ORDER BY UniqueID DESC)
FROM dbo.TableName
)
SELECT UniqueID, GroupID, Title
FROM CTE
WHERE RN = 1
This returns exactly one record for each GroupID even if there are multiple rows with the highest UniqueID (the name does not suggest so). If you want to return all rows in then use DENSE_RANK instead of ROW_NUMBER.
Here you can see all functions and how they work: http://technet.microsoft.com/en-us/library/ms189798.aspx
Since you have not mentioned any RDBMS, this statement below will work on almost all RDBMS. The purpose of the subquery is to get the greatest uniqueID for every GROUPID. To be able to get the other columns, the result of the subquery is joined on the original table.
SELECT a.*
FROM tableName a
INNER JOIN
(
SELECT GroupID, MAX(uniqueID) uniqueID
FROM tableName
GROUP By GroupID
) b ON a.GroupID = b.GroupID
AND a.uniqueID = b.uniqueID
In the case that your RDBMS supports Qnalytic functions, you can use ROW_NUMBER()
SELECT uniqueid, groupid, title
FROM
(
SELECT uniqueid, groupid, title,
ROW_NUMBER() OVER (PARTITION BY groupid
ORDER BY uniqueid DESC) rn
FROM tableName
) x
WHERE x.rn = 1
TSQL Ranking Functions
The ROW_NUMBER() generates sequential number which you can filter out. In this case the sequential number is generated on groupid and sorted by uniqueid in descending order. The greatest uniqueid will have a value of 1 in rn.
SELECT *
FROM the_table tt
WHERE NOT EXISTS (
SELECT *
FROM the_table nx
WHERE nx.GroupID = tt.GroupID
AND nx.UniqueID > tt.UniqueID
)
;
Should work in any DBMS (no window functions or CTEs are needed)
is probably faster than a sub query with an aggregate
Keeping it simple:
select * from test2
where UniqueID in (select max(UniqueID) from test2 group by GroupID)
Considering:
create table test2
(
UniqueID numeric,
GroupID numeric,
Title varchar(100)
)
insert into test2 values(1,1,'TEST 1')
insert into test2 values(2,1,'TEST 2')
insert into test2 values(3,3,'TEST 3')
insert into test2 values(4,3,'TEST 4')
insert into test2 values(5,5,'TEST 5')
insert into test2 values(6,6,'TEST 6')
insert into test2 values(7,6,'TEST 7')
insert into test2 values(8,6,'TEST 8')