group rows in plain sql

group rows in plain sql - sql

I have a Table with columns Date and Number, like so:
date Number
1-1-2012 1
1-2-2012 1
1-3-2012 2
1-4-2012 1
I want to make a sql query that groups the rows with the same Number and take the minimum date. The grouping only may occur when the value iof Number is the same as previous / next row. So the rsult is
date Number
1-1-2012 1
1-3-2012 2
1-4-2012 1

try this:
WITH CTE AS(
SELECT * ,ROW_NUMBER() OVER (ORDER BY [DATE] ) -
ROW_NUMBER() OVER (PARTITION BY NUMBER ORDER BY [DATE] ) AS ROW_NUM
FROM TABLE1)
SELECT NUMBER,MIN(DATE) AS DATE
FROM CTE
GROUP BY ROW_NUM,NUMBER
ORDER BY DATE
SQL fiddle demo

SELECT Number, MIN(date)
FROM table
GROUP BY Number
ORDER BY Number
since you requirement is a bit more specific, how about this? I have not checked it myself, but something that might work, considering you requirement..
SELECT date, Number FROM (
SELECT Number,
(SELECT MIN(date) FROM #table t2 WHERE t1.date <> t2.date AND t1.Number = t2.Number) AS date
FROM table t1
) AS a
GROUP BY number, date

Related

Counting ID's for correct creation date time

I need to get the number of user ID's for each month, but they should only be counted for the month if the user's minimum month falls within that month.
So if customer A had a min(day) of 04/18 then for month and year, they would be counted.
My table looks like:
monthyear | id
02/18 A32
04/19 T39
05/19 T39
04/19 Y95
01/18 A32
12/19 I99
11/18 OPT
09/19 TT8
I was doing something like:
SELECT day, id
SUM(CASE WHEN month = min(day) THEN 1 ELSE 0)
FROM testtable
GROUP BY 1
But I'm not sure how to specify that for each user ID, so only user ID = 1, when their min(Day) = day
Goal table to be:
monthyear | count
01/18 1
02/18 0
11/18 1
04/19 2
05/19 0
09/19 1
12/19 1

Use window functions. Let me assume that your monthyear is really yearmonth, so it sorts correctly:
SELECT yearmonth, COUNT(*) as numstarts
FROM (SELECT tt.*, ROW_NUMBER() OVER (PARTITION BY id ORDER BY yearmonth) as seqnum
FROM testtable tt
) tt
WHERE seqnum = 1
GROUP BY yearmonth;
If you do have the absurd format of month-year, then you can use string manipulations. These depend on the database, but something like this:
SELECT yearmonth, COUNT(*) as numstarts
FROM (SELECT tt.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY RIGHT(monthyear, 2), LEFT(monthyear, 2) as seqnum
FROM testtable tt
) tt
WHERE seqnum = 1
GROUP BY yearmonth;

I assumed that you have a column that's a date (use of min() is necessary). You can do it by selecting a minimal date(subquery t2) for each id and then count only these rows that connect throught left join, so if there is no connection you will get zeros for these dates or monthyear as you have in your data.
select
monthyear
,count(t2.id) as cnt
from testtable t1
left join (
select
min(date) as date
,id
from testtable
group by id
) t2
on t2.date = t1.date
and t2.id = t1.id
group by monthyear

You are looking for the number of new users each month, yes?
Here is one way to do it.
Note that I had to use TO_DATE and TO_CHAR to make sure the month/year text strings sorted correctly. If you use real DATE columns that would be unnecessary.
An additional complexity was adding the empty months in (months with zero new users). Optimally that would not be done by using a SELECT DISTINCT on the base table to get all months.
create table x (
monthyear varchar2(20),
id varchar2(10)
);
insert into x values('02/18', 'A32');
insert into x values('04/19', 'T39');
insert into x values('05/19', 'T39');
insert into x values('04/19', 'Y95');
insert into x values('01/18', 'A32');
insert into x values('12/19', 'I99');
insert into x values('11/18', 'OPT');
insert into x values('09/19', 'TT8');
And the query:
with allmonths as(
select distinct monthyear from x
),
firstmonths as(
select id, to_char(min(to_date(monthyear, 'MM/YY')),'MM/YY') monthyear from x group by id
),
firstmonthcounts as(
select monthyear, count(*) cnt
from firstmonths group by monthyear
)
select am.monthyear, nvl(fmc.cnt, 0) as newusers
from allmonths am left join firstmonthcounts fmc on am.monthyear = fmc.monthyear
order by to_date(monthyear, 'MM/YY');

SQL - Filtering by closest date [duplicate]

This question already has answers here:
SQL select only rows with max value on a column [duplicate]
(27 answers)
Closed 4 years ago.
Let's say I have a table:
SELECT * FROM table;
Ric Charge Date
VOD.L 2 20180601
VOD.L 5 20181002
VOD.L 4.5 20180212
RBS.L 3 20180504
RBS.L 6 20180708
How could I filter, by date, such that it will return ONLY the most recent charge
E.g.
Ric Charge Date
VOD.L 4.5 20180212
RBS.L 6 20180708

I would prefer simple top 1
select top 1 * from t order by date desc
As you edit question, you can use co-related subquery byt your sample output is wrong
select * from t
where t.date in (select max(t1.date) from t t1
where t1.Ric=t.Ric
)
Demo
Ric Charge Date
VOD.L 5.000 02/10/2018 00:00:00
RBS.L 6.000 08/07/2018 00:00:

Something like
SELECT * FROM table WHERE date = (SELECT MAX(date) FROM table)
does what you ask(in multiple DB too)..
Date is a keyword in SQL and makes for a poor column name. If you can, change it now. If you can't, you might need to quote it according to the database you have

Order by the date and then use LIMIT 1 to only get one record.
SELECT *
FROM table
ORDER BY date DESC
LIMIT 1;

I would use row_number() with ties clause :
select top (1) with ties t.*
from table t
order by row_number() over (partition by ric order by date desc);
If the date has ties then you can use dense_rank() instead of row_number().

A self left join with an inequality would work:
SELECT t1.*
FROM
table t1
LEFT JOIN table t2 ON
t1.date < t2.date
WHERE
t2.date IS NULL
GROUP BY
t1.Ric
;

You can use a window function such as ROW_NUMBER to create a windowed order number, then you can extract only the record with row number = 1 (the latest date):
declare #tmp table (Ric varchar(50), Charge numeric(10,3), [Date] Date)
insert into #tmp values
('VOD.L', 2 , '20180601')
,('VOD.L', 5 , '20181002')
,('VOD.L', 4.5, '20180212')
,('RBS.L', 3 , '20180504')
,('RBS.L', 6 , '20180708')
select t.Ric, t.Charge, t.[Date]
from (
select ric,
Charge,
row_number() over (partition by ric order by [Date] desc) as rn,
[Date]
from #tmp
) t where rn = 1
Result:
From your question it is not clear what sholud happen when there are more rows with the same Ric/date values

As an alternative, you could use not exists.
SELECT *
FROM TableA AS A
WHERE NOT EXISTS ( SELECT * FROM TableA AS A2
WHERE A.Ric = A2.Ric
AND A2.Date > A.Date )

Per year one maximum date row according to previous row date

I have a table having two columns and I want to fetch data of 6 years with rules
The first row would be maximum date row that is available before and equals to input date (I will pass an input date)
From the second row till 6th row I need maximum(date row) that is earlier than previous row data selected data and there should not be 2 rows for same year i need only latest one according to the previous row but not in same year.
declare #tbl table (id int identity, marketdate date )
insert into #tbl (marketdate)
values('2018-05-31'),
('2017-06-01'),
('2017-05-28'),
('2017-04-28'),
('2016-05-26'),
('2015-04-18'),
('2015-04-20'),
('2015-03-18'),
('2014-05-31'),
('2014-04-18'),
('2013-04-15')
output:
id marketdate
1 2018.05.31
3 2017.05.28
5 2016.05.27
7 2015.04.20
9 2014.04.18
10 2013.04.15

Can't you do this with a simple order by/desc?
SELECT TOP 6 id, max(marketdate) FROM tbl
WHERE tbl.marketdate <= #date
GROUP BY YEAR(marketdate), id, marketdate
ORDER BY YEAR(marketdate) DESC

Based purely on your "Output" given your sample data, I believe the following is what you are after (The max date for each distinct year of data):
SELECT TOP 6
max(marketdate),
Year(marketDate) as marketyear
FROM #tbl
WHERE #tbl.marketdate <= getdate()
GROUP BY YEAR(marketdate)
ORDER BY YEAR(marketdate) DESC;
SQLFiddle of this matching your output

you can use row_number if you are using sql server
select top 6
id
, t.marketdate
from ( select rn = row_number() over (partition by year(marketdate)order by marketdate desc)
, id
, marketdate
from #tbl) as t
where t.rn = 1
order by t.marketdate desc

The following recursively searches for the next date, which must be at least one year earlier than the previous date.
Your parameterised start position goes where I chose 2018-06-01.
WITH
recursiveSearch AS
(
SELECT
id,
marketDate
FROM
(
SELECT
yourTable.id,
yourTable.marketDate,
ROW_NUMBER() OVER (ORDER BY yourTable.marketDate DESC) AS relative_position
FROM
yourTable
WHERE
yourTable.marketDate <= '2018-06-01'
)
search
WHERE
relative_position = 1
UNION ALL
SELECT
id,
marketDate
FROM
(
SELECT
yourTable.id,
yourTable.marketDate,
ROW_NUMBER() OVER (ORDER BY yourTable.marketDate DESC) AS relative_position
FROM
yourTable
INNER JOIN
recursiveSearch
ON yourTable.marketDate < DATEADD(YEAR, -1, recursiveSearch.marketDate)
)
search
WHERE
relative_position = 1
)
SELECT
*
FROM
recursiveSearch
WHERE
id IS NOT NULL
ORDER BY
recursiveSearch.marketDate DESC
OPTION
(MAXRECURSION 0)
http://sqlfiddle.com/#!18/56246/13

How to get next minimum date that is not within 30 days and use as reference point in SQL?

I have a subset of records that look like this:
ID DATE
A 2015-09-01
A 2015-10-03
A 2015-10-10
B 2015-09-01
B 2015-09-10
B 2015-10-03
...
For each ID the first minimum date is the first index record. Now I need to exclude cases within 30 days of the index record, and any record with a date greater than 30 days becomes another index record.
For example, for ID A, 2015-09-01 and 2015-10-03 are both index records and would be retained since they are more than 30 days apart. 2015-10-10 would be dropped because it's within 30 days of the 2nd index case.
For ID B, 2015-09-10 would be dropped and would NOT be an index case because it's within 30 days of the 1st index record. 2015-10-03 would be retained because it's greater than 30 days of the 1st index record and would be considered the 2nd index case.
The output should look like this:
ID DATE
A 2015-09-01
A 2015-10-03
B 2015-09-01
B 2015-10-03
How do I do this in SQL server 2012? There's no limit to how many dates an ID can have, could be just 1 to as many as 5 or more. I'm fairly basic with SQL so any help would be greatly appreciated.

working like in your example, #test is your table with data:
;with cte1
as
(
select
ID, Date,
row_number()over(partition by ID order by Date) groupID
from #test
),
cte2
as
(
select ID, Date, Date as DateTmp, groupID, 1 as getRow from cte1 where groupID=1
union all
select
c1.ID,
c1.Date,
case when datediff(Day, c2.DateTmp, c1.Date) > 30 then c1.Date else c2.DateTmp end as DateTmp,
c1.groupID,
case when datediff(Day, c2.DateTmp, c1.Date) > 30 then 1 else 0 end as getRow
from cte1 c1
inner join cte2 c2 on c2.groupID+1=c1.groupID and c2.ID=c1.ID
)
select ID, Date from cte2 where getRow=1 order by ID, Date

select * from
(
select ID,DATE_, case when DATE_DIFF is null then 1 when date_diff>30 then 1 else 0 end comparison from
(
select ID, DATE_ ,DATE_-LAG(DATE_, 1) OVER (PARTITION BY ID ORDER BY DATE_) date_diff from trial
)
)
where comparison=1 order by ID,DATE_;
Tried in Oracle Database. Similar funtions exist in SQL Server too.
I am grouping by Id column, and based on DATE field, am comparing the date in current field with its previous field. The very first row of a given user id would return null, and first field is required in our output as first index. For all other fields, we return 1 when the date difference with respect to previous field is greater than 30.
Lag function in transact sql
Case function in transact sql

Your logic explained in question is wrong,at one place ,you have said take the first index record and in next place you considered immediate record..
This works for immediate records:
with cte
as
(
select *, ROW_NUMBER() over (partition by id order by datee) as rownum
from #test
)
select *,datediff(day,beforedate,datee)
from cte t1
cross apply
(Select isnull(max(Datee),t1.datee) as beforedate from cte t2 where t1.id =t2.id and t2.rownum<t1.rownum) b
where datediff(day,beforedate,datee)= 0 or datediff(day,beforedate,datee)>=30
This works for constant base record:
select *,datediff(day,basedate,datee) from #test t1
cross apply
(select min(Datee) as basedate from #test t2 where t1.id=t2.id)b
where datediff(day,basedate,datee)>=30 or datediff(day,basedate,datee)=0

Try this solution.
Sample demo
with diffs as (
select t1.id,t1.dt strtdt,t2.dt enddt,datediff(dd,t1.dt,t2.dt) daysdiff
from t t1
join t t2 on t1.id=t2.id and t1.dt<t2.dt
)
, y as (
select id,strtdt,enddt
from (
select id,strtdt,enddt,row_number() over(partition by id,strtdt order by daysdiff) as rn
from diffs
where daysdiff > 30
) x
where rn=1
)
,z as (
select *,coalesce(lag(enddt) over(partition by id order by strtdt),strtdt) prevend
from y)
select id,strtdt from z where strtdt=prevend
union
select id,enddt from z where strtdt=prevend

Select newest records that have distinct Name column

I did search around and I found this
SQL selecting rows by most recent date with two unique columns
Which is so close to what I want but I can't seem to make it work.
I get an error Column 'ID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I want the newest row by date for each Distinct Name
Select ID,Name,Price,Date
From table
Group By Name
Order By Date ASC
Here is an example of what I want
Table
ID
Name
Price
Date
0
A
10
2012-05-03
1
B
9
2012-05-02
2
A
8
2012-05-04
3
C
10
2012-05-03
4
B
8
2012-05-01
desired result
ID
Name
Price
Date
2
A
8
2012-05-04
3
C
10
2012-05-03
1
B
9
2012-05-02
I am using Microsoft SQL Server 2008

Select ID,Name, Price,Date
From temp t1
where date = (select max(date) from temp where t1.name =temp.name)
order by date desc
Here is a SQL Fiddle with a demo of the above
Or as Conrad points out you can use an INNER JOIN (another SQL Fiddle with a demo) :
SELECT t1.ID, t1.Name, t1.Price, t1.Date
FROM temp t1
INNER JOIN
(
SELECT Max(date) date, name
FROM temp
GROUP BY name
) AS t2
ON t1.name = t2.name
AND t1.date = t2.date
ORDER BY date DESC

There a couple ways to do this. This one uses ROW_NUMBER. Just partition by Name and then order by what you want to put the values you want in the first position.
WITH cte
AS (SELECT Row_number() OVER (partition BY NAME ORDER BY date DESC) RN,
id,
name,
price,
date
FROM table1)
SELECT id,
name,
price,
date
FROM cte
WHERE rn = 1
DEMO
Note you should probably add ID (partition BY NAME ORDER BY date DESC, ID DESC) in your actual query as a tie-breaker for date

select * from (
Select
ID, Name, Price, Date,
Rank() over (partition by Name order by Date) RankOrder
From table
) T
where RankOrder = 1

I have found another memory efficient way (but probably crude way)that has worked for me in postgress. Order the query by the date desc, then select the first record of each distinct field.
SELECT distinct on (Name) ID, Price, Date from
table
order by Date desc

Use Distinct instead of Group By
Select Distinct ID,Name,Price,Date
From table
Order By Date ASC
http://technet.microsoft.com/en-us/library/ms187831.aspx

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

group rows in plain sql - sql

try this: WITH CTE AS( SELECT * ,ROW_NUMBER() OVER (ORDER BY [DATE] ) - ROW_NUMBER() OVER (PARTITION BY NUMBER ORDER BY [DATE] ) AS ROW_NUM FROM TABLE1) SELECT NUMBER,MIN(DATE) AS DATE FROM CTE GROUP BY ROW_NUM,NUMBER ORDER BY DATE SQL fiddle demo

Related

Counting ID's for correct creation date time

SQL - Filtering by closest date [duplicate]

Per year one maximum date row according to previous row date

How to get next minimum date that is not within 30 days and use as reference point in SQL?

Select newest records that have distinct Name column

Categories

Resources