SQL group by data with row separate - sql

I would like to group by Customer & Date and generate count columns for 2 separate values (Flag=Y and Flag=N). Input table looks like this:
Customer Date Flag
------- ------- -----
001 201201 Y
001 201202 Y
001 201203 Y
001 201204 N
001 201205 N
001 201206 Y
001 201207 Y
001 201208 Y
001 201209 N
002 201201 N
002 201202 Y
002 201203 Y
002 201205 N
The output should look like this:
Customer MinDate MaxDate Count_Y
------- ------ ------- -------
001 201201 201203 3
001 201206 201208 3
002 201202 201203 2
How can I write the SQL query? Any kind of help is appreciated! Thanks!

You want to find consecutive values of "Y". This is a "gaps-and-islands" problem, and there are two basic approaches:
Determine the first "Y" in each group and use this information to define a group of consecutive "Y" values.
Use the difference of row_number() values for the calculation.
The first depends on SQL Server 2012+ and you haven't specified the version. So, the second looks like this:
select customer, min(date) as mindate, max(date) as maxdate,
count(*) as numYs
from (select t.*,
row_number() over (partition by customer order by date) as seqnum_cd,
row_number() over (partition by customer, flag order by date) as seqnum_cfd
from t
) t
where flag = 'Y'
group by customer, (seqnum_cd - seqnum_cfd), flag;
It is a little tricky to explain how this works. In my experience, thought, if you run the subquery, you will see how the seqnum columns are calculated and "get it" by observing the results.
Note: This assumes that there is at most one record per day. If there are more, you can use dense_rank() instead of row_number() for the same effect.

Try with the below query,which will give you exactly what you want.
DROP TABLE [GroupCustomer]
GO
CREATE TABLE [dbo].[GroupCustomer](
Customer VARCHAR(50),
[Date] [datetime] NULL,
Flag VARCHAR(1)
)
INSERT INTO [dbo].[GroupCustomer] (Customer ,[Date],Flag)
VALUES ('001','201201','Y'),('001','201202','Y'),
('001','201203','Y'),('001','201204','N'),
('001','201205','N'),('001','201206','Y'),
('001','201207','Y'),('001','201208','Y'),
('001','201209','N'),('002','201201','N'),
('002','201202','Y'),('002','201203','Y'),
('002','201205','N')
GO
;WITH cte_cnt
AS
(
SELECT Customer,Format(MIN([Date]),'yyMMdd') AS MinDate
,Format(MAX([Date]),'yyMMdd') AS MaxDate
, COUNT('A') AS Count_Y
FROM (
SELECT Customer,Flag,[Date],
ROW_NUMBER() OVER(Partition by customer ORDER BY [Date]) AS ROW_NUMBER,
DATEDIFF(D, ROW_NUMBER() OVER(Partition by customer ORDER BY [Date])
, [Date]) AS Diff
FROM [GroupCustomer]
WHERE Flag='Y') AS dt
GROUP BY Customer,Flag, Diff )
SELECT *
FROM cte_cnt c
ORDER BY Customer
GO

Related

How to get ' COUNT DISTINCT' over moving window

I'm working on a query to compute the distinct users of particular features of an app within a moving window. So, if there's a range from 15-20th October, I want a query to go from 8-15 Oct, 9-16 Oct etc and get the count of distinct users per feature. So for each date, it should have x rows where x is the number of features.
I have a query the following query so far:
WITH V1(edate, code, total) AS
(
SELECT date, featurecode,
DENSE_RANK() OVER ( PARTITION BY (featurecode ORDER BY accountid ASC) + DENSE_RANK() OVER ( PARTITION BY featurecode ORDER By accountid DESC) - 1
FROM....
GROUP BY edate, featurecode, appcode, accountid
HAVING appcode='sample' AND eventdate BETWEEN '15-10-2018' And '20-10-2018'
)
Select distinct date, code, total
from V1
WHERE date between '2018-10-15' AND '2018-10-20'
This returns the same set of values for all the dates. Is there any way to do this efficiently?? It's a DB2 database by the way but I'm looking for insight from postgresql users too.
Present result- All the totals are being repeated.
date code total
10/15/2018 appname-feature1 123
10/15/2018 appname-feature2 234
10/15/2018 appname-feature3 321
10/16/2018 appname-feature1 123
10/16/2018 appname-feature2 234
10/16/2018 appname-feature3 321
Desired result.
date code total
10/15/2018 appname-feature1 123
10/15/2018 appname-feature2 234
10/15/2018 appname-feature3 321
10/16/2018 appname-feature1 212
10/16/2018 appname-feature2 577
10/16/2018 appname-feature3 2345
This is not easy to do efficiently. DISTINCT counts are't incrementally maintainable (unless you go down the route of in-exact DISTINCT counts such as HyperLogLog).
It is easy to code in SQL, and try the usual indexing etc to help.
It is (possibly) not possible, however, to code with OLAP functions.. not least because you can only use RANGE BETWEEN for SUM(), COUNT(), MAX() etc, but not RANK() or DENSE_RANK() ... so just use a traditional co-related sub-select
First some data
CREATE TABLE T(D DATE,F CHAR(1),A CHAR(1));
INSERT INTO T (VALUES
('2018-10-10','X','A')
, ('2018-10-11','X','A')
, ('2018-10-15','X','A')
, ('2018-10-15','X','A')
, ('2018-10-15','X','B')
, ('2018-10-15','Y','A')
, ('2018-10-16','X','C')
, ('2018-10-18','X','A')
, ('2018-10-21','X','B')
)
;
Now a simple select
WITH B AS (
SELECT DISTINCT D, F FROM T
)
SELECT D,F
, (SELECT COUNT(DISTINCT A)
FROM T
WHERE T.F = B.F
AND T.D BETWEEN B.D - 3 DAYS AND B.D + 4 DAYS
) AS DISTINCT_A_MOVING_WEEK
FROM
B
ORDER BY F,D
;
giving, e.g.
D F DISTINCT_A_MOVING_WEEK
---------- - ----------------------
2018-10-10 X 1
2018-10-11 X 2
2018-10-15 X 3
2018-10-16 X 3
2018-10-18 X 3
2018-10-21 X 2
2018-10-15 Y 1

How to give the serial number if data is repeating

if My table has this values i need to generate seqno column
ClientId clinetLocation seqno
001 Abc 1
001 BBc 2
001 ccd 3
002 Abc 1
002 BBc 2
003 ccd 1
You are looking for the row_number() function:
select ClientId, clinetLocation,
row_number() over (partition by ClientId order by clinetLocation) as seqnum
from t;
This is a standard function available in most databases.
One option would be counting the grouped rows with respect to those columns :
select count(1) over ( order by ClientId, ClientLocation ) as seqno,
ClientId, ClientLocation
from tab
group by ClientId, ClientLocation;
where ClientId & ClientLocation combination seems unique.
Rextester Demo

Select Most Recent Entry in SQL

I'm trying to select the most recent non zero entry from my data set in SQL. Most examples of this are satisfied with returning only the date and the group by variables, but I would also like to return the relevant Value. For example:
ID Date Value
----------------------------
001 2014-10-01 32
001 2014-10-05 10
001 2014-10-17 0
002 2014-10-03 17
002 2014-10-20 60
003 2014-09-30 90
003 2014-10-10 7
004 2014-10-06 150
005 2014-10-17 0
005 2014-10-18 9
Using
SELECT ID, MAX(Date) AS MDate FROM Table WHERE Value > 0 GROUP BY ID
Returns:
ID Date
-------------------
001 2014-10-05
002 2014-10-20
003 2014-10-10
004 2014-10-06
005 2014-10-18
But whenever I try to include Value as one of the selected variables, SQLServer results in an error:
"Column 'Value' is invalid in the select list because it is not
contained in either an aggregate function or the GROUP BY clause."
My desired result would be:
ID Date Value
----------------------------
001 2014-10-05 10
002 2014-10-20 60
003 2014-10-10 7
004 2014-10-06 150
005 2014-10-18 9
One solution I have thought of would be to look up the results back in the original Table and return the Value that corresponds to the relevant ID & Date (I have already trimmed down and so I know these are unique), but this seems to me like a messy solution. Any help on this would be appreciated.
NOTE: I do not want to group by Value as this is the result I am trying to pull out in the end (i.e. for each ID, I want the most recent Value). Further Example:
ID Date Value
----------------------------
001 2014-10-05 10
001 2014-10-06 10
001 2014-10-10 10
001 2014-10-12 8
001 2014-10-18 0
Here, I only want the last non zero entry. (001, 2014-10-12, 8)
SELECT ID, MAX(Date) AS MDate, Value FROM Table WHERE Value > 0 GROUP BY ID, Value
Would return:
ID Date Value
----------------------------
001 2014-10-10 10
001 2014-10-12 8
This can also be done using a window function which is very ofter faster than a join on a grouped query:
select id, date, value
from (
select id,
date,
value,
row_number() over (partition by id order by date desc) as rn
from the_table
) t
where rn = 1
order by id;
Assuming you don't have repeated dates for the same ID in the table, this should work:
SELECT A.ID, A.Date, A.Value
FROM
T1 AS A
INNER JOIN (SELECT ID,MAX(Date) AS Date FROM T1 WHERE Value > 0 GROUP BY ID) AS B
ON A.ID = B.ID AND A.Date = B.Date
select a.id, a.date, a.value from Table1 a inner join (
select id, max(date) mydate from table1
where Value>0 group by ID) b on a.ID=b.ID and a.Date=b.mydate
Using Subqry,
SELECT ID, Date AS MDate, VALUE
FROM table t1
where date = (Select max(date)
from table t2
where Value >0
and t1.id = t2.id
)
Answers provided are perfectly adequate, but Using CTE:
;WITH cteTable
AS
(
SELECT
Table.ID [ID], MAX(Date) [MaxDate]
FROM
Table
WHERE
Table.Value > 0
GROUP BY
Table.ID
)
SELECT
cteTable.ID, cteTable.Date, Table.Value
FROM
Table INNER JOIN cteTable ON (Table.ID = cteTable.ID)

Create table with distinct values based on date

I have a table which fills up with lots of transactions monthly, like below.
Name ID Date OtherColumn
_________________________________________________
John Smith 11111 2012-11-29 Somevalue
John Smith 11111 2012-11-30 Somevalue
Adam Gray 22222 2012-12-11 Somevalue
Tim Blue 33333 2012-12-15 Somevalue
John NewName 11111 2013-01-01 Somevalue
Adam Gray 22222 2013-01-02 Somevalue
From this table i want to create a dimension table with the unique names and id's. The problem is that a person can change his/her name, like "John" in the example above. The Id's are otherwise always unique. In those cases I want to only use the newest name (the one with the latest date).
So that I end up with a table like this:
Name ID
______________________
John NewName 11111
Adam Gray 22222
Tim Blue 33333
How do I go about achieving this?
Can I do it in a single query?
Use a CTE for this. It simplifies ranking and window functions.
;WITH CTE as
(SELECT
RN = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY [Date] DESC),
ID,
Name
FROM
YourTable)
SELECT
Name,
ID
FROM
CTE
WHERE
RN = 1
I think creating a table is a bad idea, but this is how you get the most recent name.
select name
from yourtable yt join
(select id, max(date) maxdate
from yourtable
group by id ) temp on temp.id = yt.id and yt.date = maxdate
JNK's CTE solution is an equivalent of the following.
SELECT
Name,
ID
FROM (
SELECT
RN = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY [Date] DESC),
Name,
ID
FROM theTable
)
WHERE RN = 1
Trying to think a way to get rid of the partition function without introducing the possible duplicates.

How to select first 2 rows using group's

I have:
Table1
ID date amt
-------------------
001 21/01/2012 1200
001 25/02/2012 1400
001 24/03/2012 1500
001 21/04/2012 1000
002 21/03/2012 1200
002 01/01/2012 0500
002 08/09/2012 1000
.....
I want to select the first two rows from each group of ID ordered by date DESC from Table1.
Query looks like this:
SELECT TOP 2 DATE, ID, AMT FROM TABLE1 GROUP BY ID, AMT --(NOT WORKING)
Expected output:
ID date amt
-------------------
001 21/01/2012 1200
001 25/02/2012 1400
002 21/03/2012 1200
002 01/01/2012 0500
.....
you can take advantage of using Common table Expression and Window Function
WITH recordList
AS
(
SELECT ID, DATE, Amt,
DENSE_RANK() OVER (PARTITION BY ID ORDER BY DATE ASC) rn
FROM tableName
)
SELECT ID, DATE, Amt
FROM recordList
WHERE rn <= 2
SQLFiddle Demo
based on your desired result above, you are ordering the date by ASCENDING.
Ok, You can either use DENSER_RANK() or ROW_NUMBER() but in my answer, I've used DENSE_RANK() because I'm thinking of the duplicates. Anyway, it's the choice of the OP to use ROW_NUMBER() instead of DENSE_RANK().
TSQL Ranking Functions