How to select extra columns while using group by clause? - sql

I have a table which contains data in this format.
productid filterName boolfilter numericfilter
1 X 1 NULL
1 Y NULL 99inch
1 Z 0 NULL
2 Y NULL 55kg
2 Y NULL 45kg
3 K NULL 20
3 M NULL 35
3 N NULL 25
4 X 1 NULL
4 K 1 NULL
I need data in this format.
Need products where only numeric filters are setup but no boolean filters
productid filterName numericfilter
2 Y 55kg
2 Y 45kg
3 K 20
3 M 35
3 N 25
I have written this query,
SELCT productid
FROM tbl_filters
GROUP BY productid
HAVING SUM(CAST(boolfilter AS INT)) IS NULL
I am getting prouctid 2 and 3, but i need the extra columns also as i have mentioned.
When i am using multiple columns in groupby clause i am not getting the required output.

SELECT t.productid, t.filterName, t.numericfilter
FROM Table_Name t
WHERE t.numericfilter IS NOT NULL
AND NOT EXISTS (SELECT 1
FROM TABLE_NAME
WHERE t.productid = productid
AND boolfilter IS NOT NULL)
Working SQL FIDDLE
| PRODUCTID | FILTERNAME | NUMERICFILTER |
|-----------|------------|---------------|
| 2 | Y | 55kg |
| 2 | Y | 45kg |
| 3 | K | 20 |
| 3 | M | 35 |
| 3 | N | 25 |

Use window functions instead:
SELECT productid, filterName, numericfilter
FROM (SELECT f.*,
MAX(boolfilter) OVER (PARTITION BY productid) as maxbf
FROM tbl_filters f
) f
WHERE maxbf is null;
Fiddle DEMO.
This calculates the maximum of boolfilter for each productid. If it is always NULL, then the result is NULL. Note that you don't need a cast() for this.

Related

SQL: Add values to STDEVP calculation

I have the following table.
Key | Count | Amount
----| ----- | ------
1 | 2 | 10
1 | 2 | 15
2 | 5 | 1
2 | 5 | 2
2 | 5 | 3
2 | 5 | 50
2 | 5 | 20
3 | 3 | 5
3 | 3 | 4
3 | 3 | 5
Sorry I couldn't figure out who to make the above a table.
I'm running this on SQL Server Management Studio 2012.
I'd like the stdevp return of the amount columns but if the number of records is less than some value 'x' (there will never be more than x records for a given key), then I want to add zeros to account for the remainder.
For example, if 'x' is 6:
for key 1, I need stdevp(10,5,0,0,0,0)
for key 2, I need stdevp(1,2,3,50,20,0)
for key 3, I need stdevp(5,4,5,0,0,0)
I just need to be able to add zeros to the calculation. I could insert records to my table, but that seems rather tedious.
This seems complicated -- padding data for each key. Here is one approach:
with xs as (
select 0 as val, 1 as n
union all
select 0, n + 1
from xs
where xs.n < 6
)
select k.key, stdevp(coalesce(t.amount, 0))
from xs cross join
(select distinct key from t) k left join
(select t.*, row_number() over (partition by key order by key) as seqnum
from t
) t
on t.key = k.key and t.seqnum = xs.n
group by k.key;
The idea is that the cross join generates 6 rows for each key. Then the left join brings in available rows, up to the maximum.

Find out the sequence number of a pattern

Consider below table
Table
ActivityId Flag Type
---------- ----- -----
1 N
2 N
3 Y EXT
4 Y
5 Y
6 N
7 Y INT
8 Y
9 N
10 N
11 N
12 Y EXT
13 N
14 N
15 N
16 Y EXT
17 Y
18 Y INT
19 Y
20 Y EXT
21 Y
22 N
23 N
First record has always Flag = N and then any sequence of Flag = Y or Flag = N can exist for next records. Every time flag changes from N to Y, the Type field is either EXT or INT. Next Y records (before next N) might have Type = EXT or INT or NULL and this is not important.
I want to calculate Cycle No for this sequence of N/Y. First cycle starts when Flag = N (always first record has flag = N) and cycle ends when flag changes to Y and Type = EXT. Then, next cycle starts when flag changes to N and ends when flag becomes Y and type = EXT. this is repeated until all records are processed. The result for above table is:
Result
ActivityId Flag Type Cycle No
---------- ----- ----- --------
1 N 1
2 N 1
3 Y EXT 1
4 Y
5 Y
6 N 2
7 Y INT 2
8 Y 2
9 N 2
10 N 2
11 N 2
12 Y EXT 2
13 N 3
14 N 3
15 N 3
16 Y EXT 3
17 Y
18 Y INT
19 Y
20 Y EXT
21 Y
22 N 4
23 N 4
I am using SQL Server 2008 R2 (no LAG/LEAD).
Can you please help me find the SQL query to calculate Cycle No?
I've got a solution, its not pretty, but through stepwise refinement I get to your result:
The solution is in three steps
Isolate the activityid for all the possible start cycle and end
cycle rows.
Filter out all the dud start events
Number the cycles, and find the interval of activityid for each
cycle
First i select out all start and end cycle events:
with tab as
(select * from (values
(1,'N',''),(2,'N',''),(3,'Y','EXT'),(4,'Y','')
,(5,'Y',''),(6,'N',''),(7,'Y','INT'),(8,'Y','')
,(9,'N',''),(10,'N',''),(11,'N',''),(12,'Y','EXT')
,(13,'N',''),(14,'N',''),(15,'N',''),(16,'Y','EXT')
,(17,'Y',''),(18,'Y','INT'),(19,'Y',''),(20,'Y','EXT')
,(21,'Y',''),(22,'N',''),(23,'N','')) a(ActivityId,Flag,[Type]))
,CTE1 as
( select
ROW_NUMBER() over (order by t1.ActivityId) rn
,t1.ActivityId
,case when t1.Flag='N' then 'Start' else 'End' end Cycle
from tab t1
where t1.Flag='N' or (t1.Flag='Y' and t1.[Type]='Ext')
)
select * from cte1
This returns
rn ActivityId Cycle
1 1 Start
2 2 Start
3 3 End
4 6 Start
5 9 Start
6 10 Start
7 11 Start
8 12 End
9 13 Start
10 14 Start
11 15 Start
12 16 End
13 20 End
14 22 Start
15 23 Start
The problem is now that while we are sure of when the cycle ends, that is when Flag is N and Type is Ext, we are not sure when the cycle starts. Row 1 and 2 both denote a possible start event. But luckily we can see that only a start event following an End event is to be counted. Since we do not have lag or lead we must join the CTE with itself:
,CTE2 as
(
select ROW_NUMBER() over (order by a1.activityid) rn
,a1.ActivityId
,a1.Cycle
,a2.Cycle PrevCycle
from CTE1 a1 left join CTE1 a2 on a1.rn=a2.rn+1
where
a2.Cycle is null -- First Cycle
or
( a2.Cycle is not null
and
(
(a1.Cycle='End' and a2.Cycle='Start') -- End of cycle
or
(a1.Cycle='Start'
and a2.Cycle='End') -- next cycles
)
)
)
select * from cte2
This returns
rn ActivityId Cycle PrevCycle
1 1 Start NULL
2 3 End Start
3 6 Start End
4 12 End Start
5 13 Start End
6 16 End Start
7 22 Start End
I Select the first start event - since we always start with one, and then keep the END events that follow a start event.
Finally we only keep the rest of the start events if the previous event was an End event.
Now we can find the start and end of each cycle, and number them:
,cte3 as
(
select ROW_NUMBER() over (order by b1.ActivityId) CycleNumber
,b1.ActivityId StartId,b2.ActivityId EndId
from cte2 b1 left join cte2 b2
on b1.rn=b2.rn-1
where b1.Cycle='Start'
)
select * from cte3
Which gives us what we need:
CycleNumber StartId EndId
1 1 3
2 6 12
3 13 16
4 22 NULL
Now we just need to join this back on our table:
select
a.ActivityId,a.Flag,a.[Type],CycleNumber
from tab a
left join cte3 b on a.ActivityId between b.StartId and isnull(b.EndId,a.ActivityId)
This gives the result you were looking for.
This is just a quick and dirty solution, perhaps with a little TLC you can pretty it up, and reduce the number of steps.
The full solution is here:
with tab as
(select * from (values
(1,'N',''),(2,'N',''),(3,'Y','EXT'),(4,'Y','')
,(5,'Y',''),(6,'N',''),(7,'Y','INT'),(8,'Y','')
,(9,'N',''),(10,'N',''),(11,'N',''),(12,'Y','EXT')
,(13,'N',''),(14,'N',''),(15,'N',''),(16,'Y','EXT')
,(17,'Y',''),(18,'Y','INT'),(19,'Y',''),(20,'Y','EXT')
,(21,'Y',''),(22,'N',''),(23,'N','')) a(ActivityId,Flag,[Type]))
,CTE1 as
( select
ROW_NUMBER() over (order by t1.ActivityId) rn
,t1.ActivityId
,case when t1.Flag='N' then 'Start' else 'End' end Cycle
from tab t1
where t1.Flag='N' or (t1.Flag='Y' and t1.[Type]='Ext')
)
,CTE2 as
(
select ROW_NUMBER() over (order by a1.activityid) rn
,a1.ActivityId
,a1.Cycle
,a2.Cycle PrevCycle
from CTE1 a1 left join CTE1 a2 on a1.rn=a2.rn+1
where
a2.Cycle is null -- First Cycle
or
( a2.Cycle is not null
and
(
a1.Cycle='End' -- End of cycle
or
(a1.Cycle='Start'
and a2.Cycle='End') -- next cycles
)
)
)
,cte3 as
(
select ROW_NUMBER() over (order by b1.ActivityId) CycleNumber
,b1.ActivityId StartId,b2.ActivityId EndId
from cte2 b1 left join cte2 b2
on b1.rn=b2.rn-1
where b1.Cycle='Start'
)
select
a.ActivityId,a.Flag,a.[Type],CycleNumber
from tab a
left join cte3 b on a.ActivityId between b.StartId and isnull(b.EndId,a.ActivityId)
If you are happy with recursion, this can be achieved rather simply with a bit of comparison logic to the preceding row when ordered by your ActivityId:
declare #t table(ActivityId int,Flag nvarchar(1),TypeValue nvarchar(3));
insert into #t values(1 ,'N',null),(2 ,'N',null),(3 ,'Y','EXT'),(4 ,'Y',null),(5 ,'Y',null),(6 ,'N',null),(7 ,'Y','INT'),(8 ,'Y',null),(9 ,'N',null),(10,'N',null),(11,'N',null),(12,'Y','EXT'),(13,'N',null),(14,'N',null),(15,'N',null),(16,'Y','EXT'),(17,'Y',null),(18,'Y','INT'),(19,'Y',null),(20,'Y','EXT'),(21,'Y',null),(22,'N',null),(23,'N',null);
with rn as -- Derived table purely to guarantee incremental row number. If you can guarantee your ActivityId values are incremental start to finish, this isn't required.
( select row_number() over (order by ActivityId) as rn
,ActivityId
,Flag
,TypeValue
from #t
),d as
( select rn -- Recursive CTE that compares the current row to the one previous.
,ActivityId
,Flag
,TypeValue
,cast(1 as decimal(10,5)) as CycleNo
from rn
where rn = 1
union all
select rn.rn
,rn.ActivityId
,rn.Flag
,rn.TypeValue
,cast(
case when d.Flag = 'Y' and d.TypeValue = 'EXT' and d.CycleNo >= 1
then case when rn.Flag = 'N'
then d.CycleNo + 1
else (d.CycleNo + 1) * 0.0001 -- This part keeps track of the cycle number in fractional values, which can be removed by converting the final result to INT.
end
else case when rn.Flag = 'N' and d.CycleNo < 1
then d.CycleNo * 10000
else d.CycleNo
end
end
as decimal(10,5)) as CycleNo
from rn
inner join d
on d.rn = rn.rn - 1
)
select ActivityId
,Flag
,TypeValue
,cast(CycleNo as int) as CycleNo
from d
order by ActivityId;
Output:
+------------+------+-----------+---------+
| ActivityId | Flag | TypeValue | CycleNo |
+------------+------+-----------+---------+
| 1 | N | NULL | 1 |
| 2 | N | NULL | 1 |
| 3 | Y | EXT | 1 |
| 4 | Y | NULL | 0 |
| 5 | Y | NULL | 0 |
| 6 | N | NULL | 2 |
| 7 | Y | INT | 2 |
| 8 | Y | NULL | 2 |
| 9 | N | NULL | 2 |
| 10 | N | NULL | 2 |
| 11 | N | NULL | 2 |
| 12 | Y | EXT | 2 |
| 13 | N | NULL | 3 |
| 14 | N | NULL | 3 |
| 15 | N | NULL | 3 |
| 16 | Y | EXT | 3 |
| 17 | Y | NULL | 0 |
| 18 | Y | INT | 0 |
| 19 | Y | NULL | 0 |
| 20 | Y | EXT | 0 |
| 21 | Y | NULL | 0 |
| 22 | N | NULL | 4 |
| 23 | N | NULL | 4 |
+------------+------+-----------+---------+

Count groups of NULL values - partition or window?

n | g
---------
1 | 1
2 | NULL
3 | 1
4 | 1
5 | 1
6 | 1
7 | NULL
8 | NULL
9 | NULL
10 | 1
11 | 1
12 | 1
13 | 1
14 | 1
15 | 1
16 | 1
17 | NULL
18 | 1
19 | 1
20 | 1
21 | NULL
22 | 1
23 | 1
24 | 1
25 | 1
26 | NULL
27 | NULL
28 | 1
29 | 1
30 | NULL
31 | 1
From the above column g I should get this result:
x|y
---
1|4
2|1
3|1
where
x stands for the count of contiguous NULLs and
y stands for the times a single group of NULLs occurs.
I.e., there is ...
4 groups of only 1 NULL,
1 group of 2 NULLs and
1 group of 3 NULLs
Compute a running count of not-null values with a window function to form groups, then 2 two nested counts ...
SELECT x, count(*) AS y
FROM (
SELECT grp, count(*) FILTER (WHERE g IS NULL) AS x
FROM (
SELECT g, count(g) OVER (ORDER BY n) AS grp
FROM tbl
) sub1
WHERE g IS NULL
GROUP BY grp
) sub2
GROUP BY 1
ORDER BY 1;
count() only counts not null values.
This includes the preceding row with a not null g in the following group (grp) of NULL values - which has to be removed from the count.
I replaced the HAVING clause I had for that in my initial query with WHERE g IS NULL, like #klin uses in his answer), that's simpler.
Related:
Find ā€œnā€ consecutive free numbers from table
Select longest continuous sequence
If n is a gapless sequence of integer numbers, you can simplify further:
SELECT x, count(*) AS y
FROM (
SELECT grp, count(*) AS x
FROM (
SELECT n - row_number() OVER (ORDER BY n) AS grp
FROM tbl
WHERE g IS NULL
) sub1
GROUP BY 1
) sub2
GROUP BY 1
ORDER BY 1;
Eliminate not null values immediately and deduct the row number from n, thereby arriving at (meaningless) group numbers directly ...
While the only possible value in g is 1, sum() is a smart trick (like #klin provided). But that should be a boolean column then, wouldn't make sense as numeric type. So I assume that's just a simplification of the actual problem in the question.
select x, count(x) y
from (
select s, count(s) x
from (
select *, sum(g) over (order by i) as s
from example
) s
where g isnull
group by 1
) s
group by 1
order by 1;
Test it here.

Update existing records based on the order from a different column

I have the following table:
X_ID X_NAME X_TYPE X_SORT_ID
10 BOOK 1 NULL
20 PEN 1 NULL
30 WATCH 2 NULL
5 TENT 3 NULL
What I'm trying to achieve is to populate the X_SORT_ID column with incremented values starting with 1 based on value in X_ID.
So the table would look like this:
X_ID X_NAME X_TYPE X_SORT_ID
10 BOOK 1 2
20 PEN 1 3
30 WATCH 2 4
5 TENT 3 1
I need to update this table only for all existing rows.
The records that will be added in the future will use a sequence that would set the X_SORT_ID field to the next value.
The only query I came up with is not exactly what I need.
UPDATE X_ITEMS
SET X_SORT_ID = (SELECT MAX(X_ID) FROM X_ITEMS) + ROWNUM
WHERE X_SORT_ID IS NULL;
I could use just a rownum, but this would assign value of 4 to the last record with X_ID = 5, which is not what I wanted.
I'd be thankful for any suggestions.
Can use oracle row_number :
update query
update items ot
set X_SORT_ID =
(
select rw from
(
select X_ID, row_number() over ( order by X_ID ) as rw from items
) it
where it.X_ID = ot.X_ID
)
;
result table
+------+--------+--------+-----------+
| X_ID | X_NAME | X_TYPE | X_SORT_ID |
+------+--------+--------+-----------+
| 10 | BOOK | 1 | 2 |
| 20 | PEN | 1 | 3 |
| 30 | WATCH | 2 | 4 |
| 5 | TENT | 3 | 1 |
+------+--------+--------+-----------+
sqlfiddle
Using ROWNUM (a pseudocolumn) instead of ROWNUMBER(an analytic function) as used above.
Read here for difference
X_ID should be defined as primary key.
update Grentley GY
set X_SORT_ID =
(select rno from
(select X_ID,rownum as rno from Grentley GY
order by x_id ) AB
where AB.X_ID= GY.X_ID
) ;
SQL Fiddle
Sample

SQL - how to create a dynamic matrix showing attribution values per item over time (where number of attributes varies per date)

I have:
items which are described by a set of ids (GroupType, ID, Name)
VALUES table which gets populated with factor values on each date so that an item gets only a certain set of factors with values per date.
FACTORS table containing static descriptions of the factors.
Looking for:
I want to create a temporary table with a matrix showing factor values for each item per date so that one could see in user friendly way which Factors were populated on a given date (with corresponding values).
Values
Date GroupType ID Name FactorId Value
01/01/2013 1 1 A 1 10
01/01/2013 1 1 A 2 8
01/01/2013 1 1 A 3 12
01/01/2013 1 2 B 3 5
01/01/2013 1 2 B 4 6
02/01/2013 1 1 A 1 7
02/01/2013 1 1 A 2 6
02/01/2013 1 2 B 3 9
02/01/2013 1 2 B 4 9
Factors
FactorId FactorName
1 Factor1
2 Factor2
3 Factor3
4 Factor4
. .
. .
. .
temporary table Factor Values Matrix
Date Group ID Name Factor1 Factor2 Factor3 Factor4 Factor...
01/01/2013 1 1 A 10 8 12
01/01/2013 1 2 B 5 6
02/01/2013 1 1 A 7 6
02/01/2013 1 2 B 9 9
Any help is greatly appreciated!
This type of data transformation is known as a PIVOT which takes values from rows and converts it into columns.
In SQL Server 2005+, there is a function that will perform this rotation of data.
Static Pivot:
If your values will be set then you can hard-code the FactorNames into the columns by using a static pivot.
select date, grouptype, id, name, Factor1, Factor2, Factor3, Factor4
from
(
select v.date,
v.grouptype,
v.id,
v.name,
f.factorname,
v.value
from [values] v
left join factors f
on v.factorid = f.factorid
-- where v.date between date1 and date2
) src
pivot
(
max(value)
for factorname in (Factor1, Factor2, Factor3, Factor4)
) piv;
See SQL Fiddle with Demo.
Dynamic Pivot:
In your case, you stated that you are going to have an unknown number of values. If so, then you will need to use dynamic SQL to generate a SQL string that will be executed at run-time:
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT distinct ',' + QUOTENAME(FactorName)
from factors
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'SELECT date, grouptype, id, name,' + #cols + ' from
(
select v.date,
v.grouptype,
v.id,
v.name,
f.factorname,
v.value
from [values] v
left join factors f
on v.factorid = f.factorid
-- where v.date between date1 and date2
) x
pivot
(
max(value)
for factorname in (' + #cols + ')
) p '
execute(#query)
See SQL Fiddle with Demo.
Both of these versions generate the same result:
| DATE | GROUPTYPE | ID | NAME | FACTOR1 | FACTOR2 | FACTOR3 | FACTOR4 |
------------------------------------------------------------------------------
| 2013-01-01 | 1 | 1 | A | 10 | 8 | 12 | (null) |
| 2013-01-01 | 1 | 2 | B | (null) | (null) | 5 | 6 |
| 2013-02-01 | 1 | 1 | A | 7 | 6 | 11 | (null) |
| 2013-02-01 | 1 | 1 | B | (null) | (null) | 9 | 9 |
If you want to filter the results based on a date range, then you will just need to add a WHERE clause to the above queries.
It looks like you are simply trying to pivot the rows into columns. I think this does what you want:
select Date, Group, ID, Name,
max(case when factorid = 1 then name end) as Factor1,
max(case when factorid = 2 then name end) as Factor2,
max(case when factorid = 3 then name end) as Factor3,
max(case when factorid = 4 then name end) as Factor4
from t
group by Date, Group, ID, Name