I have a table in a SQL Server 2008 database with a number column that I want to arrange on a scale 1 to 10.
Here is an example where the column (Scale) is what I want to accomplish with SQL
Name Count (Scale)
----------------------
A 19 2
B 1 1
C 25 3
D 100 10
E 29 3
F 60 7
In my example above the min and max count is 1 and 100 (this could be different from day to day).
I want to get a number to which each record belongs to.
1 = 0-9
2 = 10-19
3 = 20-29 and so on...
It has to be dynamic because this data changes everyday so I can not use a WHERE clause with static numbers like this: WHEN Count Between 0 and 10...
Try this, though note technically the value 100 doesn't fall in the range 90-99 and therefore should probably be classed as 11, hence why the value 60 comes out with a scale of 6 rather than your 7:
SQL Fiddle
MS SQL Server 2008 Schema Setup:
Query 1:
create table #scale
(
Name Varchar(10),
[Count] INT
)
INSERT INTO #scale
VALUES
('A', 19),
('B', 1),
('C', 25),
('D', 100),
('E', 29),
('F', 60)
SELECT name, [COUNT],
CEILING([COUNT] * 10.0 / (SELECT MAX([Count]) - MIN([Count]) + 1 FROM #Scale)) AS [Scale]
FROM #scale
Results:
| NAME | COUNT | SCALE |
|------|-------|-------|
| A | 19 | 2 |
| B | 1 | 1 |
| C | 25 | 3 |
| D | 100 | 10 |
| E | 29 | 3 |
| F | 60 | 6 |
This gets you your answer where 60 becomes 7, hence 100 is 11:
SELECT name, [COUNT],
CEILING([COUNT] * 10.0 / (SELECT MAX([Count]) - MIN([Count]) FROM #Scale)) AS [Scale]
FROM #scale
WITH MinMax(Min, Max) AS (SELECT MIN(Count), MAX(Count) FROM Table1)
SELECT Name, Count, 1 + 9 * (Count - Min) / (Max - Min) AS Scale
FROM Table1, MinMax
You can make Scale column a PERSISTED COMPUTED column as:
alter table test drop column Scale
ALTER TABLE test ADD
Scale AS (case when Count between 0 and 9 then 1
when Count between 10 and 19 then 2
when Count between 20 and 29 then 3
when Count between 30 and 39 then 4
when Count between 40 and 49 then 5
when Count between 50 and 59 then 6
when Count between 60 and 69 then 7
when Count between 70 and 79 then 8
when Count between 80 and 89 then 9
when Count between 90 and 100 then 10
end
)PERSISTED
GO
DEMO
select ntile(10) over (order by [count])
Related
I'd like to be able to implement a "capped" cumulative sum in BigQuery using SQL.
Here's what I mean: I have a table whose rows have the amount by which a value is increased/decreased each day, but the value cannot go below 0 or above 100. I want to compute the cumulative sum of the changes to keep track of this value.
As an example, consider the following table:
day | change
--------------
1 | 70
2 | 50
3 | 20
4 | -30
5 | 10
6 | -90
7 | 20
I want to make a column that has the capped cumulative sum so that it looks like this:
day | change | capped cumsum
----------------------------
1 | 70 | 70
2 | 50 | 100
3 | 20 | 100
4 | -30 | 70
5 | 10 | 80
6 | -90 | 0
7 | 20 | 20
Simply doing SUM (change) OVER (ORDER BY day) and capping the values at 100 and 0 won't work. I need some sort of recursive loop and I don't know how to implement this in BigQuery.
Eventually I'd also like to do this over partitions, so that if I have something like
day | class | change
--------------
1 | A | 70
1 | B | 12
2 | A | 50
2 | B | 83
3 | A | -30
3 | B | 17
4 | A | 10
5 | A | -90
6 | A | 20
I can do the capped cumulative sum partitioned over each class.
I need some sort of recursive loop and I don't know how to implement this in BigQuery
Super naïve / cursor based approach
declare cumulative_change int64 default 0;
create temp table temp_table as (
select * , 0 as capped_cumsum from your_table where false
);
for rec in (select * from your_table order by day)
do
set cumulative_change = cumulative_change + rec.change;
set cumulative_change = case when cumulative_change < 0 then 0 when cumulative_change > 100 then 100 else cumulative_change end;
insert into temp_table (select rec.*, cumulative_change);
end for;
select * from temp_table order by day;
if applied to sample data in your question - output is
Slightly modified option with use of array instead of temp table
declare cumulative_change int64 default 0;
declare result array<struct<day int64, change int64, capped_cumsum int64>>;
for rec in (select * from your_table order by day)
do
set cumulative_change = cumulative_change + rec.change;
set cumulative_change = case when cumulative_change < 0 then 0 when cumulative_change > 100 then 100 else cumulative_change end;
set result = array(select as struct * from unnest(result) union all select as struct rec.*, cumulative_change);
end for;
select * from unnest(result) order by day;
P.S. I like none of above options so far :o)
Meantime, that approach might work for relatively small tables, set of data
Using RECURSIVE CTE can be another option:
DECLARE sample ARRAY<STRUCT<day INT64, change INT64>> DEFAULT [
(1, 70), (2, 50), (3, 20), (4, -30), (5, 10), (6, -90), (7, 20)
];
WITH RECURSIVE ccsum AS (
SELECT 0 AS n, vals[OFFSET(0)] AS change,
CASE
WHEN vals[OFFSET(0)] > 100 THEN 100
WHEN vals[OFFSET(0)] < 0 THEN 0
ELSE vals[OFFSET(0)]
END AS cap_csum
FROM sample
UNION ALL
SELECT n + 1 AS n, vals[OFFSET(n + 1)] AS change,
CASE
WHEN cap_csum + vals[OFFSET(n + 1)] > 100 THEN 100
WHEN cap_csum + vals[OFFSET(n + 1)] < 0 THEN 0
ELSE cap_csum + vals[OFFSET(n + 1)]
END AS cap_csum
FROM ccsum, sample
WHERE n < ARRAY_LENGTH(vals) - 1
),
sample AS (
SELECT ARRAY_AGG(change ORDER BY day) vals FROM UNNEST(sample)
)
SELECT * EXCEPT(n) FROM ccsum ORDER BY n;
output:
Eventually I'd also like to do this over partitions ...
Consider below solution
create temp function cap_value(value int64, lower_boundary int64, upper_boundary int64) as (
least(greatest(value, lower_boundary), upper_boundary)
);
with recursive temp_table as (
select *, row_number() over(partition by class order by day) as n from your_table
), iterations as (
select 1 as n, day, class, change, cap_value(change, 0, 100) as capped_cumsum
from temp_table
where n = 1
union all
select t.n, t.day, t.class, t.change, cap_value(i.capped_cumsum + t.change, 0, 100) as capped_cumsum
from temp_table t
join iterations i
on t.n = i.n + 1
and t.class = i.class
)
select * except(n) from iterations
order by class, day
if applied to sample data in your question - output is
The table I am trying to create should look like this
**ID** **Timeframe** Value
1 60 15
1 60 30
1 90 45
2 60 15
2 60 30
2 90 45
3 60 15
3 60 30
3 90 45
So for each ID the values of 60,60,90 and 15,30,45 should be repeated.
Could anyone help me with a code? :)
You are looking for a cross join. The basic idea is something like this:
select i.id, tv.timeframe, tv.value
from (values (1), (2), (3)) i(id) cross join
(values (60, 15), (60, 30), (90, 45)) tv(timeframe, value)
order by i.id, tv.value;
Not all databases support the values() table constructor. In those databases, you would need to use the appropriate syntax.
So you have this table: ...
id
1
2
3
and you have this table: ...
timeframe value
60 15
60 30
90 45
Then try this:
WITH
-- the ID table...
id(id) AS (
SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
)
,
-- the values table:
vals(timeframe,value) AS (
SELECT 60,15
UNION ALL SELECT 60,30
UNION ALL SELECT 90,45
)
SELECT
id
, timeframe
, value
FROM id CROSS JOIN vals
ORDER BY id, timeframe;
-- out id | timeframe | value
-- out ----+-----------+-------
-- out 1 | 60 | 30
-- out 1 | 60 | 15
-- out 1 | 90 | 45
-- out 2 | 60 | 30
-- out 2 | 60 | 15
-- out 2 | 90 | 45
-- out 3 | 60 | 30
-- out 3 | 60 | 15
-- out 3 | 90 | 45
-- out (9 rows)
Source:
Seq Amount
1 50
2 48
3 46
4 40
5 45
6 43
7 39
Here is what I want,
when the amount in currernt row is larger than the last one, It changes to the previous one.
For example in row 5, the amount 45>40 in row 4, then change it to 40
in row 6, the amount 43>40 in updated row5, then change it to 40
This is the expected result:
Seq Amount
1 50
2 48
3 46
4 40
5 40
6 40
7 39
I am currently using lag (amount) over (order by seq)
however, the result is not correct. I think I need a loop script but I am not sure how to do that, please help.
Thanks!
Here is a more generic version not requiring LAG:
SQL Fiddle
MS SQL Server 2008 Schema Setup:
CREATE TABLE Table1
([Seq] int, [Amount] int)
;
INSERT INTO Table1
([Seq], [Amount])
VALUES
(1, 50),
(2, 48),
(3, 46),
(4, 40),
(5, 45),
(6, 43),
(7, 39)
;
Query 1:
select t1.Seq, case when t1.Amount < t2.Amount or t2.Amount is null then t1.Amount else t2.Amount end as Value
from Table1 t1
left join Table1 t2 on t2.Seq = t1.Seq - 1
order by t1.Seq
Results:
| Seq | Value |
|-----|-------|
| 1 | 50 |
| 2 | 48 |
| 3 | 46 |
| 4 | 40 |
| 5 | 40 |
| 6 | 43 |
| 7 | 39 |
Update the column using the logic provided by RedFilter in a while loop until the value of ##rowcount = 0
Suppose the following table
ID Name RowNumber
2314 YY 1
213 XH 2
421 XD 3
123 AA 4
213 QQQ 5
12 WW 6
312 RR 7
123 GG 8
12 F 9
12 FF 10
312 VV 11
12 BB 12
32 NN 13
43 DD 14
53 DD 15
658 QQQQ 16
768 GGG 17
I want to replace the Name field with empty string based on condition that
First and Last cells value will not be removed.
Need to return values not in continuous cells.
Only n number of cells will be preserved
if n is less than or equal to the number entered by user than do nothing
For example, if user enters 5 then only 5 values will be preserved and the result should be (OR similar)-
ID Name RowNumber
2314 YY 1
213 2
421 3
123 AA 4
213 5
12 6
312 7
123 GG 8
12 9
12 10
312 11
12 12
32 NN 13
43 14
53 15
658 16
768 GGG 17
There could be more records than this.
I'm using SQL Server
The following will work in SQL Server 2012+, because it uses running/cumulative SUM. The query assumes that values in RowNumber column are sequential from 1 to total row count without gaps. If your data is not like this, you can use ROW_NUMBER to generate them.
Calculate Ratio of the given number N and total number of rows (CTE_Ratio)
Calculate running sum of this Ratio, truncating the fractional part of the sum (CTE_Groups)
Each integer value of the running rum defines a group of rows, re-number rows within each group (CTE_Final)
Preserve Name only for the first row from each group
To understand better how it works include intermediate columns (Ratio, GroupNumber, rn) into the output
SQL Fiddle
Sample data
DECLARE #T TABLE ([ID] int, [Name] varchar(50), [RowNumber] int);
INSERT INTO #T([ID], [Name], [RowNumber]) VALUES
(2314, 'YY', 1)
,(213, 'XH', 2)
,(421, 'XD', 3)
,(123, 'AA', 4)
,(213, 'QQQ', 5)
,(12, 'WW', 6)
,(312, 'RR', 7)
,(123, 'GG', 8)
,(12, 'F', 9)
,(12, 'FF', 10)
,(312, 'VV', 11)
,(12, 'BB', 12)
,(32, 'NN', 13)
,(43, 'DD', 14)
,(53, 'DD', 15)
,(658, 'QQQQ', 16)
,(768, 'GGG', 17);
DECLARE #N int = 5;
Query
WITH
CTE_Ratio AS
(
SELECT
ID
,Name
,RowNumber
,COUNT(*) OVER() AS TotalRows
,CAST(#N-1 AS float) / CAST(COUNT(*) OVER() AS float) AS Ratio
FROM #T
)
,CTE_Groups AS
(
SELECT
ID
,Name
,RowNumber
,TotalRows
,ROUND(SUM(Ratio) OVER(ORDER BY RowNumber), 0, 1) AS GroupNumber
FROM CTE_Ratio
)
,CTE_Final AS
(
SELECT
ID
,Name
,RowNumber
,TotalRows
,ROW_NUMBER() OVER(PARTITION BY GroupNumber ORDER BY RowNumber) AS rn
FROM CTE_Groups
)
SELECT
ID
,CASE WHEN rn=1 OR RowNumber = TotalRows THEN Name ELSE '' END AS Name
,RowNumber
FROM CTE_Final
ORDER BY RowNumber;
Result
+------+------+-----------+
| ID | Name | RowNumber |
+------+------+-----------+
| 2314 | YY | 1 |
| 213 | | 2 |
| 421 | | 3 |
| 123 | | 4 |
| 213 | QQQ | 5 |
| 12 | | 6 |
| 312 | | 7 |
| 123 | | 8 |
| 12 | F | 9 |
| 12 | | 10 |
| 312 | | 11 |
| 12 | | 12 |
| 32 | NN | 13 |
| 43 | | 14 |
| 53 | | 15 |
| 658 | | 16 |
| 768 | GGG | 17 |
+------+------+-----------+
Try this:
--Number that user enter
DECLARE #InputNumber INT
DECLARE #WorkingNumber INT
DECLARE #TotalRecords INT
DECLARE #Devider INT
SET #InputNumber = 5
SET #WorkingNumber = #InputNumber -2
--Assume #InputNumber greater than 2 and #TotalRecords greater than 4
SELECT #TotalRecords = COUNT(*)
FROM Table;
SET #Devider = CONVERT(#TotalRecords, DECIMAL(18,2))/CONVERT(#WorkingNumber, DECIMAL(18,2));
WITH Conditioned (RowNumber)
AS
(
SELECT RowNumber
FROM Table
WHERE RowNumber = 1
UNION
SELECT T.RowNumber
FROM (SELECT TOP 1 RowNumber
FROM Conditioned
ORDER BY RowNumber DESC) AS C
INNER JOIN Table AS T ON CONVERT(CEILING(C.RowNumber + #Devider), INT) = T.RowNumber
)
SELECT T.Id, CASE WHEN C.RowNumber IS NULL THEN '' ELSE T.Name END, T.RowNumber
FROM Table T
LEFT OUTER JOIN Conditioned C ON T.RowNumber = C.RowNumber
WHERE
UNION RowNumber != #TotalRecords
SELECT Id, Name, RowNumber
FROM Table
WHERE RowNumber = #TotalRecords
I created the following sqlite-DB-table and populated it with information about the frequency of different
colors of the pixels of a set of images that I analyzed. I'd like to select images according to alike colors.
I was inspired by a project by Matthew Mueller (http://research.cs.wisc.edu/vision/piximilar/), reengeneered
an alike website and am about to change the search-pattern he suggests.
Each image consists of 100 pixels and hence the sum of the columns color1 ... color6 is always 100.
id int | filename text | color1 int | color2 int | color3 int | color4 int | color5 int | color6 int |
------------------------------------------------------------------------------------------------------
1 | 1.bmp | 23 | 25 | 50 | 0 | 0 | 0 |
2 | 2.bmp | 25 | 12 | 11 | 2 | 37 | 13 |
3 | 3.bmp | 15 | 16 | 17 | 18 | 19 | 15 |
4 | 4.bmp | 0 | 100 | 0 | 0 | 0 | 0 |
...
I'm trying to write an SQL query to select all tuples where
a) one of any of the columns has a frequency above a certain threshold.
Example with DB above: threshold = 40 --> rows with ids 1 and 4 are selected.
b) the sum of two of any of the columns is above a certain threshold.
Example with DB above: threshold = 60 --> rows with ids 1, 2 and 4 are returned
c) rows are sorted according to how «close» / «similar» they are to a certain tuple.
Example with DB above: «closeness» to id 2 is goal:
Resulting order: 2, 3, 1, 4
I would appreciate your suggestions for good queries a, b and c very much.
Thanks, Dani
I think your queries will be easier to write if you normalize your tables
files
file_id, filename
1, 1.bmp
2, 2.bmp
file_colors
file_id, color_id, color_value
1, 1, 23
1, 2, 25
1, 3, 50
1, 4, 0
1, 5, 0
a) Any 1 color above a certain value
select file_id from file_colors
group by file_id
having count(case when color_value >= 40 then 1 end) > 0
b) Any sum of 2 colors above a certain value
select distinct file_id from file_colors t1
join file_colors t2 on t1.file_id = t2.file_id
where t1.color_id <> t2.color_id
and t1.color_value + t2.color_value >= 60
c) You didn't define 'difference'. The query below calculates it as the sum of the absolute distance for each color.
select t1.file_id
from file_colors t1
join file_colors t2 on t2.file_id = 2 and t2.color_id = t1.color_id
group by t1.file_id
order by sum(abs(t1.color_value - t2.color_value))