I have a set of data which consists of periodically collected values. I want to calculate a median using 2 left and right neighbors of a current value for each element of set.
For example, the set is:
21
22
23
-10
20
22
19
21
100
20
For the first value we pick 21, 22, 23 which median is 22. So for 21 we have 22. For -10 we have 22, 23, -10, 20, 22. Median is 22.
I use this method to get rid of "deviant" values which are abnormal for this set.
I guess I should somehow use median analytic function. Something like that:
SELECT (SELECT median(d.value)
FROM my_set d
WHERE d.key_val = s.key_val
AND d.order_value BETWEEN s.order_value - 2 AND s.order_value + 2) median_val
,s.key_val
,s.order_value
FROM my_set s
I would be happy to see any other approaches or some improved approaches to solve this question.
You did not specify anything about your table structure so I'm just guessing from your SQL what fields there are and what they're supposed to mean, but consider an attempt like this one:
SELECT s1.key_val, s1.order_value, s1.value, MEDIAN(s2.value) as med
FROM my_set s1
LEFT OUTER JOIN my_set s2
ON s2.key_val = s1.key_val
AND (s1.order_value - 2) <= s2.order_value
AND s2.order_value <= (s1.order_value + 2)
GROUP BY s1.key_val, s1.order_value, s1.value
Related
I have a SQLite table with an Id and an active period, and I am trying to get counts of the number of active of rows over a sequence of times.
A vastly simplified version of this table is:
CREATE TABLE Data (
EntityId INTEGER NOT NULL,
Start INTEGER NOT NULL,
Finish INTEGER
);
With some example data
INSERT INTO Data VALUES
(1, 0, 2),
(1, 4, 6),
(1, 8, NULL),
(2, 5, 7),
(2, 9, NULL),
(3, 8, NULL);
And an desired output of something like:
Time
Count
0
1
1
1
2
0
3
0
4
1
5
2
6
1
7
0
8
2
9
3
For which I am querying with:
WITH RECURSIVE Generate_Time(Time) AS (
SELECT 0
UNION ALL
SELECT Time + 1 FROM Generate_Time
WHERE Time + 1 <= (SELECT MAX(Start) FROM Data)
)
SELECT Time, COUNT(EntityId)
FROM Data
JOIN Generate_Time ON Start <= Time AND (Finish > Time OR Finish IS NULL)
GROUP BY Time
There is also some data I need to categorise the counts by (some are on the original table, some are using a join), but I am hitting a performance bottleneck in the order of seconds on even small amounts of data (~25,000 rows) without any of that.
I have added an index on the table covering Start/End:
CREATE INDEX Ix_Data ON Data (
Start,
Finish
);
and that helped somewhat but I can't help but feel there's a more elegant & performant way of doing this. Using the CTE to iterate over a range doesn't seem like it will scale very well but I can't think of another way to calculate what I need.
I've been looking at the query plan too, and I think the slow part of the GROUP BY since it can't use an index for that since it's from the CTE so SQLite generates a temporary BTree:
3 0 0 MATERIALIZE 3
7 3 0 SETUP
8 7 0 SCAN CONSTANT ROW
21 3 0 RECURSIVE STEP
22 21 0 SCAN TABLE Generate_Time
27 21 0 SCALAR SUBQUERY 2
32 27 0 SEARCH TABLE Data USING COVERING INDEX Ix_Data
57 0 0 SCAN SUBQUERY 3
59 0 0 SEARCH TABLE Data USING INDEX Ix_Data (Start<?)
71 0 0 USE TEMP B-TREE FOR GROUP BY
Any suggestions of a way to speed this query up, or even a better way of storing this data to craft a tighter query would be most welcome!
To get to the desired output as per your question, the following can be done.
For better performance, on option is to make use of generate_series to generate rows instead of the recursive CTE and limit the number of rows to the max-value available in data.
WITH RECURSIVE Generate_Time(Time) AS (
SELECT 0
UNION ALL
SELECT Time + 1 FROM Generate_Time
WHERE Time + 1 <= (SELECT MAX(Start) FROM Data)
)
SELECT gt.Time
,count(d.entityid)
FROM Generate_Time gt
LEFT JOIN Data d
ON gt.Time between d.start and IFNULL(d.finish,gt.Time)
GROUP BY gt.Time
This ended up being simply a case of the result set being too large. In my real data, the result set before grouping was ~19,000,000 records. I was able to do some partitioning on my client side, splitting the queries into smaller discrete chunks which improved performance ~10x, which still wasn't quite as fast as I wanted but was acceptable for my use case.
In SQL there are aggregation operators, like AVG, SUM, COUNT. Why doesn't it have an operator for multiplication? "MUL" or something.
I was wondering, does it exist for Oracle, MSSQL, MySQL ? If not is there a workaround that would give this behaviour?
By MUL do you mean progressive multiplication of values?
Even with 100 rows of some small size (say 10s), your MUL(column) is going to overflow any data type! With such a high probability of mis/ab-use, and very limited scope for use, it does not need to be a SQL Standard. As others have shown there are mathematical ways of working it out, just as there are many many ways to do tricky calculations in SQL just using standard (and common-use) methods.
Sample data:
Column
1
2
4
8
COUNT : 4 items (1 for each non-null)
SUM : 1 + 2 + 4 + 8 = 15
AVG : 3.75 (SUM/COUNT)
MUL : 1 x 2 x 4 x 8 ? ( =64 )
For completeness, the Oracle, MSSQL, MySQL core implementations *
Oracle : EXP(SUM(LN(column))) or POWER(N,SUM(LOG(column, N)))
MSSQL : EXP(SUM(LOG(column))) or POWER(N,SUM(LOG(column)/LOG(N)))
MySQL : EXP(SUM(LOG(column))) or POW(N,SUM(LOG(N,column)))
Care when using EXP/LOG in SQL Server, watch the return type http://msdn.microsoft.com/en-us/library/ms187592.aspx
The POWER form allows for larger numbers (using bases larger than Euler's number), and in cases where the result grows too large to turn it back using POWER, you can return just the logarithmic value and calculate the actual number outside of the SQL query
* LOG(0) and LOG(-ve) are undefined. The below shows only how to handle this in SQL Server. Equivalents can be found for the other SQL flavours, using the same concept
create table MUL(data int)
insert MUL select 1 yourColumn union all
select 2 union all
select 4 union all
select 8 union all
select -2 union all
select 0
select CASE WHEN MIN(abs(data)) = 0 then 0 ELSE
EXP(SUM(Log(abs(nullif(data,0))))) -- the base mathematics
* round(0.5-count(nullif(sign(sign(data)+0.5),1))%2,0) -- pairs up negatives
END
from MUL
Ingredients:
taking the abs() of data, if the min is 0, multiplying by whatever else is futile, the result is 0
When data is 0, NULLIF converts it to null. The abs(), log() both return null, causing it to be precluded from sum()
If data is not 0, abs allows us to multiple a negative number using the LOG method - we will keep track of the negativity elsewhere
Working out the final sign
sign(data) returns 1 for >0, 0 for 0 and -1 for <0.
We add another 0.5 and take the sign() again, so we have now classified 0 and 1 both as 1, and only -1 as -1.
again use NULLIF to remove from COUNT() the 1's, since we only need to count up the negatives.
% 2 against the count() of negative numbers returns either
--> 1 if there is an odd number of negative numbers
--> 0 if there is an even number of negative numbers
more mathematical tricks: we take 1 or 0 off 0.5, so that the above becomes
--> (0.5-1=-0.5=>round to -1) if there is an odd number of negative numbers
--> (0.5-0= 0.5=>round to 1) if there is an even number of negative numbers
we multiple this final 1/-1 against the SUM-PRODUCT value for the real result
No, but you can use Mathematics :)
if yourColumn is always bigger than zero:
select EXP(SUM(LOG(yourColumn))) As ColumnProduct from yourTable
I see an Oracle answer is still missing, so here it is:
SQL> with yourTable as
2 ( select 1 yourColumn from dual union all
3 select 2 from dual union all
4 select 4 from dual union all
5 select 8 from dual
6 )
7 select EXP(SUM(LN(yourColumn))) As ColumnProduct from yourTable
8 /
COLUMNPRODUCT
-------------
64
1 row selected.
Regards,
Rob.
With PostgreSQL, you can create your own aggregate functions, see http://www.postgresql.org/docs/8.2/interactive/sql-createaggregate.html
To create an aggregate function on MySQL, you'll need to build an .so (linux) or .dll (windows) file. An example is shown here: http://www.codeproject.com/KB/database/mygroupconcat.aspx
I'm not sure about mssql and oracle, but i bet they have options to create custom aggregates as well.
You'll break any datatype fairly quickly as numbers mount up.
Using LOG/EXP is tricky because of numbers <= 0 that will fail when using LOG. I wrote a solution in this question that deals with this
Using CTE in MS SQL:
CREATE TABLE Foo(Id int, Val int)
INSERT INTO Foo VALUES(1, 2), (2, 3), (3, 4), (4, 5), (5, 6)
;WITH cte AS
(
SELECT Id, Val AS Multiply, row_number() over (order by Id) as rn
FROM Foo
WHERE Id=1
UNION ALL
SELECT ff.Id, cte.multiply*ff.Val as multiply, ff.rn FROM
(SELECT f.Id, f.Val, (row_number() over (order by f.Id)) as rn
FROM Foo f) ff
INNER JOIN cte
ON ff.rn -1= cte.rn
)
SELECT * FROM cte
Not sure about Oracle or sql-server, but in MySQL you can just use * like you normally would.
mysql> select count(id), count(id)*10 from tablename;
+-----------+--------------+
| count(id) | count(id)*10 |
+-----------+--------------+
| 961 | 9610 |
+-----------+--------------+
1 row in set (0.00 sec)
I have a table with the following fields
ID,Content,QuestionMarks,TypeofQuestion
350, What is the symbol used to represent Bromine?,2,MCQ
758,What is the symbol used to represent Bromine? ,2,MCQ
2425,What is the symbol used to represent Bromine?,3,Essay
2080,A quadrilateral has four sides, four angles ,1,MCQ
2614,A circular cone has a curved surface area of ,2,MCQ
2520,Two triangles have sides 5 cm, 11 cm, 2 cm . ,2,MCQ
2196,Life supporting process mediated by water? ,2,Essay
I would like to get random questions where total marks is an input number.
For example if I say 25, the result should be all the random questions whose Sum(QuestionMarks) is 25(+/-1)
Is this really possible using a SQL
select content,id,questionmarks,sum(questionmarks) from quiz_question
group by content,id,questionmarks;
Expected Input 25
Expected Result (Sum of Question Marks =25)
Update:
How do I ensure I get atleast 2 Essay Type Questions (this is just an example) I would extend this for other conditions. Thank you for all the help
S-Man's cumulative sum is the right approach. For your logic, though, I think you want to get up to the first row that is 24 or more. That logic is:
where total - questionmark < 24
If you have enough questions, then you could get exactly 25 using:
with q25 as (
select *
from (select t.*,
sum(questionmark) over (order by random()) as running_questionmark
from t
) t
where running_questionmark < 25
)
select q.ID, q.Content, q.QuestionMarks, q.TypeofQuestion
from q25 q
union all
(select t.ID, t.Content, t.QuestionMarks, t.TypeofQuestion
from t cross join
(select sum(questionmark) as questionmark_25 from q25) x
where not exists (select 1 from q25 where q25.id = t.id)
order by abs(questionmark - (25 - questionmark_25))
limit 1
)
This selects questions up to 25 but not at 25. It then tries to find one more to make the total 25.
Supposing, questionmark is of type integer. Then you want to get some records in random order whose questionmark sum is not more than 25:
You can use the consecutive SUM() window function. The order is random. The consecutive SUM() adds every current value to the previous sum. So, you could filter where SUM() <= <your value>:
demo:db<>fiddle
SELECT
*
FROM (
SELECT
*,
SUM(questionmark) OVER (ORDER BY random()) as total
FROM
t
)s
WHERE total <= 25
Note:
This returns a records list with no more than 25, but as close as possible to it with an random order.
To find an exact match of your value is some sort of combinatorical problem which shouldn't be solved in a database. Especially when there's a random factor. What if your current SUM is 22 and the next randomly chosen value is 4. Would you retry maybe until infinity to randomly find a value = 3? Or are you trying to remove an already counted record with value = 1?
Table1
ID
12
21
12
21
...
Conditon
1)
I need to check either id should 12 or id should be 21. It should not be other numbers.
Below query is working
SELECT distinct ltrim(id) from table1 where ltrim(id) = '12' or ltrim(id) = '21')
2)
I dont need muliple number, always 12 or always 21, It should not be mixed, like
id
12
12
12
or
id
21
21
21
Below query is working
Declare #0_Recorddup int = 0
SELECT #0_Recorddup = Count(id) from (SELECT distinct id from table1) t1
if (#0_Recorddup = 0) or (#0_Recorddup > 1)
begin
''error message
end
How to merge a both query, can anyone help me....
ltrim(id) = '12'
You store id's - a numeric value - as a string? I am sorry, but I hope you program better.
For integers it is simple like in most other languages:
id = 12
and yes, etc. aredoable. I would suggest you grab some book about SQL and start learnng basics. Seriously.
I need to check either id should 12 or id should be 21. It should not be other numbers.
Simple. Trivial. Like in any other programming language:
id = 12 OR id = 21
alternative in SQL:
id IN (12, 21)
that is not as nice for 2 numbers but gets in handy fast.
"SQL for Dummies" (IBAM 1118607961, available through amazon etc.) is a decent book for someone at your level. Explains the basics. Like how not to compare numbers as strings.
I have a table Patients which looks like this:
PatientName DateOftest Eye L1 L2 L3 L4 L5
----------------------------------------------------------------
Mike 17-02-2009 L 23 25 40 32 30
Mike 17-02-2009 R 25 30 34 35 24
Bill 08-03-2006 L 20 24 30 24 25
Bill 08-03-2006 R 18 25 27 30 24
Now my query below finds mean
SELECT
PatientName, DateOfTest,
(MAX(L1) + MAX(L2) + MAX(L3) + MAX(L4) + MAX(L5))/4 as Mean,
SQRT(POW(L1 - Mean, 2) + POW(L2 - Mean, 2) + POW(L3 - Mean, 2) + POW(L4 - Mean, 2) + POW(L5 - Mean, 2)) AS Standard Deviation,
'Binocular' Eye
FROM
Patients
GROUP BY
PatientName, DateOfTest;
The above query is wrong because I have not stored mean.. is there any way to store mean to find out standard deviation in my code.. I'm asking because I have very lengthy query and more records..
To store the mean and reuse it in your query, one option would be to use a Common Table Expression. You can join the CTE to the table to use the calculated mean multiple times.
I'll admit that didn't understand the following line...
SQRT(POW(L1-Mean,2)+POW(L2-Mean,2)+POW(L3-Mean,2)+POW(L4-Mean,2)+POW(L5-Mean,2))
as Standard Deviation, 'Binocular' Eye
...but the query below shows how you would integrate the calculated mean into that line, which I think might need some additional work as well.
--This is the CTE to calculate the mean
WITH Mean_CTE AS
(
SELECT PatientName, DateOfTest,
(MAX(L1) + MAX(L2) + MAX(L3) + MAX(L4) + MAX(L5))/4 AS [Mean]
FROM Patients
GROUP BY PatientName, DateOfTest
)
--This is the original query
SELECT Patients.PatientName, Patients.DateOfTest, Mean_CTE.Mean AS Mean,
SQRT(POW(L1-Mean_CTE.Mean,2)+POW(L2-Mean_CTE.Mean,2)+POW(L3-Mean_CTE.Mean,2)
+POW(L4-Mean_CTE.Mean,2)+POW(L5-Mean_CTE.Mean,2)) as Standard Deviation,
'Binocular' Eye
FROM Patients
INNER JOIN Mean_CTE --This is where you join the two
ON Patients.PatientName = Mean_CTE.PatientName
AND Patients.DateOfTest = Mean_CTE.DateOfTest
GROUP BY Patients.PatientName, Patients.DateOfTest, Mean_CTE.Mean;
What about the possibility of adding a CALCULATED column to the table that stores the result of the formula??
This is a rather simple concept, and will sotre the value for the formula.
http://msdn.microsoft.com/en-us/library/ms191250(v=sql.105).aspx