SQL Server : how to select a fixed amount of rows (select every x-th value) - sql

A short description: I have a table with data that is updated over a certain time period. Now the problem is, that - depending on the nature of the sensor which sends the data - in this time period there could be either 50 data sets or 50.000. As I want to visualize this data (using ASP.NET / c#), for a first preview I would like to SELECT just 1000 values from the table.
I already have an approach doing this: I count the rows in the time period of interest, with a simple "where" clause to specify the sensor-id, save it as a variable in SQL, and then divide the count() by 1000. I've tried it in MS Access, where it works just fine:
set #divider = select count(*) from table where [...]
SELECT (Int([RowNumber]/#divider)), First(Value)
FROM myTable
GROUP BY (Int([RowNumber]/#divider));
The trick in Access was, that I simply have a data field ("RowNumber"), which is my PK/ID, and goes from 0 up. I tried to accomplish that in SQL Server using the ROW_NUMBER() method, which works more or less. I've got the right syntax for the method, but I can not use the GROUP BY statement
Windowed functions can only appear in the SELECT or ORDER BY
clauses.
meaning ROW_NUMBER() can't be in the GROUP BY statement.
Now I'm kinda stuck. I've tried to save the ROW_NUMBER value into a char or a separate column, and GROUP BY it later on, but I couldn't get it done. And somehow I start to think, that my strategy might have its weaknesses ...? :/
To clarify once more: I don't need to SELECT TOP 1000 from my table, because this would just mean that I select the first 1000 values (depending on the sorting). I need to SELECT every x-th value, while I can compute the x (and I could even round it to an INT, if that would help to get it done). I hope I was able to describe the problem understandable ...
This is my first post here on StackOverflow, I hope I didn't forget anything essential or important, if you need any further information (table structure, my queries so far, ...) please don't hesitate to ask. Any help or hint is highly appreciated - thanks in advance! :)
Update: SOLUTION! Big thanks to https://stackoverflow.com/users/52598/lieven!!!
Here is how I did it in the end:
I declare 2 variables - I count my rows and SET it into the first var. Then I use ROUND() on the just assigned variable, and divide it by 1000 (because in the end I want ABOUT 1000 values!). I split this operation into 2 variables, because if I used the value from the COUNT function as basis for my ROUND operation, there were some mistakes.
declare #myvar decimal(10,2)
declare #myvar2 decimal(10,2)
set #myvar = (select COUNT(*)
from value_table
where channelid=135 and myDate >= '2011-01-14 22:00:00.000' and myDate <= '2011-02-14 22:00:00.000'
)
set #myvar2 = ROUND(#myvar/1000, 0)
Now I have the rounded value, which I want to be my step-size (take every x-th value -> this is our "x" ;)) stored in #myvar2. Next I will subselect the data of the desired timespan and channel, and add the ROW_NUMBER() as column "rn", and finally add a WHERE-clause to the outer SELECT, where I divide the ROW_NUMBER through #myvar2 - when the modulus is 0, the row will be SELECTed.
select * from
(
select (ROW_NUMBER() over (order by id desc)) as rn, myValue, myDate
from value_table
where channel_id=135 and myDate >= '2011-01-14 22:00:00.000' and myDate<= '2011-02-14 22:00:00.000'
) d
WHERE rn % #myvar2 = 0
Works like a charm - once again all my thanks to https://stackoverflow.com/users/52598/lieven, see the comment below for the original posting!

In essence, all you need to do to select the x-th value is retain all rows where the modulus of the rownumber divided by x is 0.
WHERE rn % #x_thValues = 0
Now to be able to use your ROW_NUMBER's result, you'll need to wrap the entire statement into in a subselect
SELECT *
FROM (
SELECT *
, rn = ROW_NUMBER() OVER (ORDER BY Value)
FROM DummyData
) d
WHERE rn % #x_thValues = 0
Combined with a variable to what x-th values you need, you might use something like this testscript
DECLARE #x_thValues INTEGER = 2
;WITH DummyData AS (SELECT * FROM (VALUES (1), (2), (3), (4)) v (Value))
SELECT *
FROM (
SELECT *
, rn = ROW_NUMBER() OVER (ORDER BY Value)
FROM DummyData
) d
WHERE rn % #x_thValues = 0

One more option to consider:
Select Top 1000 *
From dbo.SomeTable
Where ....
Order By NewID()
but to be honest- like the previous answer more than this one.
The question could be about performance..

Related

How to pull rows from a SQL table until quotas for multiple columns are met?

I've been able to find a few examples of questions similar to this one, but most only involve a single column being checked.
SQL Select until Quantity Met
Select rows until condition met
I have a large table representing facilities, with columns for each type of resource available and the number of those specific resources available per facility. I want this stored procedure to be able to take integer values in as multiple parameters (representing each of these columns) and a Lat/Lon. Then it should iterate over the table sorted by distance, and return all rows (facilities) until the required quantity of available resources (specified by the parameters) are met.
Data source example:
Id
Lat
Long
Resource1
Resource2
...
1
50.123
4.23
5
12
...
2
61.234
5.34
0
9
...
3
50.634
4.67
21
18
...
Result Wanted:
#latQuery = 50.634
#LongQuery = 4.67
#res1Query = 10
#res2Query = 20
Id
Lat
Long
Resource1
Resource2
...
3
50.634
4.67
21
18
...
1
50.123
4.23
5
12
...
Result includes all rows that meet the queries individually. Result is also sorted by distance to the requested lat/lon
I'm able to sort the results by distance, and sum the total running values as suggested in other threads, but I'm having some trouble with the logic comparing the running values with the quota provided in the params.
First I have some CTEs to get most recent edits, order by distance and then sum the running totals
WITH cte1 AS (SELECT
#origin.STDistance(geography::Point(Facility.Lat, Facility.Long, 4326)) AS distance,
Facility.Resource1 as res1,
Facility.Resource2 as res2
-- ...etc
FROM Facility
),
cte2 AS (SELECT
distance,
res1,
SUM(res1) OVER (ORDER BY distance) AS totRes1,
res2,
SUM(res1) OVER (ORDER BY distance) AS totRes2
-- ...etc, there's 15-20 columns here
FROM cte1
)
Next, with the results of that CTE, I need to pull rows until all quotas are met. Having the issues here, where it works for one row but my logic with all the ANDs isn't exactly right.
SELECT * FROM cte2 WHERE (
(totRes1 <= #res1Query OR (totRes1 > #res1Query AND totRes1- res1 <= #totRes1)) AND
(totRes2 <= #res2Query OR (totRes2 > #res2Query AND totRes2- res2 <= #totRes2)) AND
-- ... I also feel like this method of pulling the next row once it's over may be convoluted as well?
)
As-is right now, it's mostly returning nothing, and I'm guessing it's because it's too strict? Essentially, I want to be able to let the total values go past the required values until they are all past the required values, and then return that list.
Has anyone come across a better method of searching using separate quotas for multiple columns?
See my update in the answers/comments
I think you are massively over-complicating this. This does not need any joins, just some running sum calculations, and the right OR logic.
The key to solving this is that you need all rows, where the running sum up to the previous row is less than the requirement for all requirements. This means that you include all rows where the requirement has not been met, and the first row for which the requirement has been met or exceeded.
To do this you can subtract the current row's value from the running sum.
You could utilize a ROWS specification of ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING. But then you need to deal with NULL on the first row.
In any event, even a regular running sum should always use ROWS UNBOUNDED PRECEDING, because the default is RANGE UNBOUNDED PRECEDING, which is subtly different and can cause incorrect results, as well as being slower.
You can also factor out the distance calculation into a CROSS APPLY (VALUES, avoiding the need for lots of CTEs or derived tables. You now only need one level of derivation.
DECLARE #origin geography = geography::Point(#latQuery, #LongQuery, 4326);
SELECT
f.Id,
f.Lat,
f.Long,
f.Resource1,
f.Resource2
FROM (
SELECT f.*,
SumRes1 = SUM(f.Resource1) OVER (ORDER BY v1.Distance ROWS UNBOUNDED PRECEDING) - f.Resource1,
SumRes2 = SUM(f.Resource2) OVER (ORDER BY v1.Distance ROWS UNBOUNDED PRECEDING) - f.Resource2
FROM Facility f
CROSS APPLY (VALUES(
#origin.STDistance(geography::Point(f.Lat, f.Long, 4326))
)) v1(Distance)
) f
WHERE (
f.SumRes1 < #res1Query
OR f.SumRes2 < #res2Query
);
db<>fiddle
Was able to figure out the problem on my own here. The primary issue I was running into was that I was comparing 25 different columns' running totals versus the 25 stored proc parameters (quotas of resources required by the search).
Changing the lines such as these
(totRes1 <= #res1Query OR (totRes1 > #res1Query AND totRes1- res1 <= #totRes1)) AND --...
to
(totRes1 <= #res1Query OR (totRes1 > #res1Query AND totRes1- res1 <= #totRes1) OR #res1Query = 0) AND --...
(adding in the OR #res1Query = 0)solved my issue.
In other words, the search is often only for one or two columns (types of resources) - leaving others as zero. The way my logic was set up caused it to skip over lots of rows because it was instantly marking them as having met the quota (value less than or equal to the quota). like #A Neon Tetra suggested, was pretty close to it already.
Update:
First attempt didn't exactly fix my own issue. Posting the stripped down version of my code that is now working for me.
DECLARE #Lat AS DECIMAL(12,6)
DECLARE #Lon AS DECIMAL(12,6)
DECLARE #res1Query AS INT
DECLARE #res2Query AS INT
-- repeat for Resource 3 through 25, etc...
DECLARE #origin geography = geography::Point(#Lat, #Lon, 4326);
-- CTE to be able to expose distance
cte AS (SELECT TOP(99999) -- --> this is hacky, it won't let me order by distance unless I'm selecting TOP(x) or some other fn?
dbo.Facility.FacilityGUID,
dbo.Facility.Lat,
dbo.Facility.Lon,
#origin.STDistance(geography::Point(dbo.Facility.Lat, dbo.Facility.Lon, 4326))
AS distance,
dbo.Facility.Resource1 AS res1,
dbo.Facility.Resource2 AS res2,
-- repeat for Resource 3 through 25, etc...
FROM dbo.Facility
ORDER BY distance),
-- third CTE - has access to distance so we can keep track of a running total ordered by distance
---> have to separate into two since you can't reference the same alias (distance) again within the same SELECT
fullCTE AS (SELECT
FacilityID,
Lat,
Long,
distance,
res1,
SUM(res1) OVER (ORDER BY distance)AS totRes1,
res2,
SUM(res2) OVER (ORDER BY distance)AS totRes2,
-- repeat for Resource 3 through 25, etc...
FROM cte)
SELECT * -- Customize what you're pulling here for your output as needed
FROM dbo.Facility INNER JOIN fullCTE ON (fullCTE.FacilityID = dbo.Facility.FacilityID)
WHERE EXISTS
(SELECT
FacilityID
FROM fullCTE WHERE (
FacilityID = dbo.Facility.FacilityID AND
-- Keep pulling rows until all conditions are met, as opposed to pulling rows while they're under the quota
NOT (
((totRes1 - res1 >= #res1Query AND #res1Query <> 0) OR (#res1Query = 0)) AND
((totRes2 - res2 >= #res2Query AND #res2Query <> 0) OR (#res2Query = 0)) AND
-- repeat for Resource 3 through 25, etc...
)
)
)

Sql -after group by I need to take rows with newest date

I need to write a query in sql and I can't do it correctly. I have a table with 7 columns 1st_num, 2nd_num, 3rd_num, opening_Date, Amount, code, cancel_Flag.
For every 1st_num, 2nd_num, 3rd_num I want to take only the record with the min (cancel_flag), and if there's more then 1 row so take the the newest opening Date.
But when I do group by and choose min and max for the relevant fields, I get a mix of the rows, for example:
1. 12,130,45678,2015-01-01,2005,333,0
2. 12,130,45678,2015-01-09,105,313,0
The result will be
:12,130,45678,2015-01-09,2005,333,0
and that mixes the rows into one
Microsoft sql server 2008 . using ssis by visual studio 2008
my code is :
SELECT
1st_num,
2nd_num,
3rd_num,
MAX(opening_date),
MAX (Amount),
code,
MIN(cancel_flag)
FROM do. tablename
GROUP BY
1st_num,
2nd_num,
3rd_num,
code
HAVING COUNT(*) > 1
How do I take the row with the max date or.min cancel flag as it is without mixing values?
I can't really post my code because of security reasons but I'm sure you can help.
thank you,
Oren
It is very difficult like this to answer, because every DBMS has different syntax.
Anyways, for most dbms this should work. Using row_number() function to rank the rows, and take only the first one by our definition (all your conditions):
SELECT * FROM (
SELECT t.*,
ROW_NUMBER() OVER ( PARTITION BY t.1st_num,t.2nd_num,t.3rd_num order by t.cancel_flag asc,t.opening_date desc) as row_num
FROM YourTable t ) as tableTempName
WHERE row_num = 1
Use NOT EXISTS to return a row as long as no other row with same 1st_num, 2nd_num, 3rd_num has a lower cancel_flag value, or same cancel_flag but a higher opening_Date.
select *
from tablename t1
where not exists (select 1 from tablename t2
where t2.1st_num = t1.1st_num
and t2.2nd_num = t1.2nd_num
and t2.3rd_num = t1.3rd_num
and (t2.cancel_flag < t1.cancel_flag
or (t2.cancel_flag = t1.cancel_flag and
t2.opening_Date > t1.opening_Date)))
Core ANSI SQL-99, expected to work with (almost) any dbms.

SQL Server - Multiplying row values for a given column value [duplicate]

Im looking for something like SELECT PRODUCT(table.price) FROM table GROUP BY table.sale similar to how SUM works.
Have I missed something on the documentation, or is there really no PRODUCT function?
If so, why not?
Note: I looked for the function in postgres, mysql and mssql and found none so I assumed all sql does not support it.
For MSSQL you can use this. It can be adopted for other platforms: it's just maths and aggregates on logarithms.
SELECT
GrpID,
CASE
WHEN MinVal = 0 THEN 0
WHEN Neg % 2 = 1 THEN -1 * EXP(ABSMult)
ELSE EXP(ABSMult)
END
FROM
(
SELECT
GrpID,
--log of +ve row values
SUM(LOG(ABS(NULLIF(Value, 0)))) AS ABSMult,
--count of -ve values. Even = +ve result.
SUM(SIGN(CASE WHEN Value < 0 THEN 1 ELSE 0 END)) AS Neg,
--anything * zero = zero
MIN(ABS(Value)) AS MinVal
FROM
Mytable
GROUP BY
GrpID
) foo
Taken from my answer here: SQL Server Query - groupwise multiplication
I don't know why there isn't one, but (take more care over negative numbers) you can use logs and exponents to do:-
select exp (sum (ln (table.price))) from table ...
There is no PRODUCT set function in the SQL Standard. It would appear to be a worthy candidate, though (unlike, say, a CONCATENATE set function: it's not a good fit for SQL e.g. the resulting data type would involve multivalues and pose a problem as regards first normal form).
The SQL Standards aim to consolidate functionality across SQL products circa 1990 and to provide 'thought leadership' on future development. In short, they document what SQL does and what SQL should do. The absence of PRODUCT set function suggests that in 1990 no vendor though it worthy of inclusion and there has been no academic interest in introducing it into the Standard.
Of course, vendors always have sought to add their own functionality, these days usually as extentions to Standards rather than tangentally. I don't recall seeing a PRODUCT set function (or even demand for one) in any of the SQL products I've used.
In any case, the work around is fairly simple using log and exp scalar functions (and logic to handle negatives) with the SUM set function; see #gbn's answer for some sample code. I've never needed to do this in a business application, though.
In conclusion, my best guess is that there is no demand from SQL end users for a PRODUCT set function; further, that anyone with an academic interest would probably find the workaround acceptable (i.e. would not value the syntactic sugar a PRODUCT set function would provide).
Out of interest, there is indeed demand in SQL Server Land for new set functions but for those of the window function variety (and Standard SQL, too). For more details, including how to get involved in further driving demand, see Itzik Ben-Gan's blog.
You can perform a product aggregate function, but you have to do the maths yourself, like this...
SELECT
Exp(Sum(IIf(Abs([Num])=0,0,Log(Abs([Num])))))*IIf(Min(Abs([Num]))=0,0,1)*(1-2*(Sum(IIf([Num]>=0,0,1)) Mod 2)) AS P
FROM
Table1
Source: http://productfunctionsql.codeplex.com/
There is a neat trick in T-SQL (not sure if it's ANSI) that allows to concatenate string values from a set of rows into one variable. It looks like it works for multiplying as well:
declare #Floats as table (value float)
insert into #Floats values (0.9)
insert into #Floats values (0.9)
insert into #Floats values (0.9)
declare #multiplier float = null
select
#multiplier = isnull(#multiplier, '1') * value
from #Floats
select #multiplier
This can potentially be more numerically stable than the log/exp solution.
I think that is because no numbering system is able to accommodate many products. As databases are designed for large number of records, a product of 1000 numbers would be super massive and in case of floating point numbers, the propagated error would be huge.
Also note that using log can be a dangerous solution. Although mathematically log(a*b) = log(a)*log(b), it might not be in computers as we are not dealing with real numbers. If you calculate 2^(log(a)+log(b)) instead of a*b, you may get unexpected results. For example:
SELECT 9999999999*99999999974482, EXP(LOG(9999999999)+LOG(99999999974482))
in Sql Server returns
999999999644820000025518, 9.99999999644812E+23
So my point is when you are trying to do the product do it carefully and test is heavily.
One way to deal with this problem (if you are working in a scripting language) is to use the group_concat function.
For example, SELECT group_concat(table.price) FROM table GROUP BY table.sale
This will return a string with all prices for the same sale value, separated by a comma.
Then with a parser you can get each price, and do a multiplication. (In php you can even use the array_reduce function, in fact in the php.net manual you get a suitable example).
Cheers
Another approach based on fact that the cardinality of cartesian product is product of cardinalities of particular sets ;-)
⚠ WARNING: This example is just for fun and is rather academic, don't use it in production! (apart from the fact it's just for positive and practically small integers)⚠
with recursive t(c) as (
select unnest(array[2,5,7,8])
), p(a) as (
select array_agg(c) from t
union all
select p.a[2:]
from p
cross join generate_series(1, p.a[1])
)
select count(*) from p where cardinality(a) = 0;
The problem can be solved using modern SQL features such as window functions and CTEs. Everything is standard SQL and - unlike logarithm-based solutions - does not require switching from integer world to floating point world nor handling nonpositive numbers. Just number rows and evaluate product in recursive query until no row remain:
with recursive t(c) as (
select unnest(array[2,5,7,8])
), r(c,n) as (
select t.c, row_number() over () from t
), p(c,n) as (
select c, n from r where n = 1
union all
select r.c * p.c, r.n from p join r on p.n + 1 = r.n
)
select c from p where n = (select max(n) from p);
As your question involves grouping by sale column, things got little bit complicated but it's still solvable:
with recursive t(sale,price) as (
select 'multiplication', 2 union
select 'multiplication', 5 union
select 'multiplication', 7 union
select 'multiplication', 8 union
select 'trivial', 1 union
select 'trivial', 8 union
select 'negatives work', -2 union
select 'negatives work', -3 union
select 'negatives work', -5 union
select 'look ma, zero works too!', 1 union
select 'look ma, zero works too!', 0 union
select 'look ma, zero works too!', 2
), r(sale,price,n,maxn) as (
select t.sale, t.price, row_number() over (partition by sale), count(1) over (partition by sale)
from t
), p(sale,price,n,maxn) as (
select sale, price, n, maxn
from r where n = 1
union all
select p.sale, r.price * p.price, r.n, r.maxn
from p
join r on p.sale = r.sale and p.n + 1 = r.n
)
select sale, price
from p
where n = maxn
order by sale;
Result:
sale,price
"look ma, zero works too!",0
multiplication,560
negatives work,-30
trivial,8
Tested on Postgres.
Here is an oracle solution for anyone who needs it
with data(id, val) as(
select 1,1.0 from dual union all
select 2,-2.0 from dual union all
select 3,1.0 from dual union all
select 4,2.0 from dual
),
neg(val , modifier) as(
select exp(sum(ln(abs(val)))), case when mod(count(*),2) = 0 then 1 Else -1 end
from data
where val <0
)
,
pos(val) as (
select exp(sum(ln(val)))
from data
where val >=0
)
select (select val*modifier from neg)*(select val from pos) product from dual

Infinite Scroll with shuffle results

How do I return random results that do not repeat?
For example, I've an infinite scrolling page, every time I get to the bottom it returns ten results, but sometimes the results are repeated.
I'm using this query to get results:
SELECT TOP 10 * FROM table_name ORDER BY NEWID()
Sorry, I don't know if you'll understand.
When you call the query from your application you set the seed for the RAND() function.
SET #rand = RAND(your_seed); -- initialize RAND with the seed.
SELECT * FROM table_name
ORDER BY RAND() -- Calls to RAND should now be based on the seed
OFFSET 0 LIMIT 10 -- use some MsSQL equivalent here ;)
(not tested)
Apparently, NEWID() has known distributional problems. Although random, the numbers sometimes cluster together. This would account for what you are seeing. You could try this:
SELECT TOP 10 *
FROM table_name
ORDER BY rand(checksum(NEWID()));
This may give you a better results.
The real answer, though, is to use a seeded pseudo-random number generator. Basically, enumerate the rows of the table and store the value in the table. Or calculate it in a deterministic way. Then do simple math to choose a row:
with t as (
select t.*, row_number() over (order by id) as seqnum,
count(*) over () as cnt
from table_name
)
select t.*
from t
where mod(seqnum * 74873, cnt) = 13907;
The numbers are just two prime numbers, which ensure a lack of cycles.
EDIT:
Here is a more complete solution to your problem:
with t as (
select t.*, row_number() over (order by id) as seqnum,
count(*) over () as cnt
from table_name
)
select t.*
from t
where mod(seqnum * 74873 + 13907, cnt) <= 10;
Or whatever the limits are. The idea is that using a large prime number for the multiplicative factor makes it highly likely (but not 100% certain) that that cnt and "74783" are what is called "mutually prime" or "coprime". This means that the pseudo-random number generator just described will rearrange the sequence numbers and you can just use comparisons to get a certain number of rows. This is part of mathematics called Number Theory.

how to select lines in Mysql while a condition lasts

I have something like this:
Name.....Value
A...........10
B............9
C............8
Meaning, the values are in descending order. I need to create a new table that will contain the values that make up 60% of the total values. So, this could be a pseudocode:
set Total = sum(value)
set counter = 0
foreach line from table OriginalTable do:
counter = counter + value
if counter > 0.6*Total then break
else insert line into FinalTable
end
As you can see, I'm parsing the sql lines here. I know this can be done using handlers, but I can't get it to work. So, any solution using handlers or something else creative will be great.
It should also be in a reasonable time complexity - the solution how to select values that sum up to 60% of the total
works, but it's slow as hell :(
Thanks!!!!
You'll likely need to use the lead() or lag() window function, possibly with a recursive query to merge the rows together. See this related question:
merge DATE-rows if episodes are in direct succession or overlapping
And in case you're using MySQL, you can work around the lack of window functions by using something like this:
Mysql query problem
I don't know which analytical functions SQL Server (which I assume you are using) supports; for Oracle, you could use something like:
select v.*,
cumulative/overall percent_current,
previous_cumulative/overall percent_previous from (
select
id,
name,
value,
cumulative,
lag(cumulative) over (order by id) as previous_cumulative,
overall
from (
select
id,
name,
value,
sum(value) over (order by id) as cumulative,
(select sum(value) from mytab) overall
from mytab
order by id)
) v
Explanation:
- sum(value) over ... computes a running total for the sum
- lag() gives you the value for the previous row
- you can then combine these to find the first row where percent_current > 0.6 and percent_previous < 0.6