SQL: the most effective way to get row number of one element - sql

I have a table of persons:
id | Name | Age
1 | Alex | 18
2 | Peter| 30
3 | Zack | 25
4 | Bim | 30
5 | Ken | 20
And I have the following interval of rows: WHERE ID>1 AND ID<5. I know that in this interval there is a person whose id=3. What is the most efficient (the fastest) way to get its row number in this interval (in my example rownumber=2)? I mean I don't need any other data. I need only one thing - to know row position of person with id=3 in interval WHERE ID>1 AND ID<5.
If it's possible I would like to get not vendor specific solution but a general sql solution. If it's not possible then I need solution for postgresql and h2.

The row number would be the number of rows between the first row in the interval and the row you're looking for. For interval ID>1 AND ID<5 and target row ID=3, this is:
select count(*)
from YourTable
where id between 2 and 3
For interval ID>314 AND ID<1592 and target row ID=1000, you'd use:
where id between 315 and 1000
To be sure that there is an element with ID=3, use:
select count(*)
from YourTable
where id between 2 and
(
select id
from YourTable
where id = 3
)
This will return 0 if the row doesn't exist.

Related

Split a quantity into multiple rows with limit on quantity per row

I have a table of ids and quantities that looks like this:
dbo.Quantity
id | qty
-------
1 | 3
2 | 6
I would like to split the quantity column into multiple lines and number them, but with a set limit (which can be arbitrary) on the maximum quantity allowed for each row.
So for the value of 2, expected output should be:
dbo.DesiredResult
id | qty | bucket
---------------
1 | 2 | 1
1 | 1 | 2
2 | 1 | 2
2 | 2 | 3
2 | 2 | 4
2 | 1 | 5
In other words,
Running SELECT id, SUM(qty) as qty FROM dbo.DesiredResult should return the original table (dbo.Quantity).
Running
SELECT id, SUM(qty) as qty FROM dbo.DesiredResult GROUP BY bucket
should give you this table.
id | qty | bucket
------------------
1 | 2 | 1
1 | 2 | 2
2 | 2 | 3
2 | 2 | 4
2 | 1 | 5
I feel I can do this with cursors imperitavely, looping over each row, keeping a counter that increments and resets as the "max" for each is filled. But this is very "anti-SQL" I feel there is a better way around this.
One approach is recursive CTE which emulates cursor sequentially going through rows.
Another approach that comes to mind is to represent your data as intervals and intersections of intervals.
Represent this:
id | qty
-------
1 | 3
2 | 6
as intervals [0;3), [3;9) with ids being their labels
0123456789
|--|-----|
1 2 - id
It is easy to generate this set of intervals using running total SUM() OVER().
Represent your buckets also as intervals [0;2), [2;4), [4;6), etc. with their own labels
0123456789
|-|-|-|-|-|
1 2 3 4 5 - bucket
It is easy to generate this set of intervals using a table of numbers.
Intersect these two sets of intervals preserving information about their labels.
Working with sets should be possible in a set-based SQL query, rather than a sequential cursor or recursion.
It is bit too much for me to write down the actual query right now. But, it is quite possible that ideas similar to those discussed in Packing Intervals by Itzik Ben-Gan may be useful here.
Actually, once you have your quantities represented as intervals you can generate required number of rows/buckets on the fly from the table of numbers using CROSS APPLY.
Imagine we transformed your Quantity table into Intervals:
Start | End | ID
0 | 3 | 1
3 | 9 | 2
And we also have a table of numbers - a table Numbers with column Number with values from 0 to, say, 100K.
For each Start and End of the interval we can calculate the corresponding bucket number by dividing the value by the bucket size and rounding down or up.
Something along these lines:
SELECT
Intervals.ID
,A.qty
,A.Bucket
FROM
Intervals
CROSS APPLY
(
SELECT
Numbers.Number + 1 AS Bucket
,#BucketSize AS qty
-- it is equal to #BucketSize if the bucket is completely within the Start and End boundaries
-- it should be adjusted for the first and last buckets of the interval
FROM Numbers
WHERE
Numbers.Number >= Start / #BucketSize
AND Numbers.Number < End / #BucketSize + 1
) AS A
;
You'll need to check and adjust formulas for errors +-1.
And write some CASE WHEN logic for calculating the correct qty for the buckets that happen to be on the lower and upper boundary of the interval.
Use a recursive CTE:
with cte as (
select id, 1 as n, qty
from t
union all
select id, n + 1, qty
from cte
where n + 1 < qty
)
select id, n
from cte;
Here is a db<>fiddle.

Find highest (max) date query, and then find highest value from results of previous query

Here is a table called packages:
id packages_sent date sent_order
1 | 10 | 2017-02-11 | 1
2 | 25 | 2017-03-15 | 1
3 | 5 | 2017-04-08 | 1
4 | 20 | 2017-05-21 | 1
5 | 25 | 2017-05-21 | 2
6 | 5 | 2017-06-19 | 1
This table shows the number of packages sent on a given date; if there were multiple packages sent on the same date (as is the case with rows 4 and 5), then the sent_order keeps track of the order in which they were sent.
I am trying to make a query that will return sum(packages_sent) given the following conditions: first, return the row with the max(date) (given some date provided), and second, if there are multiple rows with the same max(date), return the row with the max(send_order) (the highest send_order value).
Here is the query I have so far:
SELECT sum(packages_sent)
FROM packages
WHERE date IN
(SELECT max(date)
FROM packages
WHERE date <= '2017-05-29');
This query correctly finds the max date, which is 2017-05-21, but then for the sum it returns 45 because it is adding rows 4 and 5 together.
I want the query to return the max(date), and if there are multiple rows with the same max(date), then return the row with the max(sent_order). Using the example above with the date 2017-05-29, it should only return 25.
I don't see where a sum() comes into play. You seem to only want the last row:
select p.*
from packages p
order by date desc, sendorder desc
fetch first 1 row only;
If you data is truly ordered ascending as you show it then it's easier to use the surrogate key ID field.
SELECT packages_sent
FROM packages
WHERE ID =
(SELECT max(ID)
FROM packages
WHERE date <= '2017-05-29');
Since the ID is always increasing with date and sent order finding the max of it also finds the max of the other two in one step.

Count results in SQL statement additional row

I am trying to get 3% of total membership which the code below does, but the results are bringing me back two rows one has the % and the other is "0" not sure why or how to get rid of it ...
select
sum(Diabetes_FLAG) * 100 / (select round(count(medicaid_no) * 0.03) as percent
from membership) AS PERCENT_OF_Dia
from
prefinal
group by
Diabetes_Flag
Not sure why it brought back a second row I only need the % not the second row .
Not sure what I am doing wrong
Output:
PERCENT_OF_DIA
1 11.1111111111111
2 0
SELECT sum(Diabetes_FLAG)*100 / (SELECT round(count(medicaid_no)*0.03) as percentt
FROM membership) AS PERCENT_OF_Dia
FROM prefinal
WHERE Diabetes_FLAG = 1
# GROUP BY Diabetes_Flag # as you're limiting by the flag in the where clause, this isn't needed.
Remove the group by if you want one row:
select sum(Diabetes_FLAG)*100/( SELECT round(count(medicaid_no)*0.03) as percentt
from membership) AS PERCENT_OF_Dia
from prefinal;
When you include group by Diabetes_FLAG, it creates a separate row for each value of Diabetes_FLAG. Based on your results, I'm guessing that it takes on the values 0 and 1.
Not sure why it brought back a second row
This is how GROUP BY query works. The group by clause group data by a given column, that is - it collects all values of this column, makes a distinct set of these values and displays one row for each individual value.
Please consider this simple demo: http://sqlfiddle.com/#!9/3a38df/1
SELECT * FROM prefinal;
| Diabetes_Flag |
|---------------|
| 1 |
| 1 |
| 5 |
Usually GROUP BY column is listed in in SELECT clause too, in this way:
SELECT Diabetes_Flag, sum(Diabetes_Flag)
FROM prefinal
GROUP BY Diabetes_Flag;
| Diabetes_Flag | sum(Diabetes_Flag) |
|---------------|--------------------|
| 1 | 2 |
| 5 | 5 |
As you see, GROUP BY display two rows - one row for each unique value of Diabetes_Flag column.
If you remove Diabetes_Flag colum from SELECT clause, you will get the same result as above, but without this column:
SELECT sum(Diabetes_Flag)
FROM prefinal
GROUP BY Diabetes_Flag;
| sum(Diabetes_Flag) |
|--------------------|
| 2 |
| 5 |
So the reason that you get 2 rows is that Diabetes_Flag has 2 distict values in the table.

Finding the maximum value between a given interval

Let's say I have a table like so, where the amount is some arbitrary amount of something(like fruit or something but we don't care about the type)
row | amount
_______________
1 | 54
2 | 2
3 | 102
4 | 102
5 | 1
And I want to select the rows that have the maximum value within a given interval. For instance if I was only wanting to select from rows 2-5 what would be returned would be
row | amount
_______________
3 | 102
4 | 102
Because they both contain the max value within the interval, which is 102. Or if I chose to only look at rows 1-2 it would return
row | amount
_______________
1 | 54
Because the maximum value in the interval 1-2 only exists in row 1
I tried to use a variety of:
amount= (select MAX(amount) FROM arbitraryTable)
But that will only ever return
row | amount
_______________
3 | 102
4 | 102
Because 102 is the absolute max of the table. Can you find the maximum value between a given interval?
I would use rank() or max() as a window function:
select t.row, t.amount
from (select t.*, max(amount) over () as maxamount
from t
where row between 2 and 4
) t
where amount = maxamount;
You can use a subquery to get the max value and use it in WHERE clause:
SELECT
row,
amount
FROM
arbitraryTable
WHERE
row BETWEEN 2 AND 5 AND
amount = (
SELECT
MAX(amount)
FROM
arbitraryTable
WHERE
row BETWEEN 2 AND 5
);
Just remember to use the same conditions in the main and sub query: row BETWEEN 2 AND 5.

Determine on what page a record is

How can I determine on what page a certain record is?
Let's say i display 5 records per page using a query like this:
SELECT * FROM posts ORDER BY date DESC LIMIT 0,5
SELECT * FROM posts ORDER BY date DESC LIMIT 5,5
SELECT * FROM posts ORDER BY date DESC LIMIT 10,5
Sample data:
id | name | date
-----------------------------------------------------
1 | a | 2013-11-07 08:19 page 1
2 | b | 2013-12-02 12:32
3 | c | 2013-12-14 14:11
4 | d | 2013-12-21 09:26
5 | e | 2013-12-22 18:52 _________
6 | f | 2014-01-04 11:20 page 2
7 | g | 2014-01-07 21:09
8 | h | 2014-01-08 13:39
9 | i | 2014-01-08 16:41
10 | j | 2014-01-09 07:45 _________
11 | k | 2014-01-14 22:05 page 3
12 | l | 2014-01-21 17:21
Someone may edit a record, let's say with id = 7, or insert a new record (id = 13). How can determine on which page is that record? The reason is that I want to display the page that contains the record that has just been edited or added.
ok I guess I could just display the same page if the record is edited. But the problem is when a record gets added. The list can be ordered by name and the new record could be placed anywhere :(
Is there some way I could do a query like SELECT offset WHERE id = 13 ORDER BY date LIMIT 5 that returns 10 ?
For the sake of this example, let's assume that entry 7 has just been added (and that there could be duplicate names) - the first thing you need to do is find how many entries come before that one (based on name), thus:
SELECT COUNT(*)
FROM Posts
WHERE name <= 'g'
AND id < 7
Here, id is being used as a "tiebreaker" column, to ensure a stable sort. It's also assuming that we know the value of id, too - given that non-key data can be duplicate, you need that sort of functionality.
In any case, this gives us the number of rows preceding this one (6). With some integer division arithmetic (based on the LIMIT), we can now get the relevant information:
(int) ((6 - 1) / 5) = 1
... this is for a 0-indexed page, though (ie, entries 1 - 5 appear on page "0"); however, in this case it works in our favor. Note that we have to subtract 1 from the initial count because the first is 1, not 0 - otherwise, entry 5 would appear on the second page, instead of the first.
We now have the page index, but we need to turn it into the entry index. Some simple multiplication does that for us:
(1 * 5) + 1 = 6
(ignore that this is identical to the count - it's coincidence in this case).
This gives us the index of the first entry on the page, the value for OFFSET.
We can now write the query:
SELECT id, name, date
FROM Posts
ORDER BY name, id
LIMIT 5 OFFSET 6
(keep in mind that we require id to guarantee a stable sort for the data, if we assume that name could be a duplicate!).
This is two trips to the database. Surprisingly, SQLite allows LIMIT/OFFSET values to be the results of SQL subqueries (keep in mind, not all RDBMSs allow them to even be host variables, meaning the could only be changed with dynamic SQL. Although in at least one case, the db had ROW_NUMBER() to make up for that...). I wasn't able to get rid of the repetition of the subqueries, though.
SELECT Posts.id, Posts.name, Posts.date, Pages.pageCount
FROM Posts
CROSS JOIN (SELECT ((COUNT(*) - 1) / 5) + 1 as pageCount
FROM Posts
WHERE name <= 'g'
AND id < 7) Pages
ORDER BY name
LIMIT 5, (SELECT ((COUNT(*) - 1) / 5) * 5 + 1 as entryCount
FROM Posts
WHERE name <= 'g'
AND id < 7);
(and the working SQL Fiddle example).