Query to sort a table in sections and filling in a section from the middle out - sql

I have a table of slots, say numbered 1 to 15. These slots are logically divided in 3 sections of 5 slots each. I would like to arrange them in sections such that the middle ones fill out first and then move out towards the edges, thus the first and last of each section filling out last.
So if a table looks like this:
Slots in the original table (| represents a boundary between segments)
| 1,2,3,4,5 | 6,7,8,9,10 | 11,12,13,14,15|
I would like to run a query that would return the result as such where each segment has the middle slot listed first, then the ones next to the middle, then ones at the end of each section.
| 3,4,2,5,1 | 8,9,7,10,6 | 13,14,12,15,11 |
Is this possible with SQL?
I've tried something like this but it doesn't quite work out:
DECLARE #SegSize INT = 5;
DECLARE #NoSeg INT = 3;
SELECT SlotLoc
FROM Slots
ORDER BY ABS((SlotLoc%#SegSize)-CEILING(#SegSize/2 + 1))

This depends on a lot of assumptions, like fixed number of rows, and also assumes you don't care about the order of slots 2/3 and 4/5. Given this data:
CREATE TABLE #slots(slot tinyint);
INSERT #slots(slot) SELECT TOP (15)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_objects;
This query gets pretty close to exactly your desired output (except sometimes you listed the higher slot value first, sometimes last):
;WITH x AS
(
SELECT slot, segment = (ROW_NUMBER() OVER (ORDER BY slot)-1)/5 FROM #slots
),
y AS
(
SELECT slot,segment,
rn = ABS(3 - ROW_NUMBER() OVER (PARTITION BY segment ORDER BY slot))
FROM x
)
SELECT slot FROM y ORDER BY segment, rn;
Results:
Slot
----
3
4
2
1
5
8
9
7
10
6
13
14
12
15
11
Cleanup:
DROP TABLE #slots;

Related

Group data series into variable width windows based on first event

I have computational task which can be reduced to the follow problem:
I have a large set of pairs of integers (key, val) which I want to group into windows. The first window starts with the first pair p ordered by key attribute and spans all the pairs where p[i].key belongs to [p[0].key; p[0].key + N), with some arbitrary integer N, positive and common to all windows.
The next window starts with the first pair ordered by key not included in the previous windows and again spans all the pairs from its key to key + N, and so on for the following windows.
The last step is to sum second attribute for each window and display it together with the first key of the window.
For example, given list of records with values:
key
val
1
3
2
7
5
1
6
4
7
1
10
3
13
5
and N=3, the windows would be:
{(1,3),(2,7)},
{(5,1),(6,4),(7,1)},
{(10,3)}
{(13,5)}
The final result:
key
sum_of_values
1
10
5
6
10
3
13
5
This is easy to program with a standard programming language but I have no clue how to solve this with SQL.
Note: If clickhouse doesn't support the RECURSIVE keyword, just remove that keyword from the expression.
Clickhouse seems to use non-standard syntax for the WITH clause. The below uses standard SQL. Adjust as needed.
Sorry. clickhouse may not support this approach. If not, we would need to find another method of walking through the data.
Standard SQL:
There are a few ways. Here's one approach. First assign row numbers to allow recursively stepping through the rows. We could use LEAD as well.
Assign a group (key value) to each row based on the current key and the last group/key value and whether they are within some distance (N = 3, in this case).
The last step is to just SUM these values per group start_key and to use the start_key value as the starting key in each group.
WITH RECURSIVE nrows (xkey, val, n) AS (
SELECT xkey, val, ROW_NUMBER() OVER (ORDER BY xkey) FROM test
)
, cte (xkey, val, n, start_key) AS (
SELECT xkey, val, n, xkey FROM nrows WHERE n = 1
UNION ALL
SELECT t1.xkey, t1.val, t1.n
, CASE WHEN t1.xkey <= t2.start_key + (3-1) THEN t2.start_key ELSE t1.xkey END
FROM nrows AS t1
JOIN cte AS t2
ON t2.n = t1.n-1
)
SELECT start_key
, SUM(val) AS sum_values
FROM cte
GROUP BY start_key
ORDER BY start_key
;
Result:
+-----------+------------+
| start_key | sum_values |
+-----------+------------+
| 1 | 10 |
| 5 | 6 |
| 10 | 3 |
| 13 | 5 |
+-----------+------------+

Seperating a Oracle Query with 1.8 million rows into 40,000 row blocks

I have a project where I am taking Documents from one system and importing them into another.
The first system has the documents and associated keywords stored. I have a query that will return the results which will then be used as the index file to import them into the new system. There are about 1.8 million documents involved so this means 1.8 million rows (One per document).
I need to divide the returned results into blocks of 40,000 to make importing them in batches of 40,000 at a time, rather than one long import.
I have the query to return the results I need. Just need to know how to take that and break it up for easier import. My apologies if I have included to little information. This is my first time here asking for help.
Use the built-in function ORA_HASH to divide the rows into 45 buckets of roughly the same number of rows. For example:
select * from some_table where ora_hash(id, 44) = 0;
select * from some_table where ora_hash(id, 44) = 1;
...
select * from some_table where ora_hash(id, 44) = 44;
The function is deterministic and will always return the same result for the same input. The resulting number starts with 0 - which is normal for a hash, but unusual for Oracle, so the query may look off-by-one at first. The hash works better with more distinct values, so pass in the primary key or another unique value if possible. Don't use a low-cardinality column, like a status column, or the buckets will be lopsided.
This process is in some ways inefficient, since you're re-reading the same table 45 times. But since you're dealing with documents, I assume the table scanning won't be the bottleneck here.
A prefered way to bucketing the ID is to use the NTILE analytic function.
I'll demonstrate this on a simplified example with a table with 18 rows that should be divided in four chunks.
select listagg(id,',') within group (order by id) from tab;
1,2,3,7,8,9,10,15,16,17,18,19,20,21,23,24,25,26
Note, that the IDs are not consecutive, so no arithmetic can be used - the NTILE gets the parameter of the requested number of buckets (4) and calculates the chunk_id
select id,
ntile(4) over (order by ID) as chunk_id
from tab
order by id;
ID CHUNK_ID
---------- ----------
1 1
2 1
3 1
7 1
8 1
9 2
10 2
15 2
16 2
17 2
18 3
19 3
20 3
21 3
23 4
24 4
25 4
26 4
18 rows selected.
All but the last bucket are of the same size, the last one can be smaller.
If you want to calculate the ranges - use simple aggregation
with chunk as (
select id,
ntile(4) over (order by ID) as chunk_id
from tab)
select chunk_id, min(id) ID_from, max(id) id_to
from chunk
group by chunk_id
order by 1;
CHUNK_ID ID_FROM ID_TO
---------- ---------- ----------
1 1 8
2 9 17
3 18 21
4 23 26

How to consolidate blocks of time?

I have a derived table with a list of relative seconds to a foreign key (ID):
CREATE TABLE Times (
ID INT
, TimeFrom INT
, TimeTo INT
);
The table contains mostly non-overlapping data, but there are occasions where I have a TimeTo < TimeFrom of another record:
+----+----------+--------+
| ID | TimeFrom | TimeTo |
+----+----------+--------+
| 10 | 10 | 30 |
| 10 | 50 | 70 |
| 10 | 60 | 150 |
| 10 | 75 | 150 |
| .. | ... | ... |
+----+----------+--------+
The result set is meant to be a flattened linear idle report, but with too many of these overlaps, I end up with negative time in use. I.e. If the window above for ID = 10 was 150 seconds long, and I summed the differences of relative seconds to subtract from the window size, I'd wind up with 150-(20+20+90+75)=-55. This approach I've tried, and is what led me to realizing there were overlaps that needed to be flattened.
So, what I'm looking for is a solution to flatten the overlaps into one set of times:
+----+----------+--------+
| ID | TimeFrom | TimeTo |
+----+----------+--------+
| 10 | 10 | 30 |
| 10 | 50 | 150 |
| .. | ... | ... |
+----+----------+--------+
Considerations: Performance is very important here, as this is part of a larger query that will perform well on it's own, and I'd rather not impact its performance much if I can help it.
On a comment regarding "Which seconds have an interval", this is something I have tried for the end result, and am looking for something with better performance. Adapted to my example:
SELECT SUM(C.N)
FROM (
SELECT A.N, ROW_NUMBER()OVER(ORDER BY A.N) RowID
FROM
(SELECT TOP 60 1 N FROM master..spt_values) A
, (SELECT TOP 720 1 N FROM master..spt_values) B
) C
WHERE EXISTS (
SELECT 1
FROM Times SE
WHERE SE.ID = 10
AND SE.TimeFrom <= C.RowID
AND SE.TimeTo >= C.RowID
AND EXISTS (
SELECT 1
FROM Times2 D
WHERE ID = SE.ID
AND D.TimeFrom <= C.RowID
AND D.TimeTo >= C.RowID
)
GROUP BY SE.ID
)
The problem I have with this solution is I have get a Row Count Spool out of the EXISTS query in the query plan with a number of executions equal to COUNT(C.*). I left the real numbers in that query to illustrate that getting around this approach is for the best. Because even with a Row Count Spool reducing the cost of the query by quite a bit, it's execution count increases the cost of the query as a whole by quite a bit as well.
Further Edit: The end goal is to put this in a procedure, so Table Variables and Temp Tables are also a possible tool to use.
OK. I'm still trying to do this with just one SELECT. But This totally works:
DECLARE #tmp TABLE (ID INT, GroupId INT, TimeFrom INT, TimeTo INT)
INSERT INTO #tmp
SELECT ID, 0, TimeFrom, TimeTo
FROM Times
ORDER BY Id, TimeFrom
DECLARE #timeTo int, #id int, #groupId int
SET #groupId = 0
UPDATE #tmp
SET
#groupId = CASE WHEN id != #id THEN 0
WHEN TimeFrom > #timeTo THEN #groupId + 1
ELSE #groupId END,
GroupId = #groupId,
#timeTo = TimeTo,
#id = id
SELECT Id, MIN(TimeFrom), Max(TimeTo) FROM #tmp
GROUP BY ID, GroupId ORDER BY ID
Left join each row to its successor overlapping row on the same ID value (where such exist).
Now for each row in the result-set of LHS left join RHS the contribution to the elapsed time for the ID is:
isnull(RHS.TimeFrom,LHS.TimeTo) - LHS.TimeFrom as TimeElapsed
Summing these by ID should give you the correct answer.
Note that:
- where there isn't an overlapping successor row the calculation is simply
LHS.TimeTo - LHS.TimeFrom
- where there is an overlapping successor row the calculation will net to
(RHS.TimeFrom - LHS.TimeFrom) + (RHS.TimeTo - RHS.TimeFrom)
which simplifies to
RHS.TimeTo - LHS.TimeFrom
What about something like below (assumes SQL 2008+ due to CTE):
WITH Overlaps
AS
(
SELECT t1.Id,
TimeFrom = MIN(t1.TimeFrom),
TimeTo = MAX(t2.TimeTo)
FROM dbo.Times t1
INNER JOIN dbo.Times t2 ON t2.Id = t1.Id
AND t2.TimeFrom > t1.TimeFrom
AND t2.TimeFrom < t1.TimeTo
GROUP BY t1.Id
)
SELECT o.Id,
o.TimeFrom,
o.TimeTo
FROM Overlaps o
UNION ALL
SELECT t.Id,
t.TimeFrom,
t.TimeTo
FROM dbo.Times t
INNER JOIN Overlaps o ON o.Id = t.Id
AND (o.TimeFrom > t.TimeFrom OR o.TimeTo < t.TimeTo);
I do not have a lot of data to test with but seems decent on the smaller data sets I have.
I also wrapped by head around this issue - and afterall I found, that the problem is your data.
You claim (if i get that right), that these entries should reflect the relative times, when a user goes idle / comes back.
So, you should consider to sanitize your data and refactor your inserts to produce valid data sets.
For instance, the two lines:
+----+----------+--------+
| ID | TimeFrom | TimeTo |
+----+----------+--------+
| 10 | 50 | 70 |
| 10 | 60 | 150 |
how can it be possible that a user is idle until second 70, but goes idle on second 60? This already implies, that he has been back latest at around second 59.
I can only assume that this issue comes from different threads and/or browser windows (tabs) a user might be using your application with. (Each having it's own "idle detection")
So instead of working-around the symptoms - you should fix the cause! Why is this data entry inserted into the table? You could avoid this by simple checking, if the user is already idle before inserting a new row.
Create a unique key constraint on ID and TimeTo
Whenever an idle-event is detected, execute the following query:
INSERT IGNORE INTO Times (ID,TimeFrom,TimeTo)VALUES('10', currentTimeStamp, -1);
-- (If the user is already "idle" - nothing will happen)
Whenever an comeback-event is detected, execute the following query:
UPDATE Times SET TimeTo=currentTimeStamp WHERE ID='10' and TimeTo=-1
-- (If the user is already "back" - nothing will happen)
The fiddle linked here: http://sqlfiddle.com/#!2/dcb17/1 would reproduce the chain of events for your example, but resulting in a clean and logical set of idle-windows:
ID TIMEFROM TIMETO
10 10 30
10 50 70
10 75 150
Note: The Output is slightly different from the output you desired. But I feel that this is more accurate, cause of the reason outlined above: A user cannot go idle on second 70 without returning from it's current idle state before. He either STAYS idle (and a second thread/tab runs into the idle-event) Or he returned in between.
Especially for your need to maximize performance, you should fix the data and not invent a work-around-query. This is maybe 3 ms upon inserts, but could be worth 20 seconds upon select!
Edit: if Multi-Threading / Multiple-Sessions is the cause for the wrong insert, you would also need to implement a check, if most_recent_come_back_time < now() - idleTimeout - otherwhise a user might comeback on tab1, and is recorded idle on tab2 after a few seconds, cause tab2 did run into it's idle timeout, cause the user only refreshed tab1.
I had the 'same' problem once with 'days' (additionaly without counting WE and Holidays)
The word counting gave me the following idea:
create table Seconds ( sec INT);
insert into Seconds values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9), ...
select count(distinct sec) from times t, seconds s
where s.sec between t.timefrom and t.timeto-1
and id=10;
you can cut the start to 0 (I put the '10' here in braces)
select count(distinct sec) from times t, seconds s
where s.sec between t.timefrom- (10) and t.timeto- (10)-1
and id=10;
and finaly
select count(distinct sec) from times t, seconds s,
(select min(timefrom) m from times where id=10) as m
where s.sec between t.timefrom-m.m and t.timeto-m.m-1
and id=10;
additionaly you can "ignore" eg. 10 seconds by dividing you loose some prezition but earn speed
select count(distinct sec)*d from times t, seconds s,
(select min(timefrom) m from times where id=10) as m,
(select 10 d) as d
where s.sec between (t.timefrom-m)/d and (t.timeto-m)/d-1
and id=10;
Sure it depends on the range you have to look at, but a 'day' or two of seconds should work (although i did not test it)
fiddle ...

please help me in building a query for the below mentioned table in sql

i have a table name conversion and i have these below mentioned columns in it i want to multiply Length\width row elements l*w of 'dimension' values and display them in another new table
Please let me know if anything changes for the same logic in ms access
probably it is simple but i dont know exact query to solve the problem waiting for your solutions
ID area length/width dimensions **new column(L*W) here**
1 1 l 3 3*5=15
2 1 w 5
3 2 l 4
4 2 w 8
5 3 l 6
6 3 w 10
7 4 l 12
8 4 w 13
9 4 W 10
waiting for your reply
You could query the table twice: once for lengths and once for widths and then join by area and multiply the values:
select length.area, length.dimension * width.dimension
from
(select area, dimension from conversion where lenwidth = 'l') length
inner join
(select area, dimension from conversion where lenwidth = 'w') width
on length.area = width.area;
Two remarks:
I suppose that it is a typo that you have two width entries for area 4? Otherwise you would have to decide which value to take in above select statement.
It would not be a good idea to keep the old table and have a new table holding the results. What if you change a value? You would have to remember to change the result accordingly every time. So either ditch the old table or use a view instead of a new table.
Try this
select *,
dimensions*(lead(dimensions) over(order by id)) product
from table1;
Or if you want for the set of area then
select *,
case when length_width='l' and (lead(length_width) over(order by id))='w'
then dimensions*(lead(dimensions) over(order by id))
else 0
end as product
from table1;
fiddle

Get a certain number of contiguous rows

I've got a school project, kind of huge, and only a few days left, so here is one problem I'm stuck on, hope you can help.
I've got this table:
places(id, otherInfo)
Simple enough, well I need to make a SQL query or PL/SQL function to retrieve a certain number of contiguous rows. For instance if I call the function using getContiguousPlaces(3); On the table which has the rows:
ID
1
4
5
6
18
19
I want to get rows with ID 4, 5 and 6.
How could I do that?
SELECT p.id, p.prev_id1, p.prev_id2
FROM (
SELECT id, LAG(id, 1) OVER(ORDER BY id) prev_id1, LAG(id, 2) OVER(ORDER BY id) prev_id2
FROM places
) p
WHERE p.prev_id1 = id-1
AND p.prev_id2 = id-2
here you go: http://sqlfiddle.com/#!4/a20e1/1
But I guess you could retrieve the datas differently if you wished.
edit: this case if for the parameter "3". But if you want to adapt it with a number "n", you either have to use it within a dynamic query, or maybe use the partition by clause. I'm going to have a look at this one now...
Try this:
WITH t AS
(SELECT p.*, LEVEL l,
CONNECT_BY_ROOT id AS cbr
FROM places p CONNECT BY
PRIOR id = id-1)
SELECT *
FROM t
WHERE cbr IN
(SELECT cbr
FROM t
WHERE l = 3 )
and l <= 3
ORDER BY cbr,
id
The constant 3 should be a parameter
Here is a fiddle