Interval of consecutive numbers with Oracle SQL and JOINS - sql

I'm trying to display the intervals of consecutive elevations in Iceland, per row, in a form like this:
ELEVATION
0 27
29 33
35
37 40
42 46
48
51 63
For now, I only managed to track the gaps, I have something in mind like:
If the (count of the values in columnB until columnC='GAP') equals
that value of columnB where columnC='GAP' THEN we have a consecutive interval
between the value of columnA and column B
Can anyone give me some tips ?
current code
with x as (
SELECT distinct elevation
FROM CITIES
WHERE iso = 'IS' AND iso IS NOT NULL
),y as (
SELECT a.ELEVATION as "A",B.ELEVATION as "B",C.ELEVATION as "C"
FROM x a
JOIN x b ON b.ELEVATION > a.ELEVATION
LEFT JOIN x c ON c.ELEVATION > b.ELEVATION AND c.ELEVATION < b.ELEVATION + 2
)select y.A,y.B,y.C,case when y.C is null then 'GAP' else ' ' end GAPZ from y
order by 1,2
output
A B C GAP
------------------
0 1 2
0 2 3
0 3 4
...
...
...
0 25 26
0 26 27
0 27 GAP
0 29 30
0 30 31
0 31 32
0 32 33
0 33 GAP
0 35 GAP
0 37 38
0 38 39
0 39 40
0 40 GAP
0 42 43
0 43 44
0 44 45
0 45 46
0 46 GAP
0 48 GAP
0 51 52
0 52 53
...
...
...
0 61 62
0 62 63
0 63 GAP
0 65 66
0 66 67
0 67 68
0 68 69
0 69 GAP
0 71 72
...
...
...

You're halfway there! You need to flag either end of the gap, then select only those rows.
Select x,
Case lead(x,x) over(order by x)
When x+1 then null
Else x
End as endpoint,
Case lag(x,x) over(order by x)
When x-1 then null
Else x
End as startpoint
From table
This will show whether a row is a start or endpoint. Let's call that Q1. Now we just select what we need from that.
Select Q1.startpoint,
(Select min(endpoint)
From Q1 as endp
Where endp.endpoint >= Q1.x) as endpoint
From Q1
Where Q1.startpoint is not null

Related

SQL query to extract matching diagonal pairs in SQL Server database

I have a database table (mytable) with 2 columns x and y as shown below, from which I intend to extract rows with matching diagonal pairs of (x,y) and (y,x) e.g., 4 21 and 21 4
x y
86 86
27 27
45 45
95 95
11 11
18 8
85 85
2 2
77 77
91 91
15 15
84 84
51 51
32 32
35 35
8 8
92 92
67 67
62 62
33 33
13 13
15 11
18 18
3 3
38 38
80 80
34 34
6 6
72 72
14 12
44 44
4 22
90 90
47 47
78 78
23 3
42 42
56 56
79 79
55 55
65 65
17 17
64 64
4 4
28 28
19 19
17 9
36 36
25 25
81 81
60 60
48 48
5 5
88 88
7 19
21 21
29 29
52 52
9 17
9 9
13 13
16 10
1 1
31 31
46 46
7 7
58 58
23 23
87 87
83 83
66 66
93 93
24 2
98 98
53 53
20 6
61 61
20 20
96 96
99 99
73 73
2 24
14 14
71 71
5 21
22 4
75 75
6 20
97 97
41 41
26 26
22 22
8 18
74 74
40 40
21 5
94 94
76 76
49 49
11 15
59 59
89 89
68 68
24 24
37 37
12 12
63 63
43 43
16 16
100 100
39 39
25 1
69 69
54 54
50 50
30 30
10 10
I have tried the accepted code on stackoverflow here (enter link description here) on my mytable which gives me the expected results on Oracle DB.
select least(x, y) as x, greatest(x, y) as y
from mytable
group by least(x, y), greatest(x, y)
having count(*) = 2
union all
select x, y
from mytable
where not exists (select 1 from mytable mytable2 where mytable2.y = mytable.x and mytable2.x = mytable2.y)
order by x asc;
Now I need to execute the same query on MS SQL DB but according to my understanding MS SQL DB does not support the least and greatest functions. I have tried to use the case conditions, for instance for the first part of the SQL query on the link provided, I am considering the below but so far I cannot replicate similar results:
select x,y,z
from (
select x, y,
case
when (x < y) then x
when (y > x) then y
end as z
from mytable
group by x, y
) as t
Any suggestions on what I need to consider to complete the query in SQL Server database, so that I produce the final output as below?
It would also be great if somebody has an idea on how I can use SQL's lag() function to assist me in achieving the same result. For instance I am trying something like below.
;with t1 as (
select x as x1, y as y1, lag(x,1) over(order by x asc) as z1
from mytable
),
t2 as (
select x as x2, y as y2, lag(y,1) over(order by x asc) as z2
from mytable
)
select t1.*,t2.*
from t1 full outer join t2 on t1.x1 = t2.x2
Expected output:
x y
2 24
4 22
5 21
6 20
8 18
9 17
11 15
13 13
The equivalent of the functions LEAST() and GREATEST() is to use CASE expressions:
SELECT CASE WHEN x < y THEN x ELSE y END AS x,
CASE WHEN x > y THEN x ELSE y END AS y
FROM mytable
GROUP BY CASE WHEN x < y THEN x ELSE y END,
CASE WHEN x > y THEN x ELSE y END
HAVING COUNT(*) = 2 -- change to COUNT(*) > 1 if each combination may exist more than twice
ORDER BY x, y;
The above query will return a row for a combination of (x, y) that exists twice even if (y, x) does not exist.
If this is not what you want, use a self join and UNION ALL:
SELECT DISTINCT t1.*
FROM mytable t1 INNER JOIN mytable t2
ON t2.x = t1.y AND t2.y = t1.x
WHERE t1.x < t1.y
UNION ALL
SELECT x, y
FROM mytable
WHERE x = y
GROUP BY x, y
HAVING COUNT(*) > 1
ORDER BY x, y;
See the demo.

Foreign key value count for each row in a table

I'm trying to generate a table which returns a count for the preceding instances of the foreign key value up to and including the foreign key value on that row.
ForeignIDValue is non-nullable.
I've tried table variables and common table expressions but it gets long winded and messy.
Is there a more elegant and concise way of doing it?
So table A
PrimaryKeyValue ForeignIDValue ProgressiveForeignIDValueCount
15 42 NULL
16 42 NULL
17 43 NULL
18 42 NULL
19 42 NULL
20 42 NULL
24 42 NULL
26 42 NULL
27 42 NULL
29 42 NULL
30 42 NULL
31 42 NULL
32 42 NULL
35 42 NULL
36 42 NULL
37 42 NULL
38 42 NULL
39 42 NULL
40 44 NULL
41 45 NULL
42 46 NULL
43 45 NULL
Needs to become Table B
PrimaryKeyValue ForeignIDValue ProgressiveForeignIDValueCount
15 42 1
16 42 2
17 43 1
18 42 3
19 42 4
20 42 5
24 42 6
26 42 7
27 42 8
29 42 9
30 42 10
31 42 11
32 42 12
35 42 13
36 42 14
37 42 15
38 42 16
39 42 17
40 44 1
41 45 1
42 46 1
43 45 2
SELECT PrimaryKeyValue, ForeignIDValue,
ProgressiveForeignIDValueCount = ROW_NUMBER() OVER
(PARTITION BY ForeignIDValue ORDER BY PrimaryKeyValue)
FROM dbo.[your table name]
ORDER BY PrimaryKeyValue, ProgressiveForeignIDValueCount;
Example db<>fiddle

How do speed up date range select in sqlite?

Here is my query:
const tags = await db.queryEntries(
"SELECT tag, count(tag) AS count, created_at FROM tags WHERE DATE(created_at) >= DATE('now', '-1 days') GROUP BY tag ORDER BY count DESC LIMIT 100"
);
Here is my schema:
CREATE TABLE tags (
tag TEXT,
url STRING
, created_at TEXT);
CREATE UNIQUE INDEX tag_url ON tags (tag, url)
;
CREATE INDEX idx_tags_created_at ON tags(created_at);
It's still very slow (30+ seconds) when I run the query there are about 1.5 million records in the db for tags.
Here are the results of EXPLAIN:
addr opcode p1 p2 p3 p4 p5 comment
---- ------------- ---- ---- ---- ------------- -- -------------
0 Init 0 56 0 00 Start at 56
1 OpenEphemeral 1 5 0 k(1,-B) 00 nColumn=5
2 Integer 100 1 0 00 r[1]=100; LIMIT counter
3 Noop 2 2 0 00
4 Integer 0 6 0 00 r[6]=0; clear abort flag
5 Null 0 9 9 00 r[9..9]=NULL
6 Gosub 8 46 0 00
7 OpenRead 0 7 0 3 00 root=7 iDb=0; tags
8 OpenRead 3 3693502 0 k(3,,,) 00 root=3693502 iDb=0; tag_url
9 Rewind 3 28 11 0 00
10 DeferredSeek 3 0 0 00 Move 0 to 3.rowid if needed
11 Column 0 2 12 00 r[12]=tags.created_at
12 Function 0 12 11 date(-1) 00 r[11]=func(r[12])
13 Lt 13 27 11 50 if r[11]<r[13] goto 27
14 Column 3 0 10 00 r[10]=tags.tag
15 Compare 9 10 1 k(1,-B) 00 r[9] <-> r[10]
16 Jump 17 21 17 00
17 Move 10 9 1 00 r[9]=r[10]
18 Gosub 7 32 0 00 output one row
19 IfPos 6 49 0 00 if r[6]>0 then r[6]-=0, goto 49; check abort flag
20 Gosub 8 46 0 00 reset accumulator
21 Column 3 0 11 00 r[11]=tags.tag
22 AggStep 0 11 3 count(1) 01 accum=r[3] step(r[11])
23 If 5 26 0 00
24 Column 3 0 2 00 r[2]=tags.tag
25 Column 0 2 4 00 r[4]=tags.created_at
26 Integer 1 5 0 00 r[5]=1; indicate data in accumulator
27 Next 3 10 0 01
28 Gosub 7 32 0 00 output final row
29 Goto 0 49 0 00
30 Integer 1 6 0 00 r[6]=1; set abort flag
31 Return 7 0 0 00
32 IfPos 5 34 0 00 if r[5]>0 then r[5]-=0, goto 34; Groupby result generator entry point
33 Return 7 0 0 00
34 AggFinal 3 1 0 count(1) 00 accum=r[3] N=1
35 Copy 3 14 0 00 r[14]=r[3]
36 Sequence 1 15 0 00 r[15]=cursor[1].ctr++
37 IfNotZero 1 41 0 00 if r[1]!=0 then r[1]--, goto 41
38 Last 1 0 0 00
39 IdxLE 1 45 14 1 00 key=r[14]
40 Delete 1 0 0 00
41 Copy 2 16 0 00 r[16]=r[2]
42 Copy 4 17 0 00 r[17]=r[4]
43 MakeRecord 14 4 19 00 r[19]=mkrec(r[14..17])
44 IdxInsert 1 19 14 4 00 key=r[19]
45 Return 7 0 0 00 end groupby result generator
46 Null 0 2 4 00 r[2..4]=NULL
47 Integer 0 5 0 00 r[5]=0; indicate accumulator empty
48 Return 8 0 0 00
49 Sort 1 55 0 00
50 Column 1 3 18 00 r[18]=created_at
51 Column 1 0 17 00 r[17]=count
52 Column 1 2 16 00 r[16]=tag
53 ResultRow 16 3 0 00 output=r[16..18]
54 Next 1 50 0 00
55 Halt 0 0 0 00
56 Transaction 0 0 8 0 01 usesStmtJournal=0
57 String8 0 20 0 now 00 r[20]='now'
58 String8 0 21 0 -1 days 00 r[21]='-1 days'
59 Function 3 20 13 date(-1) 00 r[13]=func(r[20])
60 Goto 0 1 0 00
Rewrite the query so that it is sargable (i.e. it can take advantage of an index):
SELECT tag, COUNT(tag) AS count
FROM tags
WHERE created_at >= strftime('%Y-%m-%d', 'now', '-1 days')
GROUP BY tag
ORDER BY count DESC
LIMIT 100;
The above query should benefit from the following index:
CREATE INDEX idx ON tags (created_at, tag);

How to create dynamic row in sql without inserting a value?

I have a requirement to add dynamic rows based on results fetched by SQL query. I've written a query that shows result something like the below:
Value
Name
1
Test 1
2
Test 2
.
.
n
n
The above SQL result will return a dynamic number of rows. (Number of rows not fixed)
So I want to add a column with values like Parent1, Parent2, and so on based on the number of rows. Suppose my query returns a total of 300 rows then the first row should be named as Parent1 in column Value and In name both, Then result of my query until the 150th row then another dynamic row with value column as Parent2 field and so on like below table.
Value
Name
Parent1
Parent 1
1
Test 1
2
Test 2
.
.
Parent2
Parent2
151
Test 151
.
.
n
n
Please Note : I can not use DDL or DML Commands to achive this.
Suppose this is your original query
select
to_char(rownum) value, 'Test '||rownum name
from dual
connect by level <= 6
;
VALUE NAME
---------- ----------
1 Test 1
2 Test 2
3 Test 3
4 Test 4
5 Test 5
6 Test 6
and you want to introdues two header Parent lines.
You may use NTILEto split the original query in two parts ordering on some column (here VALUE)
NTILE(2) OVER (ORDER BY VALUE) nt
Change the number in NTILE to increase the split.
The query below uses the original query as base, calculates the NTILE for the split, adds with UNION ALL the Parent rows.
Most importantly covers the correct order using the NTILE number (nt), the source (first parent row than data) and the value.
with dt as ( /* your original query */
select
to_char(rownum) value, 'Test '||rownum name
from dual
connect by level <= 6
)
select VALUE, NAME,
NTILE(2) OVER (ORDER BY VALUE) nt, /* modify to change split */
1 src
from dt
union all
select
'Parent'||rownum value,
'Parent'||rownum name,
rownum nt, 0 src
from dual connect by level <= 2 /* modify to change split */
order by nt, src, value;
VALUE NAME NT SRC
---------------------------------------------- ---------------------------------------------- ---------- ----------
Parent1 Parent1 1 0
1 Test 1 1 1
2 Test 2 1 1
3 Test 3 1 1
Parent2 Parent2 2 0
4 Test 4 2 1
5 Test 5 2 1
6 Test 6 2 1
The query below will generate a list of parents/non-parents using CONNECT BY. You can change the 300 to the number of rows you want to generate and you change the 150 in the query to have a parent generated that many number of rows.
SELECT LEVEL,
CASE
WHEN MOD (LEVEL, 150) = 0 OR LEVEL = 1
THEN
'Parent' || TO_CHAR (TRUNC (LEVEL / 150) + 1)
ELSE
TO_CHAR (LEVEL)
END AS VALUE,
CASE
WHEN MOD (LEVEL, 150) = 0 OR LEVEL = 1
THEN
'Parent' || TO_CHAR (TRUNC (LEVEL / 150) + 1)
ELSE
'Test ' || TO_CHAR (LEVEL)
END AS VALUE
FROM DUAL
CONNECT BY LEVEL <= 300;
A similar approach, more dynamic.
col value for 9999
col name for a20
define limit = &1
define split = &2
select level as lvl,
case
when mod (level, &&split) = 0 or level = 1
then
'parent' || to_char (trunc (level / &&split) + 1)
else
to_char (level)
end as name,
case
when mod (level, &&split) = 0 or level = 1
then
'parent' || to_char (trunc (level / &&split) + 1)
else
'test ' || to_char (level)
end as value
from dual
connect by level <= &&limit
/
Executed as script, you inform two parameters, the total number of values and the split value.
SQL> #generate.sql 100 50
old 3: when mod (level, &&split) = 0 or level = 1
new 3: when mod (level, 50) = 0 or level = 1
old 5: 'parent' || to_char (trunc (level / &&split) + 1)
new 5: 'parent' || to_char (trunc (level / 50) + 1)
old 10: when mod (level, &&split) = 0 or level = 1
new 10: when mod (level, 50) = 0 or level = 1
old 12: 'parent' || to_char (trunc (level / &&split) + 1)
new 12: 'parent' || to_char (trunc (level / 50) + 1)
old 17: connect by level <= &&limit
new 17: connect by level <= 100
LVL NAME VALUE
---------- -------------------- ----------------------------------------------
1 parent1 parent1
2 2 test 2
3 3 test 3
4 4 test 4
5 5 test 5
6 6 test 6
7 7 test 7
8 8 test 8
9 9 test 9
10 10 test 10
11 11 test 11
LVL NAME VALUE
---------- -------------------- ----------------------------------------------
12 12 test 12
13 13 test 13
14 14 test 14
15 15 test 15
16 16 test 16
17 17 test 17
18 18 test 18
19 19 test 19
20 20 test 20
21 21 test 21
22 22 test 22
LVL NAME VALUE
---------- -------------------- ----------------------------------------------
23 23 test 23
24 24 test 24
25 25 test 25
26 26 test 26
27 27 test 27
28 28 test 28
29 29 test 29
30 30 test 30
31 31 test 31
32 32 test 32
33 33 test 33
LVL NAME VALUE
---------- -------------------- ----------------------------------------------
34 34 test 34
35 35 test 35
36 36 test 36
37 37 test 37
38 38 test 38
39 39 test 39
40 40 test 40
41 41 test 41
42 42 test 42
43 43 test 43
44 44 test 44
LVL NAME VALUE
---------- -------------------- ----------------------------------------------
45 45 test 45
46 46 test 46
47 47 test 47
48 48 test 48
49 49 test 49
50 parent2 parent2
51 51 test 51
52 52 test 52
53 53 test 53
54 54 test 54
55 55 test 55
LVL NAME VALUE
---------- -------------------- ----------------------------------------------
56 56 test 56
57 57 test 57
58 58 test 58
59 59 test 59
60 60 test 60
61 61 test 61
62 62 test 62
63 63 test 63
64 64 test 64
65 65 test 65
66 66 test 66
LVL NAME VALUE
---------- -------------------- ----------------------------------------------
67 67 test 67
68 68 test 68
69 69 test 69
70 70 test 70
71 71 test 71
72 72 test 72
73 73 test 73
74 74 test 74
75 75 test 75
76 76 test 76
77 77 test 77
LVL NAME VALUE
---------- -------------------- ----------------------------------------------
78 78 test 78
79 79 test 79
80 80 test 80
81 81 test 81
82 82 test 82
83 83 test 83
84 84 test 84
85 85 test 85
86 86 test 86
87 87 test 87
88 88 test 88
LVL NAME VALUE
---------- -------------------- ----------------------------------------------
89 89 test 89
90 90 test 90
91 91 test 91
92 92 test 92
93 93 test 93
94 94 test 94
95 95 test 95
96 96 test 96
97 97 test 97
98 98 test 98
99 99 test 99
LVL NAME VALUE
---------- -------------------- ----------------------------------------------
100 parent3 parent3
100 rows selected.

Sqlite subselect much faster than distinct + order by

I'm confused by the drastically different running times of the following two queries that produce identical output. The queries are running on Sqlite 3.7.9, on a table with about 4.5 million rows, and each produce ~50 rows of results.
Here are the queries:
% echo "SELECT DISTINCT acolumn FROM atable ORDER BY acolumn;" | time sqlite3 mydb
sqlite3 mydb 8.87s user 15.06s system 99% cpu 23.980 total
% echo "SELECT acolumn FROM (SELECT DISTINCT acolumn FROM atable) ORDER BY acolumn;" | time sqlite3 options
sqlite3 mydb 1.15s user 0.10s system 98% cpu 1.267 total
Shouldn't the performance of the two queries be closer? I understand that it may be the case that the query planner is performing the "sort" and "distinct" operations in different orders, but if so, does it need to? Or should it be able to figure out how to do it fastest?
Edit: as requested here is the output of the "EXPLAIN QUERY PLAN" command for each query.
For the first (monolithic) query:
0|0|0|SCAN TABLE atable (~1000000 rows)
0|0|0|USE TEMP B-TREE FOR DISTINCT
For the second (subquery) query:
1|0|0|SCAN TABLE atable (~1000000 rows)
1|0|0|USE TEMP B-TREE FOR DISTINCT
0|0|0|SCAN SUBQUERY 1 (~1000000 rows)
0|0|0|USE TEMP B-TREE FOR ORDER BY
Your first query orders the records first by inserting all of them into a sorted temporary table, and then implements the DISTINCT by going through them and returning only those that are not identical to the previous one.
(This can be seen in the EXPLAIN output shown below; the DISTINCT actually got converted to a GROUP BY, which behaves the same.)
Your second query is, in theory, identical to the first, but SQLite's query optimizer is rather simple and cannot prove that this conversion would be safe (as explained in the subquery flattening documentation).
Therefore, it is implemented by doing the DISTINCT first, by inserting only any non-duplicates into a temporary table, and then doing the ORDER BY with a second temporary table.
This second step is completely superfluous because the first temp table was already sorted, but this happens to be faster for your data anyway because you have so many duplicates that are never stored in either temp table.
In theory, your first query could be faster, because SQLite has already recognized that the DISTINCT and ORDER BY clauses can be implemented with the same sorted temporary table.
In practice, however, SQLite it is not smart enough to remember that the DISTINCT implies that duplicates do not need to be stored in the temp table.
(This particular optimization might be added to SQLite if you ask nicely on the mailing list.)
$ sqlite3 mydb
sqlite> .explain
sqlite> explain SELECT DISTINCT acolumn FROM atable ORDER BY acolumn;
addr opcode p1 p2 p3 p4 p5 comment
---- ------------- ---- ---- ---- ------------- -- -------------
0 Trace 0 0 0 00
1 SorterOpen 1 2 0 keyinfo(1,BINARY) 00
2 Integer 0 3 0 00 clear abort flag
3 Integer 0 2 0 00 indicate accumulator empty
4 Null 0 6 6 00
5 Gosub 5 37 0 00
6 Goto 0 40 0 00
7 OpenRead 0 2 0 1 00 atable
8 Rewind 0 14 0 00
9 Column 0 0 8 00 atable.acolumn
10 Sequence 1 9 0 00
11 MakeRecord 8 2 10 00
12 SorterInsert 1 10 0 00
13 Next 0 9 0 01
14 Close 0 0 0 00
15 OpenPseudo 2 10 2 00
16 SorterSort 1 39 0 00 GROUP BY sort
17 SorterData 1 10 0 00
18 Column 2 0 7 20
19 Compare 6 7 1 keyinfo(1,BINARY) 00
20 Jump 21 25 21 00
21 Move 7 6 0 00
22 Gosub 4 32 0 00 output one row
23 IfPos 3 39 0 00 check abort flag
24 Gosub 5 37 0 00 reset accumulator
25 Column 2 0 1 00
26 Integer 1 2 0 00 indicate data in accumulator
27 SorterNext 1 17 0 00
28 Gosub 4 32 0 00 output final row
29 Goto 0 39 0 00
30 Integer 1 3 0 00 set abort flag
31 Return 4 0 0 00
32 IfPos 2 34 0 00 Groupby result generator entry point
33 Return 4 0 0 00
34 Copy 1 11 0 00
35 ResultRow 11 1 0 00
36 Return 4 0 0 00 end groupby result generator
37 Null 0 1 0 00
38 Return 5 0 0 00
39 Halt 0 0 0 00
40 Transaction 0 0 0 00
41 VerifyCookie 0 2 0 00
42 TableLock 0 2 0 atable 00
43 Goto 0 7 0 00
sqlite> explain SELECT acolumn FROM (SELECT DISTINCT acolumn FROM atable) ORDER BY acolumn;
addr opcode p1 p2 p3 p4 p5 comment
---- ------------- ---- ---- ---- ------------- -- -------------
0 Trace 0 0 0 00
1 Goto 0 39 0 00
2 Goto 0 17 0 00
3 OpenPseudo 0 3 1 01 coroutine for sqlite_subquery_DA7480_
4 Integer 0 2 0 01
5 OpenEphemeral 2 0 0 keyinfo(1,BINARY) 08
6 OpenRead 1 2 0 1 00 atable
7 Rewind 1 14 0 00
8 Column 1 0 3 00 atable.acolumn
9 Found 2 13 3 1 00
10 MakeRecord 3 1 4 00
11 IdxInsert 2 4 0 00
12 Yield 1 0 0 00
13 Next 1 8 0 01
14 Close 1 0 0 00
15 Integer 1 2 0 00
16 Yield 1 0 0 00 end sqlite_subquery_DA7480_
17 SorterOpen 3 3 0 keyinfo(1,BINARY) 00
18 Integer 2 1 0 00
19 Yield 1 0 0 00 next row of co-routine sqlite_subquery_DA7480_
20 If 2 29 0 00
21 Column 0 0 5 00 sqlite_subquery_DA7480_.acolumn
22 MakeRecord 5 1 6 00
23 Column 0 0 7 00 sqlite_subquery_DA7480_.acolumn
24 Sequence 3 8 0 00
25 Move 6 9 0 00
26 MakeRecord 7 3 10 00
27 SorterInsert 3 10 0 00
28 Goto 0 19 0 00
29 OpenPseudo 4 6 1 00
30 OpenPseudo 5 11 3 00
31 SorterSort 3 37 0 00
32 SorterData 3 11 0 00
33 Column 5 2 6 20
34 Column 4 0 5 20
35 ResultRow 5 1 0 00
36 SorterNext 3 32 0 00
37 Close 4 0 0 00
38 Halt 0 0 0 00
39 Transaction 0 0 0 00
40 VerifyCookie 0 2 0 00
41 TableLock 0 2 0 atable 00
42 Goto 0 2 0 00
Inside most DBMS, SQL statements are translated into relational algebra and then structured in an expression tree.
The dbms then uses heuristics to optimise queries. One of the main heuristics is "Perform selection early" (p.46). I suppose the sqlite query planner does this as well, hence the differences in execution time.
Since the result of the subquery is much smaller (~50 rows opposed to 4.5 million), sorting, at the end of the expression tree, happens much faster. (Plain) Selecting isn't a very expensive process, running operations on a multitude of results is indeed.
I believe this must be because the order operation and distinct operations must be implemented more efficiently when separated by the subselect - which is effectively a simpler way to say way alexdeloy is saying.
This experiment is not complete. Please also run the following:
% echo "SELECT acolumn FROM (SELECT DISTINCT acolumn FROM atable ORDER BY acolumn) ;" | time sqlite3 mydb
Tell me if this takes longer than the other two on average and thanks.