Lets say I have a SQLite database that contains a table:
sqlite> create table person (id integer, firstname varchar, lastname varchar);
Now I want to get every entry which is in the table.
sqlite> select t0.id, t0.firstname, t0.lastname from person t0;
This works fine and this it what I would use. However I have worked with a framework from Apple (Core Data) that generates SQL. This framework generates a slightly different SQL query:
sqlite> select 0, t0.id, t0.firstname, t0.lastname from person t0;
Every SQL query generated by this framework begins with "select 0,". Why is that?
I tried to use the explain command to see whats going on but this was inconclusive - at least to me.
sqlite> explain select t0.id, t0.firstname, t0.lastname from person t0;
addr opcode p1 p2 p3 p4 p5 comment
---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
0 Trace 0 0 0 00 NULL
1 Goto 0 11 0 00 NULL
2 OpenRead 0 2 0 3 00 NULL
3 Rewind 0 9 0 00 NULL
4 Column 0 0 1 00 NULL
5 Column 0 1 2 00 NULL
6 Column 0 2 3 00 NULL
7 ResultRow 1 3 0 00 NULL
8 Next 0 4 0 01 NULL
9 Close 0 0 0 00 NULL
10 Halt 0 0 0 00 NULL
11 Transactio 0 0 0 00 NULL
12 VerifyCook 0 1 0 00 NULL
13 TableLock 0 2 0 person 00 NULL
14 Goto 0 2 0 00 NULL
And the table for the second query looks like this:
sqlite> explain select 0, t0.id, t0.firstname, t0.lastname from person t0;
addr opcode p1 p2 p3 p4 p5 comment
---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
0 Trace 0 0 0 00 NULL
1 Goto 0 12 0 00 NULL
2 OpenRead 0 2 0 3 00 NULL
3 Rewind 0 10 0 00 NULL
4 Integer 0 1 0 00 NULL
5 Column 0 0 2 00 NULL
6 Column 0 1 3 00 NULL
7 Column 0 2 4 00 NULL
8 ResultRow 1 4 0 00 NULL
9 Next 0 4 0 01 NULL
10 Close 0 0 0 00 NULL
11 Halt 0 0 0 00 NULL
12 Transactio 0 0 0 00 NULL
13 VerifyCook 0 1 0 00 NULL
14 TableLock 0 2 0 person 00 NULL
15 Goto 0 2 0 00 NULL
Some frameworks do this in order to tell, without any doubt, whether a row from that table was returned.
Consider
A B
+---+ +---+------+
| a | | a | b |
+---+ +---+------+
| 0 | | 0 | 1 |
+---+ +---+------+
| 1 | | 1 | NULL |
+---+ +---+------+
| 2 |
+---+
SELECT A.a, B.b
FROM A
LEFT JOIN B
ON B.a = A.a
Results
+---+------+
| a | b |
+---+------+
| 0 | 1 |
+---+------+
| 1 | NULL |
+---+------+
| 2 | NULL |
+---+------+
In this result set, it is not possible to see that a = 1 exists in table B, but a = 2 does not. To get that information, you need to select a non-nullable expression from table b, and the simplest way to do that is to select a simple constant value.
SELECT A.a, B.x, B.b
FROM A
LEFT JOIN (SELECT 0 AS x, B.a, B.b FROM B) AS B
ON B.a = A.a
Results
+---+------+------+
| a | x | b |
+---+------+------+
| 0 | 0 | 1 |
+---+------+------+
| 1 | 0 | NULL |
+---+------+------+
| 2 | NULL | NULL |
+---+------+------+
There are a lot of situations where these constant values are not strictly required, for example when you have no joins, or when you could choose a non-nullable column from b instead, but they don't cause any harm either, so they can just be included unconditionally.
When I have code to dynamically generate a WHERE clause, I usually start the clause with a:
WHERE 1 = 1
Then the loop to add additional conditions always adds each condition in the same format:
AND x = y
without having to put conditional logic in place to check if this is the first condition or not: "if this is the first condition then start with the WHERE keyword, else add the AND keyword.
So I can imagine a framework doing this for similar reasons. If you start the statement with a SELECT 0 then the code to add subsequent columns can be in a loop without any conditional statements. Just add , colx each time without any conditional checking along the lines of "if this is the first column, don't put a comma before the column name, otherwise do".
Example pseudo code:
String query = "SELECT 0";
for (Column col in columnList)
query += ", col";
Only Apple knows … but I see two possibilities:
Inserting a dummy column ensures that the actual output columns are numbered beginning with 1, not 0. If some existing interface already assumed one-based numbering, doing it this way in the SQL backend might have been the easiest solution.
If you make a query for multiple objects using multiple subqueries, a value like this could be used to determine from which subquery a record originates:
SELECT 0, t0.firstname, ... FROM PERSON t0 WHERE t0.id = 123
UNION ALL
SELECT 1, t0.firstname, ... FROM PERSON t0 WHERE t0.id = 456
(I don't know if Core Data actually does this.)
Your EXPLAIN output shows that the only difference is (at address 4) that the second program sets the extra column to zero, so there is only a minimal performance difference.
Related
I have the Following Output:
Sno
Value Stream
Duration
Inspection
1
Test1
3
1
2
ON
14
0
3
Start
5
0
4
Test1
5
1
5
OFF
0
1
6
Start
0
1
7
Test2
0
1
8
ON
3
1
9
START
0
1
10
Test2
2
2
I want to merge the same value after that before START values charge to after ON. For example S.no 4 will merge to s.no4.
1 | Test1 | 8 | 2 |
If the combination is not equal then don't allow it to merge. For Example, we have to consider only On/Start. If the condition is OFF/Start then don't allow to merge. E.g. S.no 5 and 6 OFF/Start then don't allow to merge s.no 4 & 7.
I think you are talking about summarization not merging:
select [Value Stream],
min(Sno) as First_Sno,
sum(Duration) as total_Duration,
sum(Inspection) as Inspection
from yourtable
group by [Value Stream]
Will give you the result
Lets have table with 3 columns: key, value, and bitmask (as varchar; of unknown maximum length):
abc | 23 | 101
xyz | 56 | 000101
Is it possible to write query, where on the output I will get one row for every combination of key, value, and 1 in bitmask, with index of that 1 as integer column (doesnt matter if starting from 0 or 1)? So for example above:
abc | 23 | 1
abc | 23 | 3
xyz | 56 | 4
xyz | 56 | 6
Thanks for any ideas!
I think you might be better off choosing a maximum length for your varchar.
SELECT * FROM
table
INNER JOIN
generate_series(1,1000) s(n)
ON
s.n <= char_length(bitmask) and
substring(bitmask from s.n for 1) = '1'
We generate a list of numbers:
s.n
---
1
2
3
4
...
And join it to the table in a way that causes repeated table rows:
s.n bitmask
--- -------
1 000101
2 000101
3 000101
4 000101
5 000101
6 000101
1 101
2 101
3 101
Then use the s.n to substring the bitmask, and look for being equal to 1:
s.n bitmask substr
--- ------- ------
1 000101 --substring('000101' from 1 for 1) = '1'? no
2 000101 --substring('000101' from 2 for 1) = '1'? no
3 000101 --substring('000101' from 3 for 1) = '1'? no
4 000101 --substring('000101' from 4 for 1) = '1'? yes...
5 000101
6 000101
1 101
2 101
3 101
So the s.n gives us the number in the last column of your desired output, and the where filters to only rows where the string substring works out
I have a table with few fields like below
ID | OpeningBal | A | B | C | D | ClosingBal
Here opening balance of current day is the closing balance of previous day and closing balance is calculated with this formula
OpeningBal + A - B - (C + D) - C
but current data is wrong in this table because of wrong formula applied previously for Closing Balance. I have tried like looping through all the records of this table and update the closing balance to the actual value. I want to update the opening balance of next row with the closing balance of current row in this cursor but I don't have the ID of next row. Any thoughts?
Actual Result:
ID OPBal A B C D CLBal
1 0 80 4 6 0 90
2 90 8 6 0 0 104
5 104 5 4 0 9 122
7 122 10 3 5 0 140
expected result:
ID OPBal A B C D CLBal
1 0 80 4 6 0 64
2 64 46 6 0 0 104
5 104 5 4 0 9 96
7 96 10 3 5 0 93
update tablename set openingbalance=a.clcurrentrow from
(select LAG(closingbalance) over (order by id) clcurrentrow from tablename ) a
Thanks all. I got the solution using LAG and I will modify this based on my requirements.
SELECT
ID,
LAG(ClosingBalance,1,0.00) OVER (ORDER BY ID) PrevClosingBalance,
ClosingBalance
FROM Table1
UPDATE Table1
SET OpeningBalnace = #PrevClosingBalance
ClosingBalance = #PrevClosingBalance + A - B -(C+D)-C
WHERE ID = #ID
I am implementing a FUSE file system that uses a sqlite3 database for its backend. I do not plan to ever change the database backend as my file system uses sqlite3 as a file format. One of the functions a file system must implement is readdir function. This function allows the process to iteratively read the contents of a directory by repeatedly calling it and getting the next few directory entries (as many as the buffer can hold). The directory entries are returned may be returned in any order. I want to implement this operation with the following query:
SELECT fileno, name FROM dirents WHERE dirno = ? LIMIT -1 OFFSET ?;
where dirno is the directory I'm reading from and OFFSET ? is the number of entries I've already returned. I want to read as many rows as I can fit into the buffer (I cannot predict the count as these are variable-length records depending on the length of the file name) and then reset the query.
Due to the stateless nature of FUSE, keeping open a query and returning the next few rows until the directory has ended is not an option as I cannot reliably detect if the process closes the directory prematurely.
The dirents table has the following schema:
CREATE TABLE dirents (
dirno INTEGER NOT NULL REFERENCES inodes(inum),
fileno INTEGER NOT NULL REFERENCES inodes(inum),
name TEXT NOT NULL,
PRIMARY KEY (dirno, name)
) WITHOUT ROWID;
Question
In theory, a SELECT statement yields rows in no defined order. In practice, can I assume that when I execute the same prepared SELECT statement multiple times with successively larger OFFSET values, I get the same results as if I read all the data in a single query, i.e. the row order is the same unspecified order each time? An assumption that currently holds is that the database is not modified between queries.
Can I assume that the row order stays reasonably similar when a different query modifies the dirents table inbetween? Some glitches (e.g. directory entries appearing twice) will of course be observable by the program, but for usability (the main user of readdir is the ls command) it is highly useful if the directory listing is usually mostly correct.
If I cannot make these assumptions, what is a better way to reach the desired result?
I know that I could throw in an ORDER BY clause to make the row order well-defined, but I fear that this might have a strong impact on performance, especially when reading small chunks of a large directory—the directory has to be sorted every time a chunk is read.
The right solution to this problem is to use order by. If you are concerned about the performance of the order by, then use an index on the column used for the order by.
The simplest approach, in my opinion, would be to remove the without rowid option on the table creation. Then you can just access the table as:
SELECT fileno, name
FROM dirents
WHERE dirno = ?
ORDER BY rowid
LIMIT -1 OFFSET ?;
I realize this adds additional bytes to each row, but it is for a good purpose -- making sure that your queries are correct.
Actually, the best index for this table would be on dirno, rowid, fileno, name. Given the where clause, you are doing a full table scan anyway, unless you have an index.
If I add an ORDER BY name clause to the SELECT query, sqlite3 generates almost identical (except for a stray Noop) bytecode for the query but a row order is guaranteed:
sqlite> EXPLAIN SELECT fileno, name FROM dirents WHERE dirno = ? LIMIT -1 OFFSET ?;
addr opcode p1 p2 p3 p4 p5 comment
---- ------------- ---- ---- ---- ------------- -- -------------
0 Init 0 21 0 00
1 Integer -1 1 0 00
2 Variable 2 2 0 00
3 MustBeInt 2 0 0 00
4 SetIfNotPos 2 2 0 00
5 Add 1 2 3 00
6 SetIfNotPos 1 3 -1 00
7 OpenRead 1 3 0 k(2,nil,nil) 02
8 Variable 1 4 0 00
9 IsNull 4 19 0 00
10 Affinity 4 1 0 D 00
11 SeekGE 1 19 4 1 00
12 IdxGT 1 19 4 1 00
13 IfPos 2 18 1 00
14 Column 1 2 5 00
15 Column 1 1 6 00
16 ResultRow 5 2 0 00
17 DecrJumpZero 1 19 0 00
18 Next 1 12 0 00
19 Close 1 0 0 00
20 Halt 0 0 0 00
21 Transaction 0 0 3 0 01
22 TableLock 0 3 0 dirents 00
23 Goto 0 1 0 00
sqlite> EXPLAIN SELECT fileno, name FROM dirents WHERE dirno = ? ORDER BY name LIMIT -1 OFFSET ?;
addr opcode p1 p2 p3 p4 p5 comment
---- ------------- ---- ---- ---- ------------- -- -------------
0 Init 0 22 0 00
1 Noop 0 0 0 00
2 Integer -1 1 0 00
3 Variable 2 2 0 00
4 MustBeInt 2 0 0 00
5 SetIfNotPos 2 2 0 00
6 Add 1 2 3 00
7 SetIfNotPos 1 3 -1 00
8 OpenRead 2 3 0 k(2,nil,nil) 02
9 Variable 1 4 0 00
10 IsNull 4 20 0 00
11 Affinity 4 1 0 D 00
12 SeekGE 2 20 4 1 00
13 IdxGT 2 20 4 1 00
14 IfPos 2 19 1 00
15 Column 2 2 5 00
16 Column 2 1 6 00
17 ResultRow 5 2 0 00
18 DecrJumpZero 1 20 0 00
19 Next 2 13 0 00
20 Close 2 0 0 00
21 Halt 0 0 0 00
22 Transaction 0 0 3 0 01
23 TableLock 0 3 0 dirents 00
24 Goto 0 1 0 00
So I guess I'm going for that ORDER BY name clause.
Does there exist a Postgres Aggregator such that, when used on the following table:
id | value
----+-----------
1 | 1
2 | 1
3 | 2
4 | 2
5 | 3
6 | 3
7 | 3
8 | 4
9 | 4
10 | 5
in a query such as:
select agg_function(4,value) from mytable where id>5
will return
agg_function
--------------
t
(a boolean true result) because a row or rows with value=4 were selected?
In other words, one argument specifies the value you are looking for, the other argument takes the column specifier, and it returns true if the column value was equal to the specified value for one or more rows?
I have successfully created an aggregate to do just that, but I'm wondering if I have just re-created the wheel...
select sum(case when value = 4 then 1 else 0 end) > 0
from mytable
where id > 5