SELECT y.M_id,SUM(t.alarm_sum),y.yarn1,AVG(t.yarn1_in),SUM(t.yarn1_alarm)
FROM '111' t, yarn y
WHERE t.date=date('now') AND y.M_id='111'
ORDER BY t.row_id DESC
LIMIT 5
I want to get the data from 2 table: '111' and yarn, but i only want the last 5 rows of data from '111'.
I ran it but the output is not limited by 5 rows from the '111' table.
What is needed to change?
You can use a subquery e.g.
select ...
from (select * from '111' order by row_id desc limit 5) t,
yarn y
[...]
However it's not clear to me that this makes any sense, and it's not clear what you're trying to achieve either.
What you're doing here is a cartesian join: functionally the database is going to select all rows of t, all rows of y, create every combination of both (so it's going to create the combination of t's first row with all of y's, t's second row with all of y, ...), then it will filter based on your condition, and finally order and limit.
I'm surprised you're getting anything other than the very last row from t though.
(also calling a column row_id seems like a bad idea in sqlite as it'd be easy to confuse with the database's own rowid).
Related
How can I get distinct values from multiple fields within one table with just one request.
Option 1
SELECT WM_CONCAT(DISTINCT(FIELD1)) FIELD1S,WM_CONCAT(DISTINCT(FIELD2)) FIELD2S,..FIELD10S
FROM TABLE;
WM_CONCAT is LIMITED
Option 2
select DISTINCT(FIELD1) FIELDVALUE, 'FIELD1' FIELDNAME
FROM TABLE
UNION
select DISTINCT(FIELD2) FIELDVALUE, 'FIELD2' FIELDNAME
FROM TABLE
... FIELD 10
is just too slow
if you were scanning a small range in the data (not full scanning the whole table) you could use WITH to optimise your query
e.g:
WITH a AS
(SELECT field1,field2,field3..... FROM TABLE WHERE condition)
SELECT field1 FROM a
UNION
SELECT field2 FROM a
UNION
SELECT field3 FROM a
.....etc
For my problem, I had
WL1 ... WL2 ... correlation
A B 0.8
B A 0.8
A C 0.9
C A 0.9
how to eliminate the symmetry from this table?
select WL1, WL2,correlation from
table
where least(WL1,WL2)||greatest(WL1,WL2) = WL1||WL2
order by WL1
this gives
WL1 ... WL2 ... correlation
A B 0.8
A C 0.9
:)
The best option in the SQL is the UNION, though you may be able to save some performance by taking out the distinct keywords:
select FIELD1 FROM TABLE
UNION
select FIELD2 FROM TABLE
UNION provides the unique set from two tables, so distinct is redundant in this case. There simply isn't any way to write this query differently to make it perform faster. There's no magic formula that makes searching 200,000+ rows faster. It's got to search every row of the table twice and sort for uniqueness, which is exactly what UNION will do.
The only way you can make it faster is to create separate indexes on the two fields (maybe) or pare down the set of data that you're searching across.
Alternatively, if you're doing this a lot and adding new fields rarely, you could use a materialized view to store the result and only refresh it periodically.
Incidentally, your second query doesn't appear to do what you want it to. Distinct always applies to all of the columns in the select section, so your constants with the field names will cause the query to always return separate rows for the two columns.
I've come up with another method that, experimentally, seems to be a little faster. In affect, this allows us to trade one full-table scan for a Cartesian join. In most cases, I would still opt to use the union as it's much more obvious what the query is doing.
SELECT DISTINCT CASE lvl WHEN 1 THEN field1 ELSE field2 END
FROM table
CROSS JOIN (SELECT LEVEL lvl
FROM DUAL
CONNECT BY LEVEL <= 2);
It's also worthwhile to add that I tested both queries on a table without useful indexes containing 800,000 rows and it took roughly 45 seconds (returning 145,000 rows). However, most of that time was spent actually fetching the records, not running the query (the query took 3-7 seconds). If you're getting a sizable number of rows back, it may simply be the number of rows that is causing the performance issue you're seeing.
When you get distinct values from multiple columns, then it won't return a data table. If you think following data
Column A Column B
10 50
30 50
10 50
when you get the distinct it will be 2 rows from first column and 1 rows from 2nd column. It simply won't work.
And something like this?
SELECT 'FIELD1',FIELD1, 'FIELD2',FIELD2,...
FROM TABLE
GROUP BY FIELD1,FIELD2,...
For example I have:
create table a (i int);
Assume there are 10k rows.
I want to count 0's in the last 20 rows.
Something like:
select count(*) from (select i from a limit 20) where i = 0;
Is that possible to make it more efficient? Like a single SQL statement or something?
PS. DB is SQLite3 if that matters at all...
UPDATE
PPS. No need to group by anything in this instance, assume the table that is literally 1 column (and presumably the internal DB row_ID or something). I'm just curious if this is possible to do without the nested selects?
You'll need to order by something in order to determine the last 20 rows. When you say last, do you mean by date, by ID, ...?
Something like this should work:
select count(*)
from (
select i
from a
order by j desc
limit 20
) where i = 0;
If you do not remove rows from the table, you may try the following hacky query:
SELECT COUNT(*) as cnt
FROM A
WHERE
ROWID > (SELECT MAX(ROWID)-20 FROM A)
AND i=0;
It operates with ROWIDs only. As the documentation says: Rows are stored in rowid order.
You need to remember to order by when you use limit, otherwise the result is indeterminate. To get the latest rows added, you need to include a column with the insertion date, then you can use that. Without this column you cannot guarantee that you will get the latest rows.
To make it efficient you should ensure that there is an index on the column you order by, possibly even a clustered index.
I'm afraid that you need a nested select to be able to count and restrict to last X rows at a time, because something like this
SELECT count(*) FROM a GROUP BY i HAVING i = 0
will count 0's, but in ALL table records, because a LIMIT in this query will basically have no effect.
However, you can optimize making COUNT(i) as it is faster to COUNT only one field than 2 or more (in this case your table will have 2 fields, i and rowid, that is automatically created by SQLite in PKless tables)
I have a table that I would like to be able to present "ranked X out of Y" data for. In particular, I'd like to be able to present that data for an individual row in a relatively efficient way (i.e. without selecting every row in the table). The ranking itself is quite simple, it's a straight ORDER BY on a single column in the table.
Postgres seems to present some unique challenges in this regard; AFAICT it doesn't have a RANK or ROW_NUMBER or equivalent function (at least in 8.3, which I'm stuck on for the moment). The canonical answer in the mailing list archives seems to be to create a temporary sequence and select from it:
test=> create temporary sequence tmp_seq;
CREATE SEQUENCE
test=*> select nextval('tmp_seq') as row_number, col1, col2 from foo;
It seems like this solution still won't help when I want to select just a single row from the table (and I want to select it by PK, not by rank).
I could denormalize and store the rank in a separate column, which makes presenting the data trivial, but just relocates my problem. UPDATE doesn't support ORDER BY, so I'm not sure how I'd construct an UPDATE query to set the ranks (short of selecting every row and running a separate UPDATE for each row, which seems like way too much DB activity to trigger every time the ranks need updating).
Am I missing something obvious? What's the Right Way to do this?
EDIT: Apparently I wasn't clear enough. I'm aware of OFFSET/LIMIT, but I don't see how it helps solve this problem. I'm not trying to select the Xth-ranked item, I'm trying to select an arbitrary item (by its PK, say), and then be able to display to the user something like "ranked 43rd out of 312."
If you want the rank, do something like
SELECT id,num,rank FROM (
SELECT id,num,rank() OVER (ORDER BY num) FROM foo
) AS bar WHERE id=4
Or if you actually want the row number, use
SELECT id,num,row_number FROM (
SELECT id,num,row_number() OVER (ORDER BY num) FROM foo
) AS bar WHERE id=4
They'll differ when you have equal values somewhere. There is also dense_rank() if you need that.
This requires PostgreSQL 8.4, of course.
Isn't it just this:
SELECT *
FROM mytable
ORDER BY
col1
OFFSET X LIMIT 1
Or I am missing something?
Update:
If you want to show the rank, use this:
SELECT mi.*, values[1] AS rank, values[2] AS total
FROM (
SELECT (
SELECT ARRAY[SUM(((mi.col1, mi.ctid) < (mo.col1, mo.ctid))::INTEGER), COUNT(*)]
FROM mytable mi
) AS values
FROM mytable mo
WHERE mo.id = #myid
) q
ROW_NUMBER functionality in PostgreSQL is implemented via LIMIT n OFFSET skip.
Find an overview here.
On the pitfalls of ranking see this SO question.
EDIT: Since you are asking for ROW_NUMBER() instead of simple ranking: row_number() is introduced to PostgreSQL in version 8.4. So you might consider to update. Otherwise this workaround might be helpful.
Previous replies tackle the question "select all rows and get their rank" which is not what you want...
you have a row
you want to know its rank
Just do :
SELECT count(*) FROM table WHERE score > $1
Where $1 is the score of the row you just selected (I suppose you'd like to display it so you might select it...).
Or do :
SELECT a., (SELECT count() FROM table b WHERE score > b.score) AS rank FROM table AS a WHERE pk = ...
However, if you select a row which is ranked last, yes you will need to count all the rows which are ranked before it, so you'll need to scan the whole table, and it will be very slow.
Solution :
SELECT count(*) FROM (SELECT 1 FROM table WHERE score > $1 LIMIT 30)
You'll get precise ranking for the 30 best scores, and it will be fast.
Who cares about the losers ?
OK, If you really do care about the losers, you'll need to make a histogram :
Suppose score can go from 0 to 100, and you have 1000000 losers with score < 80 and 10 winners with score > 80.
You make a histogram of how many rows have a score of X, it's a simple small table with 100 rows. Add a trigger to your main table to update the histogram.
Now if you want to rank a loser which has score X, his rank is sum( histo ) where histo_score > X.
Since your score probably isn't between 0 and 100, but (say) between 0 and 1000000000, you'll need to fudge it a bit, enlarge your histogram bins, for instance. so you only need 100 bins max, or use some log-histogram distribution function.
By the way postgres does this when you ANALYZE the table, so if you set statistics_target to 100 or 1000 on score, ANALYZE, and then run :
EXPLAIN SELECT * FROM table WHERE score > $1
you'll get a nice rowcount estimate.
Who needs exact answers ?
What is the most efficient way to select the first and last element only, from a column in SQLite?
The first and last element from a row?
SELECT column1, columnN
FROM mytable;
I think you must mean the first and last element from a column:
SELECT MIN(column1) AS First,
MAX(column1) AS Last
FROM mytable;
See http://www.sqlite.org/lang_aggfunc.html for MIN() and MAX().
I'm using First and Last as column aliases.
if it's just one column:
SELECT min(column) as first, max(column) as last FROM table
if you want to select whole row:
SELECT 'first',* FROM table ORDER BY column DESC LIMIT 1
UNION
SELECT 'last',* FROM table ORDER BY column ASC LIMIT 1
The most efficient way would be to know what those fields were called and simply select them.
SELECT `first_field`, `last_field` FROM `table`;
Probably like this:
SELECT dbo.Table.FirstCol, dbo.Table.LastCol FROM Table
You get minor efficiency enhancements from specifying the table name and schema.
First: MIN() and MAX() on a text column gives AAAA and TTTT results which are not the first and last entries in my test table. They are the minimum and maximum values as mentioned.
I tried this (with .stats on) on my table which has over 94 million records:
select * from
(select col1 from mitable limit 1)
union
select * from
(select col1 from mitable limit 1 offset
(select count(0) from mitable) -1);
But it uses up a lot of virtual machine steps (281,624,718).
Then this which is much more straightforward (which works if the table was created without WITHOUT ROWID) [sql keywords are in capitals]:
SELECT col1 FROM mitable
WHERE ROWID = (SELECT MIN(ROWID) FROM mitable)
OR ROWID = (SELECT MAX(ROWID) FROM mitable);
That ran with 55 virtual machine steps on the same table and produced the same answer.
min()/max() approach is wrong. It is only correct, if the values are ascending only. I needed something liket this for currency rates, which are random raising and falling.
This is my solution:
select st.*
from stats_ticker st,
(
select min(rowid) as first, max(rowid) as last --here is magic part 1
from stats_ticker
-- next line is just a filter I need in my case.
-- if you want first/last of the whole table leave it out.
where timeutc between datetime('now', '-1 days') and datetime('now')
) firstlast
WHERE
st.rowid = firstlast.first --and these two rows do magic part 2
OR st.rowid = firstlast.last
ORDER BY st.rowid;
magic part 1: the subselect results in a single row with the columns first,last containing rowid's.
magic part 2 easy to filter on those two rowid's.
This is the best solution I've come up so far. Hope you like it.
We can do that by the help of Sql Aggregate function, like Max and Min. These are the two aggregate function which help you to get last and first element from data table .
Select max (column_name ), min(column name) from table name
Max will give you the max value means last value and min will give you the min value means it will give you the First value, from the specific table.
If I have a table with the hypothetical columns foo and bar. bar might have 50-60 distinct values in it. My goal here is to pick say, up to 5 rows for say 6 unique bars. So if the 6 unique bars that get selected out of the 50-60 each happen to has at least 5 rows of data, we'll have 30 rows in total.
What you'd really want to do is:
SELECT *
FROM `sometable`
WHERE `bar` IN (
SELECT DISTINCT `bar`
FROM `sometable`
ORDER BY RAND()
LIMIT 6
)
Unfortunately, you're likely to get this:
ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
Possibly your version will be more cooperative. Otherwise, you'll probably need to do it as two queries.
Its been a while since I've worked with MySQL (I've been working with MSSQL lately), but two things come to mind:
Some sort of self join
A Cursor
Self join might look something like
SELECT DISTINCT bar FROM table AS t1 LIMIT 5
JOIN table AS t2 ON t1.foo = t2.foo
Again, its been a while, so this might not be valid MySQL. Also, you'd get all the foo's back for the 5 bars, so you'd have to figure out how to trim that down.
I think the easiest way is to use a UNION.
SELECT * FROM table WHERE bar = 'a' LIMIT 5 UNION SELECT * FROM table WHERE bar='b' UNION SEL ....... you get the jist, i hope
EDIT: not sure if this is what you need - you don't say whether this query needs also to somehow determine the bars? or if they are passed in?
A simple solution that takes 7 queries:
SELECT distinct bar FROM sometable ORDER BY rand() LIMIT 6
Then, for each of the 6 bar values above, do this, substituting {$bar} for the value, of course:
SELECT foo,bar FROM sometable WHERE bar='{$bar}' ORDER BY rand() LIMIT 5
Be careful about using "ORDER BY rand()" because it might cause MySQL to fetch a LOT of rows from your table, and compute the rand() function for all of them, and then sort them. This can take a long time if you have a big table.
If it does take a long time, then for the first query, you can remove the ORDER BY and the LIMIT clauses, and select 6 random values in your program code after the query is done.
For the second query, you can split it in to two steps:
SELECT count(*) FROM sometable WHERE bar='{$bar}'
Then, in your program code, you know how many items there are so you can randomly choose which of them to look at, and use OFFSET and LIMIT:
SELECT foo,bar FROM sometable WHERE bar='{$bar}' LIMIT 1 OFFSET {$offset}
Is this getting called from some program?
If so perhaps you can just lookup the bars, and randomly send them into a select statement.
This way your select could simply be: select * from table where bar in (?,?), and you can move the randomness problem into code, which is frankly better at dealing with that.