Select rows by index in Amazon Athena - sql

This is a very simple question but I can't seem to find documentation on it. How one would query rows by index (ie select the 10th through 20th row in a table)?
I know there's a row_numbers function but it doesn't seem to do what I want.

Do not specify any partition so your row number will be an integer between 1 and your number of record.
SELECT row_num FROM (
SELECT row_number() over () as row_num
FROM your_table
)
WHERE row_num between 100000 and 100010

I seem to have found a roundabout and clunky way of doing this in Athena, so any better answers are welcome. This approach requires you have some numeric column in your table already, in this case named some_numeric_column:
SELECT some_numeric_column, row_num FROM (
SELECT some_numeric_column,
row_number() over (order by some_numeric_column) as row_num
FROM your_table
)
WHERE row_num between 100000 and 100010
To explain, you first select some numeric column in your data, then create a column (called row_num) of row numbers which is based on the order of your selected numeric column. Then you wrap that all in a select call because Athena doesn't support creating and then conditioning on the row_num column within a single call. If you don't wrap it in a second SELECT call Athena will spit out some errors about not finding a column named row_num.

Related

SQL Equivalent to LAG() to create a computed table

I'm fairly new to SQL and I'm trying to create a computed column in a table that calculates the DateDiff on a column between the current row and the previous row.
Now all is fine and dandy doing a query with Select to display this value:
SELECT *,
Case When INCM<> lag(INCM) over(ORDER BY INCM ASC, Submit_Date ASC) Then 0 else DateDiff(mi,Submit_Date, lag(Submit_Date) over (ORDER BY INCM ASC, Submit_Date ASC)) End As Diff
FROM [OP].[Ticket_Work_Info]
But as I recently found out, when adding a computed column with this logic to the existing table I get an error saying Windowed Functions can only be used with Select or Order By.
I've been searching everywhere for an equivalent to LAG to use to create a computed column on a table.
In the meantime I ended up creating a view with this piece of code, but that's not really what I want to do going forward.
Can someone give me a hand?
Regards,
Here is some reasoning on why computed columns can only refer to values in the current row (and deterministic functions and constants). Consider a definition such as:
create t (
t_id int,
a varchar(255),
x int,
prev_x as (lag(x) over (order by t_id)
);
And some sample data:
id y x
1 z 6
2 abc 28
3 z 496
This looks fine. But, consider this query:
select t.*
from t
where a <> 'abc';
Should the value of x_prev for the third row be 28 or 6?
I guess no one wanted to make a decision on this. Instead, the idea is that a row is well-defined, so the filtering conditions do not affect the values within a row.

Oracle select specific rows

Can we select specific rows to range in oracle? for example, I have a table of 100 rows I have to select only a range of 10 to 20-row numbers. Is it possible to do that
You can do with an auxiliary operation. Firstly number the rows by row_number() function and then order by them :
select * from
(
select row_number() over (order by 0) rn, t.*
from tab t
)
where rn between 10 and 20;
but this is not a stable operation, since SQL statements are unordered sets. Therefore it's better to define a unique identity column and order depending on it.
Replace zero in the order by clause with some columns of your table to be able to reach a rigid ordering criteria. If a primary key column exists, it might be better to include only it in the order by list.
would LIMIT and OFFSET work?
ie.
SELECT * FROM table
LIMIT 20
OFFSET 20
will read rows 20 -> 40. Is this what you are trying to do?

Computed column formula ( yyMMdd## )

I need a computed column formula that gives me this yyMMdd##.
I have an identity column (DataID) and a date column (DataDate).
This what I have so far.
(((right(CONVERT([varchar](4),datepart(year,[DataDate]),0),(2))+
right(CONVERT([varchar](4),datepart(month,[DataDate]),0),(2)))+
right(CONVERT([varchar](4),datepart(day,[DataDate]),0),(2)))+
right('00'+CONVERT([varchar](2),[DataID],0),(2)))
And this gives me:
12111201
12111202
12111303
12111304
12111405
12111406
12111407
12111508
What I want is:
12111201
12111202
12111301
12111302
12111401
12111402
12111403
12111501
I'm assuming you want to have a sequence starting at 1 for each date - right? If not: please explain what you really want / need.
You won't be able to do this with a IDENTITY column and a computed column specification. An IDENTITY column returns constantly increasing numbers.
What you could do is not store those values on disk - but instead use CTE and the ROW_NUMBER() OVER (PARTITION BY....) construct to create those numbers on the fly - whenever you need to select them. Or have a job that sets those values based on such a CTE on a regular basis (e.g. once every hour or so).
That CTE might look something like this - again, assuming that DataDate is indeed of type DATE (and not DATETIME or something like that) :
;WITH CTE AS
(
SELECT
DataID, DataDate,
RowNum = ROW_NUMBER() OVER (PARTITION BY DataDate ORDER BY DataID)
FROM
dbo.YourTable
)
SELECT
DataID, DataDate, RowNum
FROM
CTE

sql resultset records numbering?

I want to number the rows in my result set. I want some way I can have a result set of 3 records use some SQL keyword to generate a column that would read 1,2,3 for watch of the records...
I know I can make a temp table with an auto increment column but i wanted to know if there was a way I can get this back from a SQL query?
SELECT row_count,
project_name,
project_id
FROM Project
anything available like "row_count" that i am dreaming of?
Ordering by SELECT 0 will give you an incrementing column and saves a potentially unnecessary sort.
SELECT ROW_NUMBER() over (order by (select 0)) as row_count,
project_name,
project_id
FROM Project
Sounds like you want ROW_NUMBER, an analytic/ranking function. But it's unclear what you want to base the numbering on - this assumes the project_id, starting at the smallest value:
SELECT ROW_NUMBER() OVER(ORDER BY p.project_id) AS row_count,
p.project_name,
p.project_id
FROM PROJECT p

Most efficient way to select 1st and last element, SQLite?

What is the most efficient way to select the first and last element only, from a column in SQLite?
The first and last element from a row?
SELECT column1, columnN
FROM mytable;
I think you must mean the first and last element from a column:
SELECT MIN(column1) AS First,
MAX(column1) AS Last
FROM mytable;
See http://www.sqlite.org/lang_aggfunc.html for MIN() and MAX().
I'm using First and Last as column aliases.
if it's just one column:
SELECT min(column) as first, max(column) as last FROM table
if you want to select whole row:
SELECT 'first',* FROM table ORDER BY column DESC LIMIT 1
UNION
SELECT 'last',* FROM table ORDER BY column ASC LIMIT 1
The most efficient way would be to know what those fields were called and simply select them.
SELECT `first_field`, `last_field` FROM `table`;
Probably like this:
SELECT dbo.Table.FirstCol, dbo.Table.LastCol FROM Table
You get minor efficiency enhancements from specifying the table name and schema.
First: MIN() and MAX() on a text column gives AAAA and TTTT results which are not the first and last entries in my test table. They are the minimum and maximum values as mentioned.
I tried this (with .stats on) on my table which has over 94 million records:
select * from
(select col1 from mitable limit 1)
union
select * from
(select col1 from mitable limit 1 offset
(select count(0) from mitable) -1);
But it uses up a lot of virtual machine steps (281,624,718).
Then this which is much more straightforward (which works if the table was created without WITHOUT ROWID) [sql keywords are in capitals]:
SELECT col1 FROM mitable
WHERE ROWID = (SELECT MIN(ROWID) FROM mitable)
OR ROWID = (SELECT MAX(ROWID) FROM mitable);
That ran with 55 virtual machine steps on the same table and produced the same answer.
min()/max() approach is wrong. It is only correct, if the values are ascending only. I needed something liket this for currency rates, which are random raising and falling.
This is my solution:
select st.*
from stats_ticker st,
(
select min(rowid) as first, max(rowid) as last --here is magic part 1
from stats_ticker
-- next line is just a filter I need in my case.
-- if you want first/last of the whole table leave it out.
where timeutc between datetime('now', '-1 days') and datetime('now')
) firstlast
WHERE
st.rowid = firstlast.first --and these two rows do magic part 2
OR st.rowid = firstlast.last
ORDER BY st.rowid;
magic part 1: the subselect results in a single row with the columns first,last containing rowid's.
magic part 2 easy to filter on those two rowid's.
This is the best solution I've come up so far. Hope you like it.
We can do that by the help of Sql Aggregate function, like Max and Min. These are the two aggregate function which help you to get last and first element from data table .
Select max (column_name ), min(column name) from table name
Max will give you the max value means last value and min will give you the min value means it will give you the First value, from the specific table.