GBQ window function AND arithmetic operations - sql

Does anyone know if it is possible to do any arithmetic operation on a result derived using GBQ window functions?
For example, can I increase row_number by 100 (some number) using pseudocode like this:
SELECT 100 + ROW_NUMBER() OVER (PARTITION BY X ORDER BY x_id DESC) increased_row_num
FROM Table1
...

You will need to use subquery for that
SELECT 100 + row_num AS increased_row_num FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY X ORDER BY x_id DESC) AS row_num
FROM Table1
)

but I'we hoped that there is another solution
With BigQuery Standard SQL expected functionality works now as is
#standardSQL
SELECT 100 + ROW_NUMBER() OVER (PARTITION BY X ORDER BY x_id DESC) increased_row_num
FROM Table1
See Enabling Standard SQL and Migrating from legacy SQL

Related

Presto: Filtering by Row Number

Can anyone help me to translate Teradata SQL QUALIFY ROW_NUMBER() OVER into Presto:
SELECT *
FROM table1
QUALIFY ROW_NUMBER() OVER(ORDER BY id DESC) > 5000000
AND ROW_NUMBER() OVER(ORDER BY id DESC) <= 10000000;
Or provide some suggestions how to extract large datasets by row filtering.
As far as I understand there is no direct analog for QUALIFY clause in PrestoSQL/Trino. You can just use window function in the WHERE clause. Something like this:
SELECT *
FROM table1
WHERE ROW_NUMBER() OVER(ORDER BY id DESC) BETWEEN 5000001
AND 10000000;

SQL Server : using CTE row partition to serialize sequential timestamps

I think I just need a little help with this but is there a way to incrementally count steps in SQL using some type of CTE row partition? I'm using SQL Server 2008 so won't be able to use the LAG function.
In the below, I am trying to find a way to calculate the Step Number as pictured below where for each unique ITEM in my table, in this case G43251, it calculates the process Step_Number based on the Date (timestamp) and the process type. For those with the same timestamp & process_type, it would label them both as the same Step_Number as there other fields that could cause the timestamp to repeat twice.
Right now I am playing around with this below and seeing how maybe I could fit in a DISTINCT timestamp methodology ? So that it doesn't count each row as something new.
WITH cte AS
(
SELECT
*,
ROW_NUMBER() OVER (ORDER BY Timestamp_Posted DESC)
- ROW_NUMBER() OVER (PARTITION BY Item ORDER BY Timestamp_Posted Desc) rn
FROM
#t1
)
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Item, rn ORDER BY Timestamp_Posted DESC) rn2
FROM
cte
ORDER BY
Timestamp_Posted DESC
Please use dense_rank() instead of row_number()
SELECT *, dense_rank() OVER(Partition By Item ORDER BY Timestamp_Posted, Process_Type ) Step_Number
FROM #t1
ORDER BY Timestamp_Posted DESC

How to get the top N percent (e.g., 50%) of a table in BigQuery (standard SQL)?

I have tried the following approaches which none of them worked:
Using SELECT TOP 50 PERCENT: BigQuery does not have top function
Using LIMIT (SELECT COUNT(*) FROM tabl)/2: the reason is BigQuery does not accept any non integer value.
Using SET to set the median value and then use WHERE
In BigQuery I would use window function percent_rank().
select t.* except (prnk)
from (select t.*, percent_rank() over(order by id) prnk from mytable t) t
where prnk <= 0.5
Note: any answer to your question will require that you provide a column to order your data. I assumed that this column is called id.
One method uses window functions:
select t.* except (seqnum, cnt)
from (select t.*, row_number() over (order by ?) as seqnum,
count(*) over () as cnt
from t
) t
where seqnum <= cnt / 2;
Another possibility would be to limit the data with a WHERE clause instead of LIMIT. This is an example if you want yo filter by an ID:
SELECT * FROM table_name as t
WHERE t.id <= (SELECT COUNT(*) FROM table_name)/2;
And if you want to filter by the row number:
SELECT t.* except (rn)
FROM (
SELECT t.*, ROW_NUMBER() OVER () AS rn
FROM table_name as t
) AS t
WHERE t.rn <= (SELECT COUNT(*) FROM table_name)/2;
To scale up, you can use an approx algorithm to find the 50% point:
DECLARE mid_date TIMESTAMP DEFAULT (
SELECT APPROX_QUANTILES(creation_date, 2)[OFFSET(1)] mid_date
FROM `fh-bigquery.stackoverflow_archive.201909_posts_answers` )
;
SELECT mid_date
, COUNTIF(creation_date > mid_date) first_half
, COUNTIF(creation_date < mid_date) second_half
FROM `fh-bigquery.stackoverflow_archive.201909_posts_answers`
Looks like it works well:
Now let's get these records out:
CREATE TABLE `temp.fifty_percent`
AS
SELECT *
FROM `fh-bigquery.stackoverflow_archive.201909_posts_answers`
WHERE creation_date < (
SELECT APPROX_QUANTILES(creation_date, 2)[OFFSET(1)] mid_date
FROM `fh-bigquery.stackoverflow_archive.201909_posts_answers`
)
This method will happily scale, while solutions using OVER(ORDER BY) won't.

How to Pass Query Answer into Limit Function Impala

I am attempting to sample 20% of a table in impala. I have heard somewhere that the built in impala sampling function has issues.
Is there a way to pass in a subquery to the impala limit function to sample n percent of the entire table.
I have something like this:
select
* from
table_a
order by rand()
limit
(
select
round( (count(distinct ids)) *.2,0)
from table_a)
)
The sub query gives me 20% of all records
I'm not sure if Impala has specific sampling logic (some databases do). But you can use window functions:
select a.*
from (select a.*,
row_number() over (order by rand()) as seqnum,
count(*) over () as cnt
from table_a
) a
where seqnum <= cnt * 0.2;

Get rows from the table using row no in sql server

I want to get rows from 100-150 from my table in sql server 2008, how i can do that? Is there any way to do so? as much i search Limit keyword is available in mysql but for sql server use common table technique but i don't want to do like that is there any other way available as it is available in Mysql?
select * from
(select row_number() over (order by #column) as row,* from Table) as t
where row between 100 and 150
#column to be replaced by a colomn from your table witch well be used to order the result
use sql limit
http://php.about.com/od/mysqlcommands/g/Limit_sql.htm
In SQL 2005 and above there is a ROW_NUMBER() function. If you need something that works for both MySQL and SQL Server though then I don't know if this is available in MySQL as I've never used it.
http://msdn.microsoft.com/en-us/library/ms186734.aspx
The example given in the linked page that seems most relevant is the following, where the results of a query are ordered by date, and then rows 50 to 60 from that result set are returned.
USE AdventureWorks2012;
GO
WITH OrderedOrders AS
(
SELECT SalesOrderID, OrderDate,
ROW_NUMBER() OVER (ORDER BY OrderDate) AS RowNumber
FROM Sales.SalesOrderHeader
)
SELECT SalesOrderID, OrderDate, RowNumber
FROM OrderedOrders
WHERE RowNumber BETWEEN 50 AND 60;
Actuall, the least expensive way to do this is using top, and then row_number()
select *
from (select *, row_number() over (order by (select NULL)) as rownum
from (select top 150 t.*
from t
) t
) t
where rownum >= 100
However, I do give you one caution. There is no such thing as rows 100-150 in a relational table, because these are inherently unordered. You need to specify the ordering. For this, you need order by:
select *
from (select *, row_number() over (order by <field>) as rownum
from (select top 150 t.*
from t
order by <field>
) t
) t
where rownum >= 100