Rewriting Tearadata SQL to BigQuery - google-bigquery

This is a bit abstract, but how could I rewrite the following TD SQL into a bigQuery SQL
SELECT UserID
,Created_DT
,MIN(CREATED_DT) OVER(PARTITION BY UserID
ORDER BY CREATED_DT ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS LEAD_CREATED_DT
FROM TABLE

Related

How do I select 1 [oldest] row per group of rows, given multiple groups?

Let's say we have the database table below, called USER_JOBS.
I'd like to write an SQL query that reflects this algorithm:
Divide the whole table in groups of rows defined by a common USER_ID (in the example table, the 2 resulting groups are colored yellow & green)
From each group, select the oldest row (according to SCHEDULE_TIME)
From this example table, the desired SQL query would return these 2 rows:
You can use ranking function (supported in most RDBS):
SELECT *
FROM
(
SELECT *
,ROW_NUMBER() OVER (PARTITION BY USER_ID ORDER BY SCHEDULE_TIME DESC) AS RowID
FROM [table]
)
WHERE RowID = 1
WITH Ranked AS (
SELECT
RANK() OVER (PARTITION BY User_ID ORDER BY ScheduleTime DESC) as Ranking,
*
FROM [table_name]
)
SELECT Status, Sob_Type, User_ID, TimeStamp FROM ranking WHERE Ranks = 1;

How to use Power Query to recreate this SQL OVER PARTITION?

I'm trying to recreate the following logic I create in SQL in Power Query. I'm tried doing it by grouping by the OrderNumber column, but I wasn't able to use a rank function or anything similar. I'm trying to keep the query folding enabled if I can.
SELECT * -- columns
FROM(
SELECT j.*,
rn = ROW_NUMBER() OVER(
PARTITION BY OrderNumber
ORDER BY LineNumber, DateReceived DESC, StatusName DESC
)
FROM [dbo].[Orders] j
) ord
WHERE
ord.rn = 1
Here's a sample of what I'm trying to achieve with this query. I want to rank each order by their LineNumber, in order to select only lowest LineNumber for each order, and in the case of a tie, I use the OrderStatus to determine a winner.

Select only 3 rows per user - SQL Query

These are the columns in my table
id (autogenerated)
created_user
created_date
post_text
This table has lot of values. I wanted to take latest 3 posts of every created_user
I am new to SQL and need help. I ran the below query in my Postgres database and it is not helpful
SELECT * FROM posts WHERE created_date IN
(SELECT MAX(created_date) FROM posts GROUP BY created_date)
You could use the row_number() window function to create an ordered row count per user. After that you can easily filter by this value
demo:db<>fiddle
SELECT
*
FROM (
SELECT
*,
row_number() OVER (PARTITION BY created_user ORDER BY created_date DESC)
FROM
posts
) s
WHERE row_number <= 3
PARTITION BY groups the users
ORDER BY date DESC orders the posts of each user into a descending order to get the most recent as row_count == 1, ...

Query Hive table using ROWNUM

How can I query a Hive table specific to row number.
For example :
Let say I want to print out all records of Hive table from row number 2 to 5.
I actually recently updated the documentation regarding the offset option
... order by ... limit 1,4
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-LIMITClause
This answer seems like what you're asking:
SQL most recent using row_number() over partition
In other words:
SELECT user_id, page_name, recent_click
FROM (
SELECT user_id,
page_name,
row_number() over (partition by session_id order by ts desc) as recent_click
from clicks_data
) T
WHERE recent_click between 2 and 5

Optimizing query with two MAX columns in the same table

I need to optimize below query
SELECT
Id, -- identity
CargoID,
[Status] AS CurrentStatus
FROM
dbo.CargoStatus
WHERE
id IN (SELECT TOP 1 ID
FROM dbo.CargoStatus CS
INNER JOIN STD.StatusMaster S ON CS.ShipStatusID = S.SatusID
WHERE CS.CargoID=CargoStatus.CargoID
ORDER BY YEAR([CS.DATE]) DESC, MONTH([CS.DATE]) DESC,
DAY([CS.DATE]) DESC, S.StatusStageNumber DESC)
There are two tables
CargoStatus, and
StatusMaster
Statusmaster has columns StatusID, StatusName, StatusStageNumber(int)
CargoStatus has columns ID, StatusID (FK StatusMaster StatusID column), Date
Is there any other better way of writing this query.
I want latest status for each cargo (only one entry per cargoID).
Since you seem to be using SQL Server 2005 or newer, you can use a CTE with the ROW_NUMBER() windowing function:
;WITH LatestCargo AS
(
SELECT
cs.Id, -- identity
cs.CargoID,
cs.[Status] AS CurrentStatus
ROW_NUMBER() OVER(PARTITION BY cs.CargoID
ORDER BY cs.[Date], s.StatusStageNumber DESC) AS 'RowNum'
FROM
dbo.CargoStatus cs
INNER JOIN
STD.StatusMaster s ON cs.ShipStatusID = s.[StatusID]
)
SELECT
Id, CargoID, [Status]
FROM
LatestCargo
WHERE
RowNum = 1
This CTE "partitions" your data by CargoID, and for each partition, the ROW_NUMBER function hands out sequential numbers, starting at 1 and ordered by Date DESC - so the latest row gets RowNum = 1 (for each CargoID) which is what I select from the CTE in the SELECT statement after it.