Partitioning in view of BigQuery is not remaining when create table - sql

I'm trying to run
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY t.Barcode, t.Country_Code ) AS seqnum_c
FROM t
in BigQuery which shows the approprite result. But the problem is when I want to create a table with the same order it's become a mess and order would not considered.
CREATE OR REPLACE TABLE `test_2` AS
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY t.Barcode, t.Country_Code ) AS seqnum_c
FROM t
IN Addition I tried:
CREATE OR REPLACE TABLE `test_2` AS
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY t.Barcode, t.Country_Code ORDER BY t.Barcode, t.Country_Code) AS seqnum_c
FROM t
And got the same result.
Have you ever faced the same issue?

Thanks #ken for your response. I guess I found my answer which is:
CREATE OR REPLACE TABLE t
AS (
SELECT t.*,
ROW_NUMBER() over (partition by t.Barcode, t.Country_Code order by Barcode, Country_Code ) as seqnum_c
FROM t)
ORDER BY Barcode,Country_Code,seqnum_c);
Best

You need to specify how you want the rows within the partition to be ordered in order for it to be deterministic.
It looks like you attempted to do this in your second example, but you did ORDER BY t.Barcode, t.Country_Code which are exactly your partition columns. That means that within each partition, each row will already have exactly the same barcode and country_code so effectively, there is no ordering happening.
For example, given the following rows
Barcode Country_Code Timestamp
111 USA 12345
111 USA 12346
111 JP 12350
You are partitioning by Barcode and Country_code so the first two rows will be a part of the same partition. However, since you don't specify an order, you cannot know which row will get which row number. In the example above, it would make sense to ORDER BY Timestamp, but without knowing your data or your goals it's hard to say what the right logic is for you.
In short, you need to specify an ORDER BY column that is not a part of the PARTITION BY columns in order to deterministically order the rows within each partition.

Related

How to use RANK OVER PARTITION BY to create rankings based on two columns?

I have duplicate records caused by data inconsistency. I am trying to take only one record for each patient (taking the latest record), who each have dozens of duplicate records due to address changes.
When I run the code below, each record in my table seems to be assigned a rank of 1. How can I assign rankings specific to each Patient ID?
SELECT DISTINCT
PATIENT_ID
,ADDRESS_START_DATE
,ADDRESS_END_DATE
,RANK() OVER (PARTITION BY PATIENT_ID ,ADDRESS_START_DATE ORDER BY ADDRESS_START_DATE DESC) AS Ind
FROM Member_Table
;
You shouldn't partition by the address_start_date if you're ordering by it:
SELECT DISTINCT
PATIENT_ID
,ADDRESS_START_DATE
,ADDRESS_END_DATE
,RANK() OVER (PARTITION BY PATIENT_ID ORDER BY ADDRESS_START_DATE DESC) AS Ind
FROM Member_Table
;

MSSQL: Why won't ROW_NUMBER give me expected results?

I have a table with a datetime field ("time") and an int field ("index")
Please see the query and the picture below. I want ROW_NUMBER to count from 1 when the index changes, also if the index value exists in previous rows. The red text indicates the output that I want to get from the query. How can I modify the query to give me the expected results?
The query:
select rv.[time], rv.[index], ROW_NUMBER() OVER(PARTITION BY rv.[index] ORDER BY rv.[time], rv.[index] ASC) AS Row#
from
tbl
This is a gaps-and-islands problem. You need to identify groups of adjacent rows. In this case, I think the simplest method is the difference of row numbers:
select rv.*,
row_number() over (partition by index, (seqnum - seqnum_2) order by time) as row_num
from (select t.*,
row_number() over (order by time) as seqnum,
row_number() over (partition by index order by time) as seqnum_2
from tbl t
) rv;
Why this works is a little tricky to explain. If you look at the results of the subquery, you will see how the difference between the two row number values identifies adjacent values that are the same.
Also, you should not use names like time and index for columns, because these a keywords in SQL. I have not escaped the names in the above query. I encourage you to give your columns and tables names that do not need to be escaped.

Skip row count based on the value of one column using ROW_NUMBER

I am using ROW_NUMBER() to organize the number of times a certain Code is used for each VisitID. Below is a modified piece of the query for an example.
SELECT
ROW_NUMBER() OVER (PARTITION BY VisitID ORDER BY EventActualDateTime) AS 'RowNum'
,VisitID
,EventActualDateTime
,Code
,LocationID
FROM
AdmVisitEvents
WHERE
VisitID = '6012227281'
and Code IN ('ENADMIN','TFRADMIN')
I am trying to figure out a way to eliminate a ROW if the LocationID is the same as the previous row.
So my result set should look like:
This could occur earlier in the row count too. For instance if the first TFRADMIN Code had the same LocationID as the ENADMIN* Code I would need to skip that row as well. (*The Codes 'ENADMIN' or 'OBSVTOIN' will always be ROW 1, and ROW 2 on will always be a 'TFRADMIN' Code).
So another example would be:
If this was my result it should only show:
This is untested in the absence of usable sample data (an image isn't usable, as the only way that the volunteers can use it is my transcribing it), however, LAG should help you achieve this:
WITH CTE AS
(SELECT VisitID,
EventActualDateTime,
Code,
LocationID,
LAG(LocationID) OVER (PARTITION BY VisitID ORDER BY EventActualDateTime) AS PreviousLocationID
FROM AdmVisitEvents
WHERE VisitID = '6012227281'
AND Code IN ('ENADMIN', 'TFRADMIN'))
SELECT ROW_NUMBER() OVER (PARTITION BY VisitID ORDER BY EventActualDateTime) AS RowNum,
VisitID,
EventActualDateTime,
Code,
LocationID
FROM CTE
WHERE LocationID != PreviousLocationID OR PreviousLocationID IS NULL;
Note that the PARTITION BY clauses aren't really needed in these queries, due to your WHERE (VisitID = '6012227281'). As VisitID can only have one scalar value, the PARTITION BY will never generate values for another "set".
You can use row_number() with ties clause :
SELECT TOP (1) WITH TIES VisitID, EventActualDateTime, Code, LocationID
FROM AdmVisitEvents
WHERE VisitID = '6012227281' AND
Code IN ('ENADMIN','TFRADMIN')
ORDER BY ROW_NUMBER() OVER (PARTITION BY VisitID, Code, LocationID ORDER BY EventActualDateTime);

SQL - New ID for a partition

I want to have an ID which is the same for all entries within a partition.
Thinking of a table with columns lie this (having a table with the three columns on the left
How can I generate the new ID on the right?
I thought about row_numer() over (Partion by)... but I could not find a good way to do it.
SELECT *, RANK() OVER (ORDER BY name, attr) as new_id
FROM YourTable

Sequence within a partition in SQL server

I have been looking around for 2 days and have not been able to figure out this one. Using dataset below and SQL server 2016 I would like to get the row number of each row by 'id' and 'cat' ordered by 'date' in asc order but would like to see a reset of the sequence if a different value in the 'cat' column for the same 'id' is found(see rows in green). Any help would be appreciated.
This is a gaps and islands problem. The simplest solution in this case is probably a difference of row numbers:
select t.*,
row_number() over (partition by id, cat, seqnum - seqnum_c order by date) as row_num
from (select t.*,
row_number() over (partition by id order by date) as seqnum,
row_number() over (partition by id, cat order by date) as seqnum_c
from t
) t;
Why this works is a bit tricky to explain. But, if you look at the sequence numbers in the subquery, you'll see that the difference defines the groups you want to define.
Note: This assumes that the date column provides a stable sort. You seem to have duplicates in the column. If there really are duplicates and you have no secondary column for sorting, then try rank() or dense_rank() instead of row_number().