Select max value of each group using partition by - sql

I have the following code which is taking a looong time to get executed. What I need to do is select the column having row number equals 1 after partitioning it by three columns (col_1, col_2, col_3) [which are also the key columns] and ordering by some columns as mentioned below. The number of records in the table is around 90 million. Am I following the best approach or is there any other better one?
with cte as (SELECT
b.*
,ROW_NUMBER() OVER ( PARTITION BY col_1,col_2,col_3
ORDER BY new_col DESC, new_col_2 DESC, new_col_3 DESC ) AS ROW_NUMBER
FROM (
SELECT
*
,CASE
WHEN update_col = ' ' THEN new_update_col
ELSE update_col
END AS new_col_1
FROM schema_name.table_name
) b
)
select top 10 * from cte WHERE ROW_NUMBER=1

Currently you are applying CASE on different columns which is impacting all rows in the database table. CASE (String Comparison) Is a costly method.
At the end, you are keeping only records with ROW NUMBER = 1. If I guess this filter keeping Half of your all records, this will increase the query execution time if you filter (Generate ROW NUMBER First and Keep Rows with RN=1) first and then apply CASE method on columns.

Related

BigQuery - Select only first row in BigQuery

I have a table with data where in Column A I have groups of repeating Data (one after another).
I want to select only first row of each group based on values in column A only (no other criteria). Mind you, I want all corresponding columns selected also for the mentioned new found row (I don't want to exclude them).
Can someone help me with a proper query.
Here is a sample:
SAMPLE
Thanks!
#standardSQL
SELECT row.*
FROM (
SELECT ARRAY_AGG(t LIMIT 1)[OFFSET(0)] row
FROM `project.dataset.table` t
GROUP BY columnA
)
you can try smth like this:
#standardSQL
SELECT
* EXCEPT(rn)
FROM (
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY columnA ORDER BY columnA) AS rn
FROM
your_dataset.your_table)
WHERE rn = 1
that will return:
Row columnA col2 ...
1 AC1001 Z_Creation
2 ACO112BISPIC QN
...
Add LIMIT 1 at the end of the query
something like
SELECT name, year FROM person_table ORDER BY year LIMIT 1
You can now use qualify for a more concise solution:
select
*
from
your_dataset.your_table
where true
qualify ROW_NUMBER() OVER(PARTITION BY columnA ORDER BY columnA) = 1
In BigQuery the physical sequence of rows is not significant. “BigQuery does not guarantee a stable ordering of rows in a table. Only the result of a query with an explicit ORDER BY clause has well-defined ordering.”[1].
First, you need to define which property will determine the first row of your group, then you can run Vasily Bronsky’s query by changing ORDER BY with that property. Which means either you should add another column to the table to store the order of the rows or select one from the columns you have.

Problems with ordering sql server order by clause

Check Image
I have a problem with an order by sql server, I need to sort the records in the Ejey field, as follows:
Alta/Deficiente
Baja/Optima
Deficiente/Deficiente
Media/Alta
Optima/Deficiente
.... So Suspensively
As shown by the selected records in gray, I clarify that my table of the image has 625 records and the other fields must remain the same.
As far as each value of EjeY has the same count of rows (5) one idea is to order the rows by the modulo of its row's row number with this count.
This could work:
--Save your data in a temp table ordered by the value of EjeY column
SELECT *
INTO #TEMP2
FROM
(
SELECT *
FROM YouTable
) AS TEMP1
ORDER BY TEMP1.EjeY --Here you modify the ordering mode you want every time your EjeY values to be displayed
--For each row of the temp table calculate the module the row number
--with the count of rows per EjeY value
--(that must be 5 in your situation)
--and order the table by this value and then by row number)
SELECT
row_col%(select top 1 count(EjeY) from YouTable group by EjeY) AS mod_col,
row_col,
TEMP3.* --your columns
FROM
(SELECT *,
ROW_NUMBER() OVER (ORDER BY (SELECT null))-1 as row_col
FROM #TEMP2) as TEMP3
ORDER BY mod_col,TEMP3.row_col

NTH row query in Oracle not behaving as expected

Given;
CREATE TABLE T1 (ID INTEGER, DESCRIPTION VARCHAR2(20));
INSERT INTO T1 VALUES (1,'ONE');
INSERT INTO T1 VALUES (2,'TWO');
INSERT INTO T1 VALUES (3,'THREE');
INSERT INTO T1 VALUES (4,'FOUR');
INSERT INTO T1 VALUES (5,'FIVE');
COMMIT;
Why does;
SELECT * FROM
( SELECT ROWNUM, ID, DESCRIPTION
FROM T1)
WHERE MOD(ROWNUM,1)=0;
Return
ROWNUM ID DESCRIPTION
------ -------------------------------------- --------------------
1 1 ONE
2 2 TWO
3 3 THREE
4 4 FOUR
5 5 FIVE
Whereas;
SELECT * FROM
( SELECT ROWNUM, ID, DESCRIPTION
FROM T1)
WHERE MOD(ROWNUM,2)=0;
Return zero rows ???
Confused, expected ROWNUM=(2,4) to be returned...
SELECT B.* FROM
( SELECT ROWNUM a, ID, DESCRIPTION
FROM T1) B
WHERE MOD(A,2)=0;
Reason: Your approach involves running rownum twice. You don't need to; nor really do you want to. Based on order of operations, the where clause will execute before the the outer select; which means the select hasn't determined the values for each row, and the number of rows is not known yet.
Additional:
I would recommend adding an order by to the inline view so the rownumbers are in a expected specific order as opposed to what the engine derives.
You have 2 operations of ROWNUM.
The 1st ROWNUM generates the numbers 1 through 5.
The 2nd ROWNUM doesn't generate anything because for the row the ROWNUM value is 1, but since MOD(1,2)=0 is false, the record is not being outputted and the ROWNUM is not being incremented, failing the condition again and again.
This query, using alias, returns exactly what you have expected:
SELECT * FROM
( SELECT ROWNUM as rn, ID, DESCRIPTION
FROM T1)
WHERE MOD(rn,2)=0;
Some facts about the ROWNUM pseudo column in Oracle:
The ROWNUM assigned to each row is determined by the order in which Oracle retrieves the row from the DB.
The order in which rows are returned is non deterministic, such that running it once may return rows in one ordering, and a second time around may have a different ordering if the base tables have been reorganized, or Oracle uses a different query plan.
The order in which ROWNUMs are assigned to rows is not necessarily correlated with the that of an order by clause (the order by clause may affect the ROWNUM order since it may cause a different query plan to be used, but the ROWNUMbers are unlikely to match the sort order).
ROWNUMbers are assigned after the records are filtered by the WHERE clause, so if you filter out ROWNUM 1 you will never return any records.
Filtering a subquery that returns an aliased ROWNUM column works because the entire subquery is returned to the outer query before the outer query filters the rows, but the ROWNUMs will still have a non deterministic order.
To successfully return a top N or Nth row query in a deterministic fashion you need to assign row numbers in a deterministic way. One such way is to use the the `ROW_NUMBER' analytic function in a subquery:
select * from
(select ROW_NUMBER() over (order by ID) rn
, ID
, DESCRIPTION
from T1)
where rn <= 4 -- Top N
or
where rn = 4 -- 1st Nth row
or even
WHERE MOD(rn,2)=0 -- every Nth row
In either case the ORDER BY clause in the ROW_NUMBER analytic function needs to match the granularity of the data otherwise ties in the ordering will again be non deterministic, most likely matching the current ROWNUM ordering.

How to select the rows in original order in Hive?

I want to select rows from mytable in original rows with definite numbers.
As we know, the key word 'limit' will randomly select rows. The rows in mytable are in order. I just want to select them in their original order. For example, to select the 10000 rows which means from row 1 to row 10000.
How to realize this?
Thanks.
Try:
SET mapred.reduce.tasks = 1
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER () AS row_num
FROM table ) table1
SORT BY row_num LIMIT 10000
Rows in your table may be in order but...
Tables are being read in parallel, results returned from different mappers or reducers not in original order. That is why you should know the rule defining "original order".
If you know then you can use row_number() or order by. For example:
select * from table order by ... limit 10000;

Get a row number on select statement while matching entire row

I am trying to get a row number of the row. Since the table doesn't have any id column, I have used ROW_NUMBER() without any order which is shown below.
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS SNO, *
FROM [table1]
Now the challenge is i need to find a row with a condition which is just a select statement with where clause but with a original row number.
SELECT TOP 1 *
FROM table1
WHERE [Total Sales] = 2555
This statement returns a single record. I have tried to use INTERSECT to combine both statements to get result with row number.
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT 100)) AS SNO, *
FROM [table1]
INTERSECT
SELECT TOP 1 *
FROM table1
WHERE [Total Sales] = 2555
Of course, this throws errors since number of columns are different. So what is the correct way to get the actual row number ?
When you run this query:
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS SNO, t.*
FROM [table1] t;
The SNO values are unstable. That means that the same query run multiple times might return different numbers. Sorting in SQL is not stable. That means that identical keys can be in an arbitrary order when the query is run multiple times. Why? SQL tables and result sets represent unordered sets. There is nothing to base a stable sort on.
The simplistic answer to your question is to use a subquery:
SELECT t.*
FROM (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS SNO, t.*
FROM [table1] t
) t
WHERE [Total Sales] = 2555;
However, the real answer is that you should be using multiple columns to create a stable sort, if you want to use this value for more than one query.
SQL does not have an initial "row number" for the entries. The table order shown is all based on the query results. If you are looking to keep them in the order they are put into the DB then maybe add a time stamp that's generated with a trigger and attached to the row when it's inserted. Then using this times tamp you can have them sorted by that.
What's the primary key if there is no I'd?