BigQuery - Select only first row in BigQuery - google-bigquery

I have a table with data where in Column A I have groups of repeating Data (one after another).
I want to select only first row of each group based on values in column A only (no other criteria). Mind you, I want all corresponding columns selected also for the mentioned new found row (I don't want to exclude them).
Can someone help me with a proper query.
Here is a sample:
SAMPLE
Thanks!

#standardSQL
SELECT row.*
FROM (
SELECT ARRAY_AGG(t LIMIT 1)[OFFSET(0)] row
FROM `project.dataset.table` t
GROUP BY columnA
)

you can try smth like this:
#standardSQL
SELECT
* EXCEPT(rn)
FROM (
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY columnA ORDER BY columnA) AS rn
FROM
your_dataset.your_table)
WHERE rn = 1
that will return:
Row columnA col2 ...
1 AC1001 Z_Creation
2 ACO112BISPIC QN
...

Add LIMIT 1 at the end of the query
something like
SELECT name, year FROM person_table ORDER BY year LIMIT 1

You can now use qualify for a more concise solution:
select
*
from
your_dataset.your_table
where true
qualify ROW_NUMBER() OVER(PARTITION BY columnA ORDER BY columnA) = 1

In BigQuery the physical sequence of rows is not significant. “BigQuery does not guarantee a stable ordering of rows in a table. Only the result of a query with an explicit ORDER BY clause has well-defined ordering.”[1].
First, you need to define which property will determine the first row of your group, then you can run Vasily Bronsky’s query by changing ORDER BY with that property. Which means either you should add another column to the table to store the order of the rows or select one from the columns you have.

Related

how to get latest date column records when result should be filtered with unique column name in sql?

I have table as below:
I want write a sql query to get output as below:
the query should select all the records from the table but, when multiple records have same Id column value then it should take only one record having latest Date.
E.g., Here Rudolf id 1211 is present three times in input---in output only one Rudolf record having date 06-12-2010 is selected. same thing with James.
I tried to write a query but it was not succssful. So, please help me to form a query string in sql.
Thanks in advance
You can partition your data over Date Desc and get the first row of each partition
SELECT A.Id, A.Name, A.Place, A.Date FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Id ORDER BY Date DESC) AS rn
FROM [Table]
) A WHERE A.rn = 1
you can use WITH TIES
select top 1 PERCENT WITH TIES * from t
order by (row_number() over(partition by id order by date desc))
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=280b7412b5c0c04c208f2914b44c7ce3
As i can see from your example, duplicate rows differ only in Date. If it's a case, then simple GROUP BY with MAX aggregate function will do the job for you.
SELECT Id, Name, Place, MAX(Date)
FROM [TABLE_NAME]
GROUP BY Id, Name, Place
Here is working example: http://sqlfiddle.com/#!18/7025e/2

Select last duplicate row with different id Oracle 11g

I have a table that look like this:
The problem is I need to get the last record with duplicates in the column "NRODENUNCIA".
You can use MAX(DENUNCIAID), along with GROUP BY... HAVING to find the duplicates and select the row with the largest DENUNCIAID:
SELECT MAX(DENUNCIAID), NRODENUNCIA, FECHAEMISION, ADUANA, MES, NOMBREESTADO
FROM YourTable
GROUP BY NRODENUNCIA, FECHAEMISION, ADUANA, MES, NOMBREESTADO
HAVING COUNT(1) > 1
This will only show rows that have at least one duplicate. If you want to see non-duplicate rows too, just remove the HAVING COUNT(1) > 1
There are a number of solutions for your problem. One is to use row_number.
Note that I've ordered by DENUNCIID in the OVER clause. This defines the "Last Record" as the one that has the largest DENUNCIID. If you want to define it differently you'd need to change the field that is being ordered.
with dupes as (
SELECT
ROW_NUMBER() OVER (Partition by NRODENUNCIA ORDER BY DENUNCIID DESC) RN,
*
FROM
YourTable
)
SELECT * FROM dupes where rn = 1
This only get's the last record per dupe.
If you want to only include records that have dupes then you change the where clause to
WHERE rn =1
and NRODENUNCIA in (select NRODENUNCIA from dupes where rn > 1)

How to select the rows in original order in Hive?

I want to select rows from mytable in original rows with definite numbers.
As we know, the key word 'limit' will randomly select rows. The rows in mytable are in order. I just want to select them in their original order. For example, to select the 10000 rows which means from row 1 to row 10000.
How to realize this?
Thanks.
Try:
SET mapred.reduce.tasks = 1
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER () AS row_num
FROM table ) table1
SORT BY row_num LIMIT 10000
Rows in your table may be in order but...
Tables are being read in parallel, results returned from different mappers or reducers not in original order. That is why you should know the rule defining "original order".
If you know then you can use row_number() or order by. For example:
select * from table order by ... limit 10000;

How to retrieve specific rows from SQL Server table?

I was wondering is there a way to retrieve, for example, 2nd and 5th row from SQL table that contains 100 rows?
I saw some solutions with WHERE clause but they all assume that the column on which WHERE clause is applied is linear, starting at 1.
Is there other way to query a SQL Server table for a specific rows in case table doesn't have a column whose values start at 1?
P.S. - I know for a solution with temporary tables, where you copy your select statement output and add a linear column to the table. I am using T-SQL
Try this,
SELECT * FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY ColumnName ASC) AS rownumber
FROM TableName
) as temptablename
WHERE rownumber IN (2,5)
With SQL Server:
; WITH Base AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY id) RN FROM YourTable
)
SELECT *
FROM Base WHERE RN IN (2, 5)
The id that you'll have to replace with your primary key or your ordering, YourTable that is your table.
It's a CTE (Common Table Expression) so it isn't a temporary table. It's something that will be expanded together with your query.
There is no 2nd or 5th row in the table.
There is only the 2nd or 5th result in a resultset that you return, as determined by the order you specify in that query.
If you are on SQL Server 2005 or above, you could use Row_Number() function. Ex:
;With CTE as (
select col1, ..., row_number() over (order by yourOrderingCol) rn
from yourTable
)
select col1,...
from cte
where rn in (2,5)
Please note that yourOrderingCol will decide the value of row number (i.e. rn).

MSSQL Select statement with incremental integer column... not from a table

I need, if possible, a t-sql query that, returning the values from an arbitrary table, also returns a incremental integer column with value = 1 for the first row, 2 for the second, and so on.
This column does not actually resides in any table, and must be strictly incremental, because the ORDER BY clause could sort the rows of the table and I want the incremental row in perfect shape always.
The solution must run on SQL Server 2000
For SQL 2005 and up
SELECT ROW_NUMBER() OVER( ORDER BY SomeColumn ) AS 'rownumber',*
FROM YourTable
for 2000 you need to do something like this
SELECT IDENTITY(INT, 1,1) AS Rank ,VALUE
INTO #Ranks FROM YourTable WHERE 1=0
INSERT INTO #Ranks
SELECT SomeColumn FROM YourTable
ORDER BY SomeColumn
SELECT * FROM #Ranks
Order By Ranks
see also here Row Number
You can start with a custom number and increment from there, for example you want to add a cheque number for each payment you can do:
select #StartChequeNumber = 3446;
SELECT
((ROW_NUMBER() OVER(ORDER BY AnyColumn)) + #StartChequeNumber ) AS 'ChequeNumber'
,* FROM YourTable
will give the correct cheque number for each row.
Try ROW_NUMBER()
http://msdn.microsoft.com/en-us/library/ms186734.aspx
Example:
SELECT
col1,
col2,
ROW_NUMBER() OVER (ORDER BY col1) AS rownum
FROM tbl
It is ugly and performs badly, but technically this works on any table with at least one unique field AND works in SQL 2000.
SELECT (SELECT COUNT(*) FROM myTable T1 WHERE T1.UniqueField<=T2.UniqueField) as RowNum, T2.OtherField
FROM myTable T2
ORDER By T2.UniqueField
Note: If you use this approach and add a WHERE clause to the outer SELECT, you have to added it to the inner SELECT also if you want the numbers to be continuous.