sql resultset records numbering? - sql

I want to number the rows in my result set. I want some way I can have a result set of 3 records use some SQL keyword to generate a column that would read 1,2,3 for watch of the records...
I know I can make a temp table with an auto increment column but i wanted to know if there was a way I can get this back from a SQL query?
SELECT row_count,
project_name,
project_id
FROM Project
anything available like "row_count" that i am dreaming of?

Ordering by SELECT 0 will give you an incrementing column and saves a potentially unnecessary sort.
SELECT ROW_NUMBER() over (order by (select 0)) as row_count,
project_name,
project_id
FROM Project

Sounds like you want ROW_NUMBER, an analytic/ranking function. But it's unclear what you want to base the numbering on - this assumes the project_id, starting at the smallest value:
SELECT ROW_NUMBER() OVER(ORDER BY p.project_id) AS row_count,
p.project_name,
p.project_id
FROM PROJECT p

Related

What else do I need to add to my SQL query to bring related information in other columns if using MIN() GROUP BY

There is a table with the following column headers: indi_cod, ries_cod, date, time and level. Each ries_cod contains more than one indi_cod, and these indi_cod are random consecutive numbers.
Which SQL query would be appropriate to build if the aim is to find the smallest ID of each ries_cod, and at the same time bring its related information corresponding to date, time and level?
I tried the following query:
SELECT MIN (indi_cod) AS min_indi_cod
FROM my-project-01-354113.indi_cod.second_step
GROUP BY ries_cod
ORDER BY ries_cod
And, indeed, it presented me with the minimum value of indi_cod for each group of ries_cod, but I couldn't write the appropriate query to bring me the information from the date, time and level columns corresponding to each indi_cod.
I usually use some kind of ranking for this type of thing. you can use row_number, rank, or dense_rank depending on your rdbms. here is an example.
with t as(select a.*,
row_number() over (partition by ries_cod, order by indi_cod) as rn
from mytable)
select * from t where rn = 1
in addition if you are using oracle you can do this without two queries by using keep.
https://renenyffenegger.ch/notes/development/databases/SQL/select/group-by/keep-dense_rank/index
I think you just need to group by with the other columns
SELECT MIN (indi_cod), ries_cod, date, time, level AS min_indi_cod
FROM mytavke p
GROUP BY ries_cod, date, time, level
ORDER BY ries_cod

Select rows by index in Amazon Athena

This is a very simple question but I can't seem to find documentation on it. How one would query rows by index (ie select the 10th through 20th row in a table)?
I know there's a row_numbers function but it doesn't seem to do what I want.
Do not specify any partition so your row number will be an integer between 1 and your number of record.
SELECT row_num FROM (
SELECT row_number() over () as row_num
FROM your_table
)
WHERE row_num between 100000 and 100010
I seem to have found a roundabout and clunky way of doing this in Athena, so any better answers are welcome. This approach requires you have some numeric column in your table already, in this case named some_numeric_column:
SELECT some_numeric_column, row_num FROM (
SELECT some_numeric_column,
row_number() over (order by some_numeric_column) as row_num
FROM your_table
)
WHERE row_num between 100000 and 100010
To explain, you first select some numeric column in your data, then create a column (called row_num) of row numbers which is based on the order of your selected numeric column. Then you wrap that all in a select call because Athena doesn't support creating and then conditioning on the row_num column within a single call. If you don't wrap it in a second SELECT call Athena will spit out some errors about not finding a column named row_num.

SQL Eliminate Duplicates with NO ID

I have a table with the following Columns...
Node, Date_Time, Market, Price
I would like to delete all but 1 record for each Node, Date time.
SELECT Node, Date_Time, MAX(Price)
FROM Hourly_Data
Group BY Node, Date_Time
That gets the results I would like to see but cant figure out how to remove the other records.
Note - There is no ID for this table
Here are steps that are rather workaround than a simple one-command which will work in any relational database:
Create new table that looks just like the one you already have
Insert the data computed by your group-by query to newly created table
Drop the old table
Rename new table to the name the old one used to have
Just remember that locking takes place and you need to have some maintenance time to perform this action.
There are simpler ways to achieve this, but they are DBMS specific.
here is an easy sql-server method that creates a Row Number within a cte and deletes from it. I believe this method also works for most RDBMS that support window functions and Common Table Expressions.
;WITH cte AS (
SELECT
*
,RowNum = ROW_NUMBER() OVER (PARTITION BY Node, Date_Time ORDER BY Price DESC)
FROM
Hourly_Data
)
DELETE
FROM
cte
WHERE
RowNum > 1

Return only the newest rows from a BigQuery table with a duplicate items

I have a table with many duplicate items – Many rows with the same id, perhaps with the only difference being a requested_at column.
I'd like to do a select * from the table, but only return one row with the same id – the most recently requested.
I've looked into group by id but then I need to do an aggregate for each column. This is easy with requested_at – max(requested_at) as requested_at – but the others are tough.
How do I make sure I get the value for title, etc that corresponds to that most recently updated row?
I suggest a similar form that avoids a sort in the window function:
SELECT *
FROM (
SELECT
*,
MAX(<timestamp_column>)
OVER (PARTITION BY <id_column>)
AS max_timestamp,
FROM <table>
)
WHERE <timestamp_column> = max_timestamp
Try something like this:
SELECT *
FROM (
SELECT
*,
ROW_NUMBER()
OVER (
PARTITION BY <id_column>
ORDER BY <timestamp column> DESC)
row_number,
FROM <table>
)
WHERE row_number = 1
Note it will add a row_number column, which you might not want. To fix this, you can select individual columns by name in the outer select statement.
In your case, it sounds like the requested_at column is the one you want to use in the ORDER BY.
And, you will also want to use allow_large_results, set a destination table, and specify no flattening of results (if you have a schema with repeated fields).

Re-indexing a column with either SQL or PL/SQL

I have several tables that use an ID number plus a column called xsequence that are both primary keys. Currently, I have a bunch of data that looks like this:
ID_NUMBER,XSEQUENCE
001,2
001,5
001,8
002,1
002,6
What I need to end up with is:
ID_NUMBER,XSEQUENCE
001,1
001,2
001,3
002,1
002,2
What is the best way of going about starting this? Every time I try, I just end up spinning my wheels.
Try something like this:
select id_number,
row_number() over (partition by id_number order by xsequence) new_xsequence
from yourtable
That's an analytic function really handy for this sort of thing. Using the Partition keyword - "resets" the counter at each id_number. (so 1,2,3 .. then starts again 1,2,3 ... etc.).
(The Partition keyword in analytic functions behaves very similar to the GROUP by keyword)
[edit]
To UPDATE the original table, I actually prefer the MERGE statement - it's a bit simpler syntax wise, and seems a bit more intuitive ;) )
MERGE INTO yourtable base
USING (
select rowid rid,
id_number,
row_number() over (partition by id_number order by xsequence) new_xsequence,
xsequence old_xsequence
from yourtable
) new
ON ( base.rowid = new.rid )
WHEN MATCHED THEN UPDATE
SET base.xsequence = new.new_xsequence
[edit]