How to get an incremental "RowId" column in SELECT using ROW_NUMBER() - sql

I've been trying to update a query on the DataExplorer that we use on our Gaming SE Site for keeping track of tags without excerpts to include an incremental row number in the results to make reading the returned values easier. There are a number of questions on here that discuss how to do this, such as this one and this one which appear to have worked for those users, but I can't seem to get it work for my situation.
To be clear, I would like something like this:
RowId | TagName | Count | Easy List Formatting
----------------------------------------------
1 | Tag1 | 6 | 1. [tag:tag1] (6)
2 | Tag2 | 6 | 1. [tag:tag2] (6)
3 | Tag3 | 5 | 1. [tag:tag3] (5)
4 | Tag4 | 5 | 1. [tag:tag4] (5)
What I've come up with so far is this:
SELECT ROW_NUMBER() OVER(PARTITION BY TagInfo.[Count] ORDER BY TagInfo.TagName ASC) AS RowId, *
FROM
(
SELECT
TagName,
[Count],
concat('1. [tag:',concat(TagName,concat('] (', concat([Count],')')))) AS [Easy List Formatting]
FROM Tags
LEFT JOIN Posts pe on pe.Id = Tags.ExcerptPostId
LEFT JOIN TagSynonyms on SourceTagName = Tags.TagName
WHERE coalesce(len(pe.Body),0) = 0 and ApprovalDate is null
) AS TagInfo
ORDER BY TagInfo.[Count] DESC, TagInfo.TagName
This yields something close to what I want, but not quite. The RowId column increments, but once the Count column changes, it resets (presumably because of the PARTITION BY). But, if I remove the PARTITION BY, the RowId column becomes what appear to be random numbers.
Is what I want to do achievable given the way the tables are structured? If so, what should the SQL be?
To access the forked query, you can use this link. The original query (before my changes) can be found here if it helps in anyway.

Removing the PARTITION BY is exactly what is needed. The reason that your numbers look random is that the ORDER BY of the outer query is different from the ORDER BY of your ROW_NUMBER(). All you have to do is make those the same, and the output of the sequence project will have the monotonically increasing value you expect.
Specifically:
SELECT ROW_NUMBER() OVER (ORDER BY TagInfo.[Count] DESC, TagInfo.TagName) AS RowId, *
FROM
(
...
) AS TagInfo
ORDER BY TagInfo.[Count] DESC, TagInfo.TagName
Now you aren't partitioning, and the two ORDER BY clauses match, so you'll get your expected output.
For what it's worth, you technically don't really even care about having an ORDER BY in the ROW_NUMBER(), you just want the same order as the final result set. In that case, you can trick the query engine like so by providing a meaningless ORDER BY clause in the ROW_NUMBER():
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS RowId
Boom, done!

A little way around i used its, i add a column to the original table that has a increment value by 1 for example
ALTER TABLE ur_table ADD id INT IDENTITY(1,1)
GO
after that you do the query with order by column id
select * from ur_table (query) order by id

Related

Identify duplicate fields in a table

I'm trying to identify specific fields that are duplicated in a table in a mariadb-10.4.20 Joomla database. I would like to identify all rows that have a specific field duplicated, then ultimately be able to remove those duplicates, leaving just the one with the highest ID.
This table contains the IDs, titles and aliases for the articles in a joomla website. The script I'm building (in perl) will use this information to print the primary title alias and create redirects for any others.
I was previously using "group by" but it appears there's been a change recently in how it's used, and now it doesn't work properly. I don't understand the new format, and I'm not even sure it was previously working fully.
Here's a basic query that shows there are two of the same articles with different IDs:
MariaDB [mydb]> select id,alias,title from db1_content where title = "article title";
+--------+---------------+--------------+
| id | alias | title |
+--------+---------------+--------------+
| 299959 | unique-title | Unique Title |
| 300026 | unique-title | Unique Title |
+--------+------------------------------+
Here's an attempt at trying to use "group by" but it returns no results.
MariaDB [mydb]> select id,title,count(title) from db1_content group by id,title having count(title) > 1;
Empty set (0.230 sec)
If I run the same query without the id field, then it does return a list of all titles that are duplicated, along with the number of occurrences of each title.
That's not exactly what I want, though. I need it to print the id, alias and title fields so I can reference them in my perl script to subsequently perform another query to ultimately delete the duplicates and create links to be used in RewriteRules.
What am I doing wrong?
Since MariaDB cannot currently delete from a CTE, you could use a derived table to generate row numbers for each title ordered by id descending, JOIN that to your main table and then delete any row which has a row number greater than 1. For example:
DELETE db1 FROM db1_content db1
JOIN (
SELECT id,
ROW_NUMBER() OVER (PARTITION BY title ORDER BY id DESC) AS rn
FROM db1_content
) dbr ON db1.id = dbr.id
WHERE dbr.rn > 1
If you don't want to actually delete the records using SQL, you can just select the ones that need to be deleted by using a CTE:
WITH rns AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY title ORDER BY id DESC) AS rn
FROM db1_content
)
SELECT id, alias, title
FROM rns
WHERE rn > 1
Demo on dbfiddle

sql select record with lowest value of the two

Despite my internet searching, I've not found a solution to what I think is a simple SQL problem.
I have a simple table as such:
zip | location | transit
------------------------
10001 | 1 | 5
10001 | 2 | 2
This table of course has a large number of zip codes, but I'd like to make s simple query by zip code and instead of returning all rows with the zip, return only a single row (with all 3 columns), that contains the lowest transit value.
I've been playing with the aggregate function min(), but haven't gotten it right.
Using Postgres SQL DB 9.6
Thanks!
Use ORDER BY along with LIMIT :
SELECT t.*
FROM mytable t
WHERE t.zipcode = ?
ORDER BY t.transit
LIMIT 1
How about
select * from table where zip = ‘10001’ order by transit limit 1
I would use distinct on:
select distinct on (zip) t.*
from t
order by zip, transit;
This is usually the most efficient method in Postgres, particularly with an index on (zip, transit).
Of course if you have only one zip code that you care about, then where/order by/limit is also totally reasonable.
Assuming that you also want to return the location value associated with the minimum transit value, then here is one possible solution using an inner join:
select t.*
from
yourtable t inner join
(select u.zip, min(u.transit) as mt from yourtable u group by u.zip) v
on t.zip = v.zip and t.transit = v.mt
Change all references to yourtable to the name of your table.

How can I fetch the last N rows, WITHOUT ordering the table

I have tables with multiple million rows and need to fetch the last rows of specific ID's
for example the last row which has device_id = 123 AND the last row which has device_id = 1234
because the tables are so huge and ordering takes so much time, is it possible to select the last 200 without ordering the table and then just order those 200 and fetch the rows I need.
How would I do that?
Thank you in advance for your help!
UPDATE
My PostgreSQL version is 9.2.1
sample data:
time device_id data data ....
"2013-03-23 03:58:00-04" | "001EC60018E36" | 66819.59 | 4.203
"2013-03-23 03:59:00-04" | "001EC60018E37" | 64277.22 | 4.234
"2013-03-23 03:59:00-04" | "001EC60018E23" | 46841.75 | 2.141
"2013-03-23 04:00:00-04" | "001EC60018E21" | 69697.38 | 4.906
"2013-03-23 04:00:00-04" | "001EC600192524"| 69452.69 | 2.844
"2013-03-23 04:01:00-04" | "001EC60018E21" | 69697.47 | 5.156
....
See SQLFiddle of this data
So if device_id = 001EC60018E21
I would want the most recent row with that device_id.
It is a grantee that the last row with that device_id is the row I want, but it may or may not be the last row of the table.
Personally I'd create a composite index on device_id and descending time:
CREATE INDEX table1_deviceid_time ON table1("device_id","time" DESC);
then I'd use a subquery to find the highest time for each device_id and join the subquery results against the main table on device_id and time to find the relevant data, eg:
SELECT t1."device_id", t1."time", t1."data", t1."data1"
FROM Table1 t1
INNER JOIN (
SELECT t1b."device_id", max(t1b."time") FROM Table1 t1b GROUP BY t1b."device_id"
) last_ids("device_id","time")
ON (t1."device_id" = last_ids."device_id"
AND t1."time" = last_ids."time");
See this SQLFiddle.
It might be helpful to maintain a trigger-based materialized view of the highest timestamp for each device ID. However, this will cause concurrency issues if most than one connection can insert data for a given device ID due to the connections fighting for update locks. It's also a pain if you don't know when new device IDs will appear as you have to do an upsert - something that's very inefficient and clumsy. Additionally, the extra write load and autovacuum work created by the summary table may not be worth it; it might be better to just pay the price of the more expensive query.
BTW, time is a terrible name for a column because it's a built-in data type name. Use something more appropriate if you can.
The general way to get the "last" row for each device_id looks like this.
select *
from Table1
inner join (select device_id, max(time) max_time
from Table1
group by device_id) T2
on Table1.device_id = T2.device_id
and Table1.time = T2.max_time;
Getting the "last" 200 device_id numbers without using an ORDER BY isn't really practical, but it's not clear why you might want to do that in the first place. If 200 is an arbitrary number, then you can get better performance by taking a subset of the table that's based on an arbitrary time instead.
select *
from Table1
inner join (select device_id, max(time) max_time
from Table1
where time > '2013-03-23 12:03'
group by device_id) T2
on Table1.device_id = T2.device_id
and Table1.time = T2.max_time;

Getting the last record in SQL in WHERE condition

i have loanTable that contain two field loan_id and status
loan_id status
==============
1 0
2 9
1 6
5 3
4 5
1 4 <-- How do I select this??
4 6
In this Situation i need to show the last Status of loan_id 1 i.e is status 4. Can please help me in this query.
Since the 'last' row for ID 1 is neither the minimum nor the maximum, you are living in a state of mild confusion. Rows in a table have no order. So, you should be providing another column, possibly the date/time when each row is inserted, to provide the sequencing of the data. Another option could be a separate, automatically incremented column which records the sequence in which the rows are inserted. Then the query can be written.
If the extra column is called status_id, then you could write:
SELECT L1.*
FROM LoanTable AS L1
WHERE L1.Status_ID = (SELECT MAX(Status_ID)
FROM LoanTable AS L2
WHERE L2.Loan_ID = 1);
(The table aliases L1 and L2 could be omitted without confusing the DBMS or experienced SQL programmers.)
As it stands, there is no reliable way of knowing which is the last row, so your query is unanswerable.
Does your table happen to have a primary id or a timestamp? If not then what you want is not really possible.
If yes then:
SELECT TOP 1 status
FROM loanTable
WHERE loan_id = 1
ORDER BY primaryId DESC
-- or
-- ORDER BY yourTimestamp DESC
I assume that with "last status" you mean the record that was inserted most recently? AFAIK there is no way to make such a query unless you add timestamp into your table where you store the date and time when the record was added. RDBMS don't keep any internal order of the records.
But if last = last inserted, that's not possible for current schema, until a PK addition:
select top 1 status, loan_id
from loanTable
where loan_id = 1
order by id desc -- PK
Use a data reader. When it exits the while loop it will be on the last row. As the other posters stated unless you put a sort on the query, the row order could change. Even if there is a clustered index on the table it might not return the rows in that order (without a sort on the clustered index).
SqlDataReader rdr = SQLcmd.ExecuteReader();
while (rdr.Read())
{
}
string lastVal = rdr[0].ToString()
rdr.Close();
You could also use a ROW_NUMBER() but that requires a sort and you cannot use ROW_NUMBER() directly in the Where. But you can fool it by creating a derived table. The rdr solution above is faster.
In oracle database this is very simple.
select * from (select * from loanTable order by rownum desc) where rownum=1
Hi if this has not been solved yet.
To get the last record for any field from a table the easiest way would be to add an ID to each record say pID. Also say that in your table you would like to hhet the last record for each 'Name', run the simple query
SELECT Name, MAX(pID) as LastID
INTO [TableName]
FROM [YourTableName]
GROUP BY [Name]/[Any other field you would like your last records to appear by]
You should now have a table containing the Names in one column and the last available ID for that Name.
Now you can use a join to get the other details from your primary table, say this is some price or date then run the following:
SELECT a.*,b.Price/b.date/b.[Whatever other field you want]
FROM [TableName] a LEFT JOIN [YourTableName]
ON a.Name = b.Name and a.LastID = b.pID
This should then give you the last records for each Name, for the first record run the same queries as above just replace the Max by Min above.
This should be easy to follow and should run quicker as well
If you don't have any identifying columns you could use to get the insert order. You can always do it like this. But it's hacky, and not very pretty.
select
t.row1,
t.row2,
ROW_NUMBER() OVER (ORDER BY t.[count]) AS rownum from (
select
tab.row1,
tab.row2,
1 as [count]
from table tab) t
So basically you get the 'natural order' if you can call it that, and add some column with all the same data. This can be used to sort by the 'natural order', giving you an opportunity to place a row number column on the next query.
Personally, if the system you are using hasn't got a time stamp/identity column, and the current users are using the 'natural order', I would quickly add a column and use this query to create some sort of time stamp/incremental key. Rather than risking having some automation mechanism change the 'natural order', breaking the data needed.
I think this code may help you:
WITH cte_Loans
AS
(
SELECT LoanID
,[Status]
,ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS RN
FROM LoanTable
)
SELECT LoanID
,[Status]
FROM LoanTable L1
WHERE RN = ( SELECT max(RN)
FROM LoanTable L2
WHERE L2.LoanID = L1.LoanID)

Get last record of a table in Postgres

I'm using Postgres and cannot manage to get the last record of my table:
my_query = client.query("SELECT timestamp,value,card from my_table");
How can I do that knowning that timestamp is a unique identifier of the record ?
If under "last record" you mean the record which has the latest timestamp value, then try this:
my_query = client.query("
SELECT TIMESTAMP,
value,
card
FROM my_table
ORDER BY TIMESTAMP DESC
LIMIT 1
");
you can use
SELECT timestamp, value, card
FROM my_table
ORDER BY timestamp DESC
LIMIT 1
assuming you want also to sort by timestamp?
Easy way: ORDER BY in conjunction with LIMIT
SELECT timestamp, value, card
FROM my_table
ORDER BY timestamp DESC
LIMIT 1;
However, LIMIT is not standard and as stated by Wikipedia, The SQL standard's core functionality does not explicitly define a default sort order for Nulls.. Finally, only one row is returned when several records share the maximum timestamp.
Relational way:
The typical way of doing this is to check that no row has a higher timestamp than any row we retrieve.
SELECT timestamp, value, card
FROM my_table t1
WHERE NOT EXISTS (
SELECT *
FROM my_table t2
WHERE t2.timestamp > t1.timestamp
);
It is my favorite solution, and the one I tend to use. The drawback is that our intent is not immediately clear when having a glimpse on this query.
Instructive way: MAX
To circumvent this, one can use MAX in the subquery instead of the correlation.
SELECT timestamp, value, card
FROM my_table
WHERE timestamp = (
SELECT MAX(timestamp)
FROM my_table
);
But without an index, two passes on the data will be necessary whereas the previous query can find the solution with only one scan. That said, we should not take performances into consideration when designing queries unless necessary, as we can expect optimizers to improve over time. However this particular kind of query is quite used.
Show off way: Windowing functions
I don't recommend doing this, but maybe you can make a good impression on your boss or something ;-)
SELECT DISTINCT
first_value(timestamp) OVER w,
first_value(value) OVER w,
first_value(card) OVER w
FROM my_table
WINDOW w AS (ORDER BY timestamp DESC);
Actually this has the virtue of showing that a simple query can be expressed in a wide variety of ways (there are several others I can think of), and that picking one or the other form should be done according to several criteria such as:
portability (Relational/Instructive ways)
efficiency (Relational way)
expressiveness (Easy/Instructive way)
If your table has no Id such as integer auto-increment, and no timestamp, you can still get the last row of a table with the following query.
select * from <tablename> offset ((select count(*) from <tablename>)-1)
For example, that could allow you to search through an updated flat file, find/confirm where the previous version ended, and copy the remaining lines to your table.
The last inserted record can be queried using this assuming you have the "id" as the primary key:
SELECT timestamp,value,card FROM my_table WHERE id=(select max(id) from my_table)
Assuming every new row inserted will use the highest integer value for the table's id.
If you accept a tip, create an id in this table like serial. The default of this field will be:
nextval('table_name_field_seq'::regclass).
So, you use a query to call the last register. Using your example:
pg_query($connection, "SELECT currval('table_name_field_seq') AS id;
I hope this tip helps you.
To get the last row,
Get Last row in the sorted order: In case the table has a column specifying time/primary key,
Using LIMIT clause
SELECT * FROM USERS ORDER BY CREATED_TIME DESC LIMIT 1;
Using FETCH clause - Reference
SELECT * FROM USERS ORDER BY CREATED_TIME FETCH FIRST ROW ONLY;
Get Last row in the rows insertion order: In case the table has no columns specifying time/any unique identifiers
Using CTID system column, where ctid represents the physical location of the row in a table - Reference
SELECT * FROM USERS WHERE CTID = (SELECT MAX(CTID) FROM USERS);
Consider the following table,
userid |username | createdtime |
1 | A | 1535012279455 |
2 | B | 1535042279423 | //as per created time, this is the last row
3 | C | 1535012279443 |
4 | D | 1535012212311 |
5 | E | 1535012254634 | //as per insertion order, this is the last row
The query 1 and 2 returns,
userid |username | createdtime |
2 | B | 1535042279423 |
while 3 returns,
userid |username | createdtime |
5 | E | 1535012254634 |
Note : On updating an old row, it removes the old row and updates the data and inserts as a new row in the table. So using the following query returns the tuple on which the data modification is done at the latest.
Now updating a row, using
UPDATE USERS SET USERNAME = 'Z' WHERE USERID='3'
the table becomes as,
userid |username | createdtime |
1 | A | 1535012279455 |
2 | B | 1535042279423 |
4 | D | 1535012212311 |
5 | E | 1535012254634 |
3 | Z | 1535012279443 |
Now the query 3 returns,
userid |username | createdtime |
3 | Z | 1535012279443 |
Use the following
SELECT timestamp, value, card
FROM my_table
ORDER BY timestamp DESC
LIMIT 1
These are all good answers but if you want an aggregate function to do this to grab the last row in the result set generated by an arbitrary query, there's a standard way to do this (taken from the Postgres wiki, but should work in anything conforming reasonably to the SQL standard as of a decade or more ago):
-- Create a function that always returns the last non-NULL item
CREATE OR REPLACE FUNCTION public.last_agg ( anyelement, anyelement )
RETURNS anyelement LANGUAGE SQL IMMUTABLE STRICT AS $$
SELECT $2;
$$;
-- And then wrap an aggregate around it
CREATE AGGREGATE public.LAST (
sfunc = public.last_agg,
basetype = anyelement,
stype = anyelement
);
It's usually preferable to do select ... limit 1 if you have a reasonable ordering, but this is useful if you need to do this within an aggregate and would prefer to avoid a subquery.
See also this question for a case where this is the natural answer.
The column name plays an important role in the descending order:
select <COLUMN_NAME1, COLUMN_NAME2> from >TABLENAME> ORDER BY <COLUMN_NAME THAT MENTIONS TIME> DESC LIMIT 1;
For example: The below-mentioned table(user_details) consists of the column name 'created_at' that has timestamp for the table.
SELECT userid, username FROM user_details ORDER BY created_at DESC LIMIT 1;
In Oracle SQL,
select * from (select row_number() over (order by rowid desc) rn, emp.* from emp) where rn=1;
select * from table_name LIMIT 1;