How to return record count in PostgreSQL - sql

I have a query with a limit and an offset. For example:
select * from tbl
limit 10 offset 100;
How to keep track of the count of the records, without running a second query like:
select count(*) from tbl;
I think this answers my question, but I need it for PostgreSQL. Any ideas?

I have found a solution and I want to share it. What I do is - I create a temp table from my real table with the filters applied, then I select from the temp table with a limit and offset (no limitations, so the performance is good), then select count(*) from the temp table (again no filters), then the other stuff I need and last - I drop the temp table.
select * into tmp_tbl from tbl where [limitations];
select * from tmp_tbl offset 10 limit 10;
select count(*) from tmp_tbl;
select other_stuff from tmp_tbl;
drop table tmp_tbl;

I haven't tried this, but from the section titled Obtaining the Result Status in the documentation you can use the GET DIAGNOSTICS command to determine the effect of a command.
GET DIAGNOSTICS number_of_rows = ROW_COUNT;
From the documentation:
This command allows retrieval of system status indicators. Each item
is a key word identifying a state value to be assigned to the
specified variable (which should be of the right data type to receive
it). The currently available status items are ROW_COUNT, the number of
rows processed by the last SQL command sent down to the SQL engine,
and RESULT_OID, the OID of the last row inserted by the most recent
SQL command. Note that RESULT_OID is only useful after an INSERT
command into a table containing OIDs.

Depends if you need it from the psql CLI or if you're accessing the database from something like an HTTP server. I am using postgres from my Node server with node-postgres. The result set is returned as an array called 'rows' on the result object so I can just do
console.log(results.rows.length)
To get the row count.

Related

Explicitly set ROWNUM in column

I'm trying to split what was a large table update into multiple inserts into working tables. One of the queries needs uses the row number in it. On an INSERT in oracle, can I explicitly add the ROWNUM as an explicit column? This is a working table ultimately used in a reporting operation with a nasty partion over clause and having a true row number is helpful.
create table MY_TABLE(KEY number,SOMEVAL varchar2(30),EXPLICIT_ROW_NUMBER NUMBER);
INSERT /*+PARALLEL(AUTO) */ INTO MY_TABLE(KEY,SOMEVAL,EXPLICIT_ROW_NUMBER) (
SELECT /*+PARALLEL(AUTO) */ KEY,SOMEVAL,ROWNUM
FROM PREVIOUS_VERSION_OF_MY_TABLE
);
where PREVIOUS_VERSION_OF_MY_TABLE has both a KEY and SOMEVAL fields.
I'd like it to number the rows in the order that the inner select statement does it. So, the first row in the select, had it been explicitly run, would have a ROWNUM of 1, etc. I don't want it reversed, etc.
The table above has over 80MM records. Originally I used an UPDATE, and when I ran it, I got some ORA error saying that I ran out of UNDO space. I do not have the exact error message at this point anymore.
I'm trying to accomplish the same thing with multiple working tables that I would have done with one or more updates. Apparently it is either hard, impossible, etc to add UNDO space, for this query (our company DB team says), without making me a DBA, or spending about $100 on a hard drive and attaching it to the instance. So I need to write a harder query to get around this limitation. The goal is to have a session id and timestamps within that session, but for each timestamp within a session (except the last timestamp), show the next session. The original query is included below:
update sc_hub_session_activity schat
set session_time_stamp_rank = (
select /*+parallel(AUTO) */ order_number
from (
select /*+parallel(AUTO) */ schat_all.explicit_row_number as explicit_row_number,row_number() over (partition by schat_all.session_key order by schat_all.session_key,schat_all.time_stamp) as order_number
from sc_hub_session_activity schat_all
where schat_all.session_key=schat.session_key
) schat_all_group
where schat.explicit_row_number = schat_all_group.explicit_row_number
);
commit;
update sc_hub_session_activity schat
set session_next_time_stamp = (
select /*+parallel(AUTO) */ time_stamp
from sc_hub_session_activity schat2
where (schat2.session_time_stamp_rank = schat.session_time_stamp_rank+1) and (schat2.session_key = schat.session_key)
);
commit;

Get Id from a conditional INSERT

For a table like this one:
CREATE TABLE Users(
id SERIAL PRIMARY KEY,
name TEXT UNIQUE
);
What would be the correct one-query insert for the following operation:
Given a user name, insert a new record and return the new id. But if the name already exists, just return the id.
I am aware of the new syntax within PostgreSQL 9.5 for ON CONFLICT(column) DO UPDATE/NOTHING, but I can't figure out how, if at all, it can help, given that I need the id to be returned.
It seems that RETURNING id and ON CONFLICT do not belong together.
The UPSERT implementation is hugely complex to be safe against concurrent write access. Take a look at this Postgres Wiki that served as log during initial development. The Postgres hackers decided not to include "excluded" rows in the RETURNING clause for the first release in Postgres 9.5. They might build something in for the next release.
This is the crucial statement in the manual to explain your situation:
The syntax of the RETURNING list is identical to that of the output
list of SELECT. Only rows that were successfully inserted or updated
will be returned. For example, if a row was locked but not updated
because an ON CONFLICT DO UPDATE ... WHERE clause condition was not
satisfied, the row will not be returned.
Bold emphasis mine.
For a single row to insert:
Without concurrent write load on the same table
WITH ins AS (
INSERT INTO users(name)
VALUES ('new_usr_name') -- input value
ON CONFLICT(name) DO NOTHING
RETURNING users.id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM users -- 2nd SELECT never executed if INSERT successful
WHERE name = 'new_usr_name' -- input value a 2nd time
LIMIT 1;
With possible concurrent write load on the table
Consider this instead (for single row INSERT):
Is SELECT or INSERT in a function prone to race conditions?
To insert a set of rows:
How to use RETURNING with ON CONFLICT in PostgreSQL?
How to include excluded rows in RETURNING from INSERT ... ON CONFLICT
All three with very detailed explanation.
For a single row insert and no update:
with i as (
insert into users (name)
select 'the name'
where not exists (
select 1
from users
where name = 'the name'
)
returning id
)
select id
from users
where name = 'the name'
union all
select id from i
The manual about the primary and the with subqueries parts:
The primary query and the WITH queries are all (notionally) executed at the same time
Although that sounds to me "same snapshot" I'm not sure since I don't know what notionally means in that context.
But there is also:
The sub-statements in WITH are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements in WITH, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot
If I understand correctly that same snapshot bit prevents a race condition. But again I'm not sure if by all the statements it refers only to the statements in the with subqueries excluding the main query. To avoid any doubt move the select in the previous query to a with subquery:
with s as (
select id
from users
where name = 'the name'
), i as (
insert into users (name)
select 'the name'
where not exists (select 1 from s)
returning id
)
select id from s
union all
select id from i

How to insert generated id into a results table

I have the following query
SELECT q.pol_id
FROM quot q
,fgn_clm_hist fch
WHERE q.quot_id = fch.quot_id
UNION
SELECT q.pol_id
FROM tdb2wccu.quot q
WHERE q.nr_prr_ls_yr_cov IS NOT NULL
For every row in that result set, I want to create a new row in another table (call it table1) and update pol_id in the quot table (from the above result set) with the generated primary key from the inserted row in table1.
table1 has two columns. id and timestamp.
I'm using db2 10.1.
I've tried numerous things and have been unsuccessful for quite a while. Thanks!
Simple solution: create a new table for the result set of your query, which has an identity column in it. Then, after running your query, update the pol_id field with the newly generated ID in your result table.
Alteratively, you can do it more manually by using the the ROW_NUMBER() OLAP function, which I often found convenient for creating IDs. For this it is convenient to use a stored procedure which does the following:
get the maximum old id from Table1 and write it into a variable old_max_id.
after generating the result set, write the row-numbers into the table1, maybe by something like
INSERT INTO TABLE1
SELECT ROW_NUMBER() OVER (PARTITION BY <primary-key> ORDER BY <whatever-you-want>)
+ OLD_MAX_ID
, CURRENT TIMESTAMP
FROM (<here comes your SQL query>)
Either write the result set into a table or return a cursor to it. Here you should either use the same ROW_NUMBER statement as above or directly use the ID from Table1.

Equivalent of Oracle's RowID in SQL Server

What's the equivalent of Oracle's RowID in SQL Server?
From the Oracle docs
ROWID Pseudocolumn
For each row in the database, the ROWID pseudocolumn returns the
address of the row. Oracle Database rowid values contain information
necessary to locate a row:
The data object number of the object
The data block in the datafile in which the row resides
The position of the row in the data block (first row is 0)
The datafile in which the row resides (first file is 1). The file
number is relative to the tablespace.
The closest equivalent to this in SQL Server is the rid which has three components File:Page:Slot.
In SQL Server 2008 it is possible to use the undocumented and unsupported %%physloc%% virtual column to see this. This returns a binary(8) value with the Page ID in the first four bytes, then 2 bytes for File ID, followed by 2 bytes for the slot location on the page.
The scalar function sys.fn_PhysLocFormatter or the sys.fn_PhysLocCracker TVF can be used to convert this into a more readable form
CREATE TABLE T(X INT);
INSERT INTO T VALUES(1),(2)
SELECT %%physloc%% AS [%%physloc%%],
sys.fn_PhysLocFormatter(%%physloc%%) AS [File:Page:Slot]
FROM T
Example Output
+--------------------+----------------+
| %%physloc%% | File:Page:Slot |
+--------------------+----------------+
| 0x2926020001000000 | (1:140841:0) |
| 0x2926020001000100 | (1:140841:1) |
+--------------------+----------------+
Note that this is not leveraged by the query processor. Whilst it is possible to use this in a WHERE clause
SELECT *
FROM T
WHERE %%physloc%% = 0x2926020001000100
SQL Server will not directly seek to the specified row. Instead it will do a full table scan, evaluate %%physloc%% for each row and return the one that matches (if any do).
To reverse the process carried out by the 2 previously mentioned functions and get the binary(8) value corresponding to known File,Page,Slot values the below can be used.
DECLARE #FileId int = 1,
#PageId int = 338,
#Slot int = 3
SELECT CAST(REVERSE(CAST(#PageId AS BINARY(4))) AS BINARY(4)) +
CAST(REVERSE(CAST(#FileId AS BINARY(2))) AS BINARY(2)) +
CAST(REVERSE(CAST(#Slot AS BINARY(2))) AS BINARY(2))
I have to dedupe a very big table with many columns and speed is important. Thus I use this method which works for any table:
delete T from
(select Row_Number() Over(Partition By BINARY_CHECKSUM(*) order by %%physloc%% ) As RowNumber, * From MyTable) T
Where T.RowNumber > 1
If you want to uniquely identify a row within the table rather than your result set, then you need to look at using something like an IDENTITY column. See "IDENTITY property" in the SQL Server help. SQL Server does not auto-generate an ID for each row in the table as Oracle does, so you have to go to the trouble of creating your own ID column and explicitly fetch it in your query.
EDIT: for dynamic numbering of result set rows see below, but that would probably an equivalent for Oracle's ROWNUM and I assume from all the comments on the page that you want the stuff above.
For SQL Server 2005 and later you can use the new Ranking Functions function to achieve dynamic numbering of rows.
For example I do this on a query of mine:
select row_number() over (order by rn_execution_date asc) as 'Row Number', rn_execution_date as 'Execution Date', count(*) as 'Count'
from td.run
where rn_execution_date >= '2009-05-19'
group by rn_execution_date
order by rn_execution_date asc
Will give you:
Row Number Execution Date Count
---------- ----------------- -----
1 2009-05-19 00:00:00.000 280
2 2009-05-20 00:00:00.000 269
3 2009-05-21 00:00:00.000 279
There's also an article on support.microsoft.com on dynamically numbering rows.
Check out the new ROW_NUMBER function. It works like this:
SELECT ROW_NUMBER() OVER (ORDER BY EMPID ASC) AS ROWID, * FROM EMPLOYEE
Several of the answers above will work around the lack of a direct reference to a specific row, but will not work if changes occur to the other rows in a table. That is my criteria for which answers fall technically short.
A common use of Oracle's ROWID is to provide a (somewhat) stable method of selecting rows and later returning to the row to process it (e.g., to UPDATE it). The method of finding a row (complex joins, full-text searching, or browsing row-by-row and applying procedural tests against the data) may not be easily or safely re-used to qualify the UPDATE statement.
The SQL Server RID seems to provide the same functionality, but does not provide the same performance. That is the only issue I see, and unfortunately the purpose of retaining a ROWID is to avoid repeating an expensive operation to find the row in, say, a very large table. Nonetheless, performance for many cases is acceptable. If Microsoft adjusts the optimizer in a future release, the performance issue could be addressed.
It is also possible to simply use FOR UPDATE and keep the CURSOR open in a procedural program. However, this could prove expensive in large or complex batch processing.
Caveat: Even Oracle's ROWID would not be stable if the DBA, between the SELECT and the UPDATE, for example, were to rebuild the database, because it is the physical row identifier. So the ROWID device should only be used within a well-scoped task.
If you want to permanently number the rows in the table, Please don't use the RID solution for SQL Server. It will perform worse than Access on an old 386. For SQL Server simply create an IDENTITY column, and use that column as a clustered primary key. This will place a permanent, fast Integer B-Tree on the table, and more importantly every non-clustered index will use it to locate rows. If you try to develop in SQL Server as if it's Oracle you'll create a poorly performing database. You need to optimize for the engine, not pretend it's a different engine.
also, please don't use the NewID() to populate the Primary Key with GUIDs, you'll kill insert performance. If you must use GUIDs use NewSequentialID() as the column default. But INT will still be faster.
If on the other hand, you simply want to number the rows that result from a query, use the RowNumber Over() function as one of the query columns.
if you just want basic row numbering for a small dataset, how about someting like this?
SELECT row_number() OVER (order by getdate()) as ROWID, * FROM Employees
From http://vyaskn.tripod.com/programming_faq.htm#q17:
Oracle has a rownum to access rows of a table using row number or row id. Is there any equivalent for that in SQL Server? Or how to generate
output with row number in SQL Server?
There is no direct equivalent to Oracle's rownum or row id in SQL
Server. Strictly speaking, in a relational database, rows within a
table are not ordered and a row id won't really make sense. But if you
need that functionality, consider the following three alternatives:
Add an IDENTITY column to your table.
Use the following query to generate a row number for each row. The following query generates a row number for each row in the authors
table of pubs database. For this query to work, the table must have a
unique key.
SELECT (SELECT COUNT(i.au_id)
FROM pubs..authors i
WHERE i.au_id >= o.au_id ) AS RowID,
au_fname + ' ' + au_lname AS 'Author name'
FROM pubs..authors o
ORDER BY RowID
Use a temporary table approach, to store the entire resultset into a temporary table, along with a row id generated by the IDENTITY()
function. Creating a temporary table will be costly, especially when
you are working with large tables. Go for this approach, if you don't
have a unique key in your table.
ROWID is a hidden column on Oracle tables, so, for SQL Server, build your own. Add a column called ROWID with a default value of NEWID().
How to do that: Add column, with default value, to existing table in SQL Server
Please see http://msdn.microsoft.com/en-us/library/aa260631(v=SQL.80).aspx
In SQL server a timestamp is not the same as a DateTime column. This is used to uniquely identify a row in a database, not just a table but the entire database.
This can be used for optimistic concurrency. for example
UPDATE [Job] SET [Name]=#Name, [XCustomData]=#XCustomData WHERE ([ModifiedTimeStamp]=#Original_ModifiedTimeStamp AND [GUID]=#Original_GUID
the ModifiedTimeStamp ensures that you are updating the original data and will fail if another update has occurred to the row.
I took this example from MS SQL example and you can see the #ID can be interchanged with integer or varchar or whatever. This was the same solution I was looking for, so I am sharing it. Enjoy!!
-- UPDATE statement with CTE references that are correctly matched.
DECLARE #x TABLE (ID int, Stad int, Value int, ison bit);
INSERT #x VALUES (1, 0, 10, 0), (2, 1, 20, 0), (6, 0, 40, 0), (4, 1, 50, 0), (5, 3, 60, 0), (9, 6, 20, 0), (7, 5, 10, 0), (8, 8, 220, 0);
DECLARE #Error int;
DECLARE #id int;
WITH cte AS (SELECT top 1 * FROM #x WHERE Stad=6)
UPDATE x -- cte is referenced by the alias.
SET ison=1, #id=x.ID
FROM cte AS x
SELECT *, #id as 'random' from #x
GO
You can get the ROWID by using the methods given below :
1.Create a new table with auto increment field in it
2.Use Row_Number analytical function to get the sequence based on your requirement.I would prefer this because it helps in situations where you are you want the row_id on ascending or descending manner of a specific field or combination of fields
Sample:Row_Number() Over(Partition by Deptno order by sal desc)
Above sample will give you the sequence number based on highest salary of each department.Partition by is optional and you can remove it according to your requirements
Please try
select NEWID()
Source: https://learn.microsoft.com/en-us/sql/t-sql/data-types/uniqueidentifier-transact-sql

MySQL -- mark all but 1 matching row

This is similar to this question, but it seems like some of the answers there aren't quite compatible with MySQL (or I'm not doing it right), and I'm having a heck of a time figuring out the changes I need. Apparently my SQL is rustier than I thought it was. I'm also looking to change a column value rather than delete, but I think at least that part is simple...
I have a table like:
rowid SERIAL
fingerprint TEXT
duplicate BOOLEAN
contents TEXT
created_date DATETIME
I want to set duplicate=true for all but the first (by created_date) of each group by fingerprint. It's easy to mark all of the rows with duplicate fingerprints as dupes. The part I'm getting stuck on is keeping the first.
One of the apps that populates the table does bulk loads of data, with multiple workers loading data from different sources, and the workers' data isn't necessarily partitioned by date, so it's a pain to try to mark these all as they come in (the first one inserted isn't necessarily the first one by date). Also, I already have a bunch of data in there I'll need to clean up either way. So I'd rather just have a relatively efficient query I can run after a bulk load to clean up than try to build it into that app.
Thanks!
MySQL needs to be explicitly told if the data you are grouping by is larger than 1024 bytes (see this link for details). So if your data in the fingerprint column is larger than 1024 bytes you should use set the max_sort_length variable (see this link for details about values allowed, and this link about how to set it) to a larger number so that the group by wont silently use only part of your data for grouping.
Once you're certain that MySQL will group your data properly, the following query will set the duplicate flag so that the first fingerprint record has duplicate set to FALSE/0 and any subsequent fingerprint records have duplicate set to TRUE/1:
UPDATE mytable m1
INNER JOIN (SELECT fingerprint
, MIN(rowid) AS minrow
FROM mytable m2
GROUP BY fingerprint) m3
ON m1.fingerprint = m3.fingerprint
SET m1.duplicate = m3.minrow != m1.rowid;
Please keep in mind that this solution does not take NULLs into account and if it is possible for the fingerprint field to be NULL then you would need additional logic to handle that case.
How about a two-step approach, assuming you can go offline during a data load:
Mark every item as duplicate.
Select the earliest row from each group, and clear the duplicate flag.
Not elegant, but gets the job done.
Here's a funny way to do it:
SET #rowid := 0;
UPDATE mytable
SET duplicate = (rowid = #rowid),
rowid = (#rowid:=rowid)
ORDER BY rowid, created_date;
First set a user variable to zero, assuming this is less than any rowid in your table.
Then use the MySQL UPDATE...ORDER BY feature to ensure that the rows are updated in order by rowid, then by created_date.
For each row, if the current rowid is not equal to the user variable #rowid, set duplicate to 0 (false). This will be true only on the first row encountered with a given value for rowid.
Then add a dummy set of rowid to its own value, setting #rowid to that value as a side effect.
As you UPDATE the next row, if it's a duplicate of the previous row, rowid will be equal to the user variable #rowid, and therefore duplicate will be set to 1 (true).
Edit: Now I have tested this, and I corrected a mistake in the line that sets duplicate.
Here's another way to do it, using MySQL's multi-table UPDATE syntax:
UPDATE mytable m1
JOIN mytable m2 ON (m1.rowid = m2.rowid AND m1.created_date < m2.created_date)
SET m2.duplicate = 1;
I don't know the MySQL syntax, but in PLSQL you just do:
UPDATE t1
SET duplicate = 1
FROM MyTable t1
WHERE rowid != (
SELECT TOP 1 rowid FROM MyTable t2
WHERE t2.fingerprint = t1.fingerprint ORDER BY created_date DESC
)
That may have some syntax errors, as I'm just typing off the cuff/not able to test it, but that's the gist of it.
MySQL version (not tested):
UPDATE t1
SET duplicate = 1
FROM MyTable t1
WHERE rowid != (
SELECT rowid FROM MyTable t2
WHERE t2.fingerprint = t1.fingerprint
ORDER BY created_date DESC
LIMIT 1
)
Untested...
UPDATE TheAnonymousTable
SET duplicate = TRUE
WHERE rowid NOT IN
(SELECT rowid
FROM (SELECT MIN(created_date) AS created_date, fingerprint
FROM TheAnonymousTable
GROUP BY fingerprint
) AS M,
TheAnonymousTable AS T
WHERE M.created_date = T.created_date
AND M.fingerprint = T.fingerprint
);
The logic is that the innermost query returns the earliest created_date for each distinct fingerprint as table alias M. The middle query determines the rowid value for each of those rows; it is a nuisance to have to do this (but necessary), and the code assumes that you won't get two records for the same fingerprint and timestamp. This gives you the rowid for the earlist record for each separate fingerprint. Then the outer query (the UPDATE) sets the 'duplicate' flag on all those rows where the rowid is not one of the earliest rows.
Some DBMS may be unhappy about doing (nested) sub-queries on the table being updated.