How do I select the latest rows for all users? - sql

I have a table similar to the following:
=> \d table
Table "public.table"
Column | Type | Modifiers
-------------+-----------------------------+-------------------------------
id | integer | not null default nextval( ...
user | bigint | not null
timestamp | timestamp without time zone | not null
field1 | double precision |
As you can see, it contains many field1 values over time for all users. Is there a way to efficiently get the latest field1 value for all users in one query (i.e. one row per user)? I'm thinking I might have to use some combination of group by and select first.

Simplest with DISTINCT ON in Postgres:
SELECT DISTINCT ON (id)
id, timestamp, field1
FROM tbl
ORDER BY id, timestamp DESC;
More details:
https://dba.stackexchange.com/questions/49540/how-do-i-efficiently-get-the-most-recent-corresponding-row/49555#49555
Select first row in each GROUP BY group?
Aside: Don't use timestamp as column name. It's a reserved word in SQL and a basic type name in Postgres.

Related

Create a table without knowing its columns in SQL

How can I create a table without knowing in advance how many and what columns it exactly holds?
The idea is that I have a table DATA that has 3 columns : ID, NAME, and VALUE
What I need is a way to get multiple values depending on the value of NAME - I can't do it with simple WHERE or JOIN (because I'll need other values - with other NAME values - later on in my query).
Because of the way this table is constructed I want to PIVOT it in order to transform every distinct value of NAME into a column so it will be easier to get to it in my later search.
What I want now is to somehow save this to a temp table / variable so I can use it later on to join with the result of another query...
So example:
Columns:
CREATE TABLE MainTab
(
id int,
nameMain varchar(max),
notes varchar(max)
);
CREATE TABLE SecondTab
(
id int,
id_mainTab, int,
nameSecond varchar(max),
notes varchar(max)
);
CREATE TABLE DATA
(
id int,
id_second int,
name varchar(max),
value varchar(max)
);
Now some example data from the table DATA:
| id | id_second_int | name | value |
|-------------------------------------------------------|
| 1 | 5550 | number | 111115550 |
| 2 | 6154 | address | 1, First Avenue |
| 3 | 1784 | supervisor | John Smith |
| 4 | 3467 | function | Marketing |
| 5 | 9999 | start_date | 01/01/2000 |
::::
Now imagine that 'name' has A LOT of different values, and in one query I'll need to get a lot of different values depending on the value of 'name'...
That's why I pivot it so that number, address, supervisor, function, start_date, ... become colums.
This I do dynamically because of the amount of possible columns - it would take me a while to write all of them in an 'IN' statement - and I don't want to have to remember to add it manually every time a new 'name' value gets added...
herefore I followed http://sqlhints.com/2014/03/18/dynamic-pivot-in-sql-server/
the thing is know that I want the result of my execute(#query) to get stored in a tempTab / variable. I want to use it later on to join it with mainTab...
It would be nice if I could use #cols (which holds the values of DATA.name) but I can't seem to figure out a way to do this.
ADDITIONALLY:
If I use the not dynamic way (write down all the values manually after 'IN') I still need to create a column called status. Now in this column (so far it's NULL everywhere because that value doesn't exist in my unpivoted table) i want to have 'open' or 'closed', depending on the date (let's say i have start_date and end_date,
CASE end_date
WHEN end_date < GETDATE() THEN pivotTab.status = 'closed'
ELSE pivotTab.status = 'open'
Where can I put this statement? Let's say my main query looks like this:
SELECT * FROM(
(SELECT id_second, name, value, id FROM TABLE_DATA) src
PIVOT (max(value) FOR name IN id, number, address, supervisor, function, start_date, end_date, status) AS pivotTab
JOIN SecondTab ON SecondTab.id = pivotTab.id_second
JOIN MainTab ON MainTab.id = SecondTab.id_mainTab
WHERE pivotTab.status = 'closed';
Well, as far as I can understand - you have some select statement and just need to "dump" its result to some temporary table. In this case you can use select into syntax like:
select .....
into #temp_table
from ....
This will create temporary table according to columns in select statement and populate it with data returned by select datatement.
See MDSN for reference.

Using WITH + DELETE clause in a single query in postgresql

I have the following table structure, for a table named listens with PRIMARYKEY on (uid,timestamp)
Column | Type | Modifiers
----------------+-----------------------------+------------------------------------------------------
id | integer | not null default nextval('listens_id_seq'::regclass)
uid | character varying | not null
date | timestamp without time zone |
timestamp | integer | not null
artist_msid | uuid |
album_msid | uuid |
recording_msid | uuid |
json | character varying |
I need to remove all the entries for a particular user (uid) which are older than the max timestamp, say max is 123456789 (in seconds) and delta is 100000, then, all records older than max-100000.
I have managed to create a query when the table contains a single user but i am unable to formulate it to work for every user in the database. This operation needs to be done for every user in the database.
WITH max_table as (
SELECT max(timestamp) - 10000 as max
FROM listens
GROUP BY uid)
DELETE FROM listens
WHERE timestamp < (SELECT max FROM max_table);
Any solutions?
I think all you need, is to make this a co-related subquery:
WITH max_table as (
SELECT uid, max(timestamp) - 10000 as mx
FROM listens
GROUP BY uid
)
DELETE FROM listens
WHERE timestamp < (SELECT mx
FROM max_table
where max_table.uid = listens.uid);
Btw: timestamp is a horrible name for a column, especially one that doesn't contain a timestamp value. One reason is because it's also a keyword but more importantly it doesn't document what that column contains. A registration timestamp? An expiration timestamp? A last active timestamp?
Alternatively, you could avoid the MAX() by using an EXISTS()
DELETE FROM listens d
WHERE EXISTS (
SELECT * FROM listens x
WHERE x.uid = d.uid
AND x.timestamp >= d.timestamp + 10000
);
BTW: timestamp is an ugly name for a column, since it is also a typename.

Access SQL unique records with latest date including null dates from single table

I have a table with the following sample structure:
Identifier| Latitude | Longitude |...many columns...|DateWhenStatusObserved|ID|
----------+----------+---------- +------------------+----------------------+--+
2823DC012 | 28.76285 | 23.70195 | ... | 1994/10/28| 1|
2823DC012 | 28.76285 | 23.70195 | ... | 1995/04/05| 2|
2822DD030 | 28.76147 | 22.98270 | ... | NULL| 3|
...
There are many more columns, but these columns do not have to be evaluated, all columns should just be returned from the query.
I would like the SQL query to return only unique records for the Identifier column with the latest date per unique Identifier. Unfortunately there are also records were date is NULL in the DateWhenStatusObserved column and in many instances the only record for an Identifier (geosite) has a NULL date.
There are already many answers for similar SQL questions such as:
How can I include null values in a MIN or MAX?
SELECT only rows with either the MAX date or NULL
http://bytes.com/topic/access/answers/719627-create-query-evaluate-max-date-recognizing-null-high-value
These are however not specific on how exactly does one use the iif statement with an aggregate Max function to allow the NULL date records to pass through while maintaining unique identifier (geosite) records.
I only get non-NULL max date records returned using a subquery and combination of Max(IIF()). I finally got a reasonable result from a basic subquery without Joins and relied on WHERE clauses, but I get duplicate identifier records from NULL dates, because I have to use OR instead of AND to get any rows returned.
Here is one of my attempts returning only non-NULL max date records:
SELECT BasicInfoTable.*
FROM Basic_information_WUA AS BasicInfoTable
INNER JOIN
(
SELECT Identifier, MAX (IIF(DateWhenStatusObserved IS NULL, 0, DateWhenStatusObserved)) AS MaxDate
FROM Basic_information_WUA
GROUP BY Identifier
)
AS Table2 ON BasicInfoTable.Identifier = Table2.Identifier AND BasicInfoTable.DateWhenStatusObserved = Table2.MaxDate;
So why is this not working for the NULL date cases?
I would appreciate any help with finding the near-optimal query for this problem.
Thanks
You need to provide similar (is NULL) logic to BasicInfoTable.DateWhenStatusObserved = Table2.MaxDate. Nulls cannot be "compared".

SQL table column values to select query list

So I have a table that has EMAIL and Order ID.
EMAIL | id
--------------
Y#a.com | 1
Y#a.com | 2
X#a.com | 3
And I need to SELECT it so that I'd have email column that is distinct and ids column that is array of int's
EMAIL | ids
--------------
Y#a.com | [1,2]
X#a.com | [3]
I use PSQL 9.3. I looked at aggregate functions, but since I'm not too good with SQL atm, I didn't really understand them. My code so far is:
SELECT DISTINCT ON (email) email
FROM order
WHERE date > '2013-01-01';
Use the aggregate function array_agg():
SELECT email, array_agg(id) AS id_array
FROM "order"
WHERE date > '2013-01-01'
GROUP BY email;
Aside: your identifiers ...
Don't use order as table name, it's a reserved word.
Don't use date as column name, it's a reserved word in standard SQL and a basic type name in Postgres.
I wouldn't use id as column name either, that's a common anti-pattern, but "id" is not a descriptive name. Once you join a couple of tables you have n columns named "id" and you need to start dealing out column aliases, not to speak of the confusion it may cause.
Instead, use something like this:
CREATE TABLE order_data (
order_data_id serial PRIMARY KEY
, email text
, order_date date
);

Get last record of a table in Postgres

I'm using Postgres and cannot manage to get the last record of my table:
my_query = client.query("SELECT timestamp,value,card from my_table");
How can I do that knowning that timestamp is a unique identifier of the record ?
If under "last record" you mean the record which has the latest timestamp value, then try this:
my_query = client.query("
SELECT TIMESTAMP,
value,
card
FROM my_table
ORDER BY TIMESTAMP DESC
LIMIT 1
");
you can use
SELECT timestamp, value, card
FROM my_table
ORDER BY timestamp DESC
LIMIT 1
assuming you want also to sort by timestamp?
Easy way: ORDER BY in conjunction with LIMIT
SELECT timestamp, value, card
FROM my_table
ORDER BY timestamp DESC
LIMIT 1;
However, LIMIT is not standard and as stated by Wikipedia, The SQL standard's core functionality does not explicitly define a default sort order for Nulls.. Finally, only one row is returned when several records share the maximum timestamp.
Relational way:
The typical way of doing this is to check that no row has a higher timestamp than any row we retrieve.
SELECT timestamp, value, card
FROM my_table t1
WHERE NOT EXISTS (
SELECT *
FROM my_table t2
WHERE t2.timestamp > t1.timestamp
);
It is my favorite solution, and the one I tend to use. The drawback is that our intent is not immediately clear when having a glimpse on this query.
Instructive way: MAX
To circumvent this, one can use MAX in the subquery instead of the correlation.
SELECT timestamp, value, card
FROM my_table
WHERE timestamp = (
SELECT MAX(timestamp)
FROM my_table
);
But without an index, two passes on the data will be necessary whereas the previous query can find the solution with only one scan. That said, we should not take performances into consideration when designing queries unless necessary, as we can expect optimizers to improve over time. However this particular kind of query is quite used.
Show off way: Windowing functions
I don't recommend doing this, but maybe you can make a good impression on your boss or something ;-)
SELECT DISTINCT
first_value(timestamp) OVER w,
first_value(value) OVER w,
first_value(card) OVER w
FROM my_table
WINDOW w AS (ORDER BY timestamp DESC);
Actually this has the virtue of showing that a simple query can be expressed in a wide variety of ways (there are several others I can think of), and that picking one or the other form should be done according to several criteria such as:
portability (Relational/Instructive ways)
efficiency (Relational way)
expressiveness (Easy/Instructive way)
If your table has no Id such as integer auto-increment, and no timestamp, you can still get the last row of a table with the following query.
select * from <tablename> offset ((select count(*) from <tablename>)-1)
For example, that could allow you to search through an updated flat file, find/confirm where the previous version ended, and copy the remaining lines to your table.
The last inserted record can be queried using this assuming you have the "id" as the primary key:
SELECT timestamp,value,card FROM my_table WHERE id=(select max(id) from my_table)
Assuming every new row inserted will use the highest integer value for the table's id.
If you accept a tip, create an id in this table like serial. The default of this field will be:
nextval('table_name_field_seq'::regclass).
So, you use a query to call the last register. Using your example:
pg_query($connection, "SELECT currval('table_name_field_seq') AS id;
I hope this tip helps you.
To get the last row,
Get Last row in the sorted order: In case the table has a column specifying time/primary key,
Using LIMIT clause
SELECT * FROM USERS ORDER BY CREATED_TIME DESC LIMIT 1;
Using FETCH clause - Reference
SELECT * FROM USERS ORDER BY CREATED_TIME FETCH FIRST ROW ONLY;
Get Last row in the rows insertion order: In case the table has no columns specifying time/any unique identifiers
Using CTID system column, where ctid represents the physical location of the row in a table - Reference
SELECT * FROM USERS WHERE CTID = (SELECT MAX(CTID) FROM USERS);
Consider the following table,
userid |username | createdtime |
1 | A | 1535012279455 |
2 | B | 1535042279423 | //as per created time, this is the last row
3 | C | 1535012279443 |
4 | D | 1535012212311 |
5 | E | 1535012254634 | //as per insertion order, this is the last row
The query 1 and 2 returns,
userid |username | createdtime |
2 | B | 1535042279423 |
while 3 returns,
userid |username | createdtime |
5 | E | 1535012254634 |
Note : On updating an old row, it removes the old row and updates the data and inserts as a new row in the table. So using the following query returns the tuple on which the data modification is done at the latest.
Now updating a row, using
UPDATE USERS SET USERNAME = 'Z' WHERE USERID='3'
the table becomes as,
userid |username | createdtime |
1 | A | 1535012279455 |
2 | B | 1535042279423 |
4 | D | 1535012212311 |
5 | E | 1535012254634 |
3 | Z | 1535012279443 |
Now the query 3 returns,
userid |username | createdtime |
3 | Z | 1535012279443 |
Use the following
SELECT timestamp, value, card
FROM my_table
ORDER BY timestamp DESC
LIMIT 1
These are all good answers but if you want an aggregate function to do this to grab the last row in the result set generated by an arbitrary query, there's a standard way to do this (taken from the Postgres wiki, but should work in anything conforming reasonably to the SQL standard as of a decade or more ago):
-- Create a function that always returns the last non-NULL item
CREATE OR REPLACE FUNCTION public.last_agg ( anyelement, anyelement )
RETURNS anyelement LANGUAGE SQL IMMUTABLE STRICT AS $$
SELECT $2;
$$;
-- And then wrap an aggregate around it
CREATE AGGREGATE public.LAST (
sfunc = public.last_agg,
basetype = anyelement,
stype = anyelement
);
It's usually preferable to do select ... limit 1 if you have a reasonable ordering, but this is useful if you need to do this within an aggregate and would prefer to avoid a subquery.
See also this question for a case where this is the natural answer.
The column name plays an important role in the descending order:
select <COLUMN_NAME1, COLUMN_NAME2> from >TABLENAME> ORDER BY <COLUMN_NAME THAT MENTIONS TIME> DESC LIMIT 1;
For example: The below-mentioned table(user_details) consists of the column name 'created_at' that has timestamp for the table.
SELECT userid, username FROM user_details ORDER BY created_at DESC LIMIT 1;
In Oracle SQL,
select * from (select row_number() over (order by rowid desc) rn, emp.* from emp) where rn=1;
select * from table_name LIMIT 1;