SQLite - select the newest row with a certain field value - sql

I have an SQLite question which essentially boils down to the following problem.
id | key | data
1 | A | x
2 | A | x
3 | B | x
4 | B | x
5 | A | x
6 | A | x
New data is appended to the end of the table with an auto-incremented id.
Now, I want to create a query which returns the latest row for each key, like this:
id | key | data
4 | B | x
6 | A | x
I've tried some different queries but I have been unsuccessful. How do you select only the latest rows for each "key" value in the table?

use this SQL-Query:
select * from tbl where id in (select max(id) from tbl group by key);

You could split the main task into two subroutine.
You could move with the approach first retrieve all id/key value then get the id for the latest value of A and B keys,
Now you could easly write a query to get latest value for A and B because you have value of id's for both A and B keys.

SELECT *
FROM mytable
JOIN
( SELECT MAX(id) AS maxid
FROM mytable
GROUP BY "key"
) AS grp
ON grp.maxid = mytable.id
Side note: it's best not to use reserved words like keyas identifiers (for tables, fields. etc.)

Without nested SELECTs, or JOINs but only if the field determining "newest" is primary key (e.g. autoincrement):
SELECT * FROM table GROUP BY key DESC;

Related

Select and count array keys in athena

I have many rows of data that represent events in my database. Each row has a column "payload" that contains an array of keys and values. I can easily parse for a value by using
Select
payload.keyname
from Database
But I am trying to get a list and count of all the keys that appear in a given day.
| payload |
|{id=a, gameid=x, gametype=1, sponserid=null} |
|{id=b, gameid=y, gametype=2, action=jump, sponserid=null}|
|{id=c, gameid=z, action=jump, sponserid=null} |
Desired Output
| Key |Count|
|id | 3 |
|game | 3 |
|gametype | 2 |
|action | 2 |
|sponserid| 2 |
Is there some method to query an array for keys easily? Such as
Select
payload.*, count(*)
from Database
group by payload.*
You can use map_keys function to extract keys from payload and unnest on top of it.
select key, count(1) as count
from database.table, unnest(map_keys(payload)) as X(key)
group by 1
You can use cross join unnest. The unnest will "unroll" the map and return a row for each map entry with key, value columns. If you want to count occurrences of each key you can group by key. For example
select key, count(*)
from mydb cross join unnest(payload) A(key, value)
group by 1
see the docs for more info.
----- EDIT ----
If your column is already in row format you can do instead:
select payload.keyname, count(*)
from mydb cross join payload
group by 1

Counting the total number of rows with SELECT DISTINCT ON without using a subquery

I have performing some queries using PostgreSQL SELECT DISTINCT ON syntax. I would like to have the query return the total number of rows alongside with every result row.
Assume I have a table my_table like the following:
CREATE TABLE my_table(
id int,
my_field text,
id_reference bigint
);
I then have a couple of values:
id | my_field | id_reference
----+----------+--------------
1 | a | 1
1 | b | 2
2 | a | 3
2 | c | 4
3 | x | 5
Basically my_table contains some versioned data. The id_reference is a reference to a global version of the database. Every change to the database will increase the global version number and changes will always add new rows to the tables (instead of updating/deleting values) and they will insert the new version number.
My goal is to perform a query that will only retrieve the latest values in the table, alongside with the total number of rows.
For example, in the above case I would like to retrieve the following output:
| total | id | my_field | id_reference |
+-------+----+----------+--------------+
| 3 | 1 | b | 2 |
+-------+----+----------+--------------+
| 3 | 2 | c | 4 |
+-------+----+----------+--------------+
| 3 | 3 | x | 5 |
+-------+----+----------+--------------+
My attemp is the following:
select distinct on (id)
count(*) over () as total,
*
from my_table
order by id, id_reference desc
This returns almost the correct output, except that total is the number of rows in my_table instead of being the number of rows of the resulting query:
total | id | my_field | id_reference
-------+----+----------+--------------
5 | 1 | b | 2
5 | 2 | c | 4
5 | 3 | x | 5
(3 rows)
As you can see it has 5 instead of the expected 3.
I can fix this by using a subquery and count as an aggregate function:
with my_values as (
select distinct on (id)
*
from my_table
order by id, id_reference desc
)
select count(*) over (), * from my_values
Which produces my expected output.
My question: is there a way to avoid using this subquery and have something similar to count(*) over () return the result I want?
You are looking at my_table 3 ways:
to find the latest id_reference for each id
to find my_field for the latest id_reference for each id
to count the distinct number of ids in the table
I therefore prefer this solution:
select
c.id_count as total,
a.id,
a.my_field,
b.max_id_reference
from
my_table a
join
(
select
id,
max(id_reference) as max_id_reference
from
my_table
group by
id
) b
on
a.id = b.id and
a.id_reference = b.max_id_reference
join
(
select
count(distinct id) as id_count
from
my_table
) c
on true;
This is a bit longer (especially the long thin way I write SQL) but it makes it clear what is happening. If you come back to it in a few months time (somebody usually does) then it will take less time to understand what is going on.
The "on true" at the end is a deliberate cartesian product because there can only ever be exactly one result from the subquery "c" and you do want a cartesian product with that.
There is nothing necessarily wrong with subqueries.

SQL: reverse groupby : EDIT

Is there a build in function in sql, to reverse the order in which the groupby works? I try to groupby a certain key but i would like to have the last inserted record returned and not the first inserted record.
Changing the order with orderby does not affect this behaviour.
Thanx in advance!
EDIT:
this is the sample data:
id|value
-----
1 | A
2 | B
3 | B
4 | C
as return i want
1 | A
3 | B
4 | C
not
1 | A
2 | B
4 | C
when using group by id don't get the result i want.
Question here is how are you identifying last inserted row. Based on your example, it looks like based on id. If id is auto generated, or a sequence then you can definitely do this.
select max(id),value
from your_table
group by value
Ideally in a table design, people uses a date column which holds the time a particular record was inserted, so it is easy to order by that.
Use Max() as your aggregate function for your id:
SELECT max(id), value FROM <table> GROUP BY value;
This will return:
1 | A
3 | B
4 | C
As for eloquent, I've not used it but I think it would look like:
$myData = DB::table('yourtable')
->select('value', DB::raw('max(id) as maxid'))
->groupBy('value')
->get();

How do I select rows where only return keys that don't have '1' in column c

Title is confusing I know, I'm just not sure how to word this. Anyway let me describe with a table:
| key | column b | column c |
|-----|----------|----------|
| a | 13 | 2 |
| a | 14 | 2 |
| a | 15 | 1 |
| b | 16 | 2 |
| b | 17 | 2 |
I'd like to select all keys where column c doesn't equal 1, so the select will result in returning only key 'b'
To clarify, my result set should not contain keys that have a row where column c is set to 1. Therefore I'd like a sql query that would return the keys that satisfy the previous statement.
To make my question as clear as possible. From the table above, what I want returned by some sql statement is a result set containing [{b}] based on the fact that key 'a' has at least one row where column c is equal to 1 whereas key 'b' does not have any rows that contain 1 in column c.
SELECT t.[Key]
FROM TableName t
WHERE NOT EXISTS (SELECT 1
FROM TableName
WHERE t.[key] = [key]
AND ColumnC = 1)
GROUP BY t.[Key]
SELECT KEY
FROM WhateverYourTableNameIs
WHERE c <> '1'
I would do this using group by and aggregation:
select [key]
from table t
group by [key]
having sum(case when c = 1 then 1 else 0 end) = 0;
The having clause counts the number of rows that have c = 1. The = 0 says that there are no such rows for a given key.
Elaboration based on other comments:
You asked for ALL keys where column c doesn't equal 1. That is exactly what the query I suggested will give you. The other part of your question so the SELECT will result in returning only key 'b', is ambiguous. The question as asked will give you results from columns A and B. There is nothing in your question to limit the result set. You either need an additional condition to your WHERE clause, or your question is inherently unanswerable.

select rows satisfying some criteria and with maximum value in a certain column

I have a table of metadata for updates to a software package. The table has columns id, name, version. I want to select all rows where the name is one of some given list of names and the version is maximum of all the rows with that name.
For example, given these records:
+----+------+---------+
| id | name | version |
+----+------+---------+
| 1 | foo | 1 |
| 2 | foo | 2 |
| 3 | bar | 4 |
| 4 | bar | 5 |
+----+------+---------+
And a task "give me the highest versions of records "foo" and "bar", I want the result to be:
+----+------+---------+
| id | name | version |
+----+------+---------+
| 2 | foo | 2 |
| 4 | bar | 5 |
+----+------+---------+
What I come up with so far, is using nested queries:
SELECT *
FROM updates
WHERE (
id IN (SELECT id
FROM updates
WHERE name = 'foo'
ORDER BY version DESC
LIMIT 1)
) OR (
id IN (SELECT id
FROM updates
WHERE name = 'bar'
ORDER BY version DESC
LIMIT 1)
);
This works, but feels wrong. If I want to filter on more names, I have to replicate the whole subquery multiple times. Is there a better way to do this?
select distinct on (name) id, name, version
from metadata
where name in ('foo', 'bar')
order by name, version desc
NOT EXISTS is a way to avoid unwanted sub optimal tuples:
SELECT *
FROM updates uu
WHERE uu.zname IN ('foo', 'bar')
AND NOT EXISTS (
SELECT *
FROM updates nx
WHERE nx.zname = uu.zanme
AND nx.version > uu.version
);
Note: I replaced name by zname, since it is more or less a keyword in postgresql.
Update after rereading the Q:
I want to select all rows where the name is one of some given list
of names and the version is maximum of all the rows with that name.
If there can be ties (multiple rows with the maximum version per name), you could use the window function rank() in a subquery. Requires PostgreSQL 8.4+.
SELECT *
FROM (
SELECT *, rank() OVER (PARTITION BY name ORDER BY version DESC) AS rnk
FROM updates
WHERE name IN ('foo', 'bar')
)
WHERE rnk = 1;