sql: filter on created column - sql

Let's say I have a very simple sql table:
id | step
----+------
1 | 1
2 | 2
3 | 3
I'm trying to create a new column based on a simple operation and filter on that new column. The newly created table before the filter should look like this:
id | step | new
----+------+-----
1 | 1 | 10
2 | 2 | 20
3 | 3 | 30
I thought the following query should work:
select id, step, step*10 as new from event where new = 20
But I'm getting the following error:
ERROR: column "new" does not exist

where is processed before select which is why calculated column new is unknown at that time. Use step*10=20 instead.
Or use a sub-query and filter with the alias.
select *
from (select id, step, jsonb_array_elements(payload::jsonb->'sample_id') as new
from event
) e
where new = --somevalue

If you don't like subqueries, you can use a lateral join:
with event as (
select 1 as id, 2 as step
)
select e.id, e.step, v.new
from event e cross join lateral
(values (e.step * 10)) v(new)
where v.new = 20;
This is convenient when you have multiple expressions that depend on each other. Rather than nesting subqueries or CTEs, you can just add them to the FROM clause.

Related

Select concatenated columns based on criteria list in other table

I have a table1
line
a
b
c
d
e
f
g
h
1
18
2
2
22
0
2
1
2
2
20
2
2
2
0
0
0
2
3
10
2
2
222
0
2
1
2
4
12
2
2
3
0
0
0
0
5
15
2
2
3
0
0
0
0
And a table2
 line
criteria
1
 a,b
2
 b,c,f,h
3
 a,b,e,g,h
4
 c,e
I am using this code to see/select the unique results of concated/joined columns, like concat(c,',',d), concat(b,',',d,',',g) and so on from table1 and is working perfectly:
SELECT DISTINCT(CONCAT(c,',',d))
FROM table1
But, instead of writing manually like concat(c,',',d), I want to refer to table2.criteria to get columns references to be concated/joined from table1 so that i can see the entire unique results against each concated criteria
Tried this, but getting an error:
SELECT DISTINCT(SELECT criteria FROM table2)
FROM table1
ERROR: more than one row returned by a subquery used as an expression
SQL state: 21000
The expected unique result is something like this;
| criteria | result |
| ------------ | ---------- |
| a,b | 15,2 |
| a,b | 10,2 |
| a,b | 20,2 |
| a,b | 12,2 |
| a,b | 18,2 |
| b,c,f,h | 2,2,2,2 |
| b,c,f,h | 2,2,0,2 |
| b,c,f,h | 2,2,0,0 |
| a,b,e,g,h | 20,2,0,0,2 |
| a,b,e,g,h | 12,2,0,0,0 |
| a,b,e,g,h | 15,2,0,0,0 |
| a,b,e,g,h | 10,2,0,1,2 |
| a,b,e,g,h | 18,2,0,1,2 |
| c,e | 2,0 |
SQL does not allow to parameterize identifiers. There are various ways to work around this restriction.
It's unclear from the question, but according to comments you want to concatenate the given pattern for every row in table1.
1. Dynamic SQL
Create a helper function (once!) that concatenates and executes statements dynamically.
Basics:
Define table and column names as arguments in a plpgsql function?
CREATE OR REPLACE FUNCTION f_concat_cols(_cols text)
RETURNS TABLE (result text)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE format(
$q$SELECT concat_ws(',', %s) FROM table1 ORDER BY line$q$, _cols);
END
$func$;
It's a set-returning function (a.k.a. "table function"), to return one result row for every row in table1 for each given pattern.
Warning: Converting user input to code like this is a prime opportunity for SQL injection. You must make sure that table1.criteria can only hold valid strings!
To get the full result matrix (with distinct results per row in table2), the query is simple now:
SELECT DISTINCT line AS t2_line, criteria, t1.*
FROM table2, f_concat_cols(criteria) t1
ORDER BY t2_line;
2. Workaround with conversion to JSON
SELECT DISTINCT t2.line AS t2_line, t2.criteria, c.*
FROM table2 t2
CROSS JOIN (SELECT line, to_json(t) AS js FROM table1 t) t1
CROSS JOIN LATERAL (
SELECT string_agg(t1.js->>sub, ',') AS result
FROM unnest(string_to_array(t2.criteria, ',')) sub
) c
ORDER BY t2_line;
After converting rows from t1 to a JSON record, we can access keys (converted from column names) directly.
I unnest the pattern, access each single key, and aggregate the result in LATERAL subquery. See:
What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?
You could encapsulate the logic in a function like in 1., but that's optional in this case.
3. Workaround with conversion to Postgres arrays
SELECT DISTINCT t2.line AS t2_line, t2.criteria, c.*
FROM table2 t2
CROSS JOIN (SELECT line, ARRAY [a,b,c,d,e,f,g,h] AS arr FROM table1 t) t1
CROSS JOIN LATERAL (
SELECT string_agg(t1.arr[idx]::text, ',') AS result
FROM unnest(string_to_array(translate(t2.criteria, 'abcdefgh', '12345678'), ',')::int[]) idx
) c
ORDER BY t2_line;
Similar to the "trick" with JSON, we can avoid dynamic SQL by converting columns to a plain Postgres array. Then project column names to integer array indices. I use translate() for the simple case, but that only works for single letters! Use replace() or regexp_replace() or some other method for longer names.
The rest is like the above.
fiddle - showing all.

Counting the total number of rows with SELECT DISTINCT ON without using a subquery

I have performing some queries using PostgreSQL SELECT DISTINCT ON syntax. I would like to have the query return the total number of rows alongside with every result row.
Assume I have a table my_table like the following:
CREATE TABLE my_table(
id int,
my_field text,
id_reference bigint
);
I then have a couple of values:
id | my_field | id_reference
----+----------+--------------
1 | a | 1
1 | b | 2
2 | a | 3
2 | c | 4
3 | x | 5
Basically my_table contains some versioned data. The id_reference is a reference to a global version of the database. Every change to the database will increase the global version number and changes will always add new rows to the tables (instead of updating/deleting values) and they will insert the new version number.
My goal is to perform a query that will only retrieve the latest values in the table, alongside with the total number of rows.
For example, in the above case I would like to retrieve the following output:
| total | id | my_field | id_reference |
+-------+----+----------+--------------+
| 3 | 1 | b | 2 |
+-------+----+----------+--------------+
| 3 | 2 | c | 4 |
+-------+----+----------+--------------+
| 3 | 3 | x | 5 |
+-------+----+----------+--------------+
My attemp is the following:
select distinct on (id)
count(*) over () as total,
*
from my_table
order by id, id_reference desc
This returns almost the correct output, except that total is the number of rows in my_table instead of being the number of rows of the resulting query:
total | id | my_field | id_reference
-------+----+----------+--------------
5 | 1 | b | 2
5 | 2 | c | 4
5 | 3 | x | 5
(3 rows)
As you can see it has 5 instead of the expected 3.
I can fix this by using a subquery and count as an aggregate function:
with my_values as (
select distinct on (id)
*
from my_table
order by id, id_reference desc
)
select count(*) over (), * from my_values
Which produces my expected output.
My question: is there a way to avoid using this subquery and have something similar to count(*) over () return the result I want?
You are looking at my_table 3 ways:
to find the latest id_reference for each id
to find my_field for the latest id_reference for each id
to count the distinct number of ids in the table
I therefore prefer this solution:
select
c.id_count as total,
a.id,
a.my_field,
b.max_id_reference
from
my_table a
join
(
select
id,
max(id_reference) as max_id_reference
from
my_table
group by
id
) b
on
a.id = b.id and
a.id_reference = b.max_id_reference
join
(
select
count(distinct id) as id_count
from
my_table
) c
on true;
This is a bit longer (especially the long thin way I write SQL) but it makes it clear what is happening. If you come back to it in a few months time (somebody usually does) then it will take less time to understand what is going on.
The "on true" at the end is a deliberate cartesian product because there can only ever be exactly one result from the subquery "c" and you do want a cartesian product with that.
There is nothing necessarily wrong with subqueries.

Handling multiple return values in subquery

I have the following data:
cte
=================
gp_id | m_ids
------|----------
1 | {123}
2 | {432,222}
3 | {123,222}
And a function with a signature like this (which in fact returns not a table but a couple of ids):
FUNCTION foo(m_ids integer[])
RETURNS TABLE (
first_id integer,
second_id integer
)
Now, I've got to iterate over each row and perform some calculations with that function, so I would get something like this:
gp_id | first_id | second_id
------|----------|-----------
1 | 25 | 25
2 | 13 | 24
3 | 25 | 11
To achieve that I tried the following code:
SELECT gp_id,
(
SELECT *
FROM foo(
(
SELECT m_ids
FROM cte c2
WHERE c2.gp_id = c1.gp_id)) limit 1)
FROM cte c1
The problem is in the SELECT * statement. If I use SELECT first_id, everything works well (except for that I have to run two consecutive queries, which I'd like to avoid, obviously), but in the former case I'm getting the error
subquery must return only one column
which is somewhat expected.
So how can I correctly iterate over the table in one single query?
Use the function in a lateral join:
select gp_id, first_id, second_id
from cte,
lateral foo(m_ids);

SQL: reverse groupby : EDIT

Is there a build in function in sql, to reverse the order in which the groupby works? I try to groupby a certain key but i would like to have the last inserted record returned and not the first inserted record.
Changing the order with orderby does not affect this behaviour.
Thanx in advance!
EDIT:
this is the sample data:
id|value
-----
1 | A
2 | B
3 | B
4 | C
as return i want
1 | A
3 | B
4 | C
not
1 | A
2 | B
4 | C
when using group by id don't get the result i want.
Question here is how are you identifying last inserted row. Based on your example, it looks like based on id. If id is auto generated, or a sequence then you can definitely do this.
select max(id),value
from your_table
group by value
Ideally in a table design, people uses a date column which holds the time a particular record was inserted, so it is easy to order by that.
Use Max() as your aggregate function for your id:
SELECT max(id), value FROM <table> GROUP BY value;
This will return:
1 | A
3 | B
4 | C
As for eloquent, I've not used it but I think it would look like:
$myData = DB::table('yourtable')
->select('value', DB::raw('max(id) as maxid'))
->groupBy('value')
->get();

SQLite - select the newest row with a certain field value

I have an SQLite question which essentially boils down to the following problem.
id | key | data
1 | A | x
2 | A | x
3 | B | x
4 | B | x
5 | A | x
6 | A | x
New data is appended to the end of the table with an auto-incremented id.
Now, I want to create a query which returns the latest row for each key, like this:
id | key | data
4 | B | x
6 | A | x
I've tried some different queries but I have been unsuccessful. How do you select only the latest rows for each "key" value in the table?
use this SQL-Query:
select * from tbl where id in (select max(id) from tbl group by key);
You could split the main task into two subroutine.
You could move with the approach first retrieve all id/key value then get the id for the latest value of A and B keys,
Now you could easly write a query to get latest value for A and B because you have value of id's for both A and B keys.
SELECT *
FROM mytable
JOIN
( SELECT MAX(id) AS maxid
FROM mytable
GROUP BY "key"
) AS grp
ON grp.maxid = mytable.id
Side note: it's best not to use reserved words like keyas identifiers (for tables, fields. etc.)
Without nested SELECTs, or JOINs but only if the field determining "newest" is primary key (e.g. autoincrement):
SELECT * FROM table GROUP BY key DESC;