Select and count array keys in athena

Select and count array keys in athena - sql

I have many rows of data that represent events in my database. Each row has a column "payload" that contains an array of keys and values. I can easily parse for a value by using
Select
payload.keyname
from Database
But I am trying to get a list and count of all the keys that appear in a given day.
| payload |
|{id=a, gameid=x, gametype=1, sponserid=null} |
|{id=b, gameid=y, gametype=2, action=jump, sponserid=null}|
|{id=c, gameid=z, action=jump, sponserid=null} |
Desired Output
| Key |Count|
|id | 3 |
|game | 3 |
|gametype | 2 |
|action | 2 |
|sponserid| 2 |
Is there some method to query an array for keys easily? Such as
Select
payload.*, count(*)
from Database
group by payload.*

You can use map_keys function to extract keys from payload and unnest on top of it.
select key, count(1) as count
from database.table, unnest(map_keys(payload)) as X(key)
group by 1

You can use cross join unnest. The unnest will "unroll" the map and return a row for each map entry with key, value columns. If you want to count occurrences of each key you can group by key. For example
select key, count(*)
from mydb cross join unnest(payload) A(key, value)
group by 1
see the docs for more info.
----- EDIT ----
If your column is already in row format you can do instead:
select payload.keyname, count(*)
from mydb cross join payload
group by 1

Related

Select concatenated columns based on criteria list in other table

I have a table1
line
a
b
c
d
e
f
g
h
1
18
2
2
22
0
2
1
2
2
20
2
2
2
0
0
0
2
3
10
2
2
222
0
2
1
2
4
12
2
2
3
0
0
0
0
5
15
2
2
3
0
0
0
0
And a table2
 line
criteria
1
 a,b
2
 b,c,f,h
3
 a,b,e,g,h
4
 c,e
I am using this code to see/select the unique results of concated/joined columns, like concat(c,',',d), concat(b,',',d,',',g) and so on from table1 and is working perfectly:
SELECT DISTINCT(CONCAT(c,',',d))
FROM table1
But, instead of writing manually like concat(c,',',d), I want to refer to table2.criteria to get columns references to be concated/joined from table1 so that i can see the entire unique results against each concated criteria
Tried this, but getting an error:
SELECT DISTINCT(SELECT criteria FROM table2)
FROM table1
ERROR: more than one row returned by a subquery used as an expression
SQL state: 21000
The expected unique result is something like this;
| criteria | result |
| ------------ | ---------- |
| a,b | 15,2 |
| a,b | 10,2 |
| a,b | 20,2 |
| a,b | 12,2 |
| a,b | 18,2 |
| b,c,f,h | 2,2,2,2 |
| b,c,f,h | 2,2,0,2 |
| b,c,f,h | 2,2,0,0 |
| a,b,e,g,h | 20,2,0,0,2 |
| a,b,e,g,h | 12,2,0,0,0 |
| a,b,e,g,h | 15,2,0,0,0 |
| a,b,e,g,h | 10,2,0,1,2 |
| a,b,e,g,h | 18,2,0,1,2 |
| c,e | 2,0 |

SQL does not allow to parameterize identifiers. There are various ways to work around this restriction.
It's unclear from the question, but according to comments you want to concatenate the given pattern for every row in table1.
1. Dynamic SQL
Create a helper function (once!) that concatenates and executes statements dynamically.
Basics:
Define table and column names as arguments in a plpgsql function?
CREATE OR REPLACE FUNCTION f_concat_cols(_cols text)
RETURNS TABLE (result text)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE format(
$q$SELECT concat_ws(',', %s) FROM table1 ORDER BY line$q$, _cols);
END
$func$;
It's a set-returning function (a.k.a. "table function"), to return one result row for every row in table1 for each given pattern.
Warning: Converting user input to code like this is a prime opportunity for SQL injection. You must make sure that table1.criteria can only hold valid strings!
To get the full result matrix (with distinct results per row in table2), the query is simple now:
SELECT DISTINCT line AS t2_line, criteria, t1.*
FROM table2, f_concat_cols(criteria) t1
ORDER BY t2_line;
2. Workaround with conversion to JSON
SELECT DISTINCT t2.line AS t2_line, t2.criteria, c.*
FROM table2 t2
CROSS JOIN (SELECT line, to_json(t) AS js FROM table1 t) t1
CROSS JOIN LATERAL (
SELECT string_agg(t1.js->>sub, ',') AS result
FROM unnest(string_to_array(t2.criteria, ',')) sub
) c
ORDER BY t2_line;
After converting rows from t1 to a JSON record, we can access keys (converted from column names) directly.
I unnest the pattern, access each single key, and aggregate the result in LATERAL subquery. See:
What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?
You could encapsulate the logic in a function like in 1., but that's optional in this case.
3. Workaround with conversion to Postgres arrays
SELECT DISTINCT t2.line AS t2_line, t2.criteria, c.*
FROM table2 t2
CROSS JOIN (SELECT line, ARRAY [a,b,c,d,e,f,g,h] AS arr FROM table1 t) t1
CROSS JOIN LATERAL (
SELECT string_agg(t1.arr[idx]::text, ',') AS result
FROM unnest(string_to_array(translate(t2.criteria, 'abcdefgh', '12345678'), ',')::int[]) idx
) c
ORDER BY t2_line;
Similar to the "trick" with JSON, we can avoid dynamic SQL by converting columns to a plain Postgres array. Then project column names to integer array indices. I use translate() for the simple case, but that only works for single letters! Use replace() or regexp_replace() or some other method for longer names.
The rest is like the above.
fiddle - showing all.

How to fetch data while where condition in jsonb in Postgresql

I have a table data_table like this
| id | reciever
| (bigint) |(jsonb)
----------------------------------------------------------------------
| 1 | [{"name":"ABC","email":"abc#gmail.com"},{"name":"ABDFC","email":"ab34c#gmail.com"},...]
| 2 | [{"name":"DEF","email":"deef#gmail.com"},{"name":"AFDBC","email":"a45bc#gmail.com"},...]
| 3 | [{"name":"GHI","email":"ghfi#gmail.com"},{"name":"AEEBC","email":"5gf#gmail.com"},...]
| 4 | [{"name":"LMN","email":"lfmn#gmail.com"},{"name":"EEABC","email":"gfg5#gmail.com"},...]
| 5 | [{"name":"PKL","email":"dfdf#gmail.com"},{"name":"ABREC","email":"a4rbc#gmail.com"},...]
| 6 | [{"name":"ANI","email":"fdffd#gmail.com"},{"name":"ABWC","email":"abrtc#gmail.com"},...]
when i run on pg admin it works fine
I want to fetch row by putting email in where condition like select * from data_table where receiver = 'abc#gmail.com'. there can be more data in array so i have shown "...".
I have tried like where receiver-->>'email'='abc#gmail.com' but it is working in the case {"name":"ABC","email":"abc#gmail.com"} only not in array where i have to chaeck every email in array
Help will be appreciated.

One option is to use exists and jsonb_array_elements():
select t.*
from mytable t
where exists (
select 1
from jsonb_array_elements(t.receiver) x(elt)
where x.elt ->> 'email' = 'abc#gmail.com'
)
This gives you all rows where at least one element in the array has the given email.
If you want to actually exhibit the matching elements, then you can use a lateral join instead (if more than one element in the array has the given email, this duplicates the row):
select t.*, x.elt
from mytable t
cross join lateral jsonb_array_elements(t.receiver) x(elt)
where x.elt ->> email = 'abc#gmail.com'

Postgres jsonb. Heterogenous json fields

If I have a table with a single jsonb column and the table has data like this:
[{"body": {"project-id": "111"}},
{"body": {"my-org.project-id": "222"}},
{"body": {"other-org.project-id": "333"}}]
Basically it stores project-id differently for different rows.
Now I need a query where the data->'body'->'etc'., from different rows would coalesce into a single field 'project-id', how can I do that?
e.g.: if I do something like this:
select data->'body'->'project-id' projectid from mytable
it will return something like:
| projectid |
| 111 |
But I also want project-id's in other rows too, but I don't want additional columns in the results. i.e, I want this:
| projectid |
| 111 |
| 222 |
| 333 |

I understand that each of your rows contains a json object, with a nested object whose key varies over rows, and whose value you want to acquire.
Assuming the 'body' always has a single key, you could do:
select jsonb_extract_path_text(t.js -> 'body', x.k) projectid
from t
cross join lateral jsonb_object_keys(t.js -> 'body') as x(k)
The lateral join on jsonb_object_keys() extracts all keys in the object as rows. Then we use jsonb_extract_path_text() to get the corresponding value.
Demo on DB Fiddle:
with t as (
select '{"body": {"project-id": "111"}}'::jsonb js
union all select '{"body": {"my-org.project-id": "222"}}'::jsonb
union all select '{"body": {"other-org.project-id": "333"}}'::jsonb
)
select jsonb_extract_path_text(t.js -> 'body', x.k) projectid
from t
cross join lateral jsonb_object_keys(t.js -> 'body') as x(k)
| projectid |
| :--------- |
| 111 |
| 222 |
| 333 |

Counting the total number of rows with SELECT DISTINCT ON without using a subquery

I have performing some queries using PostgreSQL SELECT DISTINCT ON syntax. I would like to have the query return the total number of rows alongside with every result row.
Assume I have a table my_table like the following:
CREATE TABLE my_table(
id int,
my_field text,
id_reference bigint
);
I then have a couple of values:
id | my_field | id_reference
----+----------+--------------
1 | a | 1
1 | b | 2
2 | a | 3
2 | c | 4
3 | x | 5
Basically my_table contains some versioned data. The id_reference is a reference to a global version of the database. Every change to the database will increase the global version number and changes will always add new rows to the tables (instead of updating/deleting values) and they will insert the new version number.
My goal is to perform a query that will only retrieve the latest values in the table, alongside with the total number of rows.
For example, in the above case I would like to retrieve the following output:
| total | id | my_field | id_reference |
+-------+----+----------+--------------+
| 3 | 1 | b | 2 |
+-------+----+----------+--------------+
| 3 | 2 | c | 4 |
+-------+----+----------+--------------+
| 3 | 3 | x | 5 |
+-------+----+----------+--------------+
My attemp is the following:
select distinct on (id)
count(*) over () as total,
*
from my_table
order by id, id_reference desc
This returns almost the correct output, except that total is the number of rows in my_table instead of being the number of rows of the resulting query:
total | id | my_field | id_reference
-------+----+----------+--------------
5 | 1 | b | 2
5 | 2 | c | 4
5 | 3 | x | 5
(3 rows)
As you can see it has 5 instead of the expected 3.
I can fix this by using a subquery and count as an aggregate function:
with my_values as (
select distinct on (id)
*
from my_table
order by id, id_reference desc
)
select count(*) over (), * from my_values
Which produces my expected output.
My question: is there a way to avoid using this subquery and have something similar to count(*) over () return the result I want?

You are looking at my_table 3 ways:
to find the latest id_reference for each id
to find my_field for the latest id_reference for each id
to count the distinct number of ids in the table
I therefore prefer this solution:
select
c.id_count as total,
a.id,
a.my_field,
b.max_id_reference
from
my_table a
join
(
select
id,
max(id_reference) as max_id_reference
from
my_table
group by
id
) b
on
a.id = b.id and
a.id_reference = b.max_id_reference
join
(
select
count(distinct id) as id_count
from
my_table
) c
on true;
This is a bit longer (especially the long thin way I write SQL) but it makes it clear what is happening. If you come back to it in a few months time (somebody usually does) then it will take less time to understand what is going on.
The "on true" at the end is a deliberate cartesian product because there can only ever be exactly one result from the subquery "c" and you do want a cartesian product with that.
There is nothing necessarily wrong with subqueries.

SQLite - select the newest row with a certain field value

I have an SQLite question which essentially boils down to the following problem.
id | key | data
1 | A | x
2 | A | x
3 | B | x
4 | B | x
5 | A | x
6 | A | x
New data is appended to the end of the table with an auto-incremented id.
Now, I want to create a query which returns the latest row for each key, like this:
id | key | data
4 | B | x
6 | A | x
I've tried some different queries but I have been unsuccessful. How do you select only the latest rows for each "key" value in the table?

use this SQL-Query:
select * from tbl where id in (select max(id) from tbl group by key);

You could split the main task into two subroutine.
You could move with the approach first retrieve all id/key value then get the id for the latest value of A and B keys,
Now you could easly write a query to get latest value for A and B because you have value of id's for both A and B keys.

SELECT *
FROM mytable
JOIN
( SELECT MAX(id) AS maxid
FROM mytable
GROUP BY "key"
) AS grp
ON grp.maxid = mytable.id
Side note: it's best not to use reserved words like keyas identifiers (for tables, fields. etc.)

Without nested SELECTs, or JOINs but only if the field determining "newest" is primary key (e.g. autoincrement):
SELECT * FROM table GROUP BY key DESC;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Select and count array keys in athena - sql

You can use map_keys function to extract keys from payload and unnest on top of it. select key, count(1) as count from database.table, unnest(map_keys(payload)) as X(key) group by 1

Related

Select concatenated columns based on criteria list in other table

How to fetch data while where condition in jsonb in Postgresql

Postgres jsonb. Heterogenous json fields

Counting the total number of rows with SELECT DISTINCT ON without using a subquery

SQLite - select the newest row with a certain field value

Categories

Resources