How to fetch data while where condition in jsonb in Postgresql - sql

I have a table data_table like this
| id | reciever
| (bigint) |(jsonb)
----------------------------------------------------------------------
| 1 | [{"name":"ABC","email":"abc#gmail.com"},{"name":"ABDFC","email":"ab34c#gmail.com"},...]
| 2 | [{"name":"DEF","email":"deef#gmail.com"},{"name":"AFDBC","email":"a45bc#gmail.com"},...]
| 3 | [{"name":"GHI","email":"ghfi#gmail.com"},{"name":"AEEBC","email":"5gf#gmail.com"},...]
| 4 | [{"name":"LMN","email":"lfmn#gmail.com"},{"name":"EEABC","email":"gfg5#gmail.com"},...]
| 5 | [{"name":"PKL","email":"dfdf#gmail.com"},{"name":"ABREC","email":"a4rbc#gmail.com"},...]
| 6 | [{"name":"ANI","email":"fdffd#gmail.com"},{"name":"ABWC","email":"abrtc#gmail.com"},...]
when i run on pg admin it works fine
I want to fetch row by putting email in where condition like select * from data_table where receiver = 'abc#gmail.com'. there can be more data in array so i have shown "...".
I have tried like where receiver-->>'email'='abc#gmail.com' but it is working in the case {"name":"ABC","email":"abc#gmail.com"} only not in array where i have to chaeck every email in array
Help will be appreciated.

One option is to use exists and jsonb_array_elements():
select t.*
from mytable t
where exists (
select 1
from jsonb_array_elements(t.receiver) x(elt)
where x.elt ->> 'email' = 'abc#gmail.com'
)
This gives you all rows where at least one element in the array has the given email.
If you want to actually exhibit the matching elements, then you can use a lateral join instead (if more than one element in the array has the given email, this duplicates the row):
select t.*, x.elt
from mytable t
cross join lateral jsonb_array_elements(t.receiver) x(elt)
where x.elt ->> email = 'abc#gmail.com'

Related

Select and count array keys in athena

I have many rows of data that represent events in my database. Each row has a column "payload" that contains an array of keys and values. I can easily parse for a value by using
Select
payload.keyname
from Database
But I am trying to get a list and count of all the keys that appear in a given day.
| payload |
|{id=a, gameid=x, gametype=1, sponserid=null} |
|{id=b, gameid=y, gametype=2, action=jump, sponserid=null}|
|{id=c, gameid=z, action=jump, sponserid=null} |
Desired Output
| Key |Count|
|id | 3 |
|game | 3 |
|gametype | 2 |
|action | 2 |
|sponserid| 2 |
Is there some method to query an array for keys easily? Such as
Select
payload.*, count(*)
from Database
group by payload.*
You can use map_keys function to extract keys from payload and unnest on top of it.
select key, count(1) as count
from database.table, unnest(map_keys(payload)) as X(key)
group by 1
You can use cross join unnest. The unnest will "unroll" the map and return a row for each map entry with key, value columns. If you want to count occurrences of each key you can group by key. For example
select key, count(*)
from mydb cross join unnest(payload) A(key, value)
group by 1
see the docs for more info.
----- EDIT ----
If your column is already in row format you can do instead:
select payload.keyname, count(*)
from mydb cross join payload
group by 1

How can I summarise this JSON array data as columns on the same row?

I have a PostgreSQL database with a table called items which includes a JSONB field care_ages.
This field contains an array of between one and three objects, which include these keys:
AgeFrom
AgeTo
Number
Register
For a one-off audit report I need to run on this table, I need to "unpack" this field into columns on the same row.
I've used jsonb_to_recordset to split it out into rows and columns, which gets me halfway:
SELECT
items.id,
items.name,
care_ages.*
FROM
ofsted_items,
jsonb_to_recordset(items.care_age) AS care_ages ("AgeFrom" integer, "AgeTo" integer, "Register" text, "MaximumNumber" integer)
This gives me output like:
| id | name | register | age_from | age_to | maximum_number |
|----|--------------|----------|----------|--------|----------------|
| 1 | namey mcname | xyz | 0 | 4 | 5 |
| 1 | namey mcname | abc | 4 | 8 | 7 |
Next, I need to combine these rows together, perhaps using GROUP BY, adding extra columns, like this:
| id | name | register_xyz? | xyz_age_from | xyz_age_to | xyz_maximum_number | register_abc? | abc_age_from | abc_age_to | abc_maximum_number |
|----|--------------|---------------|--------------|------------|--------------------|---------------|--------------|------------|--------------------|
| 1 | namey mcname | true | 0 | 4 | 5 | true | 4 | 8 | 7 |
Because I know ahead of time which "registers" there are (there's only three of them), it seems like this should be possible.
I've tried following this example, using CASE to calculate extra columns, but I'm not getting any useful values: all 0s and 5s for some reason.
If you are using Postgres 12 or later, you can use a jsonpath query to first extract the JSON object for each register into separate columns. Then use the "usualy" operators to extract the keys. This avoids first expanding into multiple rows just to aggregate them back into a single row later
select id, name,
(reg_xyz ->> 'AgeFrom')::int as xyz_age_from,
(reg_xyz ->> 'AgeTo')::int as xyz_age_to,
(reg_xyz ->> 'Number')::int as xyz_max_num,
(reg_abc ->> 'AgeFrom')::int as abc_age_from,
(reg_abc ->> 'AgeTo')::int as abc_age_to,
(reg_abc ->> 'Number')::int as abc_max_num
from (
select id, name,
jsonb_path_query_first(care_age, '$[*] ? (#.Register == "xyz")') as reg_xyz,
jsonb_path_query_first(care_age, '$[*] ? (#.Register == "abc")') as reg_abc
from ofsted_items
) t
At one point or another you will have to explicitly write out one expression for each column, so jsonb_to_recordset doesn't really buy you that much.
Online example
If you need this a lot, you can easily put this into a view.
Try below query:
select id,name,
max(case when register='xyz' then true end) as "register_xyz?",
max(case when register='xyz' then age_from end) as xyz_age_from,
max(case when register='xyz' then age_to end) as xyz_age_to,
max(case when register='xyz' then maximum_number end) as xyz_maximum_number,
max(case when register='abc' then true end) as "register_abc?",
max(case when register='abc' then age_from end) as abc_age_from,
max(case when register='abc' then age_to end) as abc_age_to,
max(case when register='abc' then maximum_number end) as abc_maximum_number
from (SELECT
items.id,
items.name,
care_ages.*
FROM
ofsted_items,
jsonb_to_recordset(items.care_age) AS care_ages ("AgeFrom" integer, "AgeTo" integer, "Register" text, "MaximumNumber" integer)
)t
group by id, name
You can use conditional aggregation to pivot your table. This can be done using the CASE clause as it was done at the solution you already linked or using the FILTER clause:
demo:db<>fiddle
SELECT
id,
name,
bool_and(true) FILTER (WHERE register = 'xyz') as "register_xyz?",
MAX(age_from) FILTER (WHERE register = 'xyz') as xyz_age_from,
MAX(age_to) FILTER (WHERE register = 'xyz') as xyz_age_to,
MAX(maximum_number) FILTER (WHERE register = 'xyz') as xyz_maximum_number,
bool_and(true) FILTER (WHERE register = 'abc') as "register_abc?",
MAX(age_from) FILTER (WHERE register = 'abc') as abc_age_from,
MAX(age_to) FILTER (WHERE register = 'abc') as abc_age_to,
MAX(maximum_number) FILTER (WHERE register = 'abc') as abc_maximum_number
FROM items,
jsonb_to_recordset(items.care_ages) AS care_ages ("age_from" integer, "age_to" integer, "register" text, "maximum_number" integer)
GROUP BY id, name

Counting the total number of rows with SELECT DISTINCT ON without using a subquery

I have performing some queries using PostgreSQL SELECT DISTINCT ON syntax. I would like to have the query return the total number of rows alongside with every result row.
Assume I have a table my_table like the following:
CREATE TABLE my_table(
id int,
my_field text,
id_reference bigint
);
I then have a couple of values:
id | my_field | id_reference
----+----------+--------------
1 | a | 1
1 | b | 2
2 | a | 3
2 | c | 4
3 | x | 5
Basically my_table contains some versioned data. The id_reference is a reference to a global version of the database. Every change to the database will increase the global version number and changes will always add new rows to the tables (instead of updating/deleting values) and they will insert the new version number.
My goal is to perform a query that will only retrieve the latest values in the table, alongside with the total number of rows.
For example, in the above case I would like to retrieve the following output:
| total | id | my_field | id_reference |
+-------+----+----------+--------------+
| 3 | 1 | b | 2 |
+-------+----+----------+--------------+
| 3 | 2 | c | 4 |
+-------+----+----------+--------------+
| 3 | 3 | x | 5 |
+-------+----+----------+--------------+
My attemp is the following:
select distinct on (id)
count(*) over () as total,
*
from my_table
order by id, id_reference desc
This returns almost the correct output, except that total is the number of rows in my_table instead of being the number of rows of the resulting query:
total | id | my_field | id_reference
-------+----+----------+--------------
5 | 1 | b | 2
5 | 2 | c | 4
5 | 3 | x | 5
(3 rows)
As you can see it has 5 instead of the expected 3.
I can fix this by using a subquery and count as an aggregate function:
with my_values as (
select distinct on (id)
*
from my_table
order by id, id_reference desc
)
select count(*) over (), * from my_values
Which produces my expected output.
My question: is there a way to avoid using this subquery and have something similar to count(*) over () return the result I want?
You are looking at my_table 3 ways:
to find the latest id_reference for each id
to find my_field for the latest id_reference for each id
to count the distinct number of ids in the table
I therefore prefer this solution:
select
c.id_count as total,
a.id,
a.my_field,
b.max_id_reference
from
my_table a
join
(
select
id,
max(id_reference) as max_id_reference
from
my_table
group by
id
) b
on
a.id = b.id and
a.id_reference = b.max_id_reference
join
(
select
count(distinct id) as id_count
from
my_table
) c
on true;
This is a bit longer (especially the long thin way I write SQL) but it makes it clear what is happening. If you come back to it in a few months time (somebody usually does) then it will take less time to understand what is going on.
The "on true" at the end is a deliberate cartesian product because there can only ever be exactly one result from the subquery "c" and you do want a cartesian product with that.
There is nothing necessarily wrong with subqueries.

Hive for bag of words (word count for each word in the dictionary)

I have a table with this structure:
user_id | message_id | content
1 | 1 | "I like cats"
1 | 1 | "I like dogs"
And a list of valid words in dictionary.txt (or an external hive table), for example:
I,like,dogs,cats,lemurs
And my goal is to generate an word-count table for each user
user_id | "I" | "like" | "dogs" | "cats" | "lemurs"
1 | 2 | 2 | 1 | 1 | 0
This is what I tried so far:
SELECT user_id, word, COUNT(*)
FROM messages LATERAL VIEW explode(split(content, ' ')) lTable as word
GROUP BY user_id,word;
Check this :
select ename,
length(ename)-length(replace(ename,'A', '')) A,
length(ename)-length(replace(ename,'W', '')) W
FROM EMP;
Else you can define a variable(your search string) and place it in the place of 'A', 'W' etc
I am not very familiar with doing Pivot on Hive, but in pig it can be possible to do.
DEFINE GET_WORDCOUNTS com.stackoverflow.pig.GetWordCounts('$dictionary_path');
A = LOAD .... AS user_id, message_id, content;
C = GROUP B BY (user_id);
D = FOREACH C GENERATE group, FLATTEN(GET_WORDCOUNTS(B.content));
You will have to write a simple UDF GetWordCounts which tokenizes your input content for each grouped record, and checks with input dictionary.

Access query to grab +5 or more duplicates

i have a little problem with an Access query ( dont ask me why but i cannot use a true SGBD but Access )
i have a huge table with like 920k records
i have to loop through all those data and grab the ref that occur more than 5 time on the same date
table = myTable
--------------------------------------------------------------
| id | ref | date | C_ERR_ANO |
--------------------------------------------|-----------------
| 1 | A12345678 | 2012/02/24 | A 4565 |
| 2 | D52245708 | 2011/05/02 | E 5246 |
| ... | ......... | ..../../.. | . .... |
--------------------------------------------------------------
so to resume it a bit, i have like 900000+ records
there is duplicates on the SAME DATE ( oh by the way there is another collumn i forgot to add that have C_ERR_ANO as name)
so i have to loop through all those row, grab each ref based on date AND errorNumber
and if there is MORE than 5 time with the same errorNumber i have to grab them and display it in the result
i ended up using this query:
SELECT DISTINCT Centre.REFERENCE, Centre.DATESE, Centre.C_ERR_ANO
FROM Centre INNER JOIN (SELECT
Centre.[REFERENCE],
COUNT(*) AS `toto`,
Centre.DATESE
FROM Centre
GROUP BY REFERENCE
HAVING COUNT(*) > 5) AS Centre_1
ON Centre.REFERENCE = Centre_1.REFERENCE
AND Centre.DATESE <> Centre_1.DATESE;
but this query isent good
i tried then
SELECT DATESE, REFERENCE, C_ERR_ANO, COUNT(REFERENCE) AS TOTAL
FROM (
SELECT *
FROM Centre
WHERE (((Centre.[REFERENCE]) NOT IN (SELECT [REFERENCE]
FROM [Centre] AS Tmp
GROUP BY [REFERENCE],[DATESE],[C_ERR_ANO]
HAVING Count(*)>1 AND [DATESE] = [Centre].[DATESE]
AND [C_ERR_ANO] = [Centre].[C_ERR_ANO]
AND [LIBELLE] = [Centre].[LIBELLE])))
ORDER BY Centre.[REFERENCE], Centre.[DATESE], Centre.[C_ERR_ANO])
GROUP BY REFERENCE, DATESE, C_ERR_ANO
still , not working
i'm struggeling
Your group by clause needs to include all of the items in your select. Why not use:
select Centre.DATESE, Centre.C_ERR_ANO, Count (*)
Group by Centre.DATESE, Centre.C_ERR_ANO
HAVING COUNT (*) > 5
If you need other fields then you can add them, as long as you ensure the same fields appear in the select as the group by.
No idea what is going on with the formatting here!