SQL: Select * where attribute matches all items in array - sql

Working in a Rails App, I have the following table structure (pertinent columns only)
Photos (id: integer)
Taggings (photo_id: integer, tag_id: integer)
Tags (id: integer, name:string)
I have the following SQL query:
SELECT distinct photos.*
FROM \"photos\" INNER JOIN \"taggings\" ON \"photos\".\"id\" = \"taggings\".\"photo_id\"
INNER JOIN \"tags\" ON \"tags\".\"id\" = \"taggings\".\"tag_id\"
WHERE \"tags\".\"name\" IN ('foo', 'bar')
When I generate this query I'm passing in an array of tags (in this case ["foo","bar"]). The query correctly searches for photos that match ANY of the tags passed in the array.
How can I change this query to select records with ALL of the given tags (ie a photo only matches if tagged with "foo" AND "bar", instead of selecting records with ANY of the given tags?

There may be a better way, but this should do it
SELECT photos.id,max(otherColumn)
FROM \"photos\"
INNER JOIN \"taggings\"
ON \"photos\".\"id\" = \"taggings\".\"photo_id\"
INNER JOIN \"tags\"
ON \"tags\".\"id\" = \"taggings\".\"tag_id\"
WHERE \"tags\".\"name\" IN ('foo', 'bar')
group by photos.id
having count(*) = 2 --2 is the number of items in your array of tags.

If you are in rails you don't need query to do this.
Tags.find(1).taggings should give you an array of all photos with that tag
you can also use Tags.find_by_name("foo").taggings
you can similarly iterate over all tags, and collect the arrays and then just do something like on the arrays you have got.
[ 1, 1, 3, 5 ] & [ 1, 2, 3 ] #=> [ 1, 3 ]
Basically 'and' the arrays and get the unique photos.This way you can get the photos that match all the tags.

Related

Postgres - order records based on a property inside an array of json objects

I'm working with a Postgres database and I have a products view like this:
id
name
product_groups
1
product1
[{...}]
2
product2
[{...}]
the product_groups field contains an array of json objects with the product groups data that the product belongs to, where each json object has the following structure:
{
"productGroupId": 1001,
"productGroupName": "Microphones"
"orderNo": 1,
}
I have a query to get all the products that belong to certain group:
SELECT * FROM products p WHERE p.product_groups #> [{"productGroupId": 1001}]
but I want to get all the products ordered by the orderNo property of the group that I'm querying for.
what should I add/modify to my query in order to achieve this?
I am not really sure I understand your question. My assumptions are:
there will only be one match for the condition on product groups
you want to sort the result rows from the products table, not the elements of the array.
If those two assumptions are correct, you can use a JSON path expression to extract the value of orderNo and then sort by it.
SELECT p.*
FROM products p
WHERE p.product_groups #> [{"productGroupId": 1001}]
ORDER BY jsonb_path_query_first(p.product_groups, '$[*] ? (#.productGroupId == 1001).orderNo')::int
You have to unnest the array:
SELECT p.*
FROM products AS p
CROSS JOIN LATERAL jsonb_array_elements(p.product_groups) AS arr(elem)
WHERE arr.elem #> '{"productGroupId": 1001}'
ORDER BY CAST(arr.elem ->> 'orderNo' AS bigint);

query for many to many record matching

I have table tag_store like below
I want to filter the ids which has all tag provided in a like
SELECT st.id from public."tag_store" st
inner join
(SELECT x.tg_type,x.tg_value FROM json_to_recordset
('[{ "tg_type":1, "tg_value ":"cd"},{ "tg_type":2,"tg_value ":"tg"},{ "tg_type":3,"tg_value ":"po" }] '::json)
AS x (tg_type int, tg_value TEXT)) ftg
on st.tg_type= ftg.tg_type
and st.tg_value = ftg.tg_value order by st.id;
My desired output is it should have output onlye id 1 as it has all three tg_value and tg_id matched..
Please help, what should I change, or is there any better alternate
Thanks
I would aggregate the values into a JSON array and use the #> operator to filter those that have all of them:
with tags as (
select id, jsonb_agg(jsonb_build_object('tg_id', tag_id, 'tg_value', tag_value)) all_tags
from tag_store
group by id
)
select *
from tags
where all_tags #> '[{"tg_id":1, "tg_value": "cd"},
{"tg_id":2, "tg_value": "tg"},
{"tg_id":3, "tg_value": "po"}]'
;
Online example
You can also do that directly in a HAVING clause if you want
select id
from tag_store
group by id
having jsonb_agg(jsonb_build_object('tg_id', tag_id, 'tg_value', tag_value))
#> '[{"tg_id":1, "tg_value": "cd"},
{"tg_id":2, "tg_value": "tg"},
{"tg_id":3, "tg_value": "po"}]'
;
Note that this will return IDs that have additional tags apart from those in the comparison array.

postgresql Looping through JSONB array and performing SELECTs

I have jsonb in one of my table
the jsonb looks like this
my_data : [
{pid: 1, stock: 500},
{pid: 2, stock: 1000},
...
]
pid refers to products' table id ( which is pid )
EDIT: The table products has following properties: pid (PK), name
I want to loop over my_data[] in my JSONB and fetch pid's name from product table.
I need the result to look something like this (including the product names from the second table) ->
my_data : [
{
product_name : "abc",
pid: 1,
stock : 500
},
...
]
How should I go about performing such jsonb inner join?
Edit :- tried S-Man's solutions and i'm getting this error
"invalid reference to FROM-clause entry for table \"jc\""
here is the
SQL QUERY
step-by-step demo:db<>fiddle
SELECT
jsonb_build_object( -- 5
'my_data',
jsonb_agg( -- 4
elems || jsonb_build_object('product_name', mot.product_name) -- 3
)
)
FROM
mytable,
jsonb_array_elements(mydata -> 'my_data') as elems -- 1
JOIN
my_other_table mot ON (elems ->> 'pid')::int = mot.pid -- 2
Expand JSON array into one row per array element
Join the other table against the current one using the pid values (notice the ::int cast, because otherwise it would be text value)
The new columns from the second table now can be converted into a JSON object. This one can be concatenate onto the original one using the || operator
After that recreating the array from the array elements again
Putting this in array into a my_data element
Another way is using jsonb_set() instead of step 5 do reset the array into the original array directly:
step-by-step demo:db<>fiddle
SELECT
jsonb_set(
mydata,
'{my_data}',
jsonb_agg(
elems || jsonb_build_object('product_name', mot.product_name)
)
)
FROM
mytable,
jsonb_array_elements(mydata -> 'my_data') as elems
JOIN
my_other_table mot ON (elems ->> 'pid')::int = mot.pid
GROUP BY mydata

postgreSQL query empty array fields within jsonb column

device_id | device
-----------------------------
9809 | { "name" : "printer", "tags" : [] }
9810 | { "name" : "phone", "tags" : [{"count": 2, "price" : 77}, {"count": 3, "price" : 37} ] }
For the following postgres SQL query on a jsonb column "device" that contains array 'tags':
SELECT t.device_id, elem->>'count', elem->>'price'
FROM tbl t, json_array_elements(t.device->'tags') elem
where t.device_id = 9809
device_id is the primary key.
I have two issues that I don't know how to solve:
tags is an array field that may be empty, in which case I got 0 rows. I want output no matter tags is empty or not. Dummy values are ok.
If tags contain multiple elements, I got multiple rows for the same device id. How to aggregate those multiple elements into one row?
Your first problem can be solved by using a left outer join, that will substitute NULL values for missing matches on the right side.
The second problem can be solved with an aggregate function like json_agg, array_agg or string_agg, depending on the desired result type:
SELECT t.device_id,
jsonb_agg(elem->>'count'),
jsonb_agg(elem->>'price')
FROM tbl t
LEFT JOIN LATERAL jsonb_array_elements(t.device->'tags') elem
ON TRUE
GROUP BY t.device_id;
You will get a JSON array containing just null for those rows where the array is empty, I hope that is ok for you.

Determining membership of an int in a separate relation

This is in relation to determining if an int value from a tuple in one relation is a member value of a column from another relation in Pig Latin. I'm new to Pig Latin and finding it difficult to wrap my mind around the framework.
At the moment I have two tables, one containing a list of ids against tags with a small domain of values, and another with tuples containing an id and a tag id referring to the other table.
Here's orders.csv:
id, tag
1597, x
999, y
787, a
812, x
And tags.csv:
id, tag_id
11, 55
99, 812
22, 787
I need a method of working out if the tag_id of all tuples in the order table are a member of the subset of the ids of the tag table.
id, has_x
111, 0
99, 1
22, 0
This is what I have so far:
register 's3://bucket/jython_task.py' using jython as task;
tags = load 's3://bucket/tags.csv' USING PigStorage(',') AS (id: long, tag: chararray);
orders = load 's3://bucket/orders.csv' USING PigStorage(',') AS (id: long, tag_id: long);
tags = filter tags by tag == 'x';
x_cases = foreach tags generate tag;
tagged_orders = foreach orders generate id, tag_id, tasks.check_membership(tag_id, x_cases.tag) as is_x:int;
and the udf:
def check_membership(instance, value_list):
if instance != None:
for value in value_list:
if instance == value[0]:
return 1
return 0
I get the error:
2012-09-20 23:53:45,377 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (7995), 2nd :(8028)
What am I doing wrong? is there a better way to be doing what I'm looking to do?
I do not know what is the problem in the UDF, but you can get the result with pure PIG. Use COGROUP and IsEmpty built in function.
x_cases = cogroup orders by (tag_id), tags by (id);
tagged_orders = foreach x_cases generate flatten(orders), IsEmpty(tags);
or
tagged_orders = filter x_cases by not IsEmpty(tags);
It might not be the fastest running implementation as it uses Reduce side join, but it all depends on the volumes.
A faster approach could be to use replicated join, which will load the tags table into RAM and will use Map side join, which is faster. The bad thing is that you will lose the records that are not tagged.
tagged_orders = join orders by (tag_id), tags by (id) using 'replicated';
I eventually found a solution to my own problem, it involves a left outer join against the two relations and may have a more elegant solution, I'm open to any better solutions.
tags = load 's3://bucket/tags.csv' USING PigStorage(',') AS (id: long, tag: chararray);
orders = load 's3://bucket/orders.csv' USING PigStorage(',') AS (id: long, tag_id: long);
tags = filter tags by tag == 'x';
tag_cases = foreach tags generate id, 1 as found_tag:int;
tag_cases = distinct tag_cases;
example = join orders by o_id left outer tag_cases by id;
example = foreach example generate orders::o_id as id, (tag_cases is null ? 0 : 1) as has_tag;