Dynamic intersections between groups based on relation - sql

I have 2 tables:
product_facet_values_facet_value
+-----------+--------------+
| productId | facetValueId |
+-----------+--------------+
| 6 | 1 |
| 6 | 34 |
| 7 | 39 |
| 8 | 34 |
| 8 | 1 |
| 8 | 11 |
| 9 | 1 |
| 9 | 39 |
+-----------+--------------+
facet_value
+--------------+---------+
| facetValueId | facetId |
+--------------+---------+
| 1 | 2 |
| 34 | 6 |
| 39 | 2 |
| 44 | 2 |
| 56 | 11 |
+--------------+---------+
I need to be able to get all productIds with those facetValueIds I ask for but with one extra step - I need an intersection between facetValueId groups based on same facetId.
For example I want to get all product ids with facetValueId 1, 34, 39 and result of this query should be same as I would get with the following query:
select "productId"
from "product_facet_values_facet_value"
where "facetValueId" in (1, 39)
INTERSECT
select "productId"
from "product_facet_values_facet_value"
where "facetValueId" in (34)
I wrote this query based on: facetValueIds 1 or 39 has same "facetId"=2, facetValueId 34 has "facetId"=6.
I need a query that would result in same result without having it to group it manually. If for example next time I ask for all products that have facetValueIds 1, 34, 39, 56 the result of such dynamic query should be same as if I would write 3 INTERSECTIONs between IN (1, 39) & IN(34) & IN(56) like:
select "productId"
from "product_facet_values_facet_value"
where "facetValueId" in (1, 39)
INTERSECT
select "productId"
from "product_facet_values_facet_value"
where "facetValueId" in (34)
INTERSECT
select "productId"
from "product_facet_values_facet_value"
where "facetValueId" in (56)

https://dbfiddle.uk/?rdbms=postgres_13&fiddle=d06344b4a68c7b97fc1fad46c7437894
This is the same method as #a_horse_with_no_name used, but generalised very slightly.
WITH
targets AS
(
SELECT * FROM facet_value WHERE facetId IN (2, 6)
)
SELECT
map.productId
FROM
product_facet_values_facet_value AS map
INNER JOIN
targets AS tgt
ON tgt.facetValueId = map.facetValueId
GROUP BY
map.productId
HAVING
COUNT(DISTINCT tgt.facetId) = (SELECT COUNT(DISTINCT facetId) FROM targets)

Related

bigquery group by and get all elements except the groupby value

I'm dealing with a some transactional history in bigquery. The table contains two columns:
transaction_number and item_id.
I'm trying to identify two features:
How many (average and std) products are purchased along with a certain item_id in the same transaction?
What are the list of products purchased along with the certain item_id in the same transaction?
For example: if we assume these are the products purchased in the same transaction,
|---------------------|------------------|
| trans_num | item_id |
|---------------------|------------------|
| 1 | 34 |
|---------------------|------------------|
| 1 | 35 |
|---------------------|------------------|
| 2 | 36 |
|---------------------|------------------|
| 2 | 37 |
|---------------------|------------------|
| 2 | 34 |
|---------------------|------------------|
I want the first output to be
|----------------------|------------------|
| item_id | feature_1 |
|----------------------|------------------|
| 34 | 2.5 |
|----------------------|------------------|
| 35 | 2 |
|----------------------|------------------|
| 36 | 2 |
|----------------------|------------------|
| 37 | 2 |
|----------------------|------------------|
| 38 | 2 |
|----------------------|------------------|
And feature_2 should contain
|--------|------------|
|item_id | feature 2 |
|--------|------------|
| 34 |[35, 36, 37]|
|--------|------------|
| 35 | [34] |
|--------|------------|
| 36 | [37, 34] |
|--------|------------|
| 37 | [36, 34] |
|--------|------------|
How should I approach this?
Below is for BigQuery Standard SQL
#standardSQL
with pre_aggregation as (
select a.trans_num, a.item_id, array_agg(b.item_id) other_items
from `project.dataset.table` a
join `project.dataset.table` b
on a.trans_num = b.trans_num
and a.item_id != b.item_id
group by trans_num, item_id
order by item_id, trans_num
)
select item_id,
feature_1,
array (
select distinct item
from t.feature_2 item
order by item
) as feature_2
from (
select item_id,
avg(array_length(other_items)) as feature_1,
array_concat_agg(other_items) as feature_2
from pre_aggregation
group by item_id
) t
if to apply to sample data from your question
`project.dataset.table` as (
select 1 trans_num, 34 item_id union all
select 1, 35 union all
select 2, 36 union all
select 2, 37 union all
select 2, 34
)
output is

How to count how many times a specific value appeared on each columns and group by range

I'm new on postgres and I have a question:
I have a table with 100 columns. I need to count the values from each columns and count how many times they appeared, so I can group then based on the range that they fit
I have a table like this(100 columns)
+------+------+------+------+------+---------+--------+
| Name | PRB0 | PRB1 | PRB2 | PRB3 | ....... | PRB100 |
+------+------+------+------+------+---------+--------+
| A | 15 | 6 | 47 | 54 | ..... | 8 |
| B | 25 | 22 | 84 | 86 | ..... | 76 |
| C | 57 | 57 | 96 | 38 | ..... | 28 |
+------+------+------+------+------+---------+--------+
And need the output to be something like this
+------+---------------+----------------+----------------+----------------+-----+-----------------+--+
| Name | Count 0 to 20 | Count 21 to 40 | Count 41 to 60 | Count 61 to 70 | ... | Count 81 to 100 | |
+------+---------------+----------------+----------------+----------------+-----+-----------------+--+
| A | 5 | 46 | 87 | 34 | ... | 98 | |
| B | 5 | 2 | 34 | 56 | ... | 36 | |
| C | 7 | 17 | 56 | 78 | ... | 88 | |
+------+---------------+----------------+----------------+----------------+-----+-----------------+--+
For Name A we have:
5 times the number between 0 and 20 apeared
46 times the number between 21 and 40 appeared
86 times the number between 41 and 60 appeared
Basicaly I need something like the function COUNTIFS that we have on Excel. On excel we just need to especify the range of columns and the condition.
You could unpivot with a lateral join, then aggregate:
select
name,
count(*) filter(where prb between 0 and 20) cnt_00_20,
count(*) filter(where prb between 21 and 50) cnt_21_20,
...,
count(*) filter(where prb between 81 and 100) cnt_81_100
from mytable t
cross join lateral (values(t.prb0), (t.prb1), ..., (t.prb100)) p(prb)
group by name
Note, however, that this still requires you to enumerate all the columns in the values() table constructor. If you want something fully dynamic, you can use json instead. The idea is to turn each record to a json object using to_jsonb(), then to rows with jsonb_each(); you can then do conditional aggregation.
select
name,
count(*) filter(where prb::int between 0 and 20) cnt_00_20,
count(*) filter(where prb::int between 21 and 50) cnt_21_20,
...,
count(*) filter(where prb::int between 81 and 100) cnt_81_100
from mytable t
cross join lateral to_jsonb(t) j(js)
cross join lateral jsonb_each( j.js - 'name') r(col, prb)
group by name

PostgreSQL Current count of specific value

I need to achieve a view such as:
+------------+----------+--------------+----------------+------------------+
| Parent id | Expected | Parent Value | Distinct Value | Distinct Value 2 |
+------------+----------+--------------+----------------+------------------+
| 1 | 001.001 | 3 | 6/1/2017 | 5,000.00 |
| 1 | 001.002 | 3 | 9/1/2018 | 3,500.00 |
| 1 | 001.003 | 3 | 1/7/2018 | 9,000.00 |
| 2 | 002.001 | 7 | 9/1/2017 | 2,500.00 |
| 3 | 003.001 | 5 | 3/6/2017 | 1,200.00 |
| 3 | 003.002 | 5 | 16/8/2017 | 8,700.00 |
+------------+----------+--------------+----------------+------------------+
where I get distinct child objects that have same parents, but I cannot make the "Expected" column work. Those zeros don't really matter, I just need to get subindex like "1.1", "1.2" to work. I tried rank() function but it seems it doesnt really help.
Any help appreciated, thanks in advance.
My initial try looks like this:
SELECT DISTINCT
parent.parent_id,
rank() OVER ( order by parent_id ) as expected,
parent.parent_value,
ct.distinct_value,
ct.distinct_value_2
FROM parent
LEFT JOIN (crosstab (...) )
AS ct( ... )
ON ...
Use partition by parent_id in the window function and order by another_col to define the order in groups by parent_id.
with parent(parent_id, another_col) as (
values (1, 30), (1, 20), (1, 10), (2, 40), (3, 60), (3, 50)
)
select
parent_id,
another_col,
format('%s.%s', parent_id, row_number() over w) as expected
from parent
window w as (partition by parent_id order by another_col);
parent_id | another_col | expected
-----------+-------------+----------
1 | 10 | 1.1
1 | 20 | 1.2
1 | 30 | 1.3
2 | 40 | 2.1
3 | 50 | 3.1
3 | 60 | 3.2
(6 rows)

SQL order by highest to lowest in one table referencing another table in an UPDATE

Hey all I have the following tables that need in order to get data from one that matches the other and have it from highest to lowest depending on the int of TempVersion.
UPDATE
net_Users
SET
net_Users.DefaultId = b.TId
FROM
(SELECT
TOP 1 IndivId,
TId
FROM
UTeams
WHERE
UTeams.[Active] = 1
ORDER BY
TempVersion DESC
) AS b
WHERE
net_Users.IndivId = b.IndivId
In the above I am trying to order from the highest TempVersion to the lowest.
The query above seems to just update 1 of those records with the TempVersion and stop there. I am needing it to loop to find all associated users with the same IndivId matching.
Anyone able to help me out with this?
sample data
net_Users:
name | DefaultId | IndivId | etc...
--------+-----------+---------+-------
Bob | | 87 | etc...
Jan | | 231 | etc...
Luke | | 8 | etc...
UTeams:
IndivId | TempVersion | etc...
--------+-------------+-------
8 | 44 | etc...
17 | 18 | etc...
8 | 51 | etc...
8 | 2 | etc...
7 | 22 | etc...
8 | 125 | etc...
87 | 10 | etc...
14 | 88 | etc...
8 | 5 | etc...
15 | 54 | etc...
65 | 11 | etc...
87 | 15 | etc...
39 | 104 | etc...
And the output I would be needing is (going to choose IndivId 8):
In net_users:
Name | DefaultId | IndivId | etc...
-----+-----------+---------+-------
Luke | 125 | 8 | etc...
Luke | 51 | 8 | etc...
Luke | 44 | 8 | etc...
Luke | 5 | 8 | etc...
Luke | 2 | 8 | etc...
I think this is what you were trying to do:
update net_Users
set net_Users.DefaultId = coalesce((
select top 1 TId
from UTeams
where UTeams.[Active] = 1
and net_Users.IndivId = UTeams.IndivId
order by u.TempVersion desc
)
,net_Users.DefaultId
)
another way using cross apply()
update n
set DefaultId = coalesce(x.Tid,n.DefaultId)
from net_Users as n
cross apply (
select top 1 TId
from UTeams as u
where u.[Active] = 1
and n.IndivId = u.IndivId
order by u.TempVersion desc
) as x
another way to do that with a common table expression and row_number()
with cte as (
select
n.IndivId
, n.DefaultId
, u.Tid
, rn = row_number() over (
partition by n.IndivId
order by TempVersion desc
)
from net_users as n
inner join UTeams as u
on n.IndivId = u.IndivId
where u.[Active]=1
)
update cte
set DefaultId = Tid
where rn = 1

SQL: Complex query with subtraction from different cells

I have two tables and I want to combine their data.
The first table
+------------+-----+------+-------+
| BusinessID | Lat | Long | Stars |
+------------+-----+------+-------+
| abc123 | 32 | 74 | 4.5 |
| abd123 | 32 | 75 | 4 |
| abe123 | 33 | 76 | 3 |
+------------+-----+------+-------+
The second table is:
+------------+-----+------+-------+
| BusinessID | day | time | count |
+------------+-----+------+-------+
| abc123 | 1 | 14 | 5 |
| abc123 | 1 | 15 | 6 |
| abc123 | 2 | 13 | 1 |
| abd123 | 4 | 12 | 4 |
| abd123 | 4 | 13 | 8 |
| abd123 | 5 | 11 | 2 |
+------------+-----+------+-------+
So what I want to do is find all the Businesses that are in a specific radius and have more check ins in the next hour than the current.
So the results are
+------------+
| BusinessID |
+------------+
| abd123 |
| abc123 |
+------------+
Because they have more check-ins in the next hour than the previous (6 > 5, 8 > 4)
What is more it would be helpful if the results where ordered by their difference in check-ins number. Ex. ( 8 - 4 > 6 - 5 )
SELECT *
FROM table2 t2
WHERE t2.BusinessID IN (
SELECT t1.BusinessID
FROM table1 t1
WHERE earth_box(ll_to_earth(32, 74), 4000/1.609) #> ll_to_earth(Lat, Long)
ORDER by earth_distance(ll_to_earth(32, 74), ll_to_earth(Lat, Long)), stars DESC
) AND checkin_day = 1 AND checkin_time = 14;
From the above query I can find the businesses in a radius and then find their check-ins in the specified time. Ex. 14. What I need to do now is to find the number of check-ins in the 15 hour (of the same businesses) and find if the number of the check-ins is greater than it was in the previous time.
I think you want something like this:
SELECT
t1.BusinessID
FROM
table1 t1
JOIN
(SELECT
*,
"count" - LAG("count") OVER (PARTITION BY BusinessID, "day" ORDER BY "time") "grow"
FROM
table2
WHERE
/* Some condition on table2 */) t2
ON t1.BusinessID = t2.BusinessID AND t2.grow > 0
WHERE
/* Some condition on table1 */
ORDER BY
t2.grow DESC;