How to replace subquery with in statement in bigquery - google-bigquery

I happen to stumble with a problem using bigquery, I have to build a query where I need to limit the number of ids within the left join to a subset of a query, unfortunately bigquery does not support subquery.
I've been trying to find a solution that will allow me to place this constraint within the join but haven't been successful usually the solution I encounter suggest the usage of crossjoin but I haven't had success with it so far, here is in a nutshell the table structure I have and the query I'm trying to construct:
#standardSQL
WITH User AS (
SELECT 1 AS id, "A" AS items UNION ALL
SELECT 2 AS id, "B" AS items UNION ALL
SELECT 3 AS id, "c" AS items),
Label_User AS (
SELECT 1 AS user_id, 1 AS label_id UNION ALL
SELECT 1 AS user_id, 4 AS label_id UNION ALL
SELECT 1 AS user_id, 3 AS label_id UNION ALL
SELECT 2 AS user_id, 1 AS label_id UNION ALL
SELECT 2 AS user_id, 2 AS label_id),
Labels AS (
SELECT 1 AS id, "Test" AS label UNION ALL
SELECT 2 AS id, "Admin" AS label UNION ALL
SELECT 3 AS id, "Local" AS label UNION ALL
SELECT 4 AS id, "External" AS label)
select * from User left join Label_User on id=user_id and
label_id in (select id from Labels where label = "External" or label ="Local")
-- This works for a single record of label id
-- select * from User left join Label_User on id=user_id and label_id = 1
Any help would be very appreciated.
Edit 1
Thanks #mikhail-berlyant for his suggestion, but the issue I've found with having the condition in the where clause, it's that it filters out some records that I need, so the result I'm looking for looks like this:
id items user_id label_id
1 A 1 4
1 A 1 3
2 B null null
3 C null null
But having the filter in the where output this:
Row id items user_id label_id
1 A 1 4
1 A 1 3

Below is for BigQuery Standard SQL
#standardSQL
SELECT *
FROM User
LEFT JOIN (
SELECT *
FROM User
LEFT JOIN Label_User
ON id = user_id
WHERE label_id IN (SELECT id FROM Labels WHERE label = "External" OR label ="Local")
)
USING (id, items)
when applied t sample data from your question - output as below
Row id items user_id label_id
1 1 A 1 4
2 1 A 1 3
3 2 B null null
4 3 C null null

Related

Identify rows containing repeating customers and assigning them new id as per serial number generated in sql

I have a table
id
repeat customer id
store
date
1
A
07-19-22
2
A
07-19-22
3
A
07-19-22
id
repeat customer id
store
date
1
B
07-19-22
2
B
07-19-22
3
1
B
07-19-22
4
B
07-19-22
and more tables from other store
The problem here is
all stores start with id 1
repeat customer have new id in id column and their original id is retained in repeat customer id column
I have to concatenated all the tables and also keep track of repeating customer for analytics. I have joined all tables using UNION ALL and also created a dummy id column using SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS NEW_ID, * FROM CTE, but I have no clue how to capture and assign value to repeat customer id such that I get the table as below
NEW_ID
id
new_repeat_customer_id
repeat customer id
store
date
1
1
A
07-19-22
2
2
A
07-19-22
3
3
A
07-19-22
4
1
B
07-19-22
5
2
B
07-19-22
6
3
4
1
B
07-19-22
7
4
B
07-19-22
The best way to incorporate it, would be to use Alphanumeric String as NEW_ID, and concat STORE and ID to create NEW_ID. For example A_000000001. In that way you can add similar STORE to REPEAT_CUSTOMER_ID as well.
So in this case, instead of NEW_ID=6, you would have NEW_ID=B_000000003 and REPEAT_CUSTOMER_ID would become B_000000001.
But in case that is not possible, you can use query like below to get the output
DB Fiddle Query
with CTE as
(
select * from STORE1
UNION ALL
select * from STORE2
)
,CTE2 as
(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS NEW_ID,t.* from CTE t)
,REPEAT_ID as
(select NEW_ID,ID,REPEAT_CUSTOMER_ID,STORE from CTE2 where REPEAT_CUSTOMER_ID is not null)
,REPEACT_CUSTOMER_ID as
(select c.NEW_ID as NEW_REPEAT_CUSTOMER_ID,r.NEW_ID
from REPEAT_ID r
left join CTE2 c
on c.ID=r.REPEAT_CUSTOMER_ID and c.STORE=r.STORE
)
select c.* , n.NEW_REPEAT_CUSTOMER_ID
from CTE2 c
left join REPEACT_CUSTOMER_ID n
on c.NEW_ID=n.NEW_ID
https://dbfiddle.uk/?rdbms=sqlserver_2014&fiddle=cbe63994b10f9e3b0eff53b0c89d463a
SO basically you have to separate rows where REPEATE customer is present and join it with main table query.

One query that matches values with only one condition out of two, one query that matches values with both conditions

I'm having some sort of a blank about how to do this in SQL.
Consider this reprex in R
set.seed(123)
data.frame(ID = (sample(c(1:5), 10, replace = T)),
status = (sample(c("yes", "no"), 10, replace = T)),
amount = (sample(seq(1,50,0.01),10)))
which gives out this table
ID status amount
1 3 no 29.87
2 3 yes 26.66
3 2 yes 15.49
4 2 yes 18.89
5 3 yes 44.06
6 5 no 30.79
7 4 yes 17.13
8 1 yes 6.54
9 2 yes 45.68
10 3 yes 12.66
I need to find two SQL queries.
One where I select the ID's that only have status of 'NO'
meaning ID 5.
and
One where I select the ID's that match both conditions, meaning ID 3
I have a query for both but I'm almost sure it's not correct so any lead is more than welcome.
Thanks
One where I select the ID's that only have status of 'NO' meaning ID 5.
select id from your_table where status='no' and id not in (select id from
your_table where status='yes')
One where I select the ID's that match both conditions, meaning ID 3
select id from your_table where status='no' and id in (select id from
your_table where status='yes')
At last I think you are expecting ids which do not match these conditions. so UNION both queries and get ids of your table which not exists after UNION
select id from your_table where id not in (
select id from your_table where status='no' and id not in
(select id from your_table where status='yes')
union all
select id from your_table where status='no' and id in
(select id from your_table where status='yes')
)

Postgresql query to filter latest data based on 2 columns

Table Structure First
users table
id
1
2
3
sites table
id
1
2
site_memberships table
site_id
user_id
created_on
1
1
1
1
1
2
1
1
3
2
1
1
2
1
2
1
2
2
1
2
3
Assuming higher the created_on number, latest the record
Expected Output
site_id
user_id
created_on
1
1
3
2
1
2
1
2
3
Expected output: I need latest record for each user for each site membership.
Tried the following query, but this does not seem to work.
select * from users inner join
(
SELECT ROW_NUMBER () OVER (
PARTITION BY sm.user_id,
sm.created_on
), sm.*
from site_memberships sm
inner join sites s on sm.site_id=s.id
) site_memberships
ON site_memberships.user_id = users.user_id where row_number=1```
I think you have overcomplicated the problem you want to solve.
You seem to want aggregation:
select site_id, user_id, max(created_on)
from site_memberships sm
group by site_id, user_id;
If you had additional columns that you wanted, you could use distinct on instead:
select distinct on (site_id, user_id) sm.*
from site_memberships sm
order by site_id, user_id, created_on desc;

Select unique subsets

I have a table like in example below.
SQL> select * from test;
ID PARENT_ID NAME
1 1 A
2 1 B
3 2 A
4 2 B
5 3 A
6 3 B
7 3 C
8 4 A
What I need is to get all unique subsets of names ((A,B), (A,B,C), (A)) or exclude duplicate subsets. You can see that (A,B) is twice there, one for PARENT_ID=1 and one for 2.
I want to exclude such duplicates:
ID PARENT_ID NAME
1 1 A
2 1 B
5 3 A
6 3 B
7 3 C
8 4 A
You can use DISTINCT to only return different values.
e.g.
SELECT DISTINCT GROUP_CONCAT(NAME SEPARATOR ',') as subsets
FROM TABLE_1
GROUP BY PARENT_ID;
SQL Fiddle
I have used 'group_concat' assuming you are using 'Mysql'. The equivalent function in Oracle is 'listagg()'. you can see it in action here in SQL fiddle
Here is the solution:-
Select a.* from
test a
inner join
(
Select nm, min(parent_id) as p_id
from
(
Select Parent_id, group_concat(NAME) as nm
from test
group by Parent_ID
) a
group by nm
)b
on a.Parent_id=b.p_id
order by parent_id, name

Active Record select 15 records order by date with different field value using

Here I have some articles:
id text group_id source_id
1 t1 1 1
2 t2 1 1
3 t3 2 2
4 t4 3 4
So I want to have records in result ordered by created_at column (it exists, but I didn't show it in table) and having distinct group id, such as that:
id text group_id source_id
1 t1 1 1
3 t3 2 2
4 t4 3 4
Also, I should be able to filter result with source_id.
I'm stuck with this question for two days and don't even know how to start solve problem.
Assuming you want the minimum values of the non-duplicated columns, try:
select min(id) as id,
min(text) as text,
group_id,
source_id,
min(created_at) as created_at
from articles
where source_id = #your_parameter_value
group by group_id,
source_id
order by 5
Select * from
(Select * from articles
Order by group_id, id) x
Group by group_id