I happen to stumble with a problem using bigquery, I have to build a query where I need to limit the number of ids within the left join to a subset of a query, unfortunately bigquery does not support subquery.
I've been trying to find a solution that will allow me to place this constraint within the join but haven't been successful usually the solution I encounter suggest the usage of crossjoin but I haven't had success with it so far, here is in a nutshell the table structure I have and the query I'm trying to construct:
#standardSQL
WITH User AS (
SELECT 1 AS id, "A" AS items UNION ALL
SELECT 2 AS id, "B" AS items UNION ALL
SELECT 3 AS id, "c" AS items),
Label_User AS (
SELECT 1 AS user_id, 1 AS label_id UNION ALL
SELECT 1 AS user_id, 4 AS label_id UNION ALL
SELECT 1 AS user_id, 3 AS label_id UNION ALL
SELECT 2 AS user_id, 1 AS label_id UNION ALL
SELECT 2 AS user_id, 2 AS label_id),
Labels AS (
SELECT 1 AS id, "Test" AS label UNION ALL
SELECT 2 AS id, "Admin" AS label UNION ALL
SELECT 3 AS id, "Local" AS label UNION ALL
SELECT 4 AS id, "External" AS label)
select * from User left join Label_User on id=user_id and
label_id in (select id from Labels where label = "External" or label ="Local")
-- This works for a single record of label id
-- select * from User left join Label_User on id=user_id and label_id = 1
Any help would be very appreciated.
Edit 1
Thanks #mikhail-berlyant for his suggestion, but the issue I've found with having the condition in the where clause, it's that it filters out some records that I need, so the result I'm looking for looks like this:
id items user_id label_id
1 A 1 4
1 A 1 3
2 B null null
3 C null null
But having the filter in the where output this:
Row id items user_id label_id
1 A 1 4
1 A 1 3
Below is for BigQuery Standard SQL
#standardSQL
SELECT *
FROM User
LEFT JOIN (
SELECT *
FROM User
LEFT JOIN Label_User
ON id = user_id
WHERE label_id IN (SELECT id FROM Labels WHERE label = "External" OR label ="Local")
)
USING (id, items)
when applied t sample data from your question - output as below
Row id items user_id label_id
1 1 A 1 4
2 1 A 1 3
3 2 B null null
4 3 C null null
Related
I have a table
id
repeat customer id
store
date
1
A
07-19-22
2
A
07-19-22
3
A
07-19-22
id
repeat customer id
store
date
1
B
07-19-22
2
B
07-19-22
3
1
B
07-19-22
4
B
07-19-22
and more tables from other store
The problem here is
all stores start with id 1
repeat customer have new id in id column and their original id is retained in repeat customer id column
I have to concatenated all the tables and also keep track of repeating customer for analytics. I have joined all tables using UNION ALL and also created a dummy id column using SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS NEW_ID, * FROM CTE, but I have no clue how to capture and assign value to repeat customer id such that I get the table as below
NEW_ID
id
new_repeat_customer_id
repeat customer id
store
date
1
1
A
07-19-22
2
2
A
07-19-22
3
3
A
07-19-22
4
1
B
07-19-22
5
2
B
07-19-22
6
3
4
1
B
07-19-22
7
4
B
07-19-22
The best way to incorporate it, would be to use Alphanumeric String as NEW_ID, and concat STORE and ID to create NEW_ID. For example A_000000001. In that way you can add similar STORE to REPEAT_CUSTOMER_ID as well.
So in this case, instead of NEW_ID=6, you would have NEW_ID=B_000000003 and REPEAT_CUSTOMER_ID would become B_000000001.
But in case that is not possible, you can use query like below to get the output
DB Fiddle Query
with CTE as
(
select * from STORE1
UNION ALL
select * from STORE2
)
,CTE2 as
(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS NEW_ID,t.* from CTE t)
,REPEAT_ID as
(select NEW_ID,ID,REPEAT_CUSTOMER_ID,STORE from CTE2 where REPEAT_CUSTOMER_ID is not null)
,REPEACT_CUSTOMER_ID as
(select c.NEW_ID as NEW_REPEAT_CUSTOMER_ID,r.NEW_ID
from REPEAT_ID r
left join CTE2 c
on c.ID=r.REPEAT_CUSTOMER_ID and c.STORE=r.STORE
)
select c.* , n.NEW_REPEAT_CUSTOMER_ID
from CTE2 c
left join REPEACT_CUSTOMER_ID n
on c.NEW_ID=n.NEW_ID
https://dbfiddle.uk/?rdbms=sqlserver_2014&fiddle=cbe63994b10f9e3b0eff53b0c89d463a
SO basically you have to separate rows where REPEATE customer is present and join it with main table query.
I'm having some sort of a blank about how to do this in SQL.
Consider this reprex in R
set.seed(123)
data.frame(ID = (sample(c(1:5), 10, replace = T)),
status = (sample(c("yes", "no"), 10, replace = T)),
amount = (sample(seq(1,50,0.01),10)))
which gives out this table
ID status amount
1 3 no 29.87
2 3 yes 26.66
3 2 yes 15.49
4 2 yes 18.89
5 3 yes 44.06
6 5 no 30.79
7 4 yes 17.13
8 1 yes 6.54
9 2 yes 45.68
10 3 yes 12.66
I need to find two SQL queries.
One where I select the ID's that only have status of 'NO'
meaning ID 5.
and
One where I select the ID's that match both conditions, meaning ID 3
I have a query for both but I'm almost sure it's not correct so any lead is more than welcome.
Thanks
One where I select the ID's that only have status of 'NO' meaning ID 5.
select id from your_table where status='no' and id not in (select id from
your_table where status='yes')
One where I select the ID's that match both conditions, meaning ID 3
select id from your_table where status='no' and id in (select id from
your_table where status='yes')
At last I think you are expecting ids which do not match these conditions. so UNION both queries and get ids of your table which not exists after UNION
select id from your_table where id not in (
select id from your_table where status='no' and id not in
(select id from your_table where status='yes')
union all
select id from your_table where status='no' and id in
(select id from your_table where status='yes')
)
Table Structure First
users table
id
1
2
3
sites table
id
1
2
site_memberships table
site_id
user_id
created_on
1
1
1
1
1
2
1
1
3
2
1
1
2
1
2
1
2
2
1
2
3
Assuming higher the created_on number, latest the record
Expected Output
site_id
user_id
created_on
1
1
3
2
1
2
1
2
3
Expected output: I need latest record for each user for each site membership.
Tried the following query, but this does not seem to work.
select * from users inner join
(
SELECT ROW_NUMBER () OVER (
PARTITION BY sm.user_id,
sm.created_on
), sm.*
from site_memberships sm
inner join sites s on sm.site_id=s.id
) site_memberships
ON site_memberships.user_id = users.user_id where row_number=1```
I think you have overcomplicated the problem you want to solve.
You seem to want aggregation:
select site_id, user_id, max(created_on)
from site_memberships sm
group by site_id, user_id;
If you had additional columns that you wanted, you could use distinct on instead:
select distinct on (site_id, user_id) sm.*
from site_memberships sm
order by site_id, user_id, created_on desc;
I have a table like in example below.
SQL> select * from test;
ID PARENT_ID NAME
1 1 A
2 1 B
3 2 A
4 2 B
5 3 A
6 3 B
7 3 C
8 4 A
What I need is to get all unique subsets of names ((A,B), (A,B,C), (A)) or exclude duplicate subsets. You can see that (A,B) is twice there, one for PARENT_ID=1 and one for 2.
I want to exclude such duplicates:
ID PARENT_ID NAME
1 1 A
2 1 B
5 3 A
6 3 B
7 3 C
8 4 A
You can use DISTINCT to only return different values.
e.g.
SELECT DISTINCT GROUP_CONCAT(NAME SEPARATOR ',') as subsets
FROM TABLE_1
GROUP BY PARENT_ID;
SQL Fiddle
I have used 'group_concat' assuming you are using 'Mysql'. The equivalent function in Oracle is 'listagg()'. you can see it in action here in SQL fiddle
Here is the solution:-
Select a.* from
test a
inner join
(
Select nm, min(parent_id) as p_id
from
(
Select Parent_id, group_concat(NAME) as nm
from test
group by Parent_ID
) a
group by nm
)b
on a.Parent_id=b.p_id
order by parent_id, name
Here I have some articles:
id text group_id source_id
1 t1 1 1
2 t2 1 1
3 t3 2 2
4 t4 3 4
So I want to have records in result ordered by created_at column (it exists, but I didn't show it in table) and having distinct group id, such as that:
id text group_id source_id
1 t1 1 1
3 t3 2 2
4 t4 3 4
Also, I should be able to filter result with source_id.
I'm stuck with this question for two days and don't even know how to start solve problem.
Assuming you want the minimum values of the non-duplicated columns, try:
select min(id) as id,
min(text) as text,
group_id,
source_id,
min(created_at) as created_at
from articles
where source_id = #your_parameter_value
group by group_id,
source_id
order by 5
Select * from
(Select * from articles
Order by group_id, id) x
Group by group_id