I can't understand why there is the difference in results between two queries:
1) select distinct sum(lot.lot_size) over (partition by lot.detail_id) buying_size, lot.f_detail_id
from buying
join lot on lot.lot_id = buying.lot_id
where exists (select 1
from selling s
join buying b on b.buying_id = s.buying_buying_id
where s.deal_id = 123456
and buying.deal_id = b.deal_id
and s.selling_detailid = buying.buying_detailid)
and buying.buying_status <> 'Canceled'
result:
|buying_size|f_detail_id |
|-----------|------------|
| 105 | 1 |
| 200 | 2 |
| 75 | 3 |
| 225 | 4 |
| 300 | 5 |
2) select distinct *
from (select sum(lot.lot_size) over (partition by lot.detail_id) buying_size, lot.f_detail_id
from buying
join lot on lot.lot_id = buying.lot_id
where exists (select 1
from selling s
join buying b on b.buying_id = s.buying_buying_id
where s.deal_id = 123456
and buying.deal_id = b.deal_id
and s.selling_detailid = buying.buying_detailid)
and buying.buying_status <> 'Canceled')
result:
|buying_size|f_detail_id |
|-----------|------------|
| 105 | 1 |
| 200 | 2 |
| 75 | 3 |
| 225 | 4 |
| 150 | 5 |
I know that this query might be written by another way with using GROUP BY but the main thing I'm interested in is why the result of analytical function depends on DISTINCT.
Related
I have the following three tables:
Permission
| PermissionId | PermissionName |
+--------------+----------------+
| 1 | A |
| 2 | B |
| 3 | C |
| 100 | D |
Group
| GroupId | GroupLevel | GroupName |
+---------+------------+----------------------+
| 1 | 0 | System Administrator |
| 7 | 0 | Test Group 100 |
| 8 | 20 | Test Group 200 |
| 9 | 20 | test |
| 10 | 50 | TestGroup01 |
| 11 | 51 | TestUser02 |
| 12 | 52 | TestUser03 |
GroupPermission
| GroupPermissionId | FkGroupId | FkPermissionId |
+-------------------+-----------+----------------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 1 | 4 |
I need to insert records into GroupPermission table, if table Group, GroupLevel column have 0
then I need to take its GroupId and need to insert values to GroupPermission table as that particular id and 100.
In order to above sample table records, I need to insert the following two records to GroupPermission table,
| FkGroupId | FkPermissionId |
+-----------+----------------+
| 1 | 100 |
| 7 | 100 |
How can I do it
This question is not very clear and I can only assume the value 100 is a static value and that you don't actually have foreign keys as the names of the columns imply. Also, you really should avoid reserved words like "Group" for object names. It makes things more difficult and confusing.
The simple version of your insert might look like this.
insert GroupPermission
(
FkGroupId
, FkPermissionId
)
select g.GroupId
, 100
from [Group] g
where g.GroupLevel = 0
--EDIT--
Since you want to only insert those rows that don't already exist you can use NOT EXISTS like this.
select g.GroupId
, 100
from [Group] g
where g.GroupLevel = 0
AND NOT EXISTS
(
select *
from GroupPermission gp
where gp.FkGroupId = g.GroupId
and g.FkPermissionId = 100
)
Or you could use a left join like this.
select g.GroupId
, 100
from [Group] g
left join GroupPermission gp on gp.FkGroupId = g.GroupId
and gp.FkPermissionId = 100
where g.GroupLevel = 0
and gp.FkGroupId is null
I've 3 tables as under
MERCHANDISE
+-----------+-----------+---------------+
| MERCH_NUM | MERCH_DIV | MERCH_SUB_DIV |
+-----------+-----------+---------------+
| 1 | car | awd |
| 1 | car | awd |
| 2 | bike | 1kcc |
| 3 | cycle | hybrid |
| 3 | cycle | city |
| 4 | moped | fixie |
+-----------+-----------+---------------+
PRIORITY
+----------+-----------+---------+---------+------------+------------+---------------+
| CUST_NUM | SALES_NUM | DOC_NUM | BALANCE | PRIORITY_1 | PRIORITY_2 | PRIORITY_CODE |
+----------+-----------+---------+---------+------------+------------+---------------+
| 90 | 1000 | 10 | 23 | 1 | 6 | NO |
| 91 | 1001 | 20 | 32 | 3 | 7 | PRI |
| 92 | 1002 | 30 | 11 | 2 | 8 | LATE |
| 93 | 1003 | 40 | 22 | 5 | 9 | 1MON |
+----------+-----------+---------+---------+------------+------------+---------------+
ORDER
+----------+-----------+---------+---------+-----------+-----------+
| CUST_NUM | SALES_NUM | DOC_NUM | COUNTRY | MERCH_NUM | MERCH_DIV |
+----------+-----------+---------+---------+-----------+-----------+
| 90 | 1000 | 10 | INDIA | 1 | car |
| 91 | 1001 | 20 | CHINA | 2 | bike |
| 92 | 1002 | 30 | USA | 3 | cycle |
| 93 | 1003 | 40 | UK | 4 | moped |
+----------+-----------+---------+---------+-----------+-----------+
I want to join the left joined table from the last two tables with the first one such that the MERCH_SUB_DIV 'awd' appears only once for each unique combination of merch_num and merch_div
the code I came up with is as under, but I'm not sure how do I eliminate the duplicate row just for the awd
select
ROW#, MERCH.MERCH_NUMBER, ORDPRI.MERCH_NUMBER, ORDPRI.CUST_NUM,
BALANCE, SALES_NUM, ITEM_NUM, RANK, PRIORITY_1
from (
select
ROW_NUMBER() OVER(
PARTITION BY ORD.DOC_NUM, ORD.ITEM_NUM
ORDER BY ORD.DOC_NUM, ORD.ITEM_NUM ASC
) AS Row#,
ORD.CUST_NUM, PRI.CUST_NUM, ORD.MERCH_NUM, ORD.MERCH_DIV, PRI.BALANCE,
pri.DOC_NUM, pri.SALES_NUM, pri.PRIORITY_1, pri.PRIORITY_2
from ORDER as ORD
left join PRIORITY as PRI on ORD.DOC_NUM = PRI.DOC_NUM
and ORD.SALES_NUMBER = PRI.SALES_NUM
where country_name in ('USA', ‘INDIA’)
) as ORDPRI
left join MERCHANDISE as MERCH on ORDPRI.DIV = MERCH.DIV
and ORDPRI.MERCH_NUM = MERCH.MERCH_NUM
You have to use 'DISTINCT' keyword to get unique values, but if your 'Priority table' & 'Order table' contains different values for Same MERCH_NUM then the final result contains the repetation of the 'MERCH_NUM'.
SELECT DISTINCT M.MERCH_NUMBER, O.MERCH_NUMBER, O.CUST_NUM, BALANCE, SALES_NUM,ITEM_NUM,RANK,PRIORITY_1
FROM priority_table P
LEFT JOIN order_table O ON P.CUST_NUM = O.CUST_NUM AND P.SALES_NUM=O.SALES_NUM AND P.DOC_NUM = O.DOC_NUM
LEFT JOIN merchandise_table M ON M.MERCH_NUM = O.MERCH_NUM
A way around can be to add one new Row_Number() in the outermost query having Partition by MERCH_SUB_DIV + all the columns in the final list and then filter final results based on the New Row_Number() . Follows a pseudo code that might help:
select
-- All expected columns in final result except the newRow#
ROW#, MERCH_NUM, CUST_NUM,
BALANCE, SALES_NUM, PRIORITY_1
from (
select
ROW#,
-- the new row number includes all column you want to show in final result
row_number() over ( PARTITION BY MERCH.MERCH_SUB_DIV ,
MERCH.MERCH_NUM, ORDPRI.MERCH_NUM, ORDPRI.CUST_NUM,
BALANCE, SALES_NUM, PRIORITY_1
order by (select 1 )) as newRow# ,
MERCH.MERCH_NUM, ORDPRI.CUST_NUM,
BALANCE, SALES_NUM, PRIORITY_1
from (
-- main query goes here
select
ROW_NUMBER() OVER(
PARTITION BY ORD.DOC_NUM --, ORD.ITEM_NUM
ORDER BY ORD.DOC_NUM ASC --, ORD.ITEM_NUM
) AS Row#,
ORD.CUST_NUM, ORD.MERCH_NUM, ORD.MERCH_DIV as DIV, PRI.BALANCE,
pri.DOC_NUM, pri.SALES_NUM, pri.PRIORITY_1, pri.PRIORITY_2
from #ORDER as ORD
left join #PRIORITY as PRI on ORD.DOC_NUM = PRI.DOC_NUM
and ORD.SALES_NUMBER = PRI.SALES_NUM
where country_name in ('USA', 'INDIA')
) as ORDPRI
left join #MERCHANDISE as MERCH on ORDPRI.DIV = MERCH.DIV
and ORDPRI.MERCH_NUM = MERCH.MERCH_NUM
) as T
-- final filter to get distinct values
where newRow# = 1
Sample code here .. Hope this helps!!
I have this query:
SELECT
tbl_ord.ord.table,
tbl_ord.ord.n_ord,
player.confirm
FROM
tbl_ord
INNER JOIN
tbl_players ON tbl_ord.player_id = tbl_players.player_ids
WHERE
id_shop = 3
ORDER BY
n_ord ASC
result:
+-----------+-------+---------+
| ord_table | n_ord | confirm |
+-----------+-------+---------+
| 10 | 2 | 1 |
| 10 | 2 | 0 |
| 8 | 3 | 1 |
| 8 | 3 | 1 |
| 4 | 5 | 1 |
| 4 | 5 | 1 |
+-----------+-------+---------+
I'd like get only result with all confirmed users by group-by on ord_table
+-----------+-------+---------+
| ord_table | n_ord | confirm |
+-----------+-------+---------+
| 4 | 5 | 1 |
| 8 | 3 | 1 |
+-----------+-------+---------+
Thanks!
You can probably use not exists:
SELECT o.ord_table, o.ord.n_ord,
1 as confirm
FROM tbl_ord o
WHERE NOT EXISTS (SELECT 1
FROM tbl_players p
WHERE o.player_id = p.player_ids AND
?.id_shop = 3 AND -- not sure what table this comes from
p.confirm = 0
)
ORDER BY o.n_ord ASC;
The advantage of this approach is that it avoids aggregating at the outer level. This, in turn, means that it can make better use of indexes, including using an index to potentially avoid sorting.
Use HAVING:
SELECT
tbl_ord.ord.table,
tbl_ord.ord.n_ord,
MIN(player.confirm)
FROM tbl_ord
INNER JOIN tbl_players ON tbl_ord.player_id = tbl_players.player_ids
WHERE id_shop = 3
GROUP BY tbl_ord.ord.table, tbl_ord.ord.n_ord,
HAVING MIN(player.confirm) = 1
ORDER BY n_ord ASC
I have four tables:
mls_category
points_matrix
mls_entry
bonus_points
My first table (mls_category) is like below:
*--------------------------------*
| cat_no | store_id | cat_value |
*--------------------------------*
| 10 | 101 | 1 |
| 11 | 101 | 4 |
*--------------------------------*
My second table (points_matrix) is like below:
*----------------------------------------------------*
| pm_no | store_id | value_per_point | maxpoint |
*----------------------------------------------------*
| 1 | 101 | 1 | 10 |
| 2 | 101 | 2 | 50 |
| 3 | 101 | 3 | 80 |
*----------------------------------------------------*
My third table (mls_entry) is like below:
*-------------------------------------------*
| user_id | category | distance | status |
*-------------------------------------------*
| 1 | 10 | 20 | approved |
| 1 | 10 | 30 | approved |
| 1 | 11 | 40 | approved |
*-------------------------------------------*
My fourth table (bonus_points) is like below:
*--------------------------------------------*
| user_id | store_id | bonus_points | type |
*--------------------------------------------*
| 1 | 101 | 200 | fixed |
| 2 | 102 | 300 | fixed |
| 1 | 103 | 4 | per |
*--------------------------------------------*
Now, I want to add bonus points value into the sum of total distance according to the store_id, user_id and type.
I am using the following code to get total distance:
SELECT MIN(b.value_per_point) * d.total_distance FROM points_matrix b
JOIN
(
SELECT store_id, sum(t1.totald/c.cat_value) as total_distance FROM mls_category c
JOIN
(
SELECT SUM(distance) totald, user_id, category FROM mls_entry
WHERE user_id= 1 AND status = 'approved' GROUP BY user_id, category
) t1 ON c.cat_no = t1.category
) d ON b.store_id = d.store_id AND b.maxpoint >= d.total_distance
The above code is correct to calculate value, now I want to JOIN my fourth table.
This gives me sum (60*3 = 180) as total value. Now, I want (60+200)*3 = 780 for user 1 and store id 101 and value is fixed.
i think your query will be like below
SELECT Max(b.value_per_point)*( max(d.total_distance)+max(bonus_points)) FROM mls_point_matrix b
JOIN
(
SELECT store_id, sum(t1.totald/c.cat_value) as total_distance FROM mls_category c
JOIN
(
SELECT SUM(distance) totald, user_id, category FROM mls_entry
WHERE user_id= 1 AND status = 'approved' GROUP BY user_id, category
) t1 ON c.cat_no = t1.category group by store_id
) d ON b.store_id = d.store_id inner join bonus_points bp on bp.store_id=d.store_id
DEMO fiddle
I have found a few similar questions to this on SO but nothing which applies to my situation.
I have a large dataset with hundreds of millions of rows in Table 1 and am looking for the most efficient way to run the following query. I am using Google BigQuery but I think this is a general SQL question applicable to any DBMS?
I need to apply an owner to every row in Table 1. I want to join in the following priority:
1: if item_id matches an identifier in Table 2
2: if no item_id matches try match on item_name
3: if no item_id or item_name matches try match on item_division
4: if no item_division matches, return null
Table 1 - Datapoints:
| id | item_id | item_name | item_division | units | revenue
|----|---------|-----------|---------------|-------|---------
| 1 | xyz | pen | UK | 10 | 100
| 2 | pqr | cat | US | 15 | 120
| 3 | asd | dog | US | 12 | 105
| 4 | xcv | hat | UK | 11 | 140
| 5 | bnm | cow | UK | 14 | 150
Table 2 - Identifiers:
| id | type | code | owner |
|----|---------|-----------|-------|
| 1 | id | xyz | bob |
| 2 | name | cat | dave |
| 3 | division| UK | alice |
| 4 | name | pen | erica |
| 5 | id | xcv | fred |
Desired output:
| id | item_id | item_name | item_division | units | revenue | owner |
|----|---------|-----------|---------------|-------|---------|-------|
| 1 | xyz | pen | UK | 10 | 100 | bob | <- id
| 2 | pqr | cat | US | 15 | 120 | dave | <- code
| 3 | asd | dog | US | 12 | 105 | null | <- none
| 4 | xcv | hat | UK | 11 | 140 | fred | <- id
| 5 | bnm | cow | UK | 14 | 150 | alice | <- division
My attempts so far have involved multiple joining the table onto itself and I fear it is becoming hugely inefficient.
Any help much appreciated.
Another option for BigQuery Standard SQL
#standardSQL
SELECT ARRAY_AGG(a)[OFFSET(0)].*,
ARRAY_AGG(owner
ORDER BY CASE
WHEN type = 'id' THEN 1
WHEN type = 'name' THEN 2
WHEN type = 'division' THEN 3
END
LIMIT 1
)[OFFSET(0)] owner
FROM Datapoints a
JOIN Identifiers b
ON (a.item_id = b.code AND b.type = 'id')
OR (a.item_name = b.code AND b.type = 'name')
OR (a.item_division = b.code AND b.type = 'division')
GROUP BY a.id
ORDER BY a.id
It leaves out entries which k=have no owners - like in below result (id=3 is out as it has no owner)
Row id item_id item_name item_division units revenue owner
1 1 xyz pen UK 10 100 bob
2 2 pqr cat US 15 120 dave
3 4 xcv hat UK 11 140 fred
4 5 bnm cow UK 14 150 alice
I am using the following query (thanks #Barmar) but want to know if there is a more efficient way in Google BigQuery:
SELECT a.*, COALESCE(b.owner,c.owner,d.owner) owner FROM datapoints a
LEFT JOIN identifiers b on a.item_id = b.code and b.type = 'id'
LEFT JOIN identifiers c on a.item_name = c.code and c.type = 'name'
LEFT JOIN identifiers d on a.item_division = d.code and d.type = 'division'
I'm not sure if BigQuery optimizes today a query like this - but at least you would be writing a query that gives strong hints to not run the subqueries when not needed:
#standardSQL
SELECT COALESCE(
null
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.login=a.user)
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.id = SAFE_CAST(user AS INT64))
)
FROM (SELECT '15229281' user) a
4.2s elapsed, 683 GB processed
{"action":"started"}
For example, the following query took a long time to run, but BigQuery could optimize its execution massively in the future (depending on how frequently users needed an operation like this):
#standardSQL
SELECT COALESCE(
"hello"
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.login=a.user)
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.id = SAFE_CAST(user AS INT64))
)
FROM (SELECT actor.login user FROM `githubarchive.year.2016` LIMIT 10) a
114.7s elapsed, 683 GB processed
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello