I am have a problem understanding the logic of a SELECT example I have seen lately.
There is two tables: a protein table containing a protein name variable and a interaction table containing the interaction type and the ids of the two proteins involved in that interaction.
interaction table
interactionType
protID1
protID2
protein table
protID
protName
The SELECT example is supposed to permit to obtain the names of all the proteins that have a "coiled-coil" interactionType. Here is the example:
SELECT a.protName
FROM protein a,
interaction b
WHERE (a.protID = b.protID1 AND a.protID = b.protID2) OR
(a.protID = b.protID2 AND a.protID = b.protID1) AND b.interactionType = ‘coiled-coil’;
What is the necessity for having this command "(a.protID = b.protID1 AND a.protID = b.protID2)" twice? Would not one instance suffice to obtain all the wanted proteins name?
We cannot know what the author of the query intended. We see they are using
WHERE (a.protID = b.protID1 AND a.protID = b.protID2) OR
(a.protID = b.protID2 AND a.protID = b.protID1)
But this is merely one condition stated twice (namely that a.protID matches both b.protID1 and b.protID2.
This would hence only get rows where b.protID1 = b.protID2. I find it unlikely that this was intended, for in that case the author could have made this condition explicit. The interaction table refers to two proteins and the OR makes me think the query was supposed to either return all proteins that are involved in a coiled-coil interaction or only those where the interaction involved another protein, too. I.e.:
WHERE (a.protID = b.protID1 OR a.protID = b.protID2)
AND b.interactionType = 'coiled-coil'
or
WHERE
(
(a.protID = b.protID1 AND a.protID <> b.protID2)
OR
(a.protID = b.protID2 AND a.protID <> b.protID1)
)
AND b.interactionType = 'coiled-coil'
Both conditions could easily lead to duplicates in the result. I am just guessing here of course, but the author may have intended this:
SELECT protname
FROM protein p
WHERE EXISTS
(
SELECT null
FROM interaction i
WHERE p.protid in (i.protid1, i.protid2)
AND i.interactionType = 'coiled-coil'
);
or this:
SELECT protname
FROM protein p
WHERE EXISTS
(
SELECT null
FROM interaction i
WHERE p.protid in (i.protid1, i.protid2)
AND i.protid1 <> i.protid2
AND i.interactionType = 'coiled-coil'
);
No as the second conditio is already met by the first
so correctly you could write it more clearly
SELECT a.protName FROM protein a, interaction b WHERE
(a.protID = b.protID1 AND a.protID = b.protID2) OR
((a.protID = b.protID2 AND a.protID = b.protID1) AND
b.interactionType = ‘coiled-coil’);
so the query would return all protein that fit (a.protID = b.protID1 AND a.protID = b.protID2) that both proteins are the same, and not oly the ones with b.interactionType = ‘coiled-coil’,see the example.
For that you would delete the first condition completely.
On another note JOIN is around for 30 years and is sql standard, so you should switch to it always
CREATE TABLE interaction(
interactionType varchar(20),
protID1 int,
protID2 int);
CREATE tABLE protein (
protID int,
protName varchar(25));
INSERT INTO interaction VALUES ('coiled-coil', 1,1),('B', 3,3)
Records: 2 Duplicates: 0 Warnings: 0
INSERt INTO protein VALUES (1,'A1'),(2,'B1'),(3,'C1')
Records: 3 Duplicates: 0 Warnings: 0
SELECT a.protName FROM protein a, interaction b WHERE
(a.protID = b.protID1 AND a.protID = b.protID2) OR
(a.protID = b.protID2 AND a.protID = b.protID1) AND
b.interactionType = 'coiled-coil';
protName
A1
C1
fiddle
There are many things that are funky about that query.
(a.protID = b.protID1 AND a.protID = b.protID2) and (a.protID = b.protID2 AND a.protID = b.protID1) are indeed the same.
If we name the duplicated condition (a.protID = b.protID1 AND a.protID = b.protID2) / (a.protID = b.protID2 AND a.protID = b.protID1) : Condition1 and the last condition b.interactionType = 'coiled-coil' : Condition2, we can rephrase, for readability sake :
Condition1 OR Condition1 AND Condition2
Which is equivalent to Condition1 OR (Condition1 AND Condition2), since AND takes precedence over OR
If Condition1 is false, then Condition1 AND Condition2 will be evaluated (and will result as false, since Condition1 is false
If Condition1 is true, then Condition1 AND Condition2 won't be evaluated
Condition1 OR Condition1 AND Condition2 is then equivalent to Condition1, which means that
(a.protID = b.protID1 AND a.protID = b.protID2) OR
(a.protID = b.protID2 AND a.protID = b.protID1) AND
b.interactionType = 'coiled-coil'
Is equivalent to
a.protID = b.protID1 AND a.protID = b.protID2
The way the join is written SELECT ... FROM table1, table2 WHERE table1.id = table2.table1id is deprecated since 1992
Prefer using "proper" joins : SELECT ... FROM table1 INNER JOIN table2 ON table1.id = table2.table1id
I would rephrase the query such as :
SELECT a.protName
FROM protein a
INNER JOIN interaction b
ON a.protID = b.protID1 AND a.protID = b.protID2
WHERE b.interactionType = 'coiled-coil';
I'm not sure I'm understanding the problem correctly. It seems there are two relationships between those two tables. Namely:
interaction.prodID1 -> protein.protID
interaction.prodID2 -> protein.protID
If this is the case, then the query should include two joins -- one for each relationship. It should take the form:
select
a.protID as protein1,
b.protID as protein2
from interaction i
join protein a on a.protID = i.protID1
join protein b on b.protID = i.protID2
where i.interactionType = 'coiled-cloi'
SELECT a.protName
FROM protein a,
interaction b
WHERE (a.protID = b.protID1 AND a.protID = b.protID2) OR
(a.protID = b.protID2 AND a.protID = b.protID1) AND b.interactionType = ‘coiled-coil’;
The second WHERE clause row is already covered by the first row.
I.e. you can simply do:
SELECT a.protName
FROM protein a,
interaction b
WHERE a.protID = b.protID1 AND a.protID = b.protID2;
Which re-written using modern, explicit JOIN syntax, and also having table aliases that make sense, will be:
SELECT a.protName
FROM protein p
JOIN interaction i
ON p.protID = i.protID1 AND p.protID = i.protID2;
https://dbfiddle.uk/yu2Wt3gA
However, this is just the given query written in a less complex way. I'd guess #The Impaler has given the correct query.
Related
I'm trying to make my data set smaller, where I'm currently bringing in data from 8 different tables. In order to do this, I'd like to use the WHERE clause to filter out unnecessary data, but I'm not sure how to do that for all 8 tables. This is my current query:
--GroupA first, to join the hits and sessions tables
SELECT
GroupA_hits.session_id, GroupA_hits.hits_eventInfo_eventCategory, GroupA_hits.hits_eventInfo_eventAction, GroupA_hits.hits_eventInfo_eventLabel, GroupA_hits.cd126_hit_placeholder,
GroupA_sessions.session_id, GroupA_sessions.userId, GroupA_sessions.fullVisitorId, GroupA_sessions.visitNumber, GroupA_sessions.date,
GroupB_hits.session_id, GroupB_hits.hits_eventInfo_eventCategory, GroupB_hits.hits_eventInfo_eventAction, GroupB_hits.hits_eventInfo_eventLabel, GroupB_hits.cd126_hit_placeholder,
GroupB_sessions.session_id, GroupB_sessions.userId, GroupB_sessions.fullVisitorId, GroupB_sessions.visitNumber, GroupB_sessions.date,
GroupC_hits.session_id, GroupC_hits.hits_eventInfo_eventCategory, GroupC_hits.hits_eventInfo_eventAction, GroupC_hits.hits_eventInfo_eventLabel, GroupC_hits.cd126_hit_placeholder,
GroupC_sessions.session_id, GroupC_sessions.userId, GroupC_sessions.fullVisitorId, GroupC_sessions.visitNumber, GroupC_sessions.date,
GroupD_hits.session_id, GroupD_hits.hits_eventInfo_eventCategory, GroupD_hits.hits_eventInfo_eventAction, GroupD_hits.hits_eventInfo_eventLabel, GroupD_hits.cd126_hit_placeholder,
GroupD_sessions.session_id, GroupD_sessions.userId, GroupD_sessions.fullVisitorId, GroupD_sessions.visitNumber, GroupD_sessions.date
FROM `GroupA-bigquery.170369603.ga_flat_hits_202104*` GroupA_hits
LEFT JOIN `GroupA-bigquery.170369603.ga_flat_sessions_202104*` GroupA_sessions
ON (
GroupA_hits.session_id = GroupA_sessions.session_id
)
--Next, join GroupB to GroupA
LEFT JOIN `GroupB-bigquery.170359716.ga_flat_hits_202104*` GroupB_hits
ON (
GroupB_hits.session_id = GroupA_hits.session_id
)
LEFT JOIN `GroupB-bigquery.170359716.ga_flat_sessions_202104*` GroupB_sessions
ON (
GroupB_sessions.session_id = GroupA_sessions.session_id
)
--Now, join GroupC to GroupA
LEFT JOIN `GroupC-bigquery.170726426.ga_flat_hits_202104*` GroupC_hits
ON (
GroupC_hits.session_id = GroupA_hits.session_id
)
LEFT JOIN `GroupC-bigquery.170726426.ga_flat_sessions_202104*` GroupC_sessions
ON (
GroupC_sessions.session_id = GroupA_sessions.session_id
)
--Next, join GroupD to GroupA
LEFT JOIN `GroupD-bigquery.170374765.ga_flat_hits_202104*` GroupD_hits
ON (
GroupD_hits.session_id = GroupA_hits.session_id
)
LEFT JOIN `GroupD-bigquery.170374765.ga_flat_sessions_202104*` GroupD_sessions
ON (
GroupD_sessions.session_id = GroupA_sessions.session_id
)
I would like to also include the below clauses, these are all the same column names in the different _hits tables. This is what I've tried, but I get a "This query returned no results" back. I think it's because the way this query is written, BigQuery is looking for a row where all of these exist in one hit (is my assumption), which, there won't be any. But I'd like it to look through these four tables and grab all matching rows.
WHERE GroupA_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupB_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupC_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupD_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupA_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupB_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupC_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupD_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupA_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupB_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupC_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupD_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupA_hits.cd126_hit_placeholder Is Not NULL
AND GroupB_hits.cd126_hit_placeholder Is Not NULL
AND GroupC_hits.cd126_hit_placeholder Is Not NULL
AND GroupD_hits.cd126_hit_placeholder Is Not NULL
Consider moving the WHERE conditions into ON clauses to filter those tables during the LEFT JOIN operations:
...
FROM `GroupA-bigquery.170369603.ga_flat_hits_202104*` GroupA_hits
LEFT JOIN `GroupA-bigquery.170369603.ga_flat_sessions_202104*` GroupA_sessions
ON GroupA_hits.session_id = GroupA_sessions.session_id
AND GroupA_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupA_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupA_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupA_hits.cd126_hit_placeholder Is Not NULL
--Next, join GroupB to GroupA
LEFT JOIN `GroupB-bigquery.170359716.ga_flat_hits_202104*` GroupB_hits
ON GroupB_hits.session_id = GroupA_hits.session_id
AND GroupB_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupB_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupB_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupB_hits.cd126_hit_placeholder Is Not NULL
LEFT JOIN `GroupB-bigquery.170359716.ga_flat_sessions_202104*` GroupB_sessions
ON GroupB_sessions.session_id = GroupA_sessions.session_id
--Now, join GroupC to GroupA
LEFT JOIN `GroupC-bigquery.170726426.ga_flat_hits_202104*` GroupC_hits
ON GroupC_hits.session_id = GroupA_hits.session_id
AND GroupC_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupC_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupC_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupC_hits.cd126_hit_placeholder Is Not NULL
LEFT JOIN `GroupC-bigquery.170726426.ga_flat_sessions_202104*` GroupC_sessions
ON GroupC_sessions.session_id = GroupA_sessions.session_id
--Next, join GroupD to GroupA
LEFT JOIN `GroupD-bigquery.170374765.ga_flat_hits_202104*` GroupD_hits
ON GroupD_hits.session_id = GroupA_hits.session_id
AND GroupD_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupD_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupD_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupD_hits.cd126_hit_placeholder Is Not NULL
LEFT JOIN `GroupD-bigquery.170374765.ga_flat_sessions_202104*` GroupD_sessions
ON GroupD_sessions.session_id = GroupA_sessions.session_id
BigQuery is looking for a row where all of these exist in one hit (is my assumption), which, there won't be any.
Sounds like you want each option in an OR group, which could be further simplified like this:
WHERE
'rewards' IN (GroupA_hits.hits_eventInfo_eventCategory, GroupB_hits.hits_eventInfo_eventCategory, GroupC_hits.hits_eventInfo_eventCategory, GroupD_hits.hits_eventInfo_eventCategory)
AND 'redeem points confirm' IN (GroupA_hits.hits_eventInfo_eventAction, GroupB_hits.hits_eventInfo_eventAction, GroupC_hits.hits_eventInfo_eventAction, GroupD_hits.hits_eventInfo_eventAction)
AND 'gas savings' IN (GroupA_hits.hits_eventInfo_eventLabel, GroupB_hits.hits_eventInfo_eventLabel, GroupC_hits.hits_eventInfo_eventLabel, GroupD_hits.hits_eventInfo_eventLabel)
AND COALESCE(GroupA_hits.cd126_hit_placeholder, GroupB_hits.cd126_hit_placeholder, GroupC_hits.cd126_hit_placeholder, GroupD_hits.cd126_hit_placeholder) Is Not NULL
Note that I'm making some assumptions about how BigQuery handles ANSI standard SQL, as I'm not a regular BigQuery user.
The code below produces the left block of data
It gives me the entire subset however I only need to see: For the same SATNR-VKORG combination where ALL Variant_Status is I2. If any of the Variant_Status is NOT I2, then do not show the entire SATNR-VKORG combination. My final output should only be the first two lines in the data below whereas all the other rows do not qualify.
I can't figure out how to do this but I'm thinking to use a count function like the right block to concat the SATNR-VKORG and SATNR-VKORG-Variant_Status and do a count of each unique combination. For the same SATNR-VKORG combination, if the two counts are identical then that means display else do not show. Even that I don't know how to code it, anyone help or have any better idea?
SELECT TOP (1000)
MARA.MATNR,
MARA.SATNR,
MARA.ATTYP,
MARA.MTART,
MARA.MSTAE,
MARA.LVORM,
MVKE.VMSTA as Variant_Status,
MVKE.VTWEG, mvke.VKORG, MVKE2.
VMSTA as Generic_Status,
MVKE2.VTWEG, MVKE2.VKORG,
mara.satnr + mvke.vkorg as concated
from [dgSAP_PRD].dbo.MARA AS MARA
JOIN [dgSAP_PRD].dbo.MVKE AS MVKE ON MARA.MATNR = MVKE.MATNR
JOIN [dgSAP_PRD].dbo.MARA AS MARA2 ON MARA.SATNR = MARA2.MATNR
JOIN [dgSAP_PRD].dbo.MVKE AS MVKE2 ON MARA2.MATNR = MVKE2.MATNR
WHERE MARA.MTART != 'ZODE'
AND MARA.ATTYP in (02)
AND MARA.LVORM = ''
AND MVKE2.VTWEG = '34'
AND MVKE.VTWEG = '34'
AND MVKE.VKORG=MVKE2.VKORG
and mvke2.vmsta != 'I2'
ORDER BY MARA.SATNR,MVKE.VKORG,MVKE2.VKORG, MARA.MATNR
WITH cte AS
(
SELECT TOP (1000)
MARA.MATNR,
MARA.SATNR,
MARA.ATTYP,
MARA.MTART,
MARA.MSTAE,
MARA.LVORM,
MVKE.VMSTA as Variant_Status,
MVKE.VTWEG, mvke.VKORG, MVKE2.
VMSTA as Generic_Status,
MVKE2.VTWEG, MVKE2.VKORG,
mara.satnr + mvke.vkorg as concated,
SUM(CASE WHEN Variant_Status <> 'I2' THEN 1 ELSE 0 END) OVER (PARTITION BY SATNR, VKORG) marker
from [dgSAP_PRD].dbo.MARA AS MARA
JOIN [dgSAP_PRD].dbo.MVKE AS MVKE ON MARA.MATNR = MVKE.MATNR
JOIN [dgSAP_PRD].dbo.MARA AS MARA2 ON MARA.SATNR = MARA2.MATNR
JOIN [dgSAP_PRD].dbo.MVKE AS MVKE2 ON MARA2.MATNR = MVKE2.MATNR
WHERE MARA.MTART <> 'ZODE'
AND MARA.ATTYP in (02)
AND MARA.LVORM = ''
AND MVKE2.VTWEG = '34'
AND MVKE.VTWEG = '34'
AND MVKE.VKORG=MVKE2.VKORG
and mvke2.vmsta <> 'I2'
)
SELECT *
FROM cte
WHERE marker = 0
ORDER BY SATNR, VKORG, VKORG, MATNR
I want to update table prod_replay_out based on subquery results in Postgres. However, subquery returns multiple rows but I want to skip those rows and update table based on single rows return by subquery.
I have referred link Subquery returns more than 1 row error but max() function will not apply for my expected results. Could you please provide me some suggestion to modify query? Thank you.
prod_replay_out has following columns:
seller, buyer, sender_tag, seller_tag, buyer_tag, isin, quantity, in_msg_time, msg_type, cdsx_time
prod_replay_in has following columns:
seller, buyer, sender_tag, seller_tag, buyer_tag, isin, quantity, msg_type, cdsx_time
What I have tried?
Please find below update sql:
Update sql:
update prod_replay_out O
set in_msg_id =
(Select id
From prod_replay_in I
Where I.msg_type = 'CDST010'
and I.seller = O.seller
and I.buyer = O.buyer
and I.sender_tag = O.sender_tag
and I.seller_tag = O.seller_tag
and I.buyer_tag = O.buyer_tag
and I.isin = O.isin
and I.quantity = O.quantity
and I.cdsx_time = O.in_msg_time
and I.cdsx_time::text like '2020-05-12%'
)
where O.msg_type = 'CDST01C'
and O.cdsx_time::text like '2020-05-12%';
I have tried below solution. Is it the correct approach or is there any loophole?
update prod_replay_out O
set in_msg_id =
(Select id
From prod_replay_in I
Where I.msg_type = 'CDST010'
and I.seller = O.seller
and I.buyer = O.buyer
and I.sender_tag = O.sender_tag
and I.seller_tag = O.seller_tag
and I.buyer_tag = O.buyer_tag
and I.isin = O.isin
and I.quantity = O.quantity
and I.cdsx_time = O.in_msg_time
and I.cdsx_time::text like '2020-05-12%'
and 1 = (Select count(id)
From prod_replay_in I
Where I.msg_type = 'CDST010'
and I.seller = O.seller
and I.buyer = O.buyer
and I.sender_tag = O.sender_tag
and I.seller_tag = O.seller_tag
and I.buyer_tag = O.buyer_tag
and I.isin = O.isin
and I.quantity = O.quantity
and I.cdsx_time = O.in_msg_time
and I.cdsx_time::text like '2020-05-12%'
)
)
where O.msg_type = 'CDST01C'
and O.cdsx_time::text like '2020-05-12%';
Query
Most importantly, don't use a correlated subquery. It's the inferior tool for the job. Use a subquery in the FROM clause.
This only updates where a single matching candidate row is found in the source table (neither none nor multiple), and only where it actually changes the value:
UPDATE prod_replay_out o
SET in_msg_id = i.id
FROM (
SELECT i.id, i.seller, i.buyer, i.sender_tag, i.seller_tag, i.buyer_tag, i.isin, i.quantity, i.cdsx_time
FROM prod_replay_in i
WHERE i.msg_type = 'CDST010'
AND i.cdsx_time >= '2020-05-12' -- ① "sargable" expression
AND i.cdsx_time < '2020-05-13' -- ② don't cast to date, it's a valid timestamp literal
AND NOT EXISTS ( -- ③ EXISTS is typically faster than counting
SELECT FROM prod_replay_in x
WHERE x.id <> i.id -- ④ unique
AND (i.seller, i.buyer, i.sender_tag, i.seller_tag, i.buyer_tag, i.isin, i.quantity, i.cdsx_time) -- ⑤ short syntax
= (x.seller, x.buyer, x.sender_tag, x.seller_tag, x.buyer_tag, x.isin, x.quantity, x.cdsx_time)
)
) i
WHERE o.msg_type = 'CDST01C'
AND (i.seller, i.buyer, i.sender_tag, i.seller_tag, i.buyer_tag, i.isin, i.quantity, i.cdsx_time)
= (o.seller, o.buyer, o.sender_tag, o.seller_tag, o.buyer_tag, o.isin, o.quantity, o.in_msg_time) -- ⑥ o.cdsx_time?
-- AND o.cdsx_time >= '2020-05-12' -- ⑦ redundant
-- AND o.cdsx_time < '2020-05-13'
AND o.in_msg_id IS DISTINCT FROM i.id -- ⑧ avoid empty updates
;
① Like GMB already suggested, transform this predicate to "sargable" expressions. This is faster, generally, and can use index support.
What does the word “SARGable” really mean?
Calculate number of concurrent events in SQL
② But don't cast to date if cdsx_time is a timestamp column (as seems likely). '2020-05-12' is a perfectly valid timestamp literal, signifying the first instance of the day. See:
Generating time series between two dates in PostgreSQL
If it's a timestamptz column, consider the possible influence of the timezone setting! See:
Ignoring time zones altogether in Rails and PostgreSQL
③ EXISTS is typically (much) more efficient than counting all rows, as it can stop as soon as another row is found. Especially if there can be many peers, and index support is available. See:
Select rows which are not present in other table
④ Assuming id is unique (or PK). Else use the system column ctid for the job. See:
How do I (or can I) SELECT DISTINCT on multiple columns?
⑤ Convenient, equivalent short syntax with ROW values. See:
Enforcing index scan for multicolumn comparison
⑥ Your query has:
and I.cdsx_time = O.in_msg_time -- !?
and I.cdsx_time::text like '2020-05-12%'
... but:
O.cdsx_time::text like '2020-05-12%'
You didn't mean to write and I.cdsx_time = O.cdsx_time?
⑦ Would be noise. The restriction is already enforced in the subquery. (Doesn't help index support, either.)
⑧ This one is important if some columns may already have the desired value. Then the operation is skipped instead of writing an identical row version at full cost.
If both columns are defined NOT NULL, simplify to o.in_msg_id <> i.id. Again, see:
Update a column of a table with a column of another table in PostgreSQL
Indices
If performance is an issue or you run this repeatedly, consider indices like the following:
For the first (in order of the expected query plan!) step of identifying source row candidates:
CREATE INDEX foo ON prod_replay_in (msg_type, cdsx_time);
For the second step of ruling out duplicates:
CREATE INDEX foo ON prod_replay_in (seller, buyer, sender_tag, seller_tag, buyer_tag, isin, quantity, cdsx_time);
Or any small subset that is selective enough. A smaller index on fewer columns is typically more efficient if it includes relatively few additional rows as "false positives" in the index scan. While relatively few, these are eliminated cheaply in the following FILTER step.
For the final step of identifying target rows:
CREATE INDEX foo ON prod_replay_out (msg_type, in_msg_time);
Again: or any small subset that is selective enough.
You want to update only when the subquery returns one row. One option uses aggreation and having in the subquery:
update prod_replay_out o
set in_msg_id = (
select max(id)
from prod_replay_in i
where
i.msg_type = 'cdst010'
and i.seller = o.seller
and i.buyer = o.buyer
and i.sender_tag = o.sender_tag
and i.seller_tag = o.seller_tag
and i.buyer_tag = o.buyer_tag
and i.isin = o.isin
and i.quantity = o.quantity
and i.cdsx_time = o.in_msg_time
and i.cdsx_time >= '2020-05-12'::date
and i.cdsx_time < '2020-05-13'::date
having count(*) = 1
)
where
o.msg_type = 'cdst01c'
and o.cdsx_time >= '2020-05-12'::date
and o.cdsx_time < '2020-05-13'::date
Note that I rewrote the date filters to avoid the conversion to text (you can use an half-open interval with date literals instead, which is by far more efficient).
Note that this updates in_msg_id to null when the subquery would have returned multiple rows (or no rows at all). If you want to avoid that, you can filter in the where clause:
update prod_replay_out o
set in_msg_id = (
select max(id)
from prod_replay_in i
where
i.msg_type = 'cdst010'
and i.seller = o.seller
and i.buyer = o.buyer
and i.sender_tag = o.sender_tag
and i.seller_tag = o.seller_tag
and i.buyer_tag = o.buyer_tag
and i.isin = o.isin
and i.quantity = o.quantity
and i.cdsx_time = o.in_msg_time
and i.cdsx_time >= '2020-05-12'::date
and i.cdsx_time < '2020-05-13'::date
having count(*) = 1
)
where
o.msg_type = 'cdst01c'
and o.cdsx_time >= '2020-05-12'::date
and o.cdsx_time < '2020-05-13'::date
and (
select count(*)
from prod_replay_in i
where
i.msg_type = 'cdst010'
and i.seller = o.seller
and i.buyer = o.buyer
and i.sender_tag = o.sender_tag
and i.seller_tag = o.seller_tag
and i.buyer_tag = o.buyer_tag
and i.isin = o.isin
and i.quantity = o.quantity
and i.cdsx_time = o.in_msg_time
and i.cdsx_time >= '2020-05-12'::date
and i.cdsx_time < '2020-05-13'::date
) = 1
I want to subtract two values from the same column like this:
SELECT tblTask.CustomerName,
Sum(Isnull(Cast(Value AS FLOAT), 0)) AS summa
FROM tblExtraFieldData
INNER JOIN tblTask
ON tblExtraFieldData.OwnerId = tblTask.Id
INNER JOIN tblLog
ON tblExtraFieldData.OwnerId = tblLog.OwnerId
WHERE tblLog.Modified BETWEEN '2016-11-25' AND '2016-12-31'
AND tblExtraFieldData.FieldId = '10052'
AND tblTask.Status = '5'
AND tblLog.Text LIKE '%Exporterat%'
AND tblTask.ProjectNr NOT LIKE 'Fastpris'
GROUP BY tblTask.CustomerName
ORDER BY summa
And now I want to subtract this with a value that is found in tblExtraFieldData.Value and the FieldId = '10048' in the table tblExtraFieldData
I've researched for a pretty long time and extensively already on this problem; so far nothing similar has come up. tl;dr below
Here's my problem below.
I'm trying to create a SELECT statement in SQLite with conditional filtering that works somewhat like a function. Sample pseudo-code below:
SELECT col_date, col_hour FROM table1 JOIN table2
ON table1.col_date = table2_col_date AND table1.col_hour = table2.col_hour AND table1.col_name = table2.col_name
WHERE
IF table2.col_name = "a" THEN {filter these records further such that its table2.col_volume >= 600} AND
IF table2.col_name = "b" THEN {filter these records further such that its table2.col_volume >= 550}
BUT {if any of these two statements are not met completely, do not get any of the col_date, col_hour}
*I know SQLite does not support the IF statement but this is just to demonstrate my intention.
Here's what I've been doing so far. According to this article, it is possible to transform CASE clauses into boolean logic, such that you will see here:
SELECT table1.col_date, table1.col_hour FROM table1 INNER JOIN table2
ON table1.col_date = table2.col_date AND table1.col_hour = table2.col_hour AND table1.col_name = table2.col_name
WHERE
((NOT table2.col_name = "a") OR table2.col_volume >= 600) AND
((NOT table2.col_name = "b") OR table2.col_volume >= 550)
In this syntax, the problem is that I still get col_dates and col_hours where at least one col_name's col_volume for that specific col_date and col_hour did not meet its requirement. (e.g. I still get a record entry with col_date = 2010-12-31 and col_hour = 5, but col_name = "a"'s col_volume = 200 while col_name = "b"'s col_volume = 900. This said date and hour should not appear in the query because "a" has a volume which is not >= 600, even if "b" met its volume requirement which is >= 550.)
For tl;dr
If all these are getting confusing, here are sample tables with the sample correct query results so you can just forget everything above and go right on ahead:
table1
col_date,col_hour,col_name,extra1,extra2
2010-12-31,4,"a","hi",1
2010-12-31,4,"a","he",1
2010-12-31,4,"a","ho",1
2010-12-31,5,"a","hi",1
2010-12-31,5,"a","he",1
2010-12-31,5,"a","ho",1
2010-12-31,6,"a","hi",1
2010-12-31,6,"a","he",1
2010-12-31,6,"a","ho",1
2010-12-31,4,"b","hi",1
2010-12-31,4,"b","he",1
2010-12-31,4,"b","ho",1
2010-12-31,5,"b","hi",1
2010-12-31,5,"b","he",1
2010-12-31,5,"b","ho",1
2010-12-31,6,"b","hi",1
2010-12-31,6,"b","he",1
2010-12-31,6,"b","ho",1
table2
col_date,col_hour,col_name,col_volume
2010-12-31,4,"a",750
2010-12-31,4,"b",750
2010-12-31,5,"a",200
2010-12-31,5,"b",900
2010-12-31,6,"a",700
2010-12-31,6,"b",800
The correct query results (with col_volume filters: 600 for 'a' and 550 for 'b') should be:
2010-12-31,4
2010-12-31,6
try this:
SELECT table1.col_date,
table1.col_hour
FROM table1
INNER JOIN table2
ON table1.col_date = table2.col_date
AND table1.col_hour = table2.col_hour
AND table1.col_name = table2.col_name
WHERE EXISTS ( -- here I'm appling the filter logic
select col_date,
col_hour
from table2 sub
where (col_name = 'a' and col_volume >= 600)
or (col_name = 'b' and col_volume >= 550)
and sub.col_date = table2.col_date
and sub.col_hour = table2.col_hour
and sub.col_name = table2.col_name
group by col_date,
col_hour
having count(1) = 2 -- I assume there could be only two rows:
-- one for 'a' and one for 'b'
)
You can check this demo in SQLfiddle
Last thing, you show the same columns from Table1 that you use for the join, but I imagine this is just for the sake of this example
You can try with exists and correlated subquery with case for different conditions in the where clause:
select t1.col_date
, t1.col_hour
from table1 t1
where exists ( select t2.col_volume
from table2 t2
where t2.col_date = t1.col_date
and t2.col_hour = t1.col_hour
and t2.col_name in ('a', 'b')
group by t2.col_volume
having count(t2.col_name >= case when t2.col_name = 'a' then 600 else 550 end) = (select count(*) from table2 where col_name = t2.col_name))
Your boolean transformation is wrong.
Your IFs infer that you are looking for rows that:
table2.col_name = "a" and col_volume >= 600
table2.col_name = "b" and col_volume >= 550
(implicitly) other values for col_name
So, to translate this to SQL:
SELECT table1.col_date, table1.col_hour
FROM table1
INNER JOIN table2 ON table1.col_date = table2.col_date AND
table1.col_hour = table2.col_hour
WHERE (table2.col_name = 'a' AND table2.col_volume >= 600) OR
(table2.col_name = 'b' AND table2.col_volume >= 550) OR
(table2.col_name NOT IN ('a', 'b'))
I think I have an answer (BIG HELP to #mucio's "HAVING" clause; looks like I have to brush up on that).
Apparently the approach was a simple sub-query in which the outer query will do a join on. It's a work-around (not really a direct answer to the problem I posted, I had to reorganize my program flow with this approach), but it got the job done.
Here's the sample code:
SELECT table1.col_date, table1.col_hour
FROM table1
INNER JOIN
(
SELECT col_date, col_hour
FROM table2
WHERE
(col_name = 'a' AND col_volume >= 600) OR
(col_name = 'b' AND col_volume >= 550)
GROUP BY col_date, col_hour
HAVING COUNT(1) = 2
) tb2
ON
table1.col_date = tb2.col_date AND
table1.col_hour = tb2.col_hour
GROUP BY table1.col_date, table1.col_hour