SQL function-like WHERE statement - sql

I've researched for a pretty long time and extensively already on this problem; so far nothing similar has come up. tl;dr below
Here's my problem below.
I'm trying to create a SELECT statement in SQLite with conditional filtering that works somewhat like a function. Sample pseudo-code below:
SELECT col_date, col_hour FROM table1 JOIN table2
ON table1.col_date = table2_col_date AND table1.col_hour = table2.col_hour AND table1.col_name = table2.col_name
WHERE
IF table2.col_name = "a" THEN {filter these records further such that its table2.col_volume >= 600} AND
IF table2.col_name = "b" THEN {filter these records further such that its table2.col_volume >= 550}
BUT {if any of these two statements are not met completely, do not get any of the col_date, col_hour}
*I know SQLite does not support the IF statement but this is just to demonstrate my intention.
Here's what I've been doing so far. According to this article, it is possible to transform CASE clauses into boolean logic, such that you will see here:
SELECT table1.col_date, table1.col_hour FROM table1 INNER JOIN table2
ON table1.col_date = table2.col_date AND table1.col_hour = table2.col_hour AND table1.col_name = table2.col_name
WHERE
((NOT table2.col_name = "a") OR table2.col_volume >= 600) AND
((NOT table2.col_name = "b") OR table2.col_volume >= 550)
In this syntax, the problem is that I still get col_dates and col_hours where at least one col_name's col_volume for that specific col_date and col_hour did not meet its requirement. (e.g. I still get a record entry with col_date = 2010-12-31 and col_hour = 5, but col_name = "a"'s col_volume = 200 while col_name = "b"'s col_volume = 900. This said date and hour should not appear in the query because "a" has a volume which is not >= 600, even if "b" met its volume requirement which is >= 550.)
For tl;dr
If all these are getting confusing, here are sample tables with the sample correct query results so you can just forget everything above and go right on ahead:
table1
col_date,col_hour,col_name,extra1,extra2
2010-12-31,4,"a","hi",1
2010-12-31,4,"a","he",1
2010-12-31,4,"a","ho",1
2010-12-31,5,"a","hi",1
2010-12-31,5,"a","he",1
2010-12-31,5,"a","ho",1
2010-12-31,6,"a","hi",1
2010-12-31,6,"a","he",1
2010-12-31,6,"a","ho",1
2010-12-31,4,"b","hi",1
2010-12-31,4,"b","he",1
2010-12-31,4,"b","ho",1
2010-12-31,5,"b","hi",1
2010-12-31,5,"b","he",1
2010-12-31,5,"b","ho",1
2010-12-31,6,"b","hi",1
2010-12-31,6,"b","he",1
2010-12-31,6,"b","ho",1
table2
col_date,col_hour,col_name,col_volume
2010-12-31,4,"a",750
2010-12-31,4,"b",750
2010-12-31,5,"a",200
2010-12-31,5,"b",900
2010-12-31,6,"a",700
2010-12-31,6,"b",800
The correct query results (with col_volume filters: 600 for 'a' and 550 for 'b') should be:
2010-12-31,4
2010-12-31,6

try this:
SELECT table1.col_date,
table1.col_hour
FROM table1
INNER JOIN table2
ON table1.col_date = table2.col_date
AND table1.col_hour = table2.col_hour
AND table1.col_name = table2.col_name
WHERE EXISTS ( -- here I'm appling the filter logic
select col_date,
col_hour
from table2 sub
where (col_name = 'a' and col_volume >= 600)
or (col_name = 'b' and col_volume >= 550)
and sub.col_date = table2.col_date
and sub.col_hour = table2.col_hour
and sub.col_name = table2.col_name
group by col_date,
col_hour
having count(1) = 2 -- I assume there could be only two rows:
-- one for 'a' and one for 'b'
)
You can check this demo in SQLfiddle
Last thing, you show the same columns from Table1 that you use for the join, but I imagine this is just for the sake of this example

You can try with exists and correlated subquery with case for different conditions in the where clause:
select t1.col_date
, t1.col_hour
from table1 t1
where exists ( select t2.col_volume
from table2 t2
where t2.col_date = t1.col_date
and t2.col_hour = t1.col_hour
and t2.col_name in ('a', 'b')
group by t2.col_volume
having count(t2.col_name >= case when t2.col_name = 'a' then 600 else 550 end) = (select count(*) from table2 where col_name = t2.col_name))

Your boolean transformation is wrong.
Your IFs infer that you are looking for rows that:
table2.col_name = "a" and col_volume >= 600
table2.col_name = "b" and col_volume >= 550
(implicitly) other values for col_name
So, to translate this to SQL:
SELECT table1.col_date, table1.col_hour
FROM table1
INNER JOIN table2 ON table1.col_date = table2.col_date AND
table1.col_hour = table2.col_hour
WHERE (table2.col_name = 'a' AND table2.col_volume >= 600) OR
(table2.col_name = 'b' AND table2.col_volume >= 550) OR
(table2.col_name NOT IN ('a', 'b'))

I think I have an answer (BIG HELP to #mucio's "HAVING" clause; looks like I have to brush up on that).
Apparently the approach was a simple sub-query in which the outer query will do a join on. It's a work-around (not really a direct answer to the problem I posted, I had to reorganize my program flow with this approach), but it got the job done.
Here's the sample code:
SELECT table1.col_date, table1.col_hour
FROM table1
INNER JOIN
(
SELECT col_date, col_hour
FROM table2
WHERE
(col_name = 'a' AND col_volume >= 600) OR
(col_name = 'b' AND col_volume >= 550)
GROUP BY col_date, col_hour
HAVING COUNT(1) = 2
) tb2
ON
table1.col_date = tb2.col_date AND
table1.col_hour = tb2.col_hour
GROUP BY table1.col_date, table1.col_hour

Related

Using a CASE WHEN statement and an IN (SELECT...FROM) subquery

I'm trying to create a temp table and build out different CASE WHEN logic for two different medications. In short I have two columns of interest for these CASE WHEN statements; procedure_code and ndc_code. There are only 3 procedure codes that I need, but there are about 20 different ndc codes. I created a temp.ndcdrug1 temp table with these ndc codes for medication1 and temp.ndcdrug2 for the ndc codes for medication2 instead of listing out each ndc code individually. My query looks like this:
CREATE TABLE temp.flags AS
SELECT DISTINCT a.userid,
CASE WHEN (procedure_code = 'J7170' OR ndc_code in (select ndc_code from temp.ndcdrug1)) THEN 'Y' ELSE 'N' END AS Drug1,
CASE WHEN (procedure_code = 'J7205' OR procedure_code = 'C9136' OR ndc_code in (select ndc_code from temp.ndcdrug2)) THEN 'Y' ELSE 'N' END AS Drug2,
CASE WHEN (procedure_code = 'J7170' AND procedure_code = 'J7205') THEN 'Y' ELSE 'N' END AS Both
FROM table1 a
LEFT JOIN table2 b
ON a.userid = b.userid
WHERE...
AND...
When I run this, it returns: org.apache.spark.sql.AnalysisException: IN/EXISTS predicate sub-queries can only be used in a Filter.
I could list these ndc_code values out individually, but there are a lot of them so wanted a more efficient way of going about this. Is there a way to use a sub select query like this when writing out CASE WHEN's?
Query.
CREATE TABLE temp.flags AS
SELECT DISTINCT a.userid,
CASE WHEN (
procedure_code = 'J7170' OR
(select min('1') from temp.ndcdrug1 m where m.ndc_code = a.ndc_code) = '1'
) THEN 'Y' ELSE 'N' END AS Drug1,
CASE WHEN (
procedure_code = 'J7205' OR
procedure_code = 'C9136' OR
(select min('1') from temp.ndcdrug2 m where m.ndc_code = a.ndc_code) = '1'
) THEN 'Y' ELSE 'N' END AS Drug2,
CASE WHEN (procedure_code = 'J7170' AND procedure_code = 'J7205')
THEN 'Y' ELSE 'N' END AS Both
FROM table1 a
LEFT JOIN table2 b
ON a.userid = b.userid
WHERE...
AND...

Long Running Query - Recommendations to improve performance in Redshift

SELECT
A.load,
A.sender,
A.latlong,
COUNT(distinct B.load) as load_count,
COUNT(distinct B.sender) as sender_count
FROM TABLE_A A
JOIN TABLE_B B ON
A.sender <> B.sender AND
(
A.latlong = B.latlong
or
(
lower(A.address_line1) = lower(B.address_line1)
and lower(A.city) = lower(B.city)
and lower(A.state) = lower(B.state)
and lower(A.country) = lower(B.country)
)
)
GROUP BY A.load, A.sender, A.latlong ;
I am trying to run a query as above sample, which runs for more time (approx 2 hrs) which is not at all expected. I am trying to split the query and do UNION but the result sets are not matching.
Can you please help with options to improve this query performance or alternative ways to achieve this in AWS?
Approximately 1.5 million records
I would suggest removing the to lower function and sanitizing the data to be lower case
select
A.load, A.sender, A.latlong,
count(distinct B.load) as load_count,
count(distinct B.sender) as sender_count
from
TABLE_A A
join
TABLE_B B
on
A.sender <> B.sender and
(
A.latlong = B.latlong
or
(
A.address_line1 = B.address_line1
and A.city) = B.city)
and A.state) = B.state)
and A.country) = B.country)
))
group by
A.load, A.sender, A.latlong ;

Can anyone spot the error with the WITH() clause?

SQL is not recognizing the variable I am calling in the main SELECT statement. The variable I am calling is the execs.block_order_quantity. I name this variable in the WITH statement at the beginning of the query. The code is below, and the error is attached as a picture. I have run the select statement without the WITH execs as( piece and it works just fine.
WITH execs as(SELECT th.exec_id, SUM(eha.tck_ord_qty) as "BLOCK_ORDER_QUANTITY"
FROM t1 eha
join t2 th
on eha.exec_id = th.exec_id
where th.trdng_desk_sname IN ('NAME')
and th.trd_dt between to_date('201840101', 'YYYYMMDD')
and to_date ('20140301', 'YYYYMMDD')
and exists
(SELECT 1
FROM t2t eth
WHERE eth.TRD_ID = eha.TRD_ID
AND eth.trd_stat_cd = 'EX')
group by th.exec_id)
SELECT DISTINCT
th.trd_dt as "TRADE_DATE",
eah.ord_cap_qty as "CAP_AMOUNT",
execs.block_order_quantity as "BLOCK_ORDER_QUANTITY",
eah.alloc_ovrrd_rsn_cd as "ALLOC_OVRRD_RSN_CD",
CASE --create allocation case id
WHEN(eh.manl_alloc_ind = 'Y'
OR NVL (eah.trdr_drct_qty, 0) > 0
OR NVL (eah.trdr_drct_wgt_ratio, 0) > 0)
THEN
'Y'
ELSE
'N'
END
AS "ALLOCATION_OVRRD_IND",
CASE
WHEN (eh.manl_alloc_ind = 'Y'
OR NVL (eah.trdr_drct_qty, 0) > 0
OR NVL (eah.trdr_drct_wgt_ratio, 0) > 0)
THEN
TH.EXEC_TMSTMP
ELSE
NULL
END
AS "ALLOCATION_OVRRD_TIMESTAMP",
eah.alloc_adj_assets_rt_curr_amt as "FUND_ADJ_NET_ASSETS",
eah.as_alloc_exec_hld_qty as "FUND_HOLDINGS_QUANTITY",
th.as_trd_iv_lname as "SECURITY_NAME",
th.as_trd_fmr_ticker_symb_cd as "TICKER",
CASE
WHEN NVL(th.limit_prc_amt, 0) > 0 THEN 'LIMIT' ELSE NULL END
AS "FUND_ORDER_TYPE"
from t1 eah
join t3 eh
on eah.exec_id = eah.exec_id
join t2 th
on th.trd_id = eah.trd_id
join t4 tk
on tk.tck_id = eah.tck_id
join t5 pm
on eah.pm_person_id_src = pm.person_id_src
where th.trdng_desk_sname IN('NAME')
and th.trd_dt between to_date('20140101', 'YYYYMMDD')
and to_date ('20140301', 'YYYYMMDD')
and rownum < 15
You need to join to the execs common table expression (CTE) name in your main query, e.g.:
...
from t1 eah
join t3 eh
on eah.exec_id = eah.exec_id
join t2 th
on th.trd_id = eah.trd_id
join execs
on execs.exec_id = eh.exec_id
join t4 tk
...
I'm not sure you actually want a CTE here, it looks like you could just do the aggregation in the main query as you're referencing the same tables; but there may be something I'm missing, like duplicates introduced by later joins.
Incidentally, the first on clause there looks wrong, as both sides refer to the same column in the same table; so it should probably be:
...
from t1 eah
join t3 eh
on eh.exec_id = eah.exec_id
join t2 th
Having distinct is sometimes a sign that there joins aren't right and there are duplicates you want to suppress which shouldn't really be there, so you may not need that once the join condition is fixed; and that might allow a simple aggregate to work too if it was giving the wrong result before. (Or there could stil be some other reason it's not appropriate.)
Also, the where rownum < 15 will give an indeterminate set of rows as you aren't ordering the result set before applying that filter.

Should a subquery on a join use tables from an outer query in the where clause?

I need to add a subquery to a join, because one payment can have more than one allotment, so I only need to account for the first match (where rownum = 1).
However, I'm not sure if adding pmt from the outer query to the subquery on the allotment join is best.
Should I be doing this differently in the event of performance hits, etc.. ?
SELECT
pmt.payment_uid,
alt.allotment_uid,
FROM
payment pmt
/* HERE: is the reference to pmt.pay_key and pmt.client_id
incorrect in the below subquery? */
INNER JOIN allotment alc ON alt.allotment_uid = (
SELECT
allotment_uid
FROM
allotment
WHERE
pay_key = pmt.pay_key
AND
pay_code = 'xyz'
AND
deleted = 'N'
AND
client_id = pmt.client_id
AND
ROWNUM = 1
)
WHERE
AND
pmt.deleted = 'N'
AND
pmt.date_paid >= TO_DATE('2017-07-01')
AND
pmt.date_paid < TO_DATE('2017-10-01') + 1;
It's difficult to identify the performance issue in your query without seeing an explain plan output. You query does seem to do an additional SELECT on the allotment for every record from the main query.
Here is a version which doesn't use correlated sub query. Obviously I haven't been able to test it. It does a simple join in and then filters all records except one of the allotments. Hope this helps.
WITH v_payment
AS
(
SELECT
pmt.payment_uid,
alt.allotment_uid,
ROW_NUMBER () OVER(PARTITION BY allotment_id) r_num
FROM
payment pmt JOIN allotment alt
ON (pmt.pay_key = alt.pay_key AND
pmt.client_id = alt.client_id)
WHERE pmt.deleted = 'N' AND
pmt.date_paid >= TO_DATE('2017-07-01') AND
pmt.date_paid < TO_DATE('2017-10-01') + 1 AND
alt.pay_code = 'xyz' AND
alt.deleted = 'N'
)
SELECT payment_uid,
allotment_uid
FROM v_payment
WHERE r_num = 1;
Let's know how this performs!
You can phrase the query that way. I would be more likely to do:
SELECT . . .
FROM payment p INNER JOIN
(SELECT a.*,
ROW_NUMBER() OVER (PARTITION BY pay_key, client_id
ORDER BY allotment_uid
) as seqnum
FROM allotment a
WHERE pay_code = 'xyz' AND deleted = 'N'
) a
ON a.pay_key = p.pay_key AND a.client_id = p.client_id AND
seqnum = 1
WHERE p.deleted = 'N' AND
p.date_paid >= DATE '2017-07-01' AND
p.date_paid < (DATE '2017-10-01') + 1;

Oracle SubQuery (Display subquery columns)

I cannot seem to find via searching how I would be able to return some of the columns from the the below sub queries. Specifically B.TAP_STAT_HSL/C.TAP_STAT_HSL. I'm not sure if I should be joining instead, but any help would be greatly appreciated.
SELECT
A.HSE_KEY_HSE AS HOUSEKEY,
A.DROP_STAT_HSE AS DROPSTATUS
A.TAP_STAT_HSL AS ITAPSTAT
FROM OPS$SEA.HSE_BASE,OPS$SEA.HSL_LOB,OPS$SEA.OOR_ORDER_OPEN A
WHERE A.HSE_KEY_HSE = A.HSE_KEY_HSL
AND A.HSE_KEY_HSL = A.HSE_KEY_OOR
AND A.DROP_STAT_HSE = '1'
AND A.LOB_IND_HSL = 'I'
AND A.TAP_STAT_HSL IN ('0','2')
AND A.ORD_STAT_OOR <> 'O'
AND EXISTS (SELECT 1
FROM OPS$SEA.HSE_BASE B,OPS$SEA.HSL_LOB B, OPS$SEA.OOR_ORDER_OPEN B
WHERE A.HSE_KEY_HSE = B.HSE_KEY_HSE
AND B.HSE_KEY_HSE = B.HSE_KEY_HSL
AND B.HSE_KEY_HSL = B.HSE_KEY_OOR
AND B.DROP_STAT_HSE = '1'
AND B.LOB_IND_HSL = 'C'
AND B.TAP_STAT_HSL IN ('0','2')
AND B.ORD_STAT_OOR <> 'O')
AND EXISTS (
SELECT 1
FROM OPS$SEA.HSE_BASE C,OPS$SEA.HSL_LOB C, OPS$SEA.OOR_ORDER_OPEN C
WHERE A.HSE_KEY_HSE = C.HSE_KEY_HSE
AND C.HSE_KEY_HSE = C.HSE_KEY_HSL
AND C.HSE_KEY_HSL = C.HSE_KEY_OOR
AND C.DROP_STAT_HSE = '1'
AND C.LOB_IND_HSL = 'T'
AND C.TAP_STAT_HSL IN ('0','2')
AND C.ORD_STAT_OOR <> 'O')}
Hmmm....
I believe your query can be re-written as follows:
WITH Allowed_Rows (houseKey, dropStatus, ipApStat, indicator)
as (SELECT a.HSE_KEY_HSE, a.DROP_STAT_HSE,
b.TAP_STAT_HSL, b.LOB_IND_HSL
FROM OPS$SEA.HSE_BASE as a
JOIN OPS$SEA.HSL_LOB as b
ON b.HSE_KEY_HSL = a.HSE_KEY_HSE
AND b.LOB_IND_HSL IN ('I', 'C', 'T')
AND b.TAB_STAT_HSL IN ('0', '2')
JOIN OPS$SEA.OOR_Order_Open as c
ON c.HSE_KEY_OOR = a.HSE_KEY_HSE
AND c.ORD_STAT_OOR <> '0'
WHERE a.DROP_STAT_HSE = '1')
SELECT houseKey, dropStatus, ipApStat
FROM Allowed_Rows as a
WHERE a.indicator = 'I'
AND EXISTS (SELECT '1'
FROM Allowed_Rows as b
WHERE b.houseKey = a.houseKey
AND b.indicator = 'C')
AND EXISTS (SELECT '1'
FROM Allowed_Rows as b
WHERE b.houseKey = a.houseKey
AND b.indicator = 'T')
You didn't properly qualify some of your tables, and used the same alias for multiple tables (which I'm surprised didn't generate a syntax error), so I had to make my best guess as to where things actually belong. There are a couple of other variations possible, depending on the other (unlisted) requirements and constraints.
And why and how do you need to return the 'other' values of TAP_STAT_HSL? DO you need all possible combinations? The value of the row for B or C instead of A? What?