Squeryl Select Duplicates - sql

I would like to find overlapping data with a Squeryl query. I can do so by using the method found here with normal SQL, but can't figure out how to do so using Squeryl.
Basically I need to convert this line that finds Non-Distinct rows to Squeryl
SELECT *
FROM myTable L1
JOIN(
SELECT myField1,myField2
FROM myTable
GROUP BY myField1,myField2
HAVING COUNT(*) >= 2
) L2
ON L1.myField1 = L2.myField1 AND L1.myField2 = L2.myField2;
EDIT : More importantly I need to be able to do this dynamically. I have a bit of a complex dynamic query that I call that may rely on different options being passed. If an Option is defined then it should call this, otherwise inhibit if null. But groupBy does not support an inhibitBy method. To see a full explanation of my current method look here
def getAllJoined(
hasFallback:Option[String] = None
showDuplicates:Option[String] = None):List[(Type1,Type2)] = transaction{
join(mainTable,
table2,
table3,
table3,
table4.leftOuter,
table4.leftOuter,
table5,
table6)((main, attr1, attr2, attr3, attr4, attr5, attr6, attr7) =>
where(
main.fallBack.isNotNull.inhibitWhen(!hasFallback.isDefined)
)
//What to do here to only find duplicates when showDuplicates.isDefined? AKA Non-Distinct
select(main,attr1,attr2,attr3,attr4,attr5,attr6,attr7)
on(
(main.attr1Col === attr1.id) ,
(main.attr2Col === attr2.id) ,
(main.attr3Col === attr3.id) ,
(main.attr4Col === attr4.map(_.id)) ,
(main.attr5Col === attr5.map(_.id)) ,
(main.attr6Col === attr6.id) ,
(main.attr7Col === attr7.id)
)
).toList

Check out this discussion on Google Groups. Looks like they had fixed a bug related to inhibited having in 2011, but not sure why it still persists in your case. They also have an example query using the having clause in the same thread.

Related

How to run sql queries with multiple with clauses(sub-query refactoring)?

I have a code block that has 7/8 with clauses(
sub-query refactoring) in queries. I'm looking on how to run this query as I'm getting 'sql compilation errors' when running these!, While I'm trying to run them I'm getting errors in snowflake. for eg:
with valid_Cars_Stock as (
select car_id
from vw_standard.agile_car_issue_dime
where car_stock_expiration_ts is null
and car_type_name in ('hatchback')
and car_id = 1102423975
)
, car_sale_hist as (
select vw.issue_id, vw.delivery_effective_ts, bm.car_id,
lag(bm.sprint_id) over (partition by vw.issue_id order by vw.delivery_effective_ts) as previous_stock_id
from valid_Cars_Stock i
join vw_standard.agile_car_fact vw on vw.car_id = bm.car_id
left join vw_standard.agile_board_stock_bridge b on b.board_stock_bridge_dim_key = vw.issue_board_sprint_bridge_dim_key
order by vw.car_stock_expiration_ts desc
)
,
So how to run this 2 queries separately or together! I'm new to sql aswell any help would be ideal
So lets just reformate that code as it stands:
with valid_Cars_Stock as (
select
car_id
from vw_standard.agile_car_issue_dime
where car_stock_expiration_ts is null
and car_type_name in ('hatchback')
and car_id = 1102423975
), car_sale_hist as (
select
vw.issue_id,
vw.delivery_effective_ts,
bm.car_id,
lag(bm.sprint_id) over (partition by vw.issue_id order by vw.delivery_effective_ts) as previous_stock_id
from valid_Cars_Stock i
join vw_standard.agile_car_fact vw
on vw.car_id = bm.car_id
left join vw_standard.agile_board_stock_bridge b
on b.board_stock_bridge_dim_key = vw.issue_board_sprint_bridge_dim_key
order by vw.car_stock_expiration_ts desc
),
There are clearly part of a larger block of code.
For an aside of CTE, you should 100% ignore anything anyone (including me) says about them. They are 2 things, a statical sugar, and they allow avoidance of repetition, thus the Common Table Expression name. Anyways, they CAN perform better than temp tables, AND they CAN perform worse than just repeating the say SQL many times in the same block. There is no one rule. Testing is the only way to find for you SQL what is "fastest" and it can and does change as updates/releases are made. So ignoring performance comments.
if I am trying to run a chain like this to debug it I alter the point I would like to stop normally like so:
with valid_Cars_Stock as (
select
car_id
from vw_standard.agile_car_issue_dime
where car_stock_expiration_ts is null
and car_type_name in ('hatchback')
and car_id = 1102423975
)--, car_sale_hist as (
select
vw.issue_id,
vw.delivery_effective_ts,
bm.car_id,
lag(bm.sprint_id) over (partition by vw.issue_id order by vw.delivery_effective_ts) as previous_stock_id
from valid_Cars_Stock i
join vw_standard.agile_car_fact vw
on vw.car_id = bm.car_id
left join vw_standard.agile_board_stock_bridge b
on b.board_stock_bridge_dim_key = vw.issue_board_sprint_bridge_dim_key
order by vw.car_stock_expiration_ts desc
;), NEXT_AWESOME_CTE_THAT_TOTALLY_MAKES_SENSE (
-- .....
and now the result of car_sale_hist will be returned. because we "completed" the CTE chain by not "starting another" and the ; stopped the this is all part of my SQL block.
Then once you have that steps working nicely, remove the semi-colon and end of line comments, and get of with value real value.

PostgreSQL ALL(A) <# ANY(B)

The objective is to solve the following use case:
table contains many numrange[] fields. Let A be one of those fields
we need to request rows with a parameter of type numrange[] = B according to this rule : ALL(A) <# ANY(B)
A sample of a request on table dt.t with B = {[1,3],[9,10]} would be :
select * from dt.t where ALL(A) <# ANY(ARRAY[numrange(1,3),numrange(9,10)])
So it seems feasible. But the ALL operator can only be used on the right side of the condition...
And turning it around for about a day I don't find a clue on how to solve this use case (not using functions if possible).
The real use case will be using filtering on many fields so the solution needs to be working for multiple fields in the same where clause
select *
from dt.t
where ALL(A1) <# ANY(ARRAY[numrange(1,3),numrange(9,10)])
and ALL(A2) <# ANY(ARRAY[numrange(10,13),numrange(20,20)])
Found this solution :
select *
from dt.t t1
where (
select count(1)
from (
select unnest(A) a
from dt.t t2
where t2.id=t1.id
) t
where t.a <# ANY(ARRAY['[1,3)'::numrange])) = array_length(A,1);
The idea is :
select unnest(A) a from dt.t t2 where t2.id=t1.id => gives each element of ARRAY field A
t.a <# ANY(ARRAY['[1,3)'::numrange]) => tests if this element is included in the parameter => <# ANY(B) part
(select count(1) [...]) = array_length(A,1) => checks that all elements of A is valid => the ALL(A) part of the problem
Tried it, works, and seems legit. The only thing really important is that B is the minimal union of itself (there shall be no numrange[] equivalent to B with less numrange in it).
Apart from that, seems to work. Thank you all for your help and time.

the below select statement takes a long in running

This select statement takes a long time running, after my investigation I found that the problem un subquery, stored procedure, please I appreciate your help.
SELECT DISTINCT
COKE_CHQ_NUMBER,
COKE_PAY_SUPPLIER
FROM
apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
WHERE
plan_id = 40192
AND COKE_SIGNATURE__A = 'YES'
AND COKE_SIGNATURE__B = 'YES'
AND COKE_AUDIT = 'YES'
AND COKE_CHQ_NUMBER NOT IN (SELECT DISTINCT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V
WHERE UPPER(COKE_CHQ_NUMBER_DELIVER_STATUS) <> 'DELIVERED')
AND COKE_CHQ_NUMBER NOT IN (SELECT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V)
Well there are a few issues with your SELECT statement that you should address:
First let's look at this condition:
COKE_CHQ_NUMBER NOT IN (SELECT DISTINCT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V
WHERE UPPER(COKE_CHQ_NUMBER_DELIVER_STATUS) <> 'DELIVERED')
First you select DISTINCT cheque numbers with a not delivered status then you say you don't want this. Rather than saying I don't want non delivered it is much more readable to say I want delivered ones. However this is not really an issue but rather it would make your SELECT easier to read and understand.
Second let's look at your second cheque condition:
COKE_CHQ_NUMBER NOT IN (SELECT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V)
Here you want to exclude all cheques that have an entry in Q_COKE_AP_CHECKS_DELIVERY_ST_V. This makes your first DISTINCT condition redundant as whatever cheques numbers will bring back would be rejected by this second condition of yours. I do't know if Oracle SQL engine is clever enough to work out this redundancy but this could cause your slowness as SELECT distinct can take longer to run
In addition to this if you don't have them already I would recommend adding the following indexes:
CREATE INDEX index_1 ON q_coke_ap_checks_sign_status_v(coke_chq_number, coke_pay_supplier);
CREATE INDEX index_2 ON q_coke_ap_checks_sign_status_v(plan_id, coke_signature__a, coke_signature__b, coke_audit);
CREATE INDEX index_3 ON q_coke_ap_checks_delivery_st_v(coke_chq_number_deliver);
I called the index_1,2,3 for easy to read obviously not a good naming convention.
With this in place your select should be optimized to retrieve you your data in an acceptable performance. But of course it all depends on the size and the distribution of your data which is hard to control without performing specific data analysis.
looking to you code .. seems you have redundant where condition the second NOT IN implies the firts so you could avoid
you could also transform you NOT IN clause in a MINUS clause .. join the same query with INNER join of you not in subquery
and last be careful you have proper composite index on table
Q_COKE_AP_CHECKS_SIGN_STATUS_V
cols (plan_id,COKE_SIGNATURE__A , COKE_SIGNATURE__B, COKE_AUDIT, COKE_CHQ_NUMBER, COKE_PAY_SUPPLIER)
SELECT DISTINCT
COKE_CHQ_NUMBER,
COKE_PAY_SUPPLIER
FROM
apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
WHERE
plan_id = 40192
AND COKE_SIGNATURE__A = 'YES'
AND COKE_SIGNATURE__B = 'YES'
AND COKE_AUDIT = 'YES'
MINUS
SELECT DISTINCT
COKE_CHQ_NUMBER,
COKE_PAY_SUPPLIER
FROM apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
INNER JOIN (
SELECT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V
) T ON T.COKE_CHQ_NUMBER_DELIVER = apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
WHERE
plan_id = 40192
AND COKE_SIGNATURE__A = 'YES'
AND COKE_SIGNATURE__B = 'YES'
AND COKE_AUDIT = 'YES'

SQL Column Parameter Bind Variable with either or logic

I apologize if the title is misleading.
I'm trying to avoid using two different queries. With that in mind,
I have the following sample query
SELECT COUNT (*) COUNT,
SUM (AMT) AS DED_AMT,
SUM (SURCOST),
SUM (DEALSUM),
NVL (TO_CHAR (SUM (RETAIL)), 'N/A') AS RETAIL,
MNFCID
FROM (SELECT B.ID, B.CD, A.*
FROM OUTPUTS_A A JOIN OUTPUTS_B B ON A.ID = B.ID
WHERE B.ID = :ID AND B.CD = UPPER (:CD))
GROUP BY ID;
that returns a result you can see in the first screenshot.
Notice, I'm passing two bind variables in the query, :ID, :CD. They have to go together and that's why I'm using AND operator there.
Sometimes, I have MFCID only and not :ID and :CD.
This is the logic I'm thinking about.
I would like to modify the query such a way that I should be able to pass MFCID as a bind variable. Let's say :mfcid is the variable I'm passing.
If I have the values for :ID and :CD handy, I will pass those values and pass nothing for :mfcid. (Nothing in a sense that I won't pass anything. This field CAN'T be null)
If I only have the value for :mfcid handy, I will pass that value and pass nothing for :ID and :CD. (Nothing in a sense that I won't pass anything. This field CAN'T be null)
Either way it should return me the same result.
I have tried putting it this way: AND B.MNFCID = NVL(:MNFCID, B.MNFCID) but it takes forever because it's always true if don't pass anything.
Does this do what you want?
SELECT COUNT(*) COUNT, SUM(AMT) AS DED_AMT, SUM(SURCOST),
SUM(DEALSUM), NVL(TO_CHAR(SUM(RETAIL)), 'N/A') AS RETAIL,
MNFCID
FROM OUTPUTS_A A JOIN
OUTPUTS_B B
ON A.ID = B.ID
WHERE (B.ID = :ID OR :ID IS NULL) AND
(B.CD = UPPER(:CD) OR :CD IS NULL)
GROUP BY B.ID;
If this doesn't work -- and it might not -- you may want to use dynamic SQL so the appropriate indexes can more readily be used.

How to use a variable AS a where clause?

I have one where clause which I have to use multiple times. I am quite new to Oracle SQL, so please forgive me for my newbe mistakes :). I have read this website, but could not find the answer :(. Here's the SQL statement:
var condition varchar2(100)
exec :condition := 'column 1 = 1 AND column2 = 2, etc.'
Select a.content, b.content
from
(Select (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,3)) as content
from table_name
where category = X AND :condition
group by (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,3))
) A
,
(Select (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,100)) as content
from table_name
where category = Y AND :condition
group by (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,100))) B
GROUP BY
a.content, b.content
The content field is a CLOB field and unfortunately all values needed are in the same column. My query does not work ofcourse.
You can't use a bind variable for that much of a where clause, only for specific values. You could use a substitution variable if you're running this in SQL*Plus or SQL Developer (and maybe some other clients):
define condition = 'column 1 = 1 AND column2 = 2, etc.'
Select a.content, b.content
from
(Select (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,3)) as content
from table_name
where category = X AND &condition
...
From other places, including JDBC and OCI, you'd need to have the condition as a variable and build the query string using that, so it's repeated in the code that the parser sees. From PL/SQL you could use dynamic SQL to achieve the same thing. I'm not sure why just repeating the conditions is a problem though, binding arguments if values are going to change. Certainly with two clauses like this it seems a bit pointless.
But maybe you could approach this from a different angle and remove the need to repeat the where clause. Querying the table twice might not be efficient anyway. You could apply your condition once as a subquery, but without knowing your indexes or the selectivity of the conditions this could be worse:
with sub_table as (
select category, content
from my_table
where category in (X, Y)
and column 1 = 1 AND column2 = 2, etc.
)
Select a.content, b.content
from
(Select (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,3)) as content
from sub_table
where category = X
group by (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,3))
) A
,
(Select (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,100)) as content
from sub_table
where category = Y
group by (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,100))) B
GROUP BY
a.content, b.content
I'm not sure what the grouping is for - to eliminate duplicates? This only really makes sense if you have a single X and Y record matching the other conditions, doesn't it? Maybe I'm not following it properly.
You could also use a case statement:
select max(content_x), max(content_y)
from (
select
case when category = X
then DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,3) end as content_x,
case when category = Y
then DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,100) end as content_y,
from my_table
where category in (X, Y)
and column 1 = 1 AND column2 = 2, etc.
)