Single-column row-set exists in another table or a function returns positive value - sql

I have following table structure: http://sqlfiddle.com/#!4/952e7/1
Now I am looking for a solution for the following problem:
Given an input data-time set (see below). And the SQL statement should return all of business IDs with a given business name, where every single date-times of the input set are either present in the ORDERS table or an additional function's statement is true (these both conditions are separately to be checked for each input date-time).
An example how the input date-time dataset looks like:
WITH DATES_TO_CHECK(DATETIME) AS(SELECT DATE '2021-01-03' FROM DUAL UNION ALL SELECT DATE '2020-04-08' FROM DUAL UNION ALL SELECT DATE '2020-05-07' FROM DUAL)
To be simple, the "additional function" should be a simple random number (if greather than 0.5 than true otherwise false, so the check is dbms_random.value > 0.5).
For one given date time it would look like:
SELECT BN.NAME, BD.ID
FROM BUSINESS_DATA BD, BUSINESS_NAME BN
WHERE BD.NAME_ID=BN.ID AND
BN.NAME='B1' AND
(TO_DATE('2021-01-03', 'YYYY-MM-DD') IN (SELECT OD.ORDERDATE FROM ORDERS OD WHERE OD.BUSINESS_ID=BD.ID)
OR dbms_random.value > 0.5)
ORDER BY BD.ID
Please help me, how this solution can be applied to the input date-time rowset above AND the specified name.

I don't any difference with the question you just deleted
This is the list of businesses named B1 and for which the number of order dates that match date input dates is equal to the number of input dates or dbms_random.value > 0.5
see SQL Fiddle
WITH DATES_TO_CHECK(DATETIME) AS(
SELECT DATE '2021-01-03' FROM DUAL
UNION ALL SELECT DATE '2020-04-08' fROM DUAL
UNION ALL SELECT DATE '2020-05-07' fROM DUAL
),
businesses_that_match as (
select
od.BUSINESS_ID, count(distinct OD.ORDERDATE)
from DATES_TO_CHECK dtc
left join ORDERS od on OD.ORDERDATE = dtc.datetime
group by od.BUSINESS_ID
having count(distinct OD.ORDERDATE) = (select count(distinct DATETIME) from DATES_TO_CHECK)
)
SELECT
BN.NAME, BD.ID
FROM BUSINESS_DATA BD
inner join BUSINESS_NAME BN on BD.NAME_ID=BN.ID
left join businesses_that_match btm on btm.BUSINESS_ID = bd.id
where bn.name = 'B1'
AND (btm.BUSINESS_ID is not null
OR dbms_random.value > 0.5
)

Related

SQL how to count reocrds from date

I would like it to create a NUMBER column where the records for each date will be counted. So, for example, how many NRBs are there in 2021-10. However, when I choose count it gets such a result, sum cannot be because these are not numbers but an identification number
Here is my result:
Here my code:
PROC SQL; <- FIRST QUERY
create table PolisyEnd as
select distinct
datepart(t1.data_danych) as DATA_DANYCH format yymmdd10.
,(t4.spr_NRB) as NRB
,datepart(t1.PRP_END_DATE) as PRP_END_DATE format yymmdd10.
,datepart(t1.PRP_END_DATE) as POLICY_VINTAGE format yymmd7.,
case
when datepart(t1.PRP_END_DATE) IS NOT NULL and datepart(t1.PRP_END_DATE) - &gv_date_dly. < 0 THEN 'WYGASLA'
when datepart(t1.PRP_END_DATE) IS NOT NULL and datepart(t1.PRP_END_DATE) - &gv_date_dly. >= 0 and datepart(t1.PRP_END_DATE) - &gv_date_dly. <=7 THEN 'UWAGA'
when datepart(t1.PRP_END_DATE) IS NOT NULL and datepart(t1.PRP_END_DATE) - &gv_date_dly. >= 30 THEN 'AKTYWNA'
when datepart(t1.PRP_END_DATE) IS NULL THEN 'BRAK INFORMACJI O POLISIE'
end as POLISA_INFORMACJA
from
cmz.WMDTZDP_BH t1
left join
(select distinct kontr_id,obj_oid from cmz.BH_D_ZAB_X_ALOK_&thismonth) t2
on t2.obj_oid = t1.obj_oid
left join
(select distinct data_danych, kontr_id, kre_nrb from dm.BH_WMDTKRE_&thismonth) t3
on t3.kontr_id = t2.kontr_id
left join
(select distinct spr_NRB, spr_STATUS from _mart.mart_kred) t4
on t4.spr_NRB = t3.kre_nrb
where datepart(t1.data_danych) between '5Aug2019'd and &gv_date_dly. and t1.Actual = "T"
and t4.spr_STATUS ="A"
; SECOND CAME FROM FIRST
create table PolisyEnd1 as
select distinct
DATE_
,(POLICY_VINTAGE)
,count(NRB) as NUMBER
,POLISA_INFORMACJA
from PolisyEnd
where INFORMATION ="U"
;
Quit;
EDIT 1 :
I got the result, but how to do so that for 2021-11 there is one result and summed up all records for this period
Rather than using a distinct here what you really want is a GROUP BY.
PROC SQL;
create table PolisyEnd1 as
select
DATE_
,(POLICY_VINTAGE)
,count(NRB) as NUMBER
,POLISA_INFORMACJA
from PolisyEnd
where INFORMATION ="U"
group by DATE_, (POLICY_VINTAGE), POLISA_INFORMACJA
;
Quit;
You can use group by
If you want to count just based on the DATE_ column here is an example
select DATE_, count(NRB) as NUMBER
from PolisyEnd
where INFORMATION ="U"
group by DATE_
Otherwise, you can add other columns also in the group by and select clause.
For Edit1:
For each month you can use this:
select POLICY_VINTAGE, SUM(NUMBER) as NUMBER
from Your_Table
group by POLICY_VINTAGE

SQL - Group values by range

I have following query:
SELECT
polutionmm2 AS metric,
sum(cnt) as value
FROM polutiondistributionstatistic as p inner join crates as c on p.crateid = c.id
WHERE
c.name = '154'
and to_timestamp(startts) >= '2021/01/20 00:00:00' group by polutionmm2
this query returns these values:
"metric","value"
50,580
100,8262
150,1548
200,6358
250,869
300,3780
350,505
400,2248
450,318
500,1674
550,312
600,7420
650,1304
700,2445
750,486
800,985
850,139
900,661
950,99
1000,550
I would need to edit the query in a way that it groups them toghether in ranges of 100, starting from 0. So everything that has a metric value between 0 and 99 should be one row, and the value the sum of the rows... like this:
"metric","value"
0,580
100,9810
200,7227
300,4285
400,2556
500,1986
600,8724
700,2931
800,1124
900,760
1000,550
The query will run over about 500.000 rows.. Can this be done via query? Is it efficient?
EDIT:
there can be up to 500 ranges, so an automatic way of grouping them would be great.
You can use generate_series() and a range type to generate the the ranges you want, e.g.:
select int4range(x.start, case when x.start = 1000 then null else x.start + 100 end, '[)') as range
from generate_series(0,1000,100) as x(start)
This generates the ranges [0,100), [100,200) and so on up until [1000,).
You can adjust the width and the number of ranges by using different parameters for generate_series() and adjusting the expression that evaluates the last range
This can be used in an outer join to aggregate the values per range:
with ranges as (
select int4range(x.start, case when x.start = 1000 then null else x.start + 100 end, '[)') as range
from generate_series(0,1000,100) as x(start)
)
select r.range as metric,
sum(t.value)
from ranges r
left join the_table t on r.range #> t.metric
group by range;
The expression r.range #> t.metric tests if the metric value falls into the (generated) range
Online example
You can create a Pseudo table with interval you like and join with that table.
I'll use recursive CTE for this case.
WITH RECURSIVE cte AS(
select 0 St, 99 Ed
UNION ALL
select St + 100, Ed + 100 from cte where St <= 1000
)
select cte.st as metric,sum(tb.value) as value from cte
inner join [tableName] tb --with OP query result
on tb.metric between cte.St and cte.Ed
group by cte.st
order by st
here is DB<>fiddle with some pseudo data.
use conditional aggregation
SELECT
case when polutionmm2>=0 and polutionmm2<100 then '100'
when polutionmm2>=100 and polutionmm2<200 then '200'
........
when polutionmm2>=900 and polutionmm2<1000 then '1000'
end AS metric,
sum(cnt) as value
FROM polutiondistributionstatistic as p inner join crates as c on p.crateid = c.id
WHERE
c.name = '154'
and to_timestamp(startts) >= '2021/01/20 00:00:00'
group by case when polutionmm2>=0 and polutionmm2<100 then '100'
when polutionmm2>=100 and polutionmm2<200 then '200'
........
when polutionmm2>=900 and polutionmm2<1000 then '1000'
end

select statement with subqueries against two databases

I have the below code to show what I am "trying" to accomplish in a stored procedure:
select * from
(
select to_char(sum(aa.amount))
from additional_amount aa, status st
where aa.int_tran_id = st.int_tran_id
and st.stage in ('ACHPayment_Confirmed')
and aa.entry_timestamp > (
select to_date(trunc(last_day(add_months(sysdate,-1))+1), 'DD-MON-RR') AS "day 1"
from dual
)
)
UNION ALL
(
select distinct it.debit_acct as "debit_accounts"
from internal_transactions it
where it.debit_acct IN ( select texe_cnasupro
from service.kndtexe, service.kndtctc
where texe_cncclipu = tctc_cncclipu
and tctc_cntipcli = 'C'
)
)
union all
(select distinct it.credit_acct as "credit_account"
from internal_transactions it
where it.credit_acct IN (select texe_cnasupro
from service.kndtexe, service.kndtctc
where texe_cncclipu = tctc_cncclipu
and tctc_cntipcli = 'C'
)
)
;
Output:
TO_CHAR(SUM(AA.AMOUNT))
----------------------------------------
130250292.22
6710654504
0000050334
2535814905
0007049560
5 rows selected
The top row of the output is what I need in the SP as output based on the below two queries which I am guessing needs to be sub-queried against the top select statement.
The top select is to select the sum of the amount a table with a join against another table for filtering (output:130250292.22).
The second and third selects is actually to check that the accounts in the internal_transactions table are signed up for the corresponding two tables in the service db which is a different db on the same server(owned by the same application).
The tables in the "service" db do not have the same common primary keys as in the first select which is against the same database.
Thank you for your help!
I don't understand your question, but I do know you can simplify this bit:
to_date(trunc(last_day(add_months(sysdate,-1))+1), 'DD-MON-RR') AS "day 1"
to this
trunc (sysdate, 'mm')
and you don't need a SELECT from DUAL to do that either.
and aa.entry_timestamp > trunc (sysdate, 'mm')

Find all intersections of all sets of ranges in PostgreSQL

I'm looking for an efficient way to find all the intersections between sets of timestamp ranges. It needs to work with PostgreSQL 9.2.
Let's say the ranges represent the times when a person is available to meet. Each person may have one or more ranges of times when they are available. I want to find all the time periods when a meeting can take place (ie. during which all people are available).
This is what I've got so far. It seems to work, but I don't think it's very efficient, since it considers one person's availability at a time.
WITH RECURSIVE td AS
(
-- Test data. Returns:
-- ["2014-01-20 00:00:00","2014-01-31 00:00:00")
-- ["2014-02-01 00:00:00","2014-02-20 00:00:00")
-- ["2014-04-15 00:00:00","2014-04-20 00:00:00")
SELECT 1 AS entity_id, '2014-01-01'::timestamp AS begin_time, '2014-01-31'::timestamp AS end_time
UNION SELECT 1, '2014-02-01', '2014-02-28'
UNION SELECT 1, '2014-04-01', '2014-04-30'
UNION SELECT 2, '2014-01-15', '2014-02-20'
UNION SELECT 2, '2014-04-15', '2014-05-05'
UNION SELECT 3, '2014-01-20', '2014-04-20'
)
, ranges AS
(
-- Convert to tsrange type
SELECT entity_id, tsrange(begin_time, end_time) AS the_range
FROM td
)
, min_max AS
(
SELECT MIN(entity_id), MAX(entity_id)
FROM td
)
, inter AS
(
-- Ranges for the lowest ID
SELECT entity_id AS last_id, the_range
FROM ranges r
WHERE r.entity_id = (SELECT min FROM min_max)
UNION ALL
-- Iteratively intersect with ranges for the next higher ID
SELECT entity_id, r.the_range * i.the_range
FROM ranges r
JOIN inter i ON r.the_range && i.the_range
WHERE r.entity_id > i.last_id
AND NOT EXISTS
(
SELECT *
FROM ranges r2
WHERE r2.entity_id < r.entity_id AND r2.entity_id > i.last_id
)
)
-- Take the final set of intersections
SELECT *
FROM inter
WHERE last_id = (SELECT max FROM min_max)
ORDER BY the_range;
I created the tsrange_interception_agg aggregate
create function tsrange_interception (
internal_state tsrange, next_data_values tsrange
) returns tsrange as $$
select internal_state * next_data_values;
$$ language sql;
create aggregate tsrange_interception_agg (tsrange) (
sfunc = tsrange_interception,
stype = tsrange,
initcond = $$[-infinity, infinity]$$
);
Then this query
with td (id, begin_time, end_time) as
(
values
(1, '2014-01-01'::timestamp, '2014-01-31'::timestamp),
(1, '2014-02-01', '2014-02-28'),
(1, '2014-04-01', '2014-04-30'),
(2, '2014-01-15', '2014-02-20'),
(2, '2014-04-15', '2014-05-05'),
(3, '2014-01-20', '2014-04-20')
), ranges as (
select
id,
row_number() over(partition by id) as rn,
tsrange(begin_time, end_time) as tr
from td
), cr as (
select r0.tr tr0, r1.tr as tr1
from ranges r0 cross join ranges r1
where
r0.id < r1.id and
r0.tr && r1.tr and
r0.id = (select min(id) from td)
)
select tr0 * tsrange_interception_agg(tr1) as interseptions
from cr
group by tr0
having count(*) = (select count(distinct id) from td) - 1
;
interseptions
-----------------------------------------------
["2014-02-01 00:00:00","2014-02-20 00:00:00")
["2014-01-20 00:00:00","2014-01-31 00:00:00")
["2014-04-15 00:00:00","2014-04-20 00:00:00")
If you have a fixed number of entities you want to cross reference, you can use a cross join for each of them, and build the intersection (using the * operator on ranges).
Using a cross join like this is probably less efficient, though. The following example has more to do with explaining the more complex example below.
WITH td AS
(
SELECT 1 AS entity_id, '2014-01-01'::timestamp AS begin_time, '2014-01-31'::timestamp AS end_time
UNION SELECT 1, '2014-02-01', '2014-02-28'
UNION SELECT 1, '2014-04-01', '2014-04-30'
UNION SELECT 2, '2014-01-15', '2014-02-20'
UNION SELECT 2, '2014-04-15', '2014-05-05'
UNION SELECT 4, '2014-01-20', '2014-04-20'
)
,ranges AS
(
-- Convert to tsrange type
SELECT entity_id, tsrange(begin_time, end_time) AS the_range
FROM td
)
SELECT r1.the_range * r2.the_range * r3.the_range AS r
FROM ranges r1
CROSS JOIN ranges r2
CROSS JOIN ranges r3
WHERE r1.entity_id=1 AND r2.entity_id=2 AND r3.entity_id=4
AND NOT isempty(r1.the_range * r2.the_range * r3.the_range)
ORDER BY r
In this case a multiple cross join is probably less efficient because you don't actually need to have all the possible combinations of every range in reality, since isempty(r1.the_range * r2.the_range) is enough to make isempty(r1.the_range * r2.the_range * r3.the_range) true.
I don't think you can avoid going through each person's availability at time, since you want them all to be meet anyway.
What may help is to build the set of intersections incrementally, by cross joining each person's availability to the previous subset you've calculated using another recursive CTE (intersections in the example below). You then build the intersections incrementally and get rid of the empty ranges, both stored arrays:
WITH RECURSIVE td AS
(
SELECT 1 AS entity_id, '2014-01-01'::timestamp AS begin_time, '2014-01-31'::timestamp AS end_time
UNION SELECT 1, '2014-02-01', '2014-02-28'
UNION SELECT 1, '2014-04-01', '2014-04-30'
UNION SELECT 2, '2014-01-15', '2014-02-20'
UNION SELECT 2, '2014-04-15', '2014-05-05'
UNION SELECT 4, '2014-01-20', '2014-04-20'
)
,ranges AS
(
-- Convert to tsrange type
SELECT entity_id, tsrange(begin_time, end_time) AS the_range
FROM td
)
,ranges_arrays AS (
-- Prepare an array of all possible intervals per entity
SELECT entity_id, array_agg(the_range) AS ranges_arr
FROM ranges
GROUP BY entity_id
)
,numbered_ranges_arrays AS (
-- We'll join using pos+1 next, so we want continuous integers
-- I've changed the example entity_id from 3 to 4 to demonstrate this.
SELECT ROW_NUMBER() OVER () AS pos, entity_id, ranges_arr
FROM ranges_arrays
)
,intersections (pos, subranges) AS (
-- We start off with the infinite range.
SELECT 0::bigint, ARRAY['[,)'::tsrange]
UNION ALL
-- Then, we unnest the previous intermediate result,
-- cross join it against the array of ranges from the
-- next row in numbered_ranges_arrays (joined via pos+1).
-- We take the intersection and remove the empty array.
SELECT r.pos,
ARRAY(SELECT x * y FROM unnest(r.ranges_arr) x CROSS JOIN unnest(i.subranges) y WHERE NOT isempty(x * y))
FROM numbered_ranges_arrays r
INNER JOIN intersections i ON r.pos=i.pos+1
)
,last_intersections AS (
-- We just really want the result from the last operation (with the max pos).
SELECT subranges FROM intersections ORDER BY pos DESC LIMIT 1
)
SELECT unnest(subranges) r FROM last_intersections ORDER BY r
I'm not sure whether this is likely to perform better, unfortunately. You'd probably need a larger dataset to have meaningful benchmarks.
OK, I wrote and tested this in TSQL but it should run or at least be close enough for you to translate back, it's all fairly vanilla constructs. Except maybe the between, but that can be broken into a < clause and a > clause. (thanks #Horse)
WITH cteSched AS ( --Schedule for everyone
-- Test data. Returns:
-- ["2014-01-20 00:00:00","2014-01-31 00:00:00")
-- ["2014-02-01 00:00:00","2014-02-20 00:00:00")
-- ["2014-04-15 00:00:00","2014-04-20 00:00:00")
SELECT 1 AS entity_id, '2014-01-01' AS begin_time, '2014-01-31' AS end_time
UNION SELECT 1, '2014-02-01', '2014-02-28'
UNION SELECT 1, '2014-04-01', '2014-04-30'
UNION SELECT 2, '2014-01-15', '2014-02-20'
UNION SELECT 2, '2014-04-15', '2014-05-05'
UNION SELECT 3, '2014-01-20', '2014-04-20'
), cteReq as ( --List of people to schedule (or is everyone in Sched required? Not clear, doesn't hurt)
SELECT 1 as entity_id UNION SELECT 2 UNION SELECT 3
), cteBegins as (
SELECT distinct begin_time FROM cteSched as T
WHERE NOT EXISTS (SELECT entity_id FROM cteReq as R
WHERE NOT EXISTS (SELECT * FROM cteSched as X
WHERE X.entity_id = R.entity_id
AND T.begin_time BETWEEN X.begin_time AND X.end_time ))
) SELECT B.begin_time, MIN(S.end_time ) as end_time
FROM cteBegins as B cross join cteSched as S
WHERE B.begin_time between S.begin_time and S.end_time
GROUP BY B.begin_time
-- NOTE: This assume users do not have schedules that overlap with themselves! That is, nothing like
-- John is available 2014-01-01 to 2014-01-15 and 2014-01-10 to 2014-01-20.
EDIT: Add output from above (when executed on SQL-Server 2008R2)
begin_time end_time
2014-01-20 2014-01-31
2014-02-01 2014-02-20
2014-04-15 2014-04-20

Query which gives list of dates between two date ranges

I am sorry for this but my previous question was not properly framed, so creating another post.
My question is similar to following question:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:14582643282111
I need to write inner query which will give me a list of dates between two date ranges to outer query.
My inner query returns following 2 rows:
SELECT request.REQ_DATE, request.DUE_DATE FROM myTable where id = 100
REQ_DATE DUE_DATE
3/19/2013 3/21/2013
3/8/2013 3/8/2013
So I need inner query which will return following dates to outer query:
3/19/2013
3/20/2013
3/21/2013
3/8/2013
The answer in above post has start date and end date hard coded and in my case, it is coming from other table. So I am trying to write query like this which does not work:
 
Select * from outerTable where my_date in
(
select to_date(r.REQ_DATE) + rownum -1 from all_objects,
(
SELECT REQ_DATE, DUE_DATE
FROM myTable where id = 100
) r
where rownum <= to_date(r.DUE_DATE,'dd-mon-yyyy')-to_date(r.REQ_DATE,'dd-mon-yyyy')+1;
)
with
T_from_to as (
select
trunc(REQ_DATE) as d_from,
trunc(DUE_DATE) as d_to
FROM myTable
where id = 100
),
T_seq as (
select level-1 as delta
from dual
connect by level-1 <= (select max(d_to-d_from) from T_from_to)
)
select distinct d_from + delta
from T_from_to, T_seq
where d_from + delta <= d_to
order by 1