BQ command line: Query too large - google-bigquery

i am running a query from command line to store in a new table. In this query I have several subqueries, each accessing multiple tables with TABLE_DATE_RANGE.
For each table stub there is one table per day. SO there are 4 subqueries, each accessign 180 Tables (90 days in two TABLE_DATE_RANGE queries). This amounts to 720 Tables total. So I should not max out the 1k table limit.
I have maxed out the 1k table limit before and got an error stating "too many tables" or similar.
This query however gives me the error "Query too large". As you can see below, I do allow large results. Does anyone know a solution to this?
bq query -n0 --allow_large_results --replace --destination_table="cdate-prod:crm_adhoc.tmp_email_details_event_date" 'select event_date
,contact_id
,message_name
,message_name_join
,message_id
,email
,REGEXP_EXTRACT(email,r'([^#]*$)') as email_domain
,REGEXP_EXTRACT(REGEXP_EXTRACT(email,r'([^#]*$)'),r'(^[^\.]*)') as email_provider
,sent
,sent_unique_hlp
,open
,open_unique_hlp
,open_unique_msg_hlp
,click
,click_unique_hlp
,click_unique_msg_hlp
,soft_bounce
,medium_bounce
,hard_bounce
,activity
,type
,case when type = 1 then 'PPM'
when type = 2 then 'NPM'
when type = 3 then 'PENDING'
when type = 4 then 'CB'
when type = 5 then 'REDEBIT'
when type = 6 then 'INTCO'
when type = 7 then 'EXTCO'
else 'XX'
end as type_str
from
(select send_date as event_date
,contact_id
,message_name
,substr(message_name,7) as message_name_join
,message_id
,email
, 1 as sent
, contact_id as sent_unique_hlp
, 0 as open
, string('') as open_unique_hlp
, string('') as open_unique_msg_hlp
, 0 as click
, string('') as click_unique_hlp
, string('') as click_unique_msg_hlp
, 0 as soft_bounce
, 0 as medium_bounce
, 0 as hard_bounce
, IFNULL(activity,0) as activity
, IFNULL(type,0) as type
from TABLE_DATE_RANGE(crm_data.campaign_messages,date_add(CURRENT_DATE(),-90,"day"),date_add(CURRENT_DATE(),-1,"day")),
TABLE_DATE_RANGE(crm_data.interface_messages,date_add(CURRENT_DATE(),-90,"day"),date_add(CURRENT_DATE(),-1,"day"))) ms,
(select open_date as event_date
,contact_id
,message_name
,substr(message_name,7) as message_name_join
,message_id
,email
, 0 as sent
, string('')as sent_unique_hlp
, 1 as open
, contact_id as open_unique_hlp
, concat(contact_id,string(TIMESTAMP_TO_MSEC(send_date))) open_unique_msg_hlp
, 0 as click
, string('') as click_unique_hlp
, string('') as click_unique_msg_hlp
, 0 as soft_bounce
, 0 as medium_bounce
, 0 as hard_bounce
, IFNULL(activity,0) as activity
, IFNULL(type,0) as type
from TABLE_DATE_RANGE(crm_data.interface_openings,date_add(CURRENT_DATE(),-90,"day"),date_add(CURRENT_DATE(),-1,"day")),
TABLE_DATE_RANGE(crm_data.campaign_openings,date_add(CURRENT_DATE(),-90,"day"),date_add(CURRENT_DATE(),-1,"day"))) op,
(select click_date as event_date
,contact_id
,message_name
,substr(message_name,7) as message_name_join
,message_id
,email
, 0 as sent
, string('')as sent_unique_hlp
, 0 as open
, string('') as open_unique_hlp
, string('') as open_unique_msg_hlp
, 1 as click
, contact_id as click_unique_hlp
, concat(contact_id,string(TIMESTAMP_TO_MSEC(send_date))) click_unique_msg_hlp
, 0 as soft_bounce
, 0 as medium_bounce
, 0 as hard_bounce
, IFNULL(activity,0) as activity
, IFNULL(type,0) as type
from TABLE_DATE_RANGE(crm_data.interface_clicks,date_add(CURRENT_DATE(),-90,"day"),date_add(CURRENT_DATE(),-1,"day")),
TABLE_DATE_RANGE(crm_data.campaign_clicks,date_add(CURRENT_DATE(),-90,"day"),date_add(CURRENT_DATE(),-1,"day"))) cl,
(select bounce_date as event_date
,contact_id
,message_name
,substr(message_name,7) as message_name_join
,message_id
,email
, 0 as sent
, string('')as sent_unique_hlp
, 0 as open
, string('') as open_unique_hlp
, string('') as open_unique_msg_hlp
, 0 as click
, string('') as click_unique_hlp
, string('') as click_unique_msg_hlp
,case when bounce_category = 1 then 1 end soft_bounce
,case when bounce_category = 2 then 1 end medium_bounce
,case when bounce_category in (3,4,5) then 1 end hard_bounce
, IFNULL(activity,0) as activity
, IFNULL(type,0) as type
from TABLE_DATE_RANGE(crm_data.interface_bounces,date_add(CURRENT_DATE(),-90,"day"),date_add(CURRENT_DATE(),-1,"day")),
TABLE_DATE_RANGE(crm_data.campaign_bounces,date_add(CURRENT_DATE(),-90,"day"),date_add(CURRENT_DATE(),-1,"day"))) bo'
Waiting on bqjob_r71fbdcc95fa950e5_0000014f82f52d18_1 ... (0s) Current status: RUNNING
Waiting on bqjob_r71fbdcc95fa950e5_0000014f82f52d18_1 ... (1s) Current status: RUNNING
Waiting on bqjob_r71fbdcc95fa950e5_0000014f82f52d18_1 ... (1s) Current status: DONE
Error in query string: Error processing job
'cdate-prod:bqjob_r71fbdcc95fa950e5_0000014f82f52d18_1': Query too large

My guess, from what I know:
A query can only be up to x characters long. The query presented here is shorter than that, but...
TABLE_DATE_RANGE works by internally expanding the query to contain explicitly all the in-range table names. This is usually fine, but...
This query refers to 720 tables. The query will be expanded by explicitly mentioning 720*length(table_name). That pushes it over the limit.
Suggestion: Could you union older tables into monthly entities instead of daily ones?

Related

Total customer per reporting date without union

I would like to display to run this report where I show the total number of customers per reporting date. Here is a how I need the data to look like:
My original dataset look like this (please see query): In order to calculate the number of customers. I need to use the start and end date: if Start_Date>reporting_date and End_Date<=reporting_date then count as a customer.
I was able to develop a script, but it only gives me the total number of customers for only one reporting date.
select '2022-10-31' reporting_date, count(case when Start_Date>'2022-10-31' and End_Date<='2022-10-31' then Customer_ID end)
from (values ('2022-10-14','2022-8-19','0010Y654012P6KuQAK')
, ('2022-3-15','2022-9-14','0011v65402PoSpVAAV')
, ('2021-1-11','2022-10-11','0010Y654012P6DuQAK')
, ('2022-12-1','2022-5-14','0011v65402u7muLAAQ')
, ('2021-1-30','2022-3-14','0010Y654012P6DuQAK')
, ('2022-10-31','2022-2-14','0010Y654012P6PJQA0')
, ('2021-10-31','US','0010Y654012P6PJQA0')
, ('2021-5-31','2022-5-14','0011v65402x8cjqAAA')
, ('2022-6-2','2022-1-13','0010Y654016OqkJQAS')
, ('2022-1-1','2022-11-11','0010Y654016OqIaQAK')
) a(Start_Date ,End_Date ,Customer_ID)
Is there a way to amend the code with cross-join or other workarounds to the total customers per reporting date without doing many unions
select '2022-10-31' reporting_date, count(case when Start_Date>'2022-10-31' and End_Date<='2022-10-31' then Customer_ID end)
from (values ('2022-10-14','2022-8-19','0010Y654012P6KuQAK')
, ('2022-3-15','2022-9-14','0011v65402PoSpVAAV')
, ('2021-1-11','2022-10-11','0010Y654012P6DuQAK')
, ('2022-12-1','2022-5-14','0011v65402u7muLAAQ')
, ('2021-1-30','2022-3-14','0010Y654012P6DuQAK')
, ('2022-10-31','2022-2-14','0010Y654012P6PJQA0')
, ('2021-10-31','US','0010Y654012P6PJQA0')
, ('2021-5-31','2022-5-14','0011v65402x8cjqAAA')
, ('2022-6-2','2022-1-13','0010Y654016OqkJQAS')
, ('2022-1-1','2022-11-11','0010Y654016OqIaQAK')
) a(Start_Date ,End_Date ,Customer_ID)
UNION ALL
select '2022-9-30' reporting_date, count(case when Start_Date>'2022-9-301' and End_Date<='2022-9-30' then Customer_ID end)
from (values ('2022-10-14','2022-8-19','0010Y654012P6KuQAK')
, ('2022-3-15','2022-9-14','0011v65402PoSpVAAV')
, ('2021-1-11','2022-10-11','0010Y654012P6DuQAK')
, ('2022-12-1','2022-5-14','0011v65402u7muLAAQ')
, ('2021-1-30','2022-3-14','0010Y654012P6DuQAK')
, ('2022-10-31','2022-2-14','0010Y654012P6PJQA0')
, ('2021-10-31','US','0010Y654012P6PJQA0')
, ('2021-5-31','2022-5-14','0011v65402x8cjqAAA')
, ('2022-6-2','2022-1-13','0010Y654016OqkJQAS')
, ('2022-1-1','2022-11-11','0010Y654016OqIaQAK')
) a(Start_Date ,End_Date ,Customer_ID)
It is possible to provide date ranges as a separate table/subquery, join to the actual data and perform grouping:
select s.start_d, s.end_d, COUNT(Customer_ID) AS total
FROM (SELECT '2022-10-31'::DATE, '2022-10-31'::DATE
UNION SELECT '2022-09-30', '2022-09-30')
AS s(start_d, end_d)
LEFT JOIN (values ('2022-10-14','2022-8-19','0010Y654012P6KuQAK')
, ('2022-3-15','2022-9-14','0011v65402PoSpVAAV')
, ('2021-1-11','2022-10-11','0010Y654012P6DuQAK')
, ('2022-12-1','2022-5-14','0011v65402u7muLAAQ')
, ('2021-1-30','2022-3-14','0010Y654012P6DuQAK')
, ('2022-10-31','2022-2-14','0010Y654012P6PJQA0')
, ('2021-10-31','2021-10-31','0010Y654012P6PJQA0')
, ('2021-5-31','2022-5-14','0011v65402x8cjqAAA')
, ('2022-6-2','2022-1-13','0010Y654016OqkJQAS')
, ('2022-1-1','2022-11-11','0010Y654016OqIaQAK')
) a(Start_Date ,End_Date ,Customer_ID)
ON a.Start_Date>s.start_d and a.End_Date<=s.end_d
GROUP BY s.start_d, s.end_d;
Output:

CTE with paging returning random value from subquery

I have a table with contact(contacts) information and another relationship table with key/value data for the contact table (customdata), the custom data rows are not unique, they can repeat, and they have a creation date as well.
I have a CTE querying the contacts, pretty simple, but I also want to return subquery column with from particular key, and the value from this query happens to contain a date, stored as varchar, and since this table does not contain unique rows I'm using TOP 1 and sorting by the row creation date.
The issue I'm having is that the value from the custom data table is returning random values and to top it off its not sorting correctly when casted to date.
WITH Customers_CTE AS(
SELECT row_number() over (ORDER BY (SELECT TOP 1 CONVERT(DATETIME, data_value, 101)
FROM CustomData
WHERE (data_cust_id = Customers.cust_id AND data_key = 'Sign Date' AND ISDATE(data_value) = 1) ORDER BY data_created DESC) DESC) AS rowNum,
COUNT(*) OVER (PARTITION BY NULL) AS [RowCount],
Customers.FirstName,
Customers.LastName,
(SELECT TOP 1 CONVERT(DATETIME, data_value, 101)
FROM CustomData
WHERE (data_cust_id = Customers.cust_id AND data_key = 'Sign Date' AND ISDATE(data_value) = 1) ORDER BY data_created DESC) AS DateSigned
FROM Customers)
SELECT * FROM Customers_CTE
WHERE rowNum >= 0 and rowNum < 10
Data sample
CUSTOMERS
cust_id, cust_firstname, cust_lastname
--------------------------------------
1 , john , doe
2 , jane , mary
CUSTOM DATA
data_created, data_cust_id, data_key , data_value
------------------------------------------------------
2018-04-06 , 1 , 'Sign Date' , '2018-03-17'
2018-04-06 , 1 , 'Agreed' , 'Yes'
2019-03-12 , 1 , 'Renew Date' , '2019-01-25'
2020-04-11 , 2 , 'Sign Date' , '2020-03-28'
2020-04-11 , 2 , 'Agreed' , 'Yes'
2020-06-07 , 1 , 'Sign Date' , '2020-05-13'
2020-10-21 , 2 , 'Sign Date' , '2020-09-15'
RESULT
FirstName , LastName , DateSigned
-------------------------------------
jane , mary , 2020-09-15
john , doe , 2020-05-13
I'm struggling to see what's going on with your CTE or what you are trying to do, it's just wriddled with syntax errors or issues.
If this helps, to get your desired output from your sample data, you just need the following - if I've understood your sample data you just want the maximum date value for each customer Id for a particular "key" value:
select c.cust_firstname, c.cust_lastname, d.DateSigned
from customers c
outer apply (
select Max(Try_Convert(date,data_value)) DateSigned
from customdata cd
where cd.data_cust_id=c.cust_id and cd.data_key='Sign date'
)d

How make Excel arbitrary pivoting in MS SQL 2017 (cross join + loops)

Would you help me please, to solve the task below in SQL (MS SQL Server 2017). It is simple in Excel, but seems very complicated in SQL.
There is a table with clients and their activities split by days:
client 1may 2may 3may 4may 5may other days
client1 0 0 0 0 0 ...
client2 0 0 0 0 0 ...
client3 0 0 0 0 0 ...
client4 1 1 1 1 1 ...
client5 1 1 1 0 0 ...
It is necessary to create the same table (the same quantity of rows and columns), but turn the values into new one according to the rule:
Current day value =
A) If all everyday values during a week before the day, including the current one = 1, then 1
B) If all everyday values during a week before the day, including the current one = 0, then 0
C) If the values are different, then we leave the status of the previous day (if the status of the previous day is not known, for example, the Client is new, then 0)
In Excel, I do this using the formula: = IF (AND (AF2 = AE2; AE2 = AD2; AD2 = AC2; AC2 = AB2; AB2 = AA2; AA2 = Z2); current_day_value; IF (previous_day_value = ""; 0; previous_day_value )).
The example with excel file is attached.
Thank you very much.
First thing, it's NEVER a good idea to have dates as columns.
So step #1 transpose your columns to rows. In other world to build a table with three columns
```
client date Value
client1 May1 0
client1 May2 0
client1 May3 0
.... ... ..
client4 May1 1
client4 May2 1
client4 May3 1
.... ... ..
```
step #2 perform all the calculations you need by using the date field.
Basically you put always the status of the previous day, in any case (except null).
So, i would do something like this (oracle syntax, working in sql server too), supposing the first columns is 1may
Insert into newTable (client, 1may,2may,....) select (client, 0, coalesce(1may,0), coalesce (2may,0), .... from oldTable;
Anyway me too i believe is not a good practice to put the days as columns of a relational table.
You're going to struggle with this because most brands of SQL don't allow "arbitrary pivoting", that is, you need to specify the columns you want to be displayed on a pivot - Whereas Excel will just do this for you. SQL can do this but it required dynamic SQL which can get pretty complicated and annoying pretty fast.
I would suggest you use sql just to construct the data, and then excel or SSRS (As you're in TSQL) to actually do the visualization.
Anyway. I think this does what you want:
WITH Data AS (
SELECT * FROM (VALUES
('Client 1',CONVERT(DATE, '2020-05-04'),1)
, ('Client 1',CONVERT(DATE, '2020-05-05'),1)
, ('Client 1',CONVERT(DATE, '2020-05-06'),1)
, ('Client 1',CONVERT(DATE, '2020-05-07'),0)
, ('Client 1',CONVERT(DATE, '2020-05-08'),0)
, ('Client 1',CONVERT(DATE, '2020-05-09'),0)
, ('Client 1',CONVERT(DATE, '2020-05-10'),1)
, ('Client 1',CONVERT(DATE, '2020-05-11'),1)
, ('Client 1',CONVERT(DATE, '2020-05-12'),1)
, ('Client 2',CONVERT(DATE, '2020-05-04'),1)
, ('Client 2',CONVERT(DATE, '2020-05-05'),0)
, ('Client 2',CONVERT(DATE, '2020-05-06'),0)
, ('Client 2',CONVERT(DATE, '2020-05-07'),1)
, ('Client 2',CONVERT(DATE, '2020-05-08'),0)
, ('Client 2',CONVERT(DATE, '2020-05-09'),1)
, ('Client 2',CONVERT(DATE, '2020-05-10'),0)
, ('Client 2',CONVERT(DATE, '2020-05-11'),1)
) x (Client, RowDate, Value)
)
SELECT
Client
, RowDate
, Value
, CASE
WHEN OnesBefore = DaysInWeek THEN 1
WHEN ZerosBefore = DaysInWeek THEN 0
ELSE PreviousDayValue
END As FinalCalculation
FROM (
-- This set uses windowing to calculate the intermediate values
SELECT
*
-- The count of the days present in the data, as part of the week may be missing we can't assume 7
-- We only count up to this day, so its in line with the other parts of the calculation
, COUNT(RowDate) OVER (PARTITION BY Client, WeekCommencing ORDER BY RowDate) AS DaysInWeek
-- Count up the 1's for this client and week, in date order, up to (and including) this date
, COUNT(IIF(Value = 1, 1, NULL)) OVER (PARTITION BY Client, WeekCommencing ORDER BY RowDate) AS OnesBefore
-- Count up the 0's for this client and week, in date order, up to (and including) this date
, COUNT(IIF(Value = 0, 1, NULL)) OVER (PARTITION BY Client, WeekCommencing ORDER BY RowDate) AS ZerosBefore
-- get the previous days value, or 0 if there isnt one
, COALESCE(LAG(Value) OVER (PARTITION BY Client, WeekCommencing ORDER BY RowDate), 0) AS PreviousDayValue
FROM (
-- This set adds a few simple values in that we can leverage later
SELECT
*
, DATEADD(DAY, -DATEPART(DW, RowDate) + 1, RowDate) As WeekCommencing
FROM Data
) AS DataWithExtras
) AS DataWithCalculations
As you haven't specified your table layout, I don't know what table and field names to use in my example. Hopefully if this is correct you can figure out how to click it in place with what you have - If not, leave a comment
I will note as well, I've made this purposely verbose. If you don't know what the "OVER" clause is, you'll need to do some reading: https://www.sqlshack.com/use-window-functions-sql-server/. The gist is they do aggregations without actually crunching the rows together.
Edit: Adjusted the calculation to be able to account for an arbitrary number of days in the week
Thank you so much to everyone, especially to David and Massimo, which prompted me to restructure the data.
--we join clients and dates each with each and label clients with 'active' or 'inactive'
with a as (
select client, dates
from (select distinct client from dbo.clients) a
cross join (select dates from dates) b
)
, b as (
select date
,1 end active
,client
from clients a
join dbo.dates b on a.id = b.id
)
select client
,a.dates
,isnull(b.active, 0) active
into #tmp2
from a
left join b on a.client= b.client and a.dates = b.dates
--declare variables - for date start and for loop
declare #min_date date = (select min(dates) from #tmp2);
declare #n int = 1
declare #row int = (select count(distinct dates) from #tmp2) --number of the loop iterations
--delete data from the final results
delete from final_results
--fill the table with final results
--run the loop (each iteration = analyse of each 1-week range)
while #n<=#row
begin
with a as (
--run the loop
select client
,max(dates) dates
,sum (case when active = 1 then 1 else null end) sum_active
,sum (case when active = 0 then 1 else null end) sum_inactive
from #tmp2
where dates between dateadd(day, -7 + #n, #min_date) and dateadd(day, -1 + #n, #min_date)
group by client
)
INSERT INTO [dbo].[final_results]
(client
,[dates]
,[final_result])
select client
,dates
,case when sum_active = 7 then 1 --rule A
when sum_inactive = 7 then 0 -- rule B
else
(case when isnull(sum_active, 0) + isnull(sum_inactive, 0) < 7 then 0
else
(select final_result
from final_results b
where b.dates = dateadd(day, -1, a.dates)
and a.client= b.client) end
) end
from a
set #n=#n+1
end
if object_id(N'tempdb..#tmp2', 'U') is not null drop table #tmp2

Bigquery similar query different output

I have 2 standard SQL queries in Bigquery. They are:
Query1:
select sfcase.case_id
, sfuser.user_id
, sfcase_create_date
, sfcase_status
, sfcase_origin
, sfcategory_category1
, sfcategory_category2
, sfcase_priority
, sftime_elapsedmin
, sftime_targetmin
, sfcase_sla_closemin
, if(count(sfcomment.parentid)=0,"0"
,if(count(sfcomment.parentid)=1,"1"
,if(count(sfcomment.parentid)=2,"2"
,"3"))) as comment_response
from(
select id as case_id
, timestamp_add(createddate, interval 7 hour) as sfcase_create_date
, status as sfcase_status
, origin as sfcase_origin
, priority as sfcase_priority
, case when status = 'Closed' then timestamp_diff(timestamp_add(closeddate, interval 7 hour),timestamp_add(createddate, interval 7 hour),minute)
end as sfcase_sla_closemin
, case_category__c
from `some_of_my_dataset.cs_case`
) sfcase
left join(
select upper(x1st_category__c) as sfcategory_category1
, upper(x2nd_category__c) as sfcategory_category2
, id
from `some_of_my_dataset.cs_case_category`
) sfcategory
on sfcategory.id = sfcase.case_category__c
left join(
select parentid as parentid
from `some_of_my_dataset.cs_case_comment`
) sfcomment
on sfcase.case_id = sfcomment.parentid
left join(
select ELAPSEDTIMEINMINS as sftime_elapsedmin
, TARGETRESPONSEINMINS as sftime_targetmin
, caseid
from `some_of_my_dataset.cs_case_milestone`
)sftime
on sfcase.case_id = sftime.caseid
left join(
select id as user_id
, createddate
from `some_of_my_dataset.cs_user`
)sfuser
on date(sfuser.createddate) = date(sfcase.sfcase_create_date)
group by 1
, 2
, 3
, 4
, 5
, 6
, 7
, 8
, 9
, 10
, 11
Query2:
select sfcase.id as case_id
, sfuser.id as user_id
, timestamp_add(sfcase.createddate, interval 7 hour) as sf_create_date
, sfcase.status as sf_status
, sfcase.origin as sf_origin
, upper(sfcategory.x1st_category__c) as sf_category1
, sfcategory.x2nd_category__c as sf_category2
, sfcase.priority as sf_priority
, sftime.ELAPSEDTIMEINMINS as sf_elapsedresponsemin
, sftime.TARGETRESPONSEINMINS as sf_targetresponsemin
, case when sfcase.status = 'Closed' then timestamp_diff(timestamp_add(sfcase.closeddate, interval 7 hour),timestamp_add(sfcase.createddate, interval 7 hour),minute)
end as sla_closemin
, if(count(sfcomment.parentid)=0,"0"
,if(count(sfcomment.parentid)=1,"1"
,if(count(sfcomment.parentid)=2,"2"
,"3"))) as comment_response
from `some_of_my_dataset.cs_case` as sfcase
left join `some_of_my_dataset.cs_case_category` as sfcategory
on sfcategory.id = sfcase.case_category__c
left join `some_of_my_dataset.cs_case_comment` as sfcomment
on sfcase.id = sfcomment.parentid
left join `some_of_my_dataset.cs_case_milestone` as sftime
on sfcase.id = sftime.caseid
left join `some_of_my_dataset.cs_user` as sfuser
on date(sfuser.createddate) = date(sfcase.createddate)
group by 1
, 2
, 3
, 4
, 5
, 6
, 7
, 8
, 9
, 10
, 11
I tried to run them in the same time. Query1 perform faster with less rows of data, while Query2 perform longer with more rows of data. Both of Query1 and Query2 have 12 columns.
Why do they return different result?
Which query should i use?
update: rename my dataset

Oracle SQL- Flag records based on record's date vs history

This is my 1st post on the forum. Usually I was able to find what I needed - but to tell the truth - I am not really sure how to ask a correct question to the issue. Therefore, please accept my apologies if there already is an answer on the forum and I missed it.
I am running the following code in an Oracle database via Benthic Software:
SELECT
T1."REGION"
, T1."COUNTRY"
, T1."IDNum"
, T1."CUSTOMER"
, T1."BUSSINESS"
, T3."FISCALYEARMONTH"
, T3."FISCALYEAR"
, SUM(T4."VALUE")
,"HISTORICAL_PURCHASE_FLAG"
FROM
"DATABASE"."SALES" T4
, "DATABASE"."CUSTOMER" T1
, "DATABASE"."PRODUCT" T2
, "DATABASE"."TIME" T3
WHERE
T4."CUSTOMERID" = T1."CUSTOMERID"
AND T4."PRODUCTID" = T2."PRODUCTID"
AND T4."DATEID" = T3."DATEID"
AND T3."FISCALYEAR" IN ('2016')
AND T1."COUNTRY" IN ('ENGLAND', 'France')
GROUP BY
T1."REGION"
, T1."COUNTRY"
, T1."IDNum"
, T1."CUSTOMER"
, T1."BUSSINESS"
, T3."FISCALYEARMONTH"
, T3."FISCALYEAR"
;
This query provides me with information on transactions. As you can see above, I would like to add a column named "HISTORICAL_PURCHASE_FLAG".
I would like the query to take CUSTOMER and FISCALYEARMONTH. Then, I would like to check if there are any transactions registered for the CUSTOMER, up to 2 years in the past.
So lets say I get the following result:
LineNum REGION COUNTRY IDNum CUSTOMER BUSSINESS FISCALYEARMONTH FISCALYEAR VALUE HISTORICAL_PURCHASE_FLAG
1 Europe ENGLAND 255 Abraxo Cleaner Co. Chemicals 201605 2016 34,567.00
2 Europe FRANCE 123 Metal Trade Heavy 201602 2016 12,500.00
3 Europe ENGLAND 255 Abraxo Cleaner Co. Chemicals 201601 2016 8,400.00
LineNum 1 shows transaction for Abraxo Cleaner Co. registered on 201605. And LineNum 3 is also for Abraxo Cleaner Co. but registered on 201601. What I would need the query to do, is to flag LineNum 1 as 'Existing'. Because there was a previous transaction registered.
On the other hand, LineNum 3 was the first time transactions was registered for Abraxo Cleaner Co. so the line would be flagged as 'New'.
To sum up, I would like for each row of data to be treated individually. And to check if there are any previous records of data for CUSTOMER & FISCALYEARMONTH - 24 months.
Thank you in advance for the help.
You can use LAG function:
SELECT
"REGION"
, "COUNTRY"
, "IDNum"
, "CUSTOMER"
, "BUSSINESS"
, "FISCALYEARMONTH"
, "FISCALYEAR"
, SUM("VALUE")
, MAX(CASE WHEN to_date(prev_fym,'YYYYMM') >= ADD_MONTHS (to_date("FISCALYEARMONTH",'YYYYMM'), -24) THEN 'Existing'
ELSE NULL END) "HISTORICAL_PURCHASE_FLAG"
FROM
(
SELECT
T1."REGION"
, T1."COUNTRY"
, T1."IDNum"
, T1."CUSTOMER"
, T1."BUSSINESS"
, T3."FISCALYEARMONTH"
, T3."FISCALYEAR"
, T4."VALUE"
, LAG ("FISCALYEARMONTH", 1) OVER (PARTITION BY T1."IDNum" ORDER BY T3."FISCALYEARMONTH" DESC) prev_fym
FROM
"DATABASE"."SALES" T4
, "DATABASE"."CUSTOMER" T1
, "DATABASE"."PRODUCT" T2
, "DATABASE"."TIME" T3
WHERE
T4."CUSTOMERID" = T1."CUSTOMERID"
AND T4."PRODUCTID" = T2."PRODUCTID"
AND T4."DATEID" = T3."DATEID"
AND T1."COUNTRY" IN ('ENGLAND', 'France')
AND T3."FISCALYEAR" IN ('2014','2015','2016')
)
WHERE "FISCALYEAR" IN ('2016')
GROUP BY
"REGION"
, "COUNTRY"
, "IDNum"
, "CUSTOMER"
, "BUSSINESS"
, "FISCALYEARMONTH"
, "FISCALYEAR"
;
Using a simplified "input" table... You can use the LAG() analytic function and a comparison condition to populate your last column. I assume your fiscalyearmonth is a number - if it is a character field, wrap fiscalyearmonth within TO_NUMBER(). (It would be much better if in fact you stored these as true Oracle dates, perhaps date 2016-06-01 instead of 201606, but I worked with what you have currently... and took advantage that in numeric format, "24 months ago" simply means "subtract 200").
with inputs (linenum, idnum, fiscalyearmonth) as (
select 1, 255, 201605 from dual union all
select 2, 123, 201602 from dual union all
select 3, 255, 201601 from dual union all
select 4, 255, 201210 from dual
)
select linenum, idnum, fiscalyearmonth,
case when fiscalyearmonth
- lag(fiscalyearmonth)
over (partition by idnum order by fiscalyearmonth) < 200
then 'Existing' else 'New' end as flag
from inputs
order by linenum;
LINENUM IDNUM FISCALYEARMONTH FLAG
---------- ---------- --------------- --------
1 255 201605 Existing
2 123 201602 New
3 255 201601 New
4 255 201210 New
Another solution might be to outer Join "DATABASE"."SALES" T4 a second time as T5, wilter the fiscal year via WHERE to < t4.FiscalYear-2. If the Column is NULL, the record is new, if the outer join results in a value, the record is historic.
You can achieve using row_number() function as below... modify as per your need...I assumed 2 years (means previous 24 months from sysdate ).
You can run the sub-queries separately to check how its working.
Select
"REGION"
,"COUNTRY"
,"IDNum"
,"CUSTOMER"
,"BUSSINESS"
,"FISCALYEARMONTH"
,"FISCALYEAR"
,"VALUE"
, ( case when ( TXNNO = 1 or TOTAL_TXN_LAST24MTH = 0 ) then 'New' else 'Existing' end ) as "HISTORICAL_PURCHASE_FLAG" -- if no txn in last 24 month or its first txn then 'new' else 'existing'
from
(
select
SubQry."REGION"
, SubQry."COUNTRY"
, SubQry."IDNum"
, SubQry."CUSTOMER"
, SubQry."BUSSINESS"
, SubQry."FISCALYEARMONTH"
, SUBQRY."FISCALYEAR"
, SUBQRY."VALUE"
, ROW_NUMBER() over (partition by SUBQRY."REGION",SUBQRY."COUNTRY",SUBQRY."IDNum",SUBQRY."CUSTOMER",SUBQRY."BUSSINESS" order by SUBQRY."FISCALYEARMONTH") as TXNNO,
, SUM(case when (TO_NUMBER(TO_CHAR(sysdate,'YYYYMM')) - SUBQRY."FISCALYEARMONTH") < 24 then 1 else 0 end) as TOTAL_TXN_LAST24MTH
From
(
SELECT
T1."REGION"
, T1."COUNTRY"
, T1."IDNum"
, T1."CUSTOMER"
, T1."BUSSINESS"
, T3."FISCALYEARMONTH"
, T3."FISCALYEAR"
, SUM(T4."VALUE") as VALUE
FROM
"DATABASE"."SALES" T4
, "DATABASE"."CUSTOMER" T1
, "DATABASE"."PRODUCT" T2
, "DATABASE"."TIME" T3
WHERE
T4."CUSTOMERID" = T1."CUSTOMERID"
AND T4."PRODUCTID" = T2."PRODUCTID"
AND T4."DATEID" = T3."DATEID"
AND T3."FISCALYEAR" IN ('2016')
AND T1."COUNTRY" IN ('ENGLAND', 'France')
GROUP BY
T1."REGION"
, T1."COUNTRY"
, T1."IDNum"
, T1."CUSTOMER"
, T1."BUSSINESS"
, T3."FISCALYEARMONTH"
, T3."FISCALYEAR"
) SUBQRY
);