I want to reduce my SQL Query on big Query - sql

I want to fetch data from bigQuery database but I get an error
=>The query is too large. The maximum query length is 256.000K characters, including comments and white space characters.
i will show a part of query which i repeated 21 times
WITH data AS
(
SELECT
IFNULL(department, 'UNKNOWN_DEPARTMENT') AS dept,
> 'C7s'
AS campus,
COUNTIF(task.taskRaised.raisedAt.milliSeconds BETWEEN 1542565800000 AND 1543170599999) AS taskCount_0,
COUNTIF(task.taskRaised.raisedAt.milliSeconds BETWEEN 1542565800000 AND 1543170599999
AND IF (task.deadline.currentEscalationLevel NOT IN
(
'ESC_ACKNOWLEDGEMENT'
)
, task.deadline.currentEscalationLevel, 'NOT_ESCALATED') NOT IN
(
'NOT_ESCALATED'
)
) AS escCount_0,
COUNTIF(task.taskRaised.raisedAt.milliSeconds BETWEEN 1541961000000 AND 1542565799999) AS taskCount_1,
COUNTIF(task.taskRaised.raisedAt.milliSeconds BETWEEN 1541961000000 AND 1542565799999
AND IF (task.deadline.currentEscalationLevel NOT IN
(
'ESC_ACKNOWLEDGEMENT'
)
, task.deadline.currentEscalationLevel, 'NOT_ESCALATED') NOT IN
(
'NOT_ESCALATED'
)
) AS escCount_1,
COUNTIF(task.taskRaised.raisedAt.milliSeconds BETWEEN 1541356200000 AND 1541960999999) AS taskCount_2,
COUNTIF(task.taskRaised.raisedAt.milliSeconds BETWEEN 1541356200000 AND 1541960999999
AND IF (task.deadline.currentEscalationLevel NOT IN
(
'ESC_ACKNOWLEDGEMENT'
)
, task.deadline.currentEscalationLevel, 'NOT_ESCALATED') NOT IN
(
'NOT_ESCALATED'
)
) AS escCount_2
FROM
> `nsimplbigquery.TaskManagement.C7s_*`
WHERE
_TABLE_SUFFIX IN
(
'2018_47_11',
'2018_45_11',
'2018_46_11'
)
AND IFNULL(department, 'UNKNOWN_DEPARTMENT') IN
(
'ENGG_AND_MAINT_DEPARTMENT',
'FNB_DEPARTMENT',
'TELECOM_DEPARTMENT',
'IT_DEPARTMENT',
'BILLING_AND_INSURANCE',
'HOUSEKEEPING_DEPARTMENT'
)
AND task.taskRaised.raisedAt.milliSeconds BETWEEN 1541356200000 AND 1543170599999
GROUP BY
dept
)
,
mainQuery AS
(
SELECT
dept,
campus,
SUM(taskCount_0) AS taskCount_0,
SUM(escCount_0) AS escCount_0,
CAST(SAFE_DIVIDE(SUM(escCount_0), SUM(taskCount_0)) * 10000 AS INT64) AS escPerc_0,
SUM(taskCount_1) AS taskCount_1,
SUM(escCount_1) AS escCount_1,
CAST(SAFE_DIVIDE(SUM(escCount_1), SUM(taskCount_1)) * 10000 AS INT64) AS escPerc_1,
SUM(taskCount_2) AS taskCount_2,
SUM(escCount_2) AS escCount_2,
CAST(SAFE_DIVIDE(SUM(escCount_2), SUM(taskCount_2)) * 10000 AS INT64) AS escPerc_2
FROM
data
GROUP BY
ROLLUP (campus, dept)
)
SELECT
dept,
campus,
taskCount_0,
escCount_0,
escPerc_0,
taskCount_1,
escCount_1,
escPerc_1,
taskCount_2,
escCount_2,
escPerc_2
FROM
mainQuery
WHERE
campus IS NOT NULL
ORDER BY
CASE
WHEN
dept IS NULL
THEN
1
ELSE
0
END
ASC, dept ASC, campus ASC;
This is the query which I repeat so many times so can due to I have so many ids Where C7s i changed with following ids
C7z,
C7u,
H0B,
IDp,
ITR,
C7i,
C7j,
C7k,
C7l,
C7m,
C7o,
C71,
C7t,
F6qZ,
C7w,
GIui,
Fs,
C70,
C7p,
C7r
if you see my explainantion i quote a line this nsimplbigquery.TaskManagement.C7s_*
so at next query the table names is changed
like
nsimplbigquery.TaskManagement.C7z_*

Instead of repeating your whole SELECT statement 21 times, rather use below approach. You will have 3x21=63 entries in the that list for _TABLE_SUFFIX - but you will be able to get around your issue with query length
FROM `nsimplbigquery.TaskManagement.*`
WHERE _TABLE_SUFFIX IN (
'C7s_2018_47_11',
'C7s_2018_45_11',
'C7s_2018_46_11',
'C7z_2018_47_11',
'C7z_2018_45_11',
'C7z_2018_46_11',
'C7u_2018_47_11',
'C7u_2018_45_11',
'C7u_2018_46_11',
...
...
...
'C7r_2018_47_11',
'C7r_2018_45_11',
'C7r_2018_46_11',
)

Related

how to delete repeated values in UNION query using count

Hello I have been trying to delete a repeated value on the following UNION query with the following results (image). How can I filter out the value LW_ID=8232 with AANTALLN =0. I need to find a way taht if in the first query AANTALLN >0 is found, then on the second part of the union query not insert it again. Thanks "
With LESEENHEIDLOOPBAAN as (
SELECT
LE_AGENDA_FK,
LE_CODE,
LE_ID,
LE_KLAS_FK,
LE_KLASPARTITIE_FK,
LE_OMSCHRIJVING,
LE_VERANDERDDOOR,
LE_VERANDERDOP,
Count(LH_ID) As AantalLln
FROM
LESEENHEID
INNER JOIN LOOPBAANLESEENHEID on (LH_LESEENHEID_FK = LE_ID)
INNER JOIN LOOPBAAN ON (LH_LOOPBAAN_FK = LB_ID)
WHERE
(
'2022/09/28' BETWEEN LB_VAN
AND LB_TOT
)
AND (
LE_ID in (8277, 8276, 8232)
)
GROUP BY
LE_AGENDA_FK,
LE_CODE,
LE_ID,
LE_KLAS_FK,
LE_KLASPARTITIE_FK,
LE_OMSCHRIJVING,
LE_VERANDERDDOOR,
LE_VERANDERDOP
),
LESEENHEIDLOOPBAANNULL AS (
SELECT
LE_AGENDA_FK,
LE_CODE,
LE_ID,
LE_KLAS_FK,
LE_KLASPARTITIE_FK,
LE_OMSCHRIJVING,
LE_VERANDERDDOOR,
LE_VERANDERDOP,
0 As AantalLln
FROM
LESEENHEID
where
LE_ID in (8277, 8276, 8232)
and EXISTS (
SELECT
*
FROM
LESEENHEIDLOOPBAAN
)
)
SELECT
*
FROM
LESEENHEIDLOOPBAAN
UNION
SELECT
*
FROM
LESEENHEIDLOOPBAANNULL ROWS 1000
Try this out using ROW_NUMBER:
SELECT * FROM (
SELECT
ROW_NUMBER () OVER (PARTITION BY LW_ID ORDER BY AANTALLN DESC) AS RN
,* FROM
(
SELECT * FROM
LESEENHEIDLOOPBAAN
UNION
SELECT
*
FROM
LESEENHEIDLOOPBAANNULL ROWS 1000
)
)
) WHERE RN = 1
This way you eliminate the duplicates.

Select columns maximum and minimum value for all records

I have a table as below; I want to get the column names having maximum and minimum value except population column (ofcourse it will have maximum value) for all records.
State Population age_below_18 age_18_to_50 age_50_above
1 1000 250 600 150
2 4200 400 300 3500
Result :
State Population Maximum_group Minimum_group Max_value Min_value
1 1000 age_18_to_50 age_50_above 600 150
2 4200 age_50_above age_18_to_50 3500 300
Assuming none of the values are NULL, you can use greatest() and least():
select state, population,
(case when age_below_18 = greatest(age_below_18, age_18_to_50, age_50_above)
then 'age_below_18'
when age_below_18 = greatest(age_below_18, age_18_to_50, age_50_above)
then 'age_18_to_50'
when age_below_18 = greatest(age_below_18, age_18_to_50, age_50_above)
then 'age_50_above'
end) as maximum_group,
(case when age_below_18 = least(age_below_18, age_18_to_50, age_50_above)
then 'age_below_18'
when age_below_18 = least(age_below_18, age_18_to_50, age_50_above)
then 'age_18_to_50'
when age_below_18 = least(age_below_18, age_18_to_50, age_50_above)
then 'age_50_above'
end) as minimum_group,
greatest(age_below_18, age_18_to_50, age_50_above) as maximum_value,
least(age_below_18, age_18_to_50, age_50_above) as minimum_value
from t;
If your result set is actually being generated by a query, there is likely a better approach.
An alternative method "unpivots" the data and then reaggregates:
select state, population,
max(which) over (dense_rank first_value order by val desc) as maximum_group,
max(which) over (dense_rank first_value order by val asc) as minimum_group,
max(val) as maximum_value,
min(val) as minimum_value
from ((select state, population, 'age_below_18' as which, age_below_18 as val
from t
) union all
(select state, population, 'age_18_to_50' as which, age_18_to_50 as val
from t
) union all
(select state, population, 'age_50_above' as which, age_50_above as val
from t
)
) t
group by state, population;
This approach would have less performance than the first, although it is perhaps easier to implement as the number of values increases. However, Oracle 12C supports lateral joins, where a similar approach would have competitive performance.
with CTE as (
select T.*
--step2: rank value
,RANK() OVER (PARTITION BY "State", "Population" order by "value") "rk"
from (
--step1: union merge three column to on column
select
"State", "Population",
'age_below_18' as GroupName,
"age_below_18" as "value"
from TestTable
union all
select
"State", "Population",
'age_18_to_50' as GroupName,
"age_18_to_50" as "value"
from TestTable
union all
select
"State", "Population",
'age_50_above' as GroupName,
"age_50_above" as "value"
from TestTable
) T
)
select T1."State", T1."Population"
,T3.GroupName Maximum_group
,T4.GroupName Minimum_group
,T3."value" Max_value
,T4."value" Min_value
--step3: max rank get maxvalue,min rank get minvalue
from (select "State", "Population",max( "rk") as Max_rank from CTE group by "State", "Population") T1
left join (select "State", "Population",min( "rk") as Min_rank from CTE group by "State", "Population") T2
on T1."State" = T2."State" and T1."Population" = T2."Population"
left join CTE T3 on T3."State" = T1."State" and T3."Population" = T1."Population" and T1.Max_rank = T3."rk"
left join CTE T4 on T4."State" = T2."State" and T4."Population" = T2."Population" and T2.Min_rank = T4."rk"
SQL Fiddle DEMO LINK
Hope it help you :)
Another option: use a combination of UNPIVOT(), which "rotates columns into rows" (see: documentation) and analytic functions, which "compute an aggregate value based on a group of rows" (documentation here) eg
Test data
select * from T ;
STATE POPULATION YOUNGERTHAN18 BETWEEN18AND50 OVER50
1 1000 250 600 150
2 4200 400 300 3500
UNPIVOT
select *
from T
unpivot (
quantity for agegroup in (
youngerthan18 as 'youngest'
, between18and50 as 'middleaged'
, over50 as 'oldest'
)
);
-- result
STATE POPULATION AGEGROUP QUANTITY
1 1000 youngest 250
1 1000 middleaged 600
1 1000 oldest 150
2 4200 youngest 400
2 4200 middleaged 300
2 4200 oldest 3500
Include Analytic Functions
select distinct
state
, population
, max( quantity ) over ( partition by state ) maxq
, min( quantity ) over ( partition by state ) minq
, first_value ( agegroup ) over ( partition by state order by quantity desc ) biggest_group
, first_value ( agegroup ) over ( partition by state order by quantity ) smallest_group
from T
unpivot (
quantity for agegroup in (
youngerthan18 as 'youngest'
, between18and50 as 'middleaged'
, over50 as 'oldest'
)
)
;
-- result
STATE POPULATION MAXQ MINQ BIGGEST_GROUP SMALLEST_GROUP
1 1000 600 150 middleaged oldest
2 4200 3500 300 oldest middleaged
Example tested w/ Oracle 11g (see dbfiddle) and Oracle 12c.
Caution: {1} column (headings) need adjusting (according to your requirements). {2} If there are NULLs in your original table, you should adjust the query eg by using NVL().
An advantage of the described approach is: the code will remain rather clear, even if more 'categories' are used. Eg when working with 11 age groups, the query may look something like ...
select distinct
state
, population
, max( quantity ) over ( partition by state ) maxq
, min( quantity ) over ( partition by state ) minq
, first_value ( agegroup ) over ( partition by state order by quantity desc ) biggest_group
, first_value ( agegroup ) over ( partition by state order by quantity ) smallest_group
from T
unpivot (
quantity for agegroup in (
y10 as 'youngerthan10'
, b10_20 as 'between10and20'
, b20_30 as 'between20and30'
, b30_40 as 'between30and40'
, b40_50 as 'between40and50'
, b50_60 as 'between50and60'
, b60_70 as 'between60and70'
, b70_80 as 'between70and80'
, b80_90 as 'between80and90'
, b90_100 as 'between90and100'
, o100 as 'over100'
)
)
order by state
;
See dbfiddle.

Syntax error in FROM clause - MS ACCESS

I am working with a tool that would extract some data from an Access Database. So basically, i am working on a query to get this data.
Below is the code i am currently working on.
I am getting an error: Syntax error in FROM clause
I can't seem to find where the query is going wrong. I would appreciate any help! Thank youu.
EDIT: putting my actual query
SELECT table_freq.*, IIF(table_freq.txn_ctr > (table_ave_freq.ave_freq * 3), "T", "F") as suspicious_flag
FROM
(
SELECT tbl_TransactionHistory.client_num, tbl_TransactionHistory.client_name,
tbl_TransactionHistory.transaction_date, Count(tbl_TransactionHistory.client_num) AS txn_ctr
FROM tbl_TransactionHistory
GROUP BY tbl_TransactionHistory.client_num, tbl_TransactionHistory.client_name,
tbl_TransactionHistory.transaction_date
) AS table_freq
INNER JOIN
(
SELECT table_total_freq.client_num, total_txn_ctr as TotalTransactionFrequency, total_no_days as TotalTransactionDays,
(table_total_freq.total_txn_ctr)/(table_no_of_days.total_no_days) AS ave_freq
FROM
(
(
SELECT client_num, SUM(txn_ctr) AS total_txn_ctr
FROM
(
SELECT client_num, client_name, transaction_date, COUNT(client_num) AS txn_ctr
FROM tbl_TransactionHistory
GROUP BY client_num, client_name, transaction_date
) AS tabFreq
GROUP BY client_num
) AS table_total_freq
INNER JOIN
(
SELECT client_num, COUNT(txn_date) as total_no_days
FROM
(
SELECT DISTINCT(transaction_date) as txn_date, client_num
FROM tbl_TransactionHistory
ORDER BY client_num
) AS table1
GROUP BY client_num
) AS table_no_of_days
ON table_total_freq.client_num = table_no_of_days.client_num
)
) AS table_ave_freq
ON table_freq.client_num = table_ave_freq.client_num

SQL query top 2 columns of joined table?

I am having no luck attempting to get the top (x number) of rows from a joined table. I want the top 2 resources (ordered by name) which in this case should be Katie and Simon and regardless of what I've tried, I can't seem to get it right. You can see below what I've commented out - and what looks like it should work (but doesn't). I cannot use a union. Any ideas?
select distinct
RTRESOURCE.RNAME as Resource,
RTTASK.TASK as taskname, SUM(distinct SOTRAN.QTY2BILL) AS quantitytobill from SOTRAN AS SOTRAN INNER JOIN RTTASK AS RTTASK ON sotran.taskid = rttask.taskid
left outer JOIN RTRESOURCE AS RTRESOURCE ON rtresource.keyno=sotran.resid
WHERE sotran.phantom<>'y' and sotran.pgroup = 'L' and sotran.timesheet = 'y' and sotran.taskid >0 AND RTRESOURCE.KEYNO in ('193','159','200') AND ( SOTRAN.ADDDATE>='8/15/2015 12:00:00 AM' AND SOTRAN.ADDDATE<'9/3/2015 11:59:59 PM' )
//and RTRESOURCE.RNAME in ( select distinct top 2 RTRESOURCE.RNAME from RTRESOURCE order by RTRESOURCE.RNAME)
//and ( select count(*) from RTRESOURCE RTRESOURCE2 where RTRESOURCE2.RNAME = RTRESOURCE.RNAME ) <= 2
GROUP BY RTRESOURCE.rname,RTTASK.task,RTTASK.taskid,RTTASK.mdsstring ORDER BY Resource,taskname
You should provide a schema.
But lets assume your query work. You create a CTE.
WITH youQuery as (
SELECT *
FROM < you big join query>
), maxBill as (
SELECT Resource, Max(quantitytobill) as Bill
FROM yourQuery
)
SELECT top 2 *
FROM maxBill
ORDER BY Bill
IF you want top 2 alphabetical
WITH youQuery as (
SELECT *
FROM < you big join query>
), Names as (
SELECT distinct Resource
FROM yourQuery
Order by Resource
)
SELECT top 2 *
FROM Names

Sum of working days with date ranges from multiple records (overlapping)

suppose there are records as follows:
Employee_id, work_start_date, work_end_date
1, 01-jan-2014, 07-jan-2014
1, 03-jan-2014, 12-jan-2014
1, 23-jan-2014, 25-jan-2014
2, 15-jan-2014, 25-jan-2014
2, 07-jan-2014, 15-jan-2014
2, 09-jan-2014, 12-jan-2014
The requirement is to write an SQL select statment which would summarize the work days grouped by employee_id, but exclude the overlapped periods (meaning - take them into calculation only once).
The desired output would be:
Employee_id, worked_days
1, 13
2, 18
The calculations for working days in the date range are done like this:
If work_start_date = 5 and work_end_date = 9 then worked_days = 4 (9 - 5).
I could write a pl/sql function which solves this (manually iterating over the records and doing the calculation), but I'm sure it can be done using SQL for better performance.
Can someone please point me in the right direction?
Thanks!
This is a slightly modified query from similar question:
compute sum of values associated with overlapping date ranges
SELECT "Employee_id",
SUM( "work_end_date" - "work_start_date" )
FROM(
SELECT "Employee_id",
"work_start_date" ,
lead( "work_start_date" )
over (Partition by "Employee_id"
Order by "Employee_id", "work_start_date" )
As "work_end_date"
FROM (
SELECT "Employee_id", "work_start_date"
FROM Table1
UNION
SELECT "Employee_id","work_end_date"
FROM Table1
) x
) x
WHERE EXISTS (
SELECT 1 FROM Table1 t
WHERE t."work_start_date" > x."work_end_date"
AND t."work_end_date" > x."work_start_date"
OR t."work_start_date" = x."work_start_date"
AND t."work_end_date" = x."work_end_date"
)
GROUP BY "Employee_id"
;
Demo: http://sqlfiddle.com/#!4/4fcce/2
This is a tricky problem. For instance, you can't use lag(), because the overlapping period may not be the "previous" one. Or different periods can start and or stop on the same day.
The idea is to reconstruct the periods. How to do this? Find the records where the periods start -- that is, there is no overlap with any other. Then use this as a flag and count this flag cumulatively to count overlapping groups. Then getting the working days is just aggregation from there:
with ps as (
select e.*,
(case when exists (select 1
from emps e2
where e2.employee_id = e.employee_id and
e2.work_start_date <= e.work_start_date and
e2.work_end_date >= e.work_end_date
)
then 0 else 1
) as IsPeriodStart
from emps e
)
select employee_id, sum(work_end_date - work_start_date) as Days_Worked
from (select employee_id, min(work_start_date) as work_start_date,
max(work_end_date) as work_end_date
from (select ps.*,
sum(IsPeriod_Start) over (partition by employee_id
order by work_start_date
) as grp
from ps
) ps
group by employee_id, grp
) ps
group by employee_id;
date_tbl type
create or replace package RG_TYPE is
type date_tbl is table of date;
end;
function (result as a table with the dates between 2 parameters)
create or replace function dates
(
p_from date,
p_to date
) return rg_type.date_tbl pipelined
is
l_idx date:=p_from;
begin
loop
if l_idx>nvl(p_to,p_from) then
exit;
end if;
pipe row(l_idx);
l_idx:=l_idx+1;
end loop;
return;
end;
SQL:
select employee_id,sum(c)
from
(select e.employee_id,d.column_value,count(distinct w.employee_id) as c
from (select distinct employee_id from works) e,
table(dates((select min(work_start_date) as a from works),(select max(work_end_date) as b from works))) d,
works w
where e.employee_id=w.employee_id
and d.column_value>=w.work_start_date
and d.column_value<w.work_end_date
group by e.employee_id,d.column_value) Sub
group by employee_id
order by 1,2