Query Optimization, Case and Concatenation - sql

I am new to sql and I need to optimize a query which takes about 15 minutes to run on hadoop. My boss told me to get start with the "case"
SELECT
CONCAT (ZSYTMM_AE1.NUMNF, 'A', ZSYTMM_AE1.SERIE) AS receipt,
CONCAT (ZSYTMM_AE1.NUMNF, 'A', ZSYTMM_AE1.SERIE) AS external_receipt,
zsytsd_pa_plant.plant as plant,
ZSYTMM_AE1.LIFNR as suppliercode,
(CASE
WHEN SUBSTRING ( ZSYTMM_AE1.LIFNR, 1 , 2 ) = 10 THEN 'VI'
WHEN SUBSTRING ( ZSYTMM_AE1.LIFNR, 1 , 2 ) = 11 THEN 'VI'
WHEN SUBSTRING ( ZSYTMM_AE1.LIFNR, 1 , 2 ) = 12 THEN 'VI'
WHEN SUBSTRING ( ZSYTMM_AE1.LIFNR, 1 , 2 ) = 13 THEN 'MI'
WHEN SUBSTRING ( ZSYTMM_AE1.LIFNR, 1 , 2 ) = 14 THEN 'MI'
END) as reference_type
FROM dl_edge_base_sapprdmerc_11324_base_prd_sapr3.ZSYTMM_AE1 ZSYTMM_AE1
inner join dl_edge_base_sapprdmerc_11324_base_prd_sapr3.zsytsd_pa_plant zsytsd_pa_plant
ON zsytmm_ae1.werks = zsytsd_pa_plant.plant
Any ideas how to improve the case, or the concatenation?
Thx

Related

How put grouping variable to columns in SQL/

I have following dataset
and want to get this
How can I do it?
Using SQL Server, you can use a PIVOT, such as :
SELECT Time, [a],[b],[c]
FROM
(
SELECT time, [group],value
FROM dataset) d
PIVOT
(
SUM(value)
FOR [group] IN ([a],[b],[c])
) AS pvt
You can try it on the following fiddle.
Changed the column names to not conflict with reserved words. You would have to put them into single quotes otherwise.
WITH
-- the input
indata(grp,tm,val) AS (
SELECT 'a',1,44
UNION ALL SELECT 'a',2,22
UNION ALL SELECT 'a',3, 1
UNION ALL SELECT 'b',1, 1
UNION ALL SELECT 'b',2, 5
UNION ALL SELECT 'b',3, 6
UNION ALL SELECT 'c',1, 7
UNION ALL SELECT 'c',2, 8
UNION ALL SELECT 'c',3, 9
)
SELECT tm
, SUM(CASE grp WHEN 'a' THEN val END) AS a
, SUM(CASE grp WHEN 'b' THEN val END) AS b
, SUM(CASE grp WHEN 'c' THEN val END) AS c
FROM indata
GROUP BY tm
;
tm | a | b | c
----+----+---+---
1 | 44 | 1 | 7
2 | 22 | 5 | 8
3 | 1 | 6 | 9
select * from
(
select
time,[group],value
from yourTable
group by time,[group],value
)
as table
pivot
(
sum([value])
for [group] in ([a],[b],[c])
) as p
order by time
This is the result
for Vertica,
SELECT time
, SUM(value) FILTER (WHERE group = a) a
, SUM(value) FILTER (WHERE group = b) b
, SUM(value) FILTER (WHERE group = c) c
FROM yourTable
GROUP BY time

Query Optimization - To repeat a pattern in Oracle SQL

Introduction: I can do this in MS-Excel, it takes me 1 minute, but I m trying to get this in Oracle SQL
Here is my Code:
SELECT A.*, (CASE WHEN A.r = 1 then 'X1' when A.r = 2 then 'X2' when A.r = 3 then 'X3' when A.r = 4
then 'X4' when A.r = 5 then 'X2' when A.r = 6 then 'X6' end) X FROM
(
Select Rownum r
From dual
Connect By Rownum <= 6 ) A
This is the Output:
Now, what if I have to do it for 25000 numbers, meaning when (rownum <= 25000) currently I have it only for 6, Is there a better method to do this with out case statement?
If you want to repeat this pattern of 6 rows for the remaining rows, then you can do:
select t.*,
(case when mod(rownum, 6) = 5 then 'X2'
else 'X' || (mod(rownum - 1, 6) + 1)
end)
from t;

I need help converting T-SQL query to Oracle supported query

I am executing this query in SQL Server and it is working fine but when I try to execute it in Oracle, it is not giving the same results.
You can see in my attached photo the data of one customer, which have got the code 1, 2,4, 8 and he should get 0.70 value for having code 1,2,4 and then for having code 8 he should get 0.75 so after multiplication it should return 0.52 as value. I tried it in Oracle by replacing is null by nvl but it returned 1 instead of 0.52. Please help me convert this query in an oracle supported query which will return the same results.
Here is my query
SELECT [id] ,[name],r = isnull(nullif(
max(CASE WHEN [code] IN (1,2,4) then 0.70 else 0 end)
,0),1)
* isnull(nullif(
min(CASE WHEN [code] IN (1,2) then 0 else 1 end)
* max(CASE WHEN [code] IN (4) then 0.20 else 0 end)
,0),1)
* isnull(nullif(
max(CASE WHEN [code] IN (8) then 0.75 else 0 end)
,0),1)
FROM (values (1, 'ali',4)
,(1, 'ali',1)
,(1, 'ali',8)
,(1, 'ali',2)
,(2, 'sunny',1)
,(4, 'arslan',4)) as t(id, name,code)
GROUP BY id, name;
Since now you are multiplying scores, first we need to decide, what is the score if non of codes is matched. I suppose, it should be 0.
Next, we should break all possible codes into independent groups, that is which results do not depend on other groups members. Here they are (1,2,4) and (8). And define the rule for every group.
So
SELECT [id] ,[name],r =
-- At least one of values needed to get score > 0
MAX(CASE WHEN code IN (1,2,4, 8) THEN 1.0 ELSE 0.0 END) *
-- Now rules for every independent set of codes. Rule should return score if matched or 1.0 if not matched
-- (1,2,4)
coalesce(MAX(CASE WHEN [code] IN (1,2,4) THEN 0.70 END), 1.0 ) *
-- (8)
coalesce(MAX(CASE WHEN [code] IN (8) THEN 0.75 END), 1.0)
-- more ?
FROM (values (1, 'ali',4)
,(1, 'ali',1)
,(1, 'ali',8)
,(1, 'ali',2)
,(2, 'sunny',1)
,(4, 'arslan',4)) as t(id, name,code)
GROUP BY id, name;
There are some SQL Server things in the query that are not standard SQL:
[] around column names - remove them; you don't need them here (otherwise you would use standard SQL quotes "")
r = expression - for an alias name. Change this to standard SQL expression AS r
ISNULL(expression, value) - Change this to standard SQL COALESCE(expression, value) or Oracle's NVL(expression, value)
NULLIF(expression, value) - this you can keep; Oracle supports it, too
values (), (), ... - replace with a SELECT FROM DUAL UNION ALL subquery
You get:
select
id,
name,
coalesce(nullif( max(case when code in (1,2,4) then 0.70 else 0 end), 0), 1) *
coalesce(nullif( min(case when code in (1,2) then 0 else 1 end) *
max(case when code in (4) then 0.20 else 0 end) , 0), 1) *
coalesce(nullif( max(case when code in (8) then 0.75 else 0 end), 0), 1) as r
from
(
select 1 as id, 'ali' as name, 4 as code from dual
union all
select 1 as id, 'ali' as name, 8 as code from dual
union all
select 1 as id, 'ali' as name, 2 as code from dual
union all
select 2 as id, 'sunny' as name, 1 as code from dual
union all
select 4 as id, 'arslan' as name, 4 as code from dual
)
group by id, name;
The calculation, however, is unnecessarily complicated:
coalesce(nullif( max(case when code in (1,2,4) then 0.70 else 0 end), 0), 1)
means if there is at least one match then 0.70 else 0 which is turned to null which is turned to 1. So it is the same as
min(case when code in (1,2,4) then 0.70 else 1 end)
So if I am not mistaken, the whole calcultion becomes:
case when max(case when code in (1,2) then 1 end) = 1
then 0.7 else max(case when code = 4 then 0.14 else 1 end) end *
min(case when code = 8 then 0.75 else 1 end) as r
or
case when max(case when code in (1,2) then 1 end) = 1 then 0.7
when max(case when code = 4 then 1 end) = 1 then 0.14
else 1
end *
min(case when code = 8 then 0.75 else 1 end) as r
Well, there are many ways to write this.
The code below should give you the answer you expect;
CREATE TABLE #TestData (ID int, Name varchar(10), Code int)
INSERT INTO #TestData (ID, Name, Code)
VALUES
(1,'ali',4)
,(1,'ali',1)
,(1,'ali',8)
,(1,'ali',2)
,(2,'sunny',1)
,(4,'arslan',4)
SELECT DISTINCT
a.id
,a.Name
,COALESCE(b.HasCode1, b.HasCode2, b.HasCode4,1) * COALESCE(b.HasCode8,1) Result
FROM (SELECT ID, Name FROM #TestData GROUP BY ID, Name) a
LEFT JOIN
(
SELECT
ID
,Name
,SUM(CASE WHEN CODE = 1 THEN 0.7 END) HasCode1
,SUM(CASE WHEN CODE = 2 THEN 0.7 END) HasCode2
,SUM(CASE WHEN CODE = 4 THEN 0.7 END) HasCode4
,SUM(CASE WHEN CODE = 8 THEN 0.75 END) HasCode8
FROM #TestData
GROUP BY
ID
,Name
) b
ON a.ID = b.ID
AND a.Name = b.Name
DROP TABLE #TestData
If I understand what you're after (ie. for each of the cases, the id/name combination needs to have all the codes specified), then this will probably do what you're after. You may want to add some sort of trunc/floor/round function on the val column if you're after the answer to 2 decimal places, though:
with t as (select 1 id, 'ali' name, 4 code from dual union all
select 1 id, 'ali' name, 1 code from dual union all
select 1 id, 'ali' name, 8 code from dual union all
select 1 id, 'ali' name, 2 code from dual union all
select 2 id, 'ali' name, 4 code from dual union all
select 2 id, 'ali' name, 8 code from dual union all
select 3 id, 'bob' name, 1 code from dual union all
select 3 id, 'bob' name, 2 code from dual union all
select 3 id, 'bob' name, 8 code from dual),
res as (select id,
name,
case when count(distinct case when code in (1, 2, 4) then code end) = 3 then 0.7
when count(distinct case when code in (1, 2) then code end) = 2 then 0.5
else 1
end case_1_2_and_poss_4,
case when count(distinct case when code = 8 then code end) = 1 then 0.75 else 1 end case_8
from t
group by id, name)
select id,
name,
case_1_2_and_poss_4 * case_8 val
from res;
ID NAME VAL
---------- ---- ----------
1 ali 0.525
2 ali 0.75
3 bob 0.375

SQL query - sum of values by status for date interval

I get crazy because of one query. I have a table like following and I want to get a data - Summa of Values by Status For every Date in interval.
Table
Id Name Value Date Status
1 pro1 2 01.04.14 0
2 pro1 8 02.04.14 1
3 pro2 6 02.04.14 1
4 pro3 0 03.04.14 0
5 pro4 7 03.04.14 0
6 pro4 2 03.04.14 0
7 pro4 4 03.04.14 1
8 pro4 6 04.04.14 1
9 pro4 1 04.04.14 1
For example,
Input: Name = pro4, minDate = 01.02.14, maxDate = 04.09.14
Output:
Date Values sum for 0 Status Values sum for 1 Status
01.04.14 0 0
02.04.14 0 0
03.04.14 9 (=7+2) 4 (only 4 exist)
04.04.14 0 7 (6+1)
In 01.02.14 and 02.04.14 dates, pro4 has not values by status, but I want to show that rows, because I need all dates in that interval. Can anyone help me to create this query?
Edit:
I can not change structure, I have already that table with data. Every day exist in table many times (minimum 1 time)
Thanks in advance.
Assuming you have a row for each date in the table, use conditional aggregation:
select date,
sum(Case when name = 'pro4' and status = 0 then Value else 0 end) as values_0,
sum(case when name = 'pro4' and status = 1 then Value else 0 end) as values_1
from Table t
where date >= '2014-04-01' and date <= '2014-04-09'
group by date
order by date;
If you don't have this list of dates, you can take this approach instead:
with dates as (
select cast('2014-04-01' as date) as thedate
union all
select dateadd(day, 1, thedate)
from dates
where thedate < '2014-04-09'
)
select dates.thedate,
sum(Case when status = 0 then Value else 0 end) as values_0,
sum(case when status = 1 then Value else 0 end) as values_1
from dates left outer join
table t
on t.date = dates.thedate and t.name = 'pro4'
group by dates.thedate;
just an assumption query :
select Distinct date ,case when status = 0 and MAX(date) then SUM(value) ELSE 0 END Status0 ,
case when status = 1 and MAX(date) then SUM(value) ELSE 0 END Status1 from table
To expand my comment the complete query is
WITH [counter](N) AS
(SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1)
, days(N) AS (
SELECT row_number() over (ORDER BY (SELECT NULL)) FROM [counter])
, months (N) AS (
SELECT N - 1 FROM days WHERE N < 13)
, calendar ([date]) AS (
SELECT DISTINCT cast(dateadd(DAY, days.n
, dateadd(MONTH, months.n, '20131231')) AS date)
FROM months
CROSS JOIN days
)
SELECT a.Name
, c.Date
, [Sum of 0] = SUM(CASE Status WHEN 0 THEN Value ELSE 0 END)
, [Sum of 1] = SUM(CASE Status WHEN 1 THEN Value ELSE 0 END)
FROM Calendar c
LEFT JOIN myTable a ON c.Date = a.Date AND a.name = 'pro4'
WHERE c.date BETWEEN '20140201' AND '20140904'
GROUP BY c.Date, a.Name
ORDER BY c.Date
Note that the condition on the name need to be in the JOIN, otherwise you'll get only the date of your table.
If you need multiple years just add another CTE for the count and a dateadd(YEAR,...) in the CTE calendar
This is not really the exact query, but I think you can get that by having a query that looks like:
select date, status, sum(value) from table
where (date between mindate and maxdate) and name = product_name
group by date, status;
this page gives more info.
EDIT
So the above query only gives a part of the answer required by the OP. A LEFT OUTER JOIN of the original table and the result of the above query on thedate and status fields will give the missing info.
e.g.
select x.date, x.status, x.sum_of_values from table as y
left outer join
(select date, status, sum(value) as sum_of_values
from table
where (date between mindate and maxdate) and name = product_name
group by date, status) as x
on y.date= x.date and y.status = x.status
order by x.date;

Group the result from DATEDIFF function in SQL

I have this query :
SELECT DATEDIFF(DAY,wj_date,wj_donedate) AS tmpDay
FROM wssjobm , sysbrxces WHERE wj_br = zu_br AND zu_user = 'mbs'
AND wj_date >= '2013/04/01' AND wj_date <= '2013/04/30'
AND wj_status = 'D' AND wj_donedate <= '2013/04/30'
The eg. result :
tmpDay
1
11
5
1
7
2
12
10
2
2
How can i group by the result and count it using query to be like this :
tmpDay count
1 2
2 3
5 1
7 1
10 1
11 1
12 1
thanks!
Extra Question :
May i get the result like this :
tmpDayGroup Count
1- 4 5
5- 8 2
11 -12 3
Try:
SELECT
tmpDay,
COUNT(*) [Count]
FROM
YourTable
Group By tmpDay
For the given example, query should be like:
SELECT
DATEDIFF(DAY,wj_date,wj_donedate) AS tmpDay,
COUNT(*) [Count]
FROM
wssjobm , sysbrxces
WHERE
wj_br = zu_br AND
zu_user = 'mbs' AND
wj_date >= '2013/04/01' AND
wj_date <= '2013/04/30' AND
wj_status = 'D' AND
wj_donedate <= '2013/04/30'
GROUP BY DATEDIFF(DAY,wj_date,wj_donedate)
For grouping the result, try:
;with T as(
select '1' FrmD, '4' ToD union all
select '5' FrmD, '8' ToD union all
select '9' FrmD, '12' ToD
)
select
T.FrmD +'-'+ T.ToD tmpDayGroup,
COUNT(*) [Count]
from T Left Join (
SELECT
DATEDIFF(DAY,wj_date,wj_donedate) AS tmpDay
FROM
wssjobm , sysbrxces
WHERE
wj_br = zu_br AND
zu_user = 'mbs' AND
wj_date >= '2013/04/01' AND
wj_date <= '2013/04/30' AND
wj_status = 'D' AND
wj_donedate <= '2013/04/30'
) T2 on T2.tmpDay between T.FrmD and T.ToD
group by T.FrmD, T.ToD