How can get I get relational data from another table and calculate the average in Qlikview? - qlikview

I have relationship with 2 tables
Table 1 - Process
Table 2 - Process History
Here the relationship is Id(Process table) and ProcessId(Process history table)
I want to calculate the Average Networking days of all the processes.
For eg:
nwd = 0;
count = 0;
if(Process.Id = ProcessHistory.ProcessId && ProcessHistory.Status='Status 3') {
nwd += NWD(Process.CreatedOn, ProcessHistory.CreatedOn);
count++;
}
Expected result AverageNWD = nwd/count;
How can we achieve this?

In the script:
Using the script below will add a new field to the Process table - NetWorkingDays. This field will contain the working days for each project (Id). With this field in the dataset will be easier to calculate the average in the UI (something like sum(NetWorkingDays) / count(distinct Id)
Process:
Load * Inline [
Id, Name , CretedOn
1, Process1, 2019-04-02
2, Process2, 2019-04-05
3, Process3, 2019-05-02
4, Process4, 2019-06-02
];
ProcessHistory:
Load
Id as ProcessHistoryId,
ProcessId as Id,
Status,
CreatedOn as ProcessHistoryCreatedOn
;
Load * Inline [
Id, ProcessId, Status , CreatedOn
1, 1, Status 1, 2019-04-02
2, 1, Status 2, 2019-04-02
3, 1, Status 3, 2019-04-04
4, 2, Status 1, 2019-04-05
5, 2, Status 3, 2019-04-06
6, 3, Status 1, 2019-05-07
7, 3, Status 3, 2019-05-09
8, 4, Status 1, 2019-06-02
9, 4, Status 2, 2019-06-04
10, 4, Status 3, 2019-06-07
];
TempTable:
Load
Id,
min(CretedOn) as MinCreatedOn
Resident
Process
Group By
Id
;
join (TempTable)
Load
Id,
max(ProcessHistoryCreatedOn) as MaxCreatedOn
Resident
ProcessHistory
Where
Status = 'Status 3'
Group By
Id
;
NetWorkingDaysData:
Load
Id,
NetWorkDays(MinCreatedOn, MaxCreatedOn) as NetWorkingDays
Resident
TempTable
;
Drop Table TempTable;
The last part of the script (from inside out):
Create temporary table to calculate min(CreatedOn) from Process table and max(ProcessHistoryCreatedOn) from ProcessHistory table. ProcessHistory is also filtered to include only records where Status = 'Status 3' (both tables are aggregated per Id)
TempTable:
Load
Id,
min(CretedOn) as MinCreatedOn
Resident
Process
Group By
Id
;
join (TempTable)
Load
Id,
max(ProcessHistoryCreatedOn) as MaxCreatedOn
Resident
ProcessHistory
Where
Status = 'Status 3'
Group By
Id
;
Once the temp table is created we can create the final table that in which we will calculate the number of net working days using the NetWorkDays function. The NetWorkingDaysData table will have only two fields - Id and NetWorkingDays
NetWorkingDaysData:
Load
Id,
NetWorkDays(MinCreatedOn, MaxCreatedOn) as NetWorkingDays
Resident
TempTable
;
And the final step is to drop the TempTable - its no longer required
In the UI:
The same result can be achieved in the UI using the expression below. Just bear in mind that the UI approach might lead to higher resource consumption! Since all the calculations are on-the-fly (depends how big your dataset is)
avg(
Aggr(
NetWorkDays( min(ProcessHistoryCreatedOn) , max( {< Status = {'Status 3'} >} ProcessHistoryCreatedOn) )
, Id)
)

Related

1 distinct row having max value

This is the data I have
I need Unique ID(1 row) with max(Price). So, the output would be:
I have tried the following
select * from table a
join (select b.id,max(b.price) from table b
group by b.id) c on c.id=a.id;
gives the Question as output, because there is no key. I did try the other where condition as well, which gives the original table as output.
You could try something like this in SQL Server:
Table
create table ex1 (
id int,
item char(1),
price int,
qty int,
usr char(2)
);
Data
insert into ex1 values
(1, 'a', 7, 1, 'ab'),
(1, 'a', 7, 2, 'ac'),
(2, 'b', 6, 1, 'ab'),
(2, 'b', 6, 1, 'av'),
(2, 'b', 5, 1, 'ab'),
(3, 'c', 5, 2, 'ab'),
(4, 'd', 4, 2, 'ac'),
(4, 'd', 3, 1, 'av');
Query
select a.* from ex1 a
join (
select id, max(price) as maxprice, min(usr) as minuser
from ex1
group by id
) c
on c.id = a.id
and a.price = c.maxprice
and a.usr = c.minuser
order by a.id, a.usr;
Result
id item price qty usr
1 a 7 1 ab
2 b 6 1 ab
3 c 5 2 ab
4 d 4 2 ac
Explanation
In your dataset, ID 1 has 2 records with the same price. You have to make a decision which one you want. So, in the above example, I am showing a single record for the user whose name is lowest alphabetically.
Alternate method
SQL Server has ranking function row_number over() that can be used as well:
select * from (
select row_number() over( partition by id order by id, price desc, usr) as sr, *
from ex1
) c where sr = 1;
The subquery says - give me all records from the table and give each row a serial number starting with 1 unique to each ID. The rows should be sorted by ID first, then price descending and then usr. The outer query picks out records with sr number 1.
Example here: https://rextester.com/KZCZ25396

How would you find the 'GOOD' ID when cancellation is involved?

Suppose you have the following schema:
CREATE TABLE Data
(
ID INT,
CXL INT
)
INSERT INTO Data (ID, CXL)
SELECT 1, NULL
UNION
SELECT 2, 1
UNION
SELECT 3, 2
UNION
SELECT 5, 3
UNION
SELECT 6, NULL
UNION
SELECT 7, NULL
UNION
SELECT 8, 7
The column CXL is the ID that cancels a particular ID. So, for example, the first row in the table with ID:1 was good until it was cancelled by ID:2 (CXL column). ID:2 was good until it was cancelled by ID:3. ID:3 was good until it was cancelled by ID:5 so in this sequence the last "GOOD" ID was ID:5.
I would like to find all the "GOOD" IDs So in this example it would be:
Latest GOOD ID
5
6
8
Here's a fiddle if you want to play with this:
http://sqlfiddle.com/#!6/68ac48/1
SELECT D.ID
FROM Data D
WHERE NOT EXISTS(SELECT 1
FROM Data WHERE D.ID = CXL)
select Id
from data
where Id not in (select cxl from data where cxl is not null)

SQL Server 2008, how to check if multi records exist in the DB?

I have 3 tables:
recipe:
id, name
ingredient:
id, name
recipeingredient:
id, recipeId, ingredientId, quantity
Every time, a customer creates a new recipe, I need to check the recipeingredient table to verify if this recipe exists or not. If ingredientId and quantity are exactly the same, I will tell the customer the recipe already exists. Since I need to check multiple rows, need help to write this query.
Knowing your ingredients and quantities, you can do something like this:
select recipeId as ExistingRecipeID
from recipeingredient
where (ingredientId = 1 and quantity = 1)
or (ingredientId = 8 and quantity = 1)
or (ingredientId = 13 and quantity = 1)
group by recipeId
having count(*) = 3 --must match # of ingeredients in WHERE clause
I originally thought that the following query would find pairs of recipes that have exactly the same ingredients:
select ri1.recipeId, ri2.recipeId
from RecipeIngredient ri1 full outer join
RecipeIngredient ri2
on ri1.ingredientId = ri2.ingredientId and
ri1.quantity = ri2.quantity and
ri1.recipeId < ri2.recipeId
group by ri1.recipeId, ri2.recipeId
having count(ri1.id) = count(ri2.id) and -- same number of ingredients
count(ri1.id) = count(*) and -- all r1 ingredients are present
count(*) = count(ri2.id) -- all r2 ingredents are present
However, this query doesn't count things correctly, because the mismatches don't have the right pairs of ids. Alas.
The following does do the correct comparison. It counts the ingredients in each recipe before the join, so this value can just be compared on all matching rows.
select ri1.recipeId, ri2.recipeId
from (select ri.*, COUNT(*) over (partition by recipeid) as numingredients
from #RecipeIngredient ri
) ri1 full outer join
(select ri.*, COUNT(*) over (partition by recipeid) as numingredients
from #RecipeIngredient ri
) ri2
on ri1.ingredientId = ri2.ingredientId and
ri1.quantity = ri2.quantity and
ri1.recipeId < ri2.recipeId
group by ri1.recipeId, ri2.recipeId
having max(ri1.numingredients) = max(ri2.numingredients) and
max(ri1.numingredients) = count(*)
The having clause guarantees that each recipe that the same number of ingredients, and that the number of matching ingredients is the total. This time, I've tested it on the following data:
insert into #recipeingredient select 1, 1, 1
insert into #recipeingredient select 1, 2, 10
insert into #recipeingredient select 2, 1, 1
insert into #recipeingredient select 2, 2, 10
insert into #recipeingredient select 2, 3, 10
insert into #recipeingredient select 3, 1, 1
insert into #recipeingredient select 4, 1, 1
insert into #recipeingredient select 4, 3, 10
insert into #recipeingredient select 5, 1, 1
insert into #recipeingredient select 5, 2, 10
If you have a new recipe, you can modify this query to just look for the recipe in one of the tables (say ri1) using an additional condition on the on clause.
If you place the ingredients in a temporary table, you can substitute one of these tables, say ri1, with the new table.
You might try something like this to find if you have a duplicate:
-- Setup test data
declare #recipeingredient table (
id int not null primary key identity
, recipeId int not null
, ingredientId int not null
, quantity int not null
)
insert into #recipeingredient select 1, 1, 1
insert into #recipeingredient select 1, 2, 10
insert into #recipeingredient select 2, 1, 1
insert into #recipeingredient select 2, 2, 10
-- Actual Query
if exists (
select *
from #recipeingredient old
full outer join #recipeingredient new
on old.recipeId != new.recipeId -- Different recipes
and old.ingredientId = new.ingredientId -- but same ingredients
and old.quantity = new.quantity -- and same quantities
where old.id is null -- Match not found
or new.id is null -- Match not found
)
begin
select cast(0 as bit) as IsDuplicateRecipe
end
else begin
select cast(1 as bit) as IsDuplicateRecipe
end
Since this is really only searching for a duplicate, you might want to substitute a temp table or pass a table variable for the "new" table. This way you wouldn't have to insert the new records before doing your search. You could also insert into the base tables, wrap the whole thing in a transaction and rollback based upon the results.

Most efficient way to select record if a value has changed

What would be the most efficient way to select a record when one of the value has changed?
Ex:
I have an account history table like below where records are being created when the account change:
Id AcctNb Active Created
8 123456 1 01/03/2012
6 123456 0 01/01/2012
I like to find an efficient way to return the record where the active status has changed since the last entry.
UPDATE
The query I am using at the moment which works but inefficiently"
select d1.acctNb,d1.active, d2.active
from d044 d1 , d044 d2
where d1.created = '2012-04-14'
and d1.acctNb = d2.acctNb
and d2.created = (select max(d.created) from d044 d where d.acctNb = d2.acctNb and d.id != d1.id)
and (d1.active != d2.active)
Try this:
create table log
(
log_id int identity(1,1) primary key,
acct_id int not null,
active bit not null,
created datetime not null
);
insert into log(acct_id, active,created)
values
(1,1,'January 1, 2012'),
(1,1,'January 2, 2012'),
(1,0,'January 3, 2012'),
(1,0,'January 4, 2012'),
(1,1,'January 5, 2012'),
(2,0,'February 1, 2012'),
(2,1,'February 2, 2012'),
(2,0,'February 3, 2012'),
(2,1,'February 4, 2012'),
(2,1,'February 5, 2012');
The solution:
with serialize as
(
select row_number()
over(partition by acct_id order by created) rx,
*
from log
)
select ds.acct_id,
ds.active ds_active,
pr.active pr_active,
ds.created
from serialize ds -- detect second row
join serialize pr -- previous row
on pr.acct_id = ds.acct_id
and ds.rx = pr.rx + 1
where ds.rx >= 2 and
pr.active <> ds.active
Query output: January 3, January 5, February 2, February 3, February 4
Those are the dates when changes on active had occurred(detected)
Basically the logic is, starting from second row, we scan its previous row, if their active's value didn't match (via WHERE pr.active <> ds.active), we show them on results
Live test: http://sqlfiddle.com/#!3/68136/4
2 ways
1)
add a column
update_tsmp timestamp
put a trigger on the table that runs after update or insert
-- checks the Active field
-- if it has changed update update_tsmp to the current timestamp
now you must define "since the last entry" to determine whether you want to return the record
2)
create a history table
Id AcctNb Active Created change_tsmp updating_user delete_flag
put a trigger on the table that runs before update or delete
-- copies the record to the history table, checking the delete flag as appropriate
If future SQL Server will have LAG windowing function, you can simplify the comparison of previous row to current by using LAG
This works now on Postgresql(since 8.4), it already has LAG and LEAD windowing function:
create table log
(
log_id serial primary key,
acct_id int not null,
active boolean not null,
created timestamp not null
);
insert into log(acct_id, active,created)
values
(1,true,'January 1, 2012'),
(1,true,'January 2, 2012'),
(1,false,'January 3, 2012'),
(1,false,'January 4, 2012'),
(1,true,'January 5, 2012'),
(2,false,'February 1, 2012'),
(2,true,'February 2, 2012'),
(2,false,'February 3, 2012'),
(2,true,'February 4, 2012'),
(2,true,'February 5, 2012');
LAG approach is elegantly simpler than ROW_NUMBER and JOIN combo approach:
with merge_prev as
(
select
acct_id, created,
lag(active) over(partition by acct_id order by created) pr_active, -- previous row's active
active sr_active -- second row's active
from log
)
select *
from merge_prev
where
pr_active <> sr_active
Live test: http://sqlfiddle.com/#!1/b1eb0/25
EDIT
LAG is already available on SQL Server 2012: http://sqlfiddle.com/#!6/d17c0/1

AVG and COUNT in SQL Server

I have a rating system in which any person may review other. Each person can be judged by one person more than once. For the calculation of averages, I would like to include only the most current values​​.
Is this possible with SQL?
Person 1 rates Person 2 with 5 on 1.2.2011 <- ignored because there is a newer rating of person 1
Person 1 rates Person 2 with 2 on 1.3.2011
Person 2 rates Person 1 with 6 on 1.2.2011 <-- ignored as well
Person 2 rates Person 1 with 3 on 1.3.2011
Person 3 rates Person 1 with 5 on 1.5.2011
Result:
The Average for Person 2 is 2.
The Average for Person 1 is 4.
The table may look like this: evaluator, evaluatee, rating, date.
Kind Regards
Michael
It's perfectly possible.
Let's assume your table structure looks like this:
CREATE TABLE [dbo].[Ratings](
[Evaluator] varchar(10),
[Evaluatee] varchar(10),
[Rating] int,
[Date] datetime
);
and the values like this:
INSERT INTO Ratings
SELECT 'Person 1', 'Person 2', 5, '2011-02-01' UNION
SELECT 'Person 1', 'Person 2', 2, '2011-03-01' UNION
SELECT 'Person 2', 'Person 1', 6, '2011-02-01' UNION
SELECT 'Person 2', 'Person 1', 3, '2011-03-01' UNION
SELECT 'Person 3', 'Person 1', 5, '2011-05-01'
Then the average rating for Person 1 is:
SELECT AVG(Rating) FROM Ratings r1
WHERE Evaluatee='Person 1' and not exists
(SELECT 1 FROM Ratings r2
WHERE r1.Evaluatee = r2.Evaluatee AND
r1.evaluator=r2.evaluator AND
r1.date < r2.date)
Result:
4
Or for all Evaluatee's, grouped by Evaluatee:
SELECT Evaluatee, AVG(Rating) FROM Ratings r1
WHERE not exists
(SELECT 1 FROM Ratings r2
WHERE r1.Evaluatee = r2.Evaluatee AND
r1.evaluator = r2.evaluator AND
r1.date < r2.date)
GROUP BY Evaluatee
Result:
Person 1 4
Person 2 2
This might look like it has an implicit assumption that no entries exist with the same date;
but that's actually not a problem: If such entries can exist, then you can not decide which of these was made later anyway; you could only choose randomly between them. Like shown here, they are both included and averaged - which might be the best solution you can get for that border case (although it slightly favors that person, giving him two votes).
To avoid this problem altogether, you could simply make Date part of the primary key or a unique index - the obvious primary key choice here being the columns (Evaluator, Evaluatee, Date).
declare #T table
(
evaluator int,
evaluatee int,
rating int,
ratedate date
)
insert into #T values
(1, 2, 5, '20110102'),
(1, 2, 2, '20110103'),
(2, 1, 6, '20110102'),
(2, 1, 3, '20110103'),
(3, 1, 5, '20110105')
select evaluatee,
avg(rating) as avgrating
from (
select evaluatee,
rating,
row_number() over(partition by evaluatee, evaluator
order by ratedate desc) as rn
from #T
) as T
where T.rn = 1
group by evaluatee
Result:
evaluatee avgrating
----------- -----------
1 4
2 2
This is possible to do, but it can be REALLY harry - SQL was not designed to compare rows, only columns. I would strongly recommend you keep an additional table containing only the most recent data, and store the rest in an archive table.
If you must do it this way, then I'll need a full table structure to try to write a query for this. In particular I need to know which are the unique indexes.