Let's supose a simple scenario: a fact table F with two dimensions D1 and D2.
F D1 D2
10 A B
15 B C
In this scenario I define a new calculated member C1 using an expression close than this one:
with member measures.C1 as
sum(
descendants( [D1].[Ds].currentMember, , leaves ),
[myMeasure]
)
select
measures.C1 on 0,
[D2].[Ds].AllMembers on 1
from [MyCube]
How can I modify C1 to incorpore all time all D2 members in expression?
I get this results:
C1 D2
10 B
15 C
and I'm looking for this:
C1 D2
35 B
35 C
(of course this is a simplification of real problem, please, don't try to fix C1 expression, only add code to get expected results, I have tried with:
sum(
{ descendants( [D1].[Ds].currentMember, , leaves ),
[D2].[Ds].AllMembers },
[myMeasure]
unsuccesfully)
regards.
For this specific example, change your member statement the following.
WITH MEMBER [Measures].[C1] AS
SUM([D1].[Ds].[All], [myMeasure])
This gives you everything in that dimension for your measure. That value then should be repeated for each attribute in your D2 dimension.
Based on the title of the question and some of your text this is only a small example. It maybe possible that you need to investigate scope. It is pretty powerful and you can do some neat things with it.
Related
i am so green in SQL that I don't even know how to properly phrase my question or look for an existing answer in stack overflow or anywhere else. Sorry!
Assume i have 3 columns. One is an ID and two data columns A and B. A single ID can have multiple entries. I like to remove all entries, where A and B are same for a given ID. Probably i give an example
ID
A
B
01
x
y
01
x
y
01
x
y
02
x
y
02
x
z
02
x
y
In this table I would like to remove all 3 entries that belong to ID 01 as A as well as B are all x and y, respectively. For ID 02, however, column B differs for the first and second entry. Therefore I like to keep ID 02. I hope this illustrates the idea sufficiently :-).
I am look for a 'scalable' solution, as I am not only looking at two data columns A and B, but actually 4 different columns.
Does anyone know how to set a proper filter in SQL to remove those entries according to my needs?
Many thanks.
Benjamin
As for this, it basically doesn't matter how many coumns you actually have, as long as they are identical
this can be used for an as joining basis for a DELETE
WITH CTE AS
(SELECT DISTINCT "ID", "A", "B" FROM tab1),
CTE2 AS (SELECT "ID", COUNT(*) count_ FROM CTE GROUP BY "ID" HAVING COUNT(*) >1)
SELECT "ID" FROM CTE2
| ID |
| -: |
| 2 |
db<>fiddle here
In our project we have to periodically calculate aggregates and further calculations based on the input data received.
One frequent requirement is to calculate time differences between certain rows in our input data stream.
For example, this is my input datastream:
Timestamp Event Value
2017-05-21 11:33 e1 17
2017-05-21 11:37 e2 18
2017-05-21 11:38 e3 18
2017-05-21 11:39 e1 19
2017-05-21 11:42 e2 19
I now want to calculate all the timespans between e2 events and the last received e1 event (ordered by timestamp).
i would expect the result to be:
3 (minutes)
4 (minutes)
A similar requirement would be to calculate the timespans between the same type of event (i.e. all differences between e1 events) where i would expect this result:
6 (minutes)
My attempts so far:
This sort of analytics could pretty easily be achieved using the LAG function in conjunction with the WHEN clause but unfortunately the WHEN clause is missing in U-SQL.
If it would be T-SQL it would also be possible to solve this using Sub-Selects in the SELECT-Clause of the statement but unfortunately this is also not possible in U-SQL.
Do you have any suggestions or sample scripts on how to solve this issue?
Thank you very much for your help!
In U-SQL, you can use c# methods for simple date arithmetic. If your data is as simple as you describe, you could just rank the e1 and e2 events then join them, something like this:
#data =
EXTRACT Timestamp DateTime,
Event string,
Value int
FROM "/input/input58.csv"
USING Extractors.Csv();
//#data = SELECT *
// FROM (
// VALUES
// ( "2017-05-21 11:33", "e1", 17 ),
// ( "2017-05-21 11:37", "e2", 18 ),
// ( "2017-05-21 11:38", "e3", 18 ),
// ( "2017-05-21 11:39", "e1", 19 ),
// ( "2017-05-21 11:42", "e2", 19 )
// ) AS T(Timestamp, Event, Value);
#e1 =
SELECT ROW_NUMBER() OVER(ORDER BY Timestamp) AS rn,
*
FROM #data
WHERE Event == "e1";
#e2 =
SELECT ROW_NUMBER() OVER(ORDER BY Timestamp) AS rn,
*
FROM #data
WHERE Event == "e2";
#working =
SELECT
(e2.Timestamp - e1.Timestamp).TotalSeconds AS diff_sec,
(e2.Timestamp - e1.Timestamp).ToString() AS diff_hhmmss,
e1.Timestamp AS ts1,
e2.Timestamp AS ts2
FROM #e1 AS e1
INNER JOIN #e2 AS e2 ON e1.rn == e2.rn;
OUTPUT #working TO "/output/output.csv"
USING Outputters.Csv(quoting:false);
My results, showing 4 and 3 minutes for sample data:
Will that work for you? If not, please provide a more realistic data sample.
#data =
SELECT
LAST_VALUE(Event == "e1" ? Timestamp : (DateTime?)null) OVER (ORDER BY Timestamp) AS E1Time
// MAX(Event == "e1" ? Timestamp : DateTime.MinValue) OVER (ORDER BY Timestamp) AS E1Time
, Timestamp AS E2Time
FROM #events
HAVING Event == "e2"
;
because aggregates/WFs ignore null (at least they should, U-SQL documentation for LAST_VALUE doesn't say, so needs verification). This allows emulation of conditional behavior such as WHEN. Similar behavior can be obtained with MAX/MIN and an appropriate default.
That said, you should spec the input data and expected result in detail, which may alter the solution. Namely, can aberrant data sequences occur and what behavior is expected (or at least tolerated for the sake of simplicity) if they do:
e1, e1, e2 - Above code ignores earlier e1
e1, e2, e2 - Above code computes 2 values wrt the same e1
e1, e1, e2, e2 - Above code doesn't recognize nesting, same as case 2.
e2 - Above code may crash (null) or throw results off by using DateTime.MinValue.
etc. At some point of complexity you'd probably have to defer to a custom reducer via REDUCE ALL (this is a last resort!), but that would restrict the size of data that can be processed.
Consider a relation S with attributes A , B , C , and D . Write an SQL query that returns an empty answer if and only if the functional dependency AB-->C holds on relation S . (It is not important what your query returns when the functional dependency does not hold on S , as long as the query result is not empty in this case.) Assume that no NULL values are present.
My question is how to return an empty answer and how to correct my part if it's wrong.
Select
From S AS S1, S As S2,
Where (S1.C!=S2.C) AND (S1.A=S2.A) AND (S1.B=S2.B)
... iff each value of (a,b) is associated with exactly one value of (c).
The check that is needed is whether there are any value of the tuple (a,b) that are related to more than one value of c.
To demonstrate that the functional dependency does not hold, we would need to demonstrate a counterexample.
Here are a couple of simple examples.
Functional dependency (a,b)->(c) holds
a b c d
-- -- -- --
2 3 5 42
2 3 5 42
Functional dependency does not hold
a b c d
-- -- -- --
2 3 7 42
2 3 11 42
If the functional dependency does not hold, that requires that the same value of (a,b) correspond to different values of (c).
Several queries are possible. Here is one example:
SELECT s.a
, s.b
FROM s
GROUP
BY s.a
, s.b
HAVING NOT ( MIN(s.c) <=> MAX(s.c) )
I have a table whose columns are dynamic, except one column:A. The table also has some null values (0n) in it. How do I add another column that shows total of each row and either ignores the column that has "0n" in that particular row or takes 0 in its place.
Here is my code, it fails on sum and also does not ignore nulls.
addTotalCol:{[]
table:flip`A`B`C`D!4 4#til 9;
colsToSum: string (cols table) except `A; / don't sum A
table: update Total: sum (colsToSum) from table; / type error here. Also check for nulls
:table;
}
I think it is better to use functional update in your case:
addTotalCol:{[]
table:flip`A`B`C`D!4 4#til 9;
colsToSum:cols[table] except `A; / don't sum A
table:![table;();0b;enlist[`Total]!enlist(sum;enlist,colsToSum)];
:table;
}
Reason why it is not working is because your fourth line is parsed as:
table: update Total: sum (enlist"B";enlist"C";enlist"D") from table;
Since sum only works with numbers, it returns 'type error since your inputs are string.
Another solution to use colsToSum as string input:
addTotalCol:{[]
table:flip`A`B`C`D!4 4#til 9;
colsToSum:string cols[table] except `A; / don't sum A
table:get"update Total:sum(",sv[";";colsToSum],") from table"
:table;
}
Basically this will build the query in string before it is executed in q.
Still, functional update is preferred though.
EDIT: Full answer to sum 0n:
addTotalCol:{[]
table:flip`A`B`C`D!4 4#0n,til 9;
colsToSum:cols[table] except `A; / don't sum A
table:![table;();0b;enlist[`Total]!enlist(sum;(^;0;enlist,colsToSum))];
:table;
}
I think there is a cleaner version here without a functional form.
q)//let us build a table where our first col is symbols and the rest are numerics,
/// we will exclude first from row sums
q)t:flip `c0`c1`c2`c3!(`a`b`c`d;1 2 3 0N;0n 4 5 6f;1 2 3 0Nh)
q)//columns for sum
q)sc:cols[t] except `c0
q)///now let us make sure we fill in each column with zero,
/// add across rows and append as a new column
q)show t1:t,'flip enlist[`sumRows]!enlist sum each flip 0^t sc
c0 c1 c2 c3 sumRows
-------------------
a 1 1 2
b 2 4 2 8
c 3 5 3 11
d 6 6
q)meta t1
c | t f a
-------| -----
c0 | s
c1 | i
c2 | f
c3 | h
sumRows| f
I`m working on a project that is examining Colorectal Cancer. And i have some data that i want to filter with an Sql query. The problem is that some of the experiments have failed so that some of the samples are in another folder pointed with _2 or _3 if its the second or third time this experiment was tried.
My data looks as follows:
So i want to check if Sentrix_ID exists more than 1 time if so than the query should take the latest version _2 or _3 and so one. Is there any Query that will do that for me.
And the raw_data:
ID Sample_Name Sample_Code Sample Sentrix_ID NorTum Pool_ID Sample_Group Sentrix_Position Folderdate opmerkingen
1835 99-02872T2 99-02872T2 99-02872 1495455 T2 GS0006564-OPA HNPCC_UV R001_C010 Exp060501 MLH1-UV
1836 97-5332T1 97-5332T1 97-5332 1495455 T1 GS0006564-OPA MUTYH R001_C011 Exp060501 1105delC_G382D
1827 R02-81709N R02-81709N R02-81709 1495455 N GS0006563-OPA HNPCC_UV R001_C002 Exp060501 MSH2
492 t03-32120 N t03-32120 N t03-32120 1495455_2 N GS0006563-OPA TEM_TME_LOH R004_C005 Exp060920
484 t02-27628 N t02-27628 N t02-27628 1495455_2 N GS0006563-OPA TEM_TME_LOH R004_C004 Exp060920
478 t03-06297 B2 t03-06297 B2 t03-06297 1495455_2 B2 GS0006563-OPA TEM_TME_LOH R006_C003 Exp060920
479 t03-06297 B3 t03-06297 B3 t03-06297 1495455_2 B3 GS0006563-OPA TEM_TME_LOH R007_C003 Exp060920
Thanks in advance
So you want to parse the value in your sentrix_id column into a prefix and suffix? You can use the instr() function to identify the position of the underscore, if any, unless you know the length of the prefix is always the same.
Try something like
select *,
left(sentrix_id,8) as prefix,
iif(len(sentrix_id) = 8 ,"",mid(sentrix_id,10)) as suffix
from table
if you create a new query with equivalent sql, you should see two additional rows in your table, one for the ID and one for the version suffix.
Then, using those query results in a second new query, you can group by prefix and return the max(suffix).
SELECT TOP 1 * FROM [YourTable] WHERE [Sentrix_ID] LIKE '<value>%' ORDER BY [Sentrix_ID] DESC
Would that work for your problem?
SELECT *
FROM (
SELECT count(Sentrix_ID) AS senId
FROM tableName
GROUP BY Sentrix_ID
) AS m
WHERE senId = 1;
This query will results if Sentrix_ID exists more than 1 it will give the latest version but in above question it will show you the empty table because Sentrix_id(1495455 & 1495455_2) has repeated more than one times. So if you have _3 as one entry in the table means it will give the _3 data.