How to get the latest row per value? - kql

I have the following kusto query to display columns from a SQL table for the past 7 days as follows:
customEvents
| where name == "TrackNullColumns"
| project timestamp,
Column = tostring(customDimensions["Column"]),
CurrentNullColumns = toint(customDimensions["CurrentNull"]),
PreviousNullColumns = toint(customDimensions["PreviousNull"]),
CurrentRows = toint(customDimensions["CurrentRows"]),
PreviousRows = toint(customDimensions["PreviousRows"])
| distinct timestamp, Column, CurrentNullColumns, PreviousNullColumns, CurrentRows, PreviousRows
| extend Delta = CurrentNullColumns - PreviousNullColumns
| project timestamp, Column, CurrentNullColumns, PreviousNullColumns, Delta
| order by Delta desc
Here is the result:
What I noticed is that the same column is repeated multiple times to display distinct values. How do I modify this query to display column only once with latest timestamp value?

arg_max()
customEvents
| where name == "TrackNullColumns"
| project timestamp,
Column = tostring(customDimensions["Column"]),
CurrentNullColumns = toint(customDimensions["CurrentNull"]),
PreviousNullColumns = toint(customDimensions["PreviousNull"]),
CurrentRows = toint(customDimensions["CurrentRows"]),
PreviousRows = toint(customDimensions["PreviousRows"])
| summarize arg_max(timestamp, *) by Column
| extend Delta = CurrentNullColumns - PreviousNullColumns
| project timestamp, Column, CurrentNullColumns, PreviousNullColumns, Delta
| order by Delta desc

Related

PostgreSQL query to select records which a specific value doesn't include in text array

I have a table like this
| id | data |
|---------------|---------------------|
| org:abc:basic | {org,org:abc:basic} |
| org:xyz:basic | {org,basic} |
| org:efg:basic | {org} |
I need to write a query to select all the rows which doesn't have the id inside the data column.
Or at least I need to query all the records which doesn't have a text starting from org: and ending with :basic within data.
Currently for this I try to run
SELECT * FROM t_permission WHERE 'org:%:basic' NOT LIKE ANY (data)
query which returns everything even the first row.
you can use the <> operator with ALL against the array:
select *
from the_table
where id <> all(data);

Aggregate multiple column result into one column by taking the first non-empty value

I have a query like so:
Events
| where EventType == 'test' | take 1000 | project DeviceId | distinct DeviceId
| join kind=inner (Events | where EventType == 'test2' | take 1000 | project DeviceId | distinct DeviceId) on DeviceId
| join kind=inner (Events | where EventType == 'test3' | take 1000 | project DeviceId | distinct DeviceId) on DeviceId
| project-rename DeviceId0 = DeviceId
Which will return the following result:
There may be multiple columns, and based on the join type (inner/outer) the column may or may not have value
My question is does Kusto provide a way for me to aggregate the result into just 1 column DeviceId, where it contains the first non-empty value.
For example, if I have 3 columns DeviceId0, DeviceId1, DeviceId2 where the values is:
(d1, d1, d1) => return d1
(d1, null, null) => return d1
The column will not have different value, and at least one of them will have value (because of the join on DeviceId column)
coalesce() does exactly what you want (see doc).
So in your case, you should use:
| project coalesce(DeviceId1, DeviceId2, DeviceId3)

Sqlite query by timestamp and value

I have a Sqlite table with the following rows:
id: int PK autoincrement
timestamp: int value NOT NULL. Timestamp of the DB insertion
value: int value NOT NULL. Possible values [0-4].
I want to query the database to obtain if all the values on the database for the registers contained within the 60 seconds before the given timestamp have the same value. For instance:
id | timestamp | value
1 | 1594575090 | 1
2 | 1594575097 | 1
3 | 1594575100 | 1
4 | 1594575141 | 2
5 | 1594575145 | 2
6 | 1594575055 | 3
7 | 1594575060 | 4
In this case, if I made the expected query for the registers contained on the 60 seconds before the register 3 (including the register 3), it should query if the value of the registers [1,2, 3] are the same, which should return 1.
On the other side, if this query was done with register 7, it will compare value of registers [4,5,6,7] and it should return 0, as this value is not the same for the three of them.
Any guesses of how can I perform this query?
I think that you want this:
select count(distinct value) = 1 result
from tablename
where id <= ?
and timestamp - (select timestamp from tablename where id = ?) <= 60;
Replace the ? placeholder with the id that you want the results for.
Maybe you want the absolute value of the difference of the timestamps to be less than 60, so if this is the case then change to:
and abs(timestamp - (select timestamp from tablename where id = ?)) <= 60;
See the demo.
Hmmm . . . I think the logic you are describing is:
select ( min(value) = max(value) ) as all_same
from t cross join
(select t.*
from t
where t.id = ?
) tt
where t.timestamp < tt.timestamp and
t.timestamp >= tt.timestamp - 60

Compare one row of a table to every rows of a second table

I am trying to retrieve the number of days between a random date and the next known date for a holiday. Let's say my first table looks like this :
date | is_holiday | zone
9/11/18 | 0 | A
22/12/18 | 1 | A
and my holidays table looks like this
start_date | end_date | zone
20/12/18 | 04/01/18 | A
21/12/18 | 04/01/18 | B
...
I want to be able to know how many days are between an entry that is not a holiday in the first table and the next holiday date.
I have tried to get the next row with a later date in a join clause but the join isn't the tool for this task. I also have tried grouping by date and comparing the date with the next row but I can have multiple entries with the same date in the first table so it doesn't work.
This is the join clause I have tried :
SELECT mai.*, vac.start_date, datediff(vac.start_date, mai.date)
FROM (SELECT *
FROM MAIN
WHERE is_holiday = 0
) mai LEFT JOIN
(SELECT start_date, zone
FROM VACATIONS_UPDATED
ORDER BY start_date
) vac
ON mai.date < vac.start_date AND mai.zone = vac.zone
I expect to get a table looking like this :
date | is_holiday | zone | next_holiday
9/11/18 | 0 | A | 11
22/12/18 | 1 | A | 0
Any lead on how to achieve this ?
It might get messy to do it in SQL but if in case you are open to doing it from code, here is what it should look like. You basically need a crossJoin
Dataset<Row> table1 = <readData>
Dataset<Row> holidays = <readData>
//then cache the small table to get the best performance
table1.crossJoin( holidays ).filter("table1.zone == holidays.zone AND table1.date < holidays.start_date").select( "table1.*", "holidays.start_date").withColumn("nextHoliday", *calc diff*)
In scenarios where one row from table1 matches multiple holidays, then you can add an id column to table1 and then group the crossJoin.
// add unique id to the rows
table1 = table1.withColumn("id", functions.monotonically_increasing_id() )
Some details on crossJoins:
http://kirillpavlov.com/blog/2016/04/23/beyond-traditional-join-with-apache-spark/

CASE...WHEN in WHERE clause in Postgresql

My query looks like:
SELECT *
FROM table
WHERE t1.id_status_notatka_1 = ANY (selected_type)
AND t1.id_status_notatka_2 = ANY (selected_place)
here I would like to add CASE WHEN
so my query is:
SELECT *
FROM table
WHERE t1.id_status_notatka_1 = ANY (selected_type)
AND t1.id_status_notatka_2 = ANY (selected_place)
AND CASE
WHEN t2.id_bank = 12 THEN t1.id_status_notatka_4 = ANY (selected_effect)
END
but it doesn't work. The syntax is good but it fails in searching for anything. So my question is - how use CASE WHEN in WHERE clause. Short example: if a=0 then add some condition to WHERE (AND condition), if it's not then don't add (AND condition)
No need for CASE EXPRESSION , simply use OR with parenthesis :
AND (t2.id_bank <> 12 OR t1.id_status_notatka_4 = ANY (selected_effect))
For those looking to use a CASE in the WHERE clause, in the above adding an else true condition in the case block should allow the query to work as expected. In the OP, the case will resolve as NULL, which will result in the WHERE clause effectively selecting WHERE ... AND NULL, which will always fail.
SELECT *
FROM table
WHERE t1.id_status_notatka_1 = ANY (selected_type)
AND t1.id_status_notatka_2 = ANY (selected_place)
AND CASE
WHEN t2.id_bank = 12 THEN t1.id_status_notatka_4 = ANY (selected_effect)
ELSE true
END
The accepted answer works, but I'd like to share input for those who are looking for a different answer. Thanks to sagi, I've come up with the following query, but I'd like to give a test case as well.
Let us assume this is the structure of our table
tbl
id | type | status
-----------------------
1 | Student | t
2 | Employee | f
3 | Employee | t
4 | Student | f
and we want to select all Student rows, that have Status = 't', however, We also like to retrieve all Employee rows regardless of its Status.
if we perform SELECT * FROM tbl WHERE type = 'Student' AND status = 't' we would only get the following result, we won't be able to fetch Employees
tbl
id | type | status
-----------------------
1 | Student | t
and performing SELECT * FROM tbl WHERE Status = 't' we would only get the following result, we got an Employee Row on the result but there are Employee Rows that were not included on the result set, one could argue that performing IN might work, however, it will give the same result set. SELECT * FROM tbl WHERE type IN('Student', 'Employee') AND status = 't'
tbl
id | type | status
-----------------------
1 | Student | t
3 | Employee | t
remember, we want to retrieve all Employee rows regardless of its Status, to do that we perform the query
SELECT * FROM tbl WHERE (type = 'Student' AND status = 't') OR (type = 'Employee')
result will be
table
id | type | status
-----------------------
1 | Student | t
2 | Employee | f
3 | Employee | t