I just started to use the Kusto query language. Still trying to grasp all of it.
So I have a query to get some SignIn events with a timestamp.
But I'm only interested in the unique values with the most recent date.
Distinct is not an option because all rows are different due to this timestamp.
Also the query returns too many results so it can't be processed.
Query to get all the logs is:
SigninLogs
| project TimeGenerated, Identity, UserPrincipalName, Location, DeviceDetail
You should be able to use the arg_max() aggregation function: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/arg-max-aggfunction.
And if that is done frequently - consider creating a materialized view with that logic: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/management/materialized-views/materialized-view-overview
For example:
datatable(a:int, b:int, c:string, d:datetime)
[
1, 2, "3", datetime(2021-09-08 15:05:10),
4, 5, "6", datetime(2021-09-08 15:05:17),
4, 5, "6", datetime(2021-09-08 15:05:43),
1, 2, "3", datetime(2021-09-08 15:05:27),
1, 2, "4", datetime(2021-09-08 15:05:53),
]
| summarize arg_max(d, *) by a, b, c
a
b
c
d
1
2
3
2021-09-08 15:05:27.0000000
4
5
6
2021-09-08 15:05:43.0000000
1
2
4
2021-09-08 15:05:53.0000000
Related
I need to build a sales pipeline with one query in SQL (Big Query).
The table has columns:
-timestamp (event time)
-id (user id)
-event
Each event is a number from 1 to 8. And I need to calculate how many unique users there were at each step.
Each step is counted only if the previous steps have been completed.
It is not necessary to go through them straight one after the other, but the main thing is that before each step, n-1 step was taken earlier.
If you sort the table by 'timestamp', you often get such sequences for one 'id' at one day:
4, 4, 1, 1, 3, 6, 5, 5, 6, 5, 6, 7, 8, 1, 2, 5, 3, 4.
In this example, the longest sequence is 1, 2, 3, 4.
The sequence is counted in one day!
I failed to solve the problem through the max/min/lag/lead window functions. I even did a 'case' with a sequential comparison with lag+n values.
I wasted 2 days for this task(
I have a column in a table which contains, for each row, a JSONarray. I need to extract certain the same elements from it for each row, however, as it is an array, the order of the elements inside the array is not always the same and I can't call these elements by their names.Is there a way for me to do a for loop or something similar that goes through every index of the array and when it doesn't return null it breaks?
An extension to Lukasz's great answer:
With a CTE with a couple of rows of "id, json" we can see how FLATTEN pulls it apart:
WITH fake_data(id, json) as (
SELECT column1, parse_json(column2) FROM VALUES
(1, '[1,2,3]'),
(2, '{"4":4, "5":5}')
)
SELECT t.*
,f.*
FROM fake_data AS t
,LATERAL FLATTEN(INPUT => t.json) f
ID
JSON
SEQ
KEY
PATH
INDEX
VALUE
THIS
1
[ 1, 2, 3 ]
1
[0]
0
1
[ 1, 2, 3 ]
1
[ 1, 2, 3 ]
1
[1]
1
2
[ 1, 2, 3 ]
1
[ 1, 2, 3 ]
1
[2]
2
3
[ 1, 2, 3 ]
2
{ "4": 4, "5": 5 }
2
4
['4']
4
{ "4": 4, "5": 5 }
2
{ "4": 4, "5": 5 }
2
5
['5']
5
{ "4": 4, "5": 5 }
The Flatten gives seq, key, path, index, value and this
Seq : is the row of the input, which is super useful if you are pulling rows apart and want to merge them back together, but not mix up different rows.
Key : is the name of the property if the thing being FLATTEN'ed was an object, which is the case for the second row.
Path : is how that value could be accessed. aka t.json[2] would with you 3
Index : is the step into the object if it's an array
Value: is the value
This: is the thing that getting looped, useful for get things like the next one, etc.
There is no need to know the size of array:
CREATE OR REPLACE TABLE tab_name
AS
SELECT 1 AS id, PARSE_JSON('[1,2,3]') AS col_array
UNION ALL
SELECT 2 AS id, PARSE_JSON('[1]') AS col_array;
Query:
SELECT t.id
,f.INDEX
,f.VALUE
FROM tab_name t
, LATERAL FLATTEN(INPUT => t.col_array) f
-- WHERE f.VALUE::INT = 1;
Output:
Lateral flatten can help extract the fields of a JSON object and is a very good alternative to extracting them one by one using the respective names. However, sometimes the JSON object can be nested and normally extracting those nested objects requires knowing their names.
Here is an article that might help you to DYNAMICALLY EXTRACT THE FIELDS OF A MULTI-LEVEL JSON OBJECT USING LATERAL FLATTEN
I have a Rails 4.2.5.x project running PostGres. I have a table with a similar structure to this:
id, contact_id, date, domain, f1, f2, f3, etc
1, ABC, 01-01-16, abc.com, 1, 2, 3, ...
2, ABC, 01-01-15, abc.com, 1, 2, 3, ...
3, ABC, 01-01-14, abc.com, 1, 2, 3, ...
4, DEF, 01-01-15, abc.com, 1, 2, 3, ...
5, DEF, 01-01-14, abc.com, 1, 2, 3, ...
6, GHI, 01-11-16, abc.com, 1, 2, 3, ...
7, GHI, 01-01-16, abc.com, 1, 2, 3, ...
8, GHI, 01-01-15, abc.com, 1, 2, 3, ...
9, GHI, 01-01-14, abc.com, 1, 2, 3, ...
...
...
99, ZZZ, 01-01-16, xyz.com, 1, 2, 3, ...
I need to query to find:
The most recent rows by date
filtered by domain
for a distinct contact_id (grouped by?)
row-limited result. In this example, I'm not adding this complication but this needs to be factored in. If there are 50 distinct contacts, I am only interested in the top 3 by date.
ID is the primary key.
there are indexes on the other columns
the fX columns indicate other data in the model that is needed (such as contact email, for example).
In MySQL, this would be a simple SELECT * FROM table WHERE domain='abc.com' GROUP BY contact_id ORDER BY date DESC, however, PostGres complains, in this case, that:
ActiveRecord::StatementInvalid: PG::GroupingError: ERROR: column "table.id" must appear in the GROUP BY clause or be used in an aggregate function
I expect to get back 3 rows; 1, 4 and 6. Ideally, I'd like to get back the full rows in a single query... but I accept that I may need to do one query to get the IDs first, then another to find the items I want.
This is the closest I have got:
ExampleContacts
.select(:contact_id, 'max(date) AS max_date')
.where(domain: 'abc.com')
.group(:contact_id)
.order('max_date desc')
.limit(3)
However... this returns the contact_id, not the id. I cannot add the ID for the row.
EDIT:
Essentially, I need to get the primary key back for the row which is grouped on the non-primary key and sorted by another field.
If you want the rows, you don't need grouping. It's simply Contact.select('DISTINCT ON (contact_id)').where(domain: 'abc.com').order(date: :desc).limit(3)
Just to clarify #murad-yusufov's accepted answer, I ended up doing this:
subquery = ExampleContacts.select('DISTINCT ON (contact_id) *')
.where(domain: 'abc.com')
.order(contact_id)
.order(date: :desc)
ExampleContacts.from("(#{subquery.to_sql}) example_contacts")
.order(date: :desc)
I am a fairly experienced SQL Server developer but this problem has me REALLY stumped.
I have a FUNCTION. The function is referencing a table that is something like this...
PERFORMANCE_ID, JUDGE_ID, JUDGING_CRITERIA, SCORE
--------------------------------------------------
101, 1, 'JUMP_HEIGHT', 8
101, 1, 'DEXTERITY', 7
101, 1, 'SYNCHRONIZATION', 6
101, 1, 'SPEED', 9
101, 2, 'JUMP_HEIGHT', 6
101, 2, 'DEXTERITY', 5
101, 2, 'SYNCHRONIZATION', 8
101, 2, 'SPEED', 9
101, 3, 'JUMP_HEIGHT', 9
101, 3, 'DEXTERITY', 6
101, 3, 'SYNCHRONIZATION', 7
101, 3, 'SPEED', 8
101, 4, 'JUMP_HEIGHT', 7
101, 4, 'DEXTERITY', 6
101, 4, 'SYNCHRONIZATION', 5
101, 4, 'SPEED', 8
In this example there are 4 judges (with IDs 1, 2, 3, and 4) judging a performance (101) against 4 different criteria (JUMP_HEIGHT, DEXTERITY, SYNCHRONIZATION, SPEED).
(Please keep in mind that in my real data there are 10+ criteria and at least 6 judges.)
I want to aggregate the results in a score BY JUDGING_CRITERIA and then aggregate those into a final score by summing...something like this...
SELECT SUM (Avgs) FROM
(SELECT AVG(SCORE) Avgs
FROM PERFORMANCE_SCORES
WHERE PERFORMANCE_ID=101
GROUP BY JUDGING_CRITERIA) result
BUT... that is not quite what I want IN THAT I want to EXCLUDE from the AVG the highest and lowest values for each JUDGING_CRITERIA grouping. That is the part that I can't figure out. The AVG should be applied only to the MIDDLE values of the GROUPING FOR EACH JUDGING_CRITERIA. The HI value and the LO value for JUMP_HEIGHT should not be included in the average. The HI value and the LO value for DEXTERITY should not be included in the average. ETC.
I know this could be accomplished with a cursor to set the hi and lo for each criteria to NULL. But this is a FUNCTION and should be extremely fast.
I am wondering if there is a way to do this as a SET operation but still automatically exclude HI and LO from the aggregation?
Thanks for your help. I have a feeling it can probably be done with some advanced SQL syntax but I don't know it.
One last thing. This example is actually a simplification of the problem I am trying to solve. I have other constraints not mentioned here for the sake of simplicity.
Seth
EDIT: -Moved the WHERE clause to inside the CTE.
-Removed JudgeID from the partition
This would be my approach
;WITH Agg1 AS
(
SELECT PERFORMANCE_ID
,JUDGE_ID
,JUDGING_CRITERIA
,SCORE
,MinFind = ROW_NUMBER() OVER ( PARTITION BY PERFORMANCE_ID
,JUDGING_CRITERIA
ORDER BY SCORE ASC )
,MaxFind = ROW_NUMBER() OVER ( PARTITION BY PERFORMANCE_ID
,JUDGING_CRITERIA
ORDER BY SCORE DESC )
FROM PERFORMANCE_SCORES
WHERE PERFORMANCE_ID=101
)
SELECT AVG(Score)
FROM Agg1
WHERE MinFind > 1
AND MaxFind > 1
GROUP BY JUDGING_CRITERIA
I have some data I am querying. The table is composed of two columns - a unique ID, and a value. I would like to count the number of times each unique value appears (which can easily be done with a COUNT and GROUP BY), but I then want to be able to count that. So, I would like to see how many items appear twice, three times, etc.
So for the following data (ID, val)...
1, 2
2, 2
3, 1
4, 2
5, 1
6, 7
7, 1
The intermediate step would be (val, count)...
1, 3
2, 3
7, 1
And I would like to have (count_from_above, new_count)...
3, 2 -- since three appears twice in the previous table
1, 1 -- since one appears once in the previous table
Is there any query which can do that? If it helps, I'm working with Postgres. Thanks!
Try something like this:
select
times,
count(1)
from ( select
id,
count(distinct value) as times
from table
group by id ) a
group by times