merge rows with the same timestamp into new column values - kql

I've got a bunch of log data from a KQL Table that I want to plot. Here's the simplified query:
MyDataTable
| where ['TimeGenerated'] >= ago(30m)
| summarize count=count() by bin(TimeGenerated, 15m), log_level
That gets me a table like this:
"TimeGenerated [UTC]"
"log_level"
count
"10/19/2022, 11:00:00.000 PM"
info
3527
"10/19/2022, 11:00:00.000 PM"
warn
33
"10/19/2022, 11:00:00.000 PM"
error
2
"10/19/2022, 11:15:00.000 PM"
info
5274
"10/19/2022, 11:15:00.000 PM"
warn
42
"10/19/2022, 11:15:00.000 PM"
error
5
"10/19/2022, 11:30:00.000 PM"
info
1553
"10/19/2022, 11:30:00.000 PM"
warn
15
"10/19/2022, 11:30:00.000 PM"
error
1
But I want to combine the entries with the same timestamp and put the count into different columns based on log_level. Essentially, I want to end up with this:
"TimeGenerated [UTC]"
info
warn
error
"10/19/2022, 11:00:00.000 PM"
3527
33
2
"10/19/2022, 11:15:00.000 PM"
5274
42
5
"10/19/2022, 11:30:00.000 PM"
1533
15
1
Any tips on how to do that transformation?
PS: An ideal solution would create new columns dynamically depending on the different values of log_level but if I have to use info/warn/error in the query that's better than the current situation.

Seems like you want a pivot
datatable(TimeGenerated:datetime, LogLevel:string, Count:long)
[
datetime(10/19/2022 11:00:00.000 PM), "info", 3527,
datetime(10/19/2022 11:00:00.000 PM), "warn", 33,
datetime(10/19/2022 11:00:00.000 PM), "error", 2,
datetime(10/19/2022 11:15:00.000 PM), "info", 5274,
datetime(10/19/2022 11:15:00.000 PM), "warn", 42,
datetime(10/19/2022 11:15:00.000 PM), "error", 5,
datetime(10/19/2022 11:30:00.000 PM), "info", 1553,
datetime(10/19/2022 11:30:00.000 PM), "warn", 15,
datetime(10/19/2022 11:30:00.000 PM), "error", 1,
]
| evaluate pivot(LogLevel, sum(Count))

the solution below uses:
bag_pack()
make_bag()
bag_unpack()
datatable(TimeGenerated:datetime, LogLevel:string, Count:long)
[
datetime(10/19/2022 11:00:00.000 PM), "info", 3527,
datetime(10/19/2022 11:00:00.000 PM), "warn", 33,
datetime(10/19/2022 11:00:00.000 PM), "error", 2,
datetime(10/19/2022 11:15:00.000 PM), "info", 5274,
datetime(10/19/2022 11:15:00.000 PM), "warn", 42,
datetime(10/19/2022 11:15:00.000 PM), "error", 5,
datetime(10/19/2022 11:30:00.000 PM), "info", 1553,
datetime(10/19/2022 11:30:00.000 PM), "warn", 15,
datetime(10/19/2022 11:30:00.000 PM), "error", 1,
]
| extend p = pack(LogLevel, Count)
| summarize b = make_bag(p) by TimeGenerated
| evaluate bag_unpack(b)
TimeGenerated
error
info
warn
2022-10-19 23:00:00.0000000
2
3527
33
2022-10-19 23:15:00.0000000
5
5274
42
2022-10-19 23:30:00.0000000
1
1553
15

Related

How to filter a dataframe given a specific daily hour?

Given the two data frames:
df1:
datetime v
2020-10-01 12:00:00 15
2020-10-02 4
2020-10-03 07:00:00 3
2020-10-03 08:01:00 51
2020-10-03 09:00:00 9
df2:
datetime p
2020-10-01 11:00:00 1
2020-10-01 12:00:00 2
2020-10-02 13:00:00 14
2020-10-02 13:01:00 5
2020-10-03 20:00:00 12
2020-10-03 02:01:00 30
2020-10-03 07:00:00 7
I want to merge these two dataframes into one, and the policy is looking up the nearest value around 08:00 daily. The final result should be
datetime v p
2020-10-01 08:00:00 15 1
2020-10-02 08:00:00 4 14
2020-10-03 08:00:00 51 7
How can I implement this?
Given the following dataframes:
import pandas as pd
df1 = pd.DataFrame(
{
"datetime": [
"2020-10-01 12:00:00",
"2020-10-02",
"2020-10-03 07:00:00",
"2020-10-03 08:01:00",
"2020-10-03 09:00:00",
],
"v": [15, 4, 3, 51, 9],
}
)
df2 = pd.DataFrame(
{
"datetime": [
"2020-10-01 11:00:00",
"2020-10-01 12:00:00",
"2020-10-02 13:00:00",
"2020-10-02 13:01:00",
"2020-10-03 20:00:00",
"2020-10-03 02:01:00",
"2020-10-03 07:00:00",
],
"p": [1, 2, 14, 5, 12, 30, 7],
}
)
You can define a helper function:
def align(df):
# Set proper type
df["datetime"] = pd.to_datetime(df["datetime"])
# Slice df by day
dfs = [
df.copy().loc[df["datetime"].dt.date == item, :]
for item in df["datetime"].dt.date.unique()
]
# Evaluate distance in seconds between given hour and 08:00:00 and filter on min
for i, df in enumerate(dfs):
df["target"] = pd.to_datetime(df["datetime"].dt.date.astype(str) + " 08:00:00")
df["distance"] = (
df["target"].map(lambda x: x.hour * 3600 + x.minute * 60 + x.second)
- df["datetime"].map(lambda x: x.hour * 3600 + x.minute * 60 + x.second)
).abs()
dfs[i] = df.loc[df["distance"].idxmin(), :]
# Concatenate filtered dataframes
df = (
pd.concat(dfs, axis=1)
.T.drop(columns=["datetime", "distance"])
.rename(columns={"target": "datetime"})
.set_index("datetime")
)
return df
To apply on df1 and df2 and then merge:
df = pd.merge(
right=align(df1), left=align(df2), how="outer", right_index=True, left_index=True
).reindex(columns=["v", "p"])
print(df)
# Output
v p
datetime
2020-10-01 08:00:00 15 1
2020-10-02 08:00:00 4 14
2020-10-03 08:00:00 51 7

Postgres find unique values ​in json

I am using Postgresql and have a table, with id, sender::jsonb and date, as follows:
id | sender | last login date
----+-----------------------------------------------------------------------------------+----------------------------------
1 | {"firstName": "Ivan", "lastName": "Ivanov", "middleName": "Ivanovich", } | 2021-04-10 14:49:36.234504 +00:00
2 | {"firstName": "Ivan", "lastName": "Ivanov", "middleName": "Ivanovich", } | 2021-04-09 14:49:36.234504 +00:00
3 | {"firstName": "Ivan", "lastName": "Ivanov", "middleName": "Ivanovich", } | 2021-04-11 14:49:36.234504 +00:00
4 | {"firstName": "Nickolai","lastName": "Nickov", "middleName": "Nikovich", } | 2021-04-30 14:49:36.234504 +00:00
5 | {"firstName": "Nickolai","lastName": "Nickov", "middleName": "Nikovich", } | 2021-04-29 14:49:36.234504 +00:00
6 | {"firstName": "Vladimir","lastName": "Vladimirovich","middleName": "Putout", } | 2021-04-15 14:49:36.234504 +00:00
7 | {"firstName": "Petr", "lastName": "Petrov", "middleName": "Petrovich", } | 2021-04-10 14:49:36.234504 +00:00
8 | {"firstName": "Ivan", "lastName": "Ivanov", "middleName": "Ivanovich", } | 2021-04-01 14:49:36.234504 +00:00
9 | {"firstName": "Ignat", "lastName": "Ignatov", "middleName": "Ignatovich", }| 2021-04-06 14:49:36.234504 +00:00
10| {"firstName": "Vladimir","lastName": "Vladimirovich","middleName": "Putout", } | 2021-04-17 14:49:36.234504 +00:00
11| {"firstName": "Ivan", "lastName": "Ivanov", "middleName": "Ivanovich", } | 2021-04-12 14:49:36.234504 +00:00
p.s.There may be other information in the "sender" column, but the search for uniqueness is only necessary by "firstName", "lastName", "midddleName"
It is necessary to return a result consisting of unique names and with the latest date. In particular, I want to get the result:
id | sender | last login date
----+-----------------------------------------------------------------------------------+----------------------------------
4 | {"firstName": "Nickolai","lastName": "Nickov", "middleName": "Nikovich", } | 2021-04-30 14:49:36.234504 +00:00
10| {"firstName": "Vladimir","lastName": "Vladimirovich","middleName": "Putout", } | 2021-04-17 14:49:36.234504 +00:00
11| {"firstName": "Ivan", "lastName": "Ivanov", "middleName": "Ivanovich", } | 2021-04-12 14:49:36.234504 +00:00
7 | {"firstName": "Petr", "lastName": "Petrov", "middleName": "Petrovich", } | 2021-04-10 14:49:36.234504 +00:00
9 | {"firstName": "Ignat", "lastName": "Ignatov", "middleName": "Ignatovich", }| 2021-04-06 14:49:36.234504 +00:00
Everything is very complicated by the fact that json is used. I had thoughts to do - "name" concatenation and perform group by and sorting, but unfortunately it does not work.
You can use distinct on() to do this:
select distinct on (firstname, lastname) id, sender, last_login_date
from (
select id, sender, last_login_date,
sender ->> 'firstName' as firstname,
sender ->> 'lastName' as lastname
from the_table
) t
order by firstname, lastname, last_login_date desc
you can do it using window function :
select * from
(
select * ,rank() over (partition by sender->> 'firstName',sender->> 'lastName' order by last_login_date desc) rn
from yourtable
) t
where rn = 1
order by last_login_date desc
db<>fiddle here

sort dataframe by other dataframe

Given a df's
knn_df
0 1 2 3
0 1.1565523 1.902790 1.927971 1.1530536
1 1.927971 1.1565523 1.815097 1.1530536
2 1.902790 1.1565523 1.815097 1.927971
3 1.815097 1.927971 1.902790 1.1530536
4 1.902790 1.1565523 1.815097 1.1530536
dates_df
0 1 2 3
0 2011-11-14 02:30:00.601 2003-08-12 00:00:00.000 2003-11-30 23:00:00.000 2011-10-25 12:00:00.000
1 2003-11-30 23:00:00.000 2011-11-14 02:30:00.601 2002-08-06 00:00:00.000 2011-10-25 12:00:00.000
2 2003-08-12 00:00:00.000 2011-11-14 02:30:00.601 2002-08-06 00:00:00.000 2003-11-30 23:00:00.000
3 2002-08-06 00:00:00.000 2003-11-30 23:00:00.000 2003-08-12 00:00:00.000 2011-10-25 12:00:00.000
4 2003-08-12 00:00:00.000 2011-11-14 02:30:00.601 2002-08-06 00:00:00.000 2011-10-25 12:00:00.000
I have to sort the values of knn_df be the dates of the dates_df.
Every row in dates_df correspond to row in knn_df
I tried to sort like this.
np.argsort(dates_df.values,axis=1)[:,::-1]
array([[0, 3, 2, 1],
[1, 3, 0, 2],
[1, 3, 0, 2],
[3, 1, 2, 0],
[1, 3, 0, 2]])
Which give the right order of the values by columns, But when i tried to reorder
Sorted_knn = (knn_df.values[np.arange(len(knn_df)),
np.argsort(dates_df.values,axis=1)[:,::-1]])
I get an error
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (5,) (5,4)
I am missing something...
Add [:, None] for making two-dimensional 5x1 array for correct broadcasting:
a = np.argsort(dates_df.values,axis=1)[:,::-1]
b = knn_df.values[np.arange(len(knn_df))[:, None], a]
print (b)
[[1.1565523 1.1530536 1.927971 1.90279 ]
[1.1565523 1.1530536 1.927971 1.815097 ]
[1.1565523 1.927971 1.90279 1.815097 ]
[1.1530536 1.927971 1.90279 1.815097 ]
[1.1565523 1.1530536 1.90279 1.815097 ]]

How to build SQL for get grouped data by date

I have data in Postgres DataBase like this
| id | name | start_date | end_date |
1 Event1 2018-09-14 14:22:00 2018-09-15 14:22:00
2 Event2 2018-09-15 14:22:00 2018-09-15 15:22:00
I need SQL which return me response group_by date and If Event duration (end_date, start_date) took 2 days i need return him twice in two days array and this all should be order by date. So response should look like this.
{
"2018-09-14": [
{
"id": 1,
"name": "Event1",
"start_date": "2018-09-14 14:22:00",
"end_date": "2018-09-15 14:22:00",
}],
"2018-09-15": [{
"id": 1,
"name": "Event1",
"start_date": "2018-09-14 14:22:00",
"end_date": "2018-09-15 14:22:00",
},
{
"id": 2,
"name": "Event2",
"start_date": "2018-09-15 14:22:00",
"end_date": "2018-09-15 15:22:00",
}]
}
Could you help me with this SQL?
demo: db<>fiddle
SELECT
jsonb_object_agg(dates, data_array)
FROM (
SELECT
dates,
jsonb_agg(data) as data_array
FROM (
SELECT DISTINCT
unnest(ARRAY[start_date::date, end_date::date]) as dates,
row_to_json(events)::jsonb as data
FROM
events
)s
GROUP BY dates
) s
Convert table into json object with row_to_json.
aggregating both dates into one array with ARRAY[]
unnest() expands the data with every single date.
The result so far:
dates data
2018-09-14 {"id": 1, "name": "Event1", "end_date": "2018-09-15 14:22:00", "start_date": "2018-09-14 14:22:00"}
2018-09-15 {"id": 1, "name": "Event1", "end_date": "2018-09-15 14:22:00", "start_date": "2018-09-14 14:22:00"}
2018-09-15 {"id": 2, "name": "Event2", "end_date": "2018-09-15 15:22:00", "start_date": "2018-09-15 14:22:00"}
2018-09-15 {"id": 2, "name": "Event2", "end_date": "2018-09-15 15:22:00", "start_date": "2018-09-15 14:22:00"}
DISTINCT eliminates all tied elements where start_date == end_date.
grouping by the dates, aggregating the json elements into an json array (jsonb_agg)
after all, grouping the table into json elements (jsonb_object_agg) with key == date and value == json array
If you just want rows you only need the these steps:
aggregating both dates into one array with ARRAY[]
unnest() expands the data with every single date.
DISTINCT eliminates all tied elements where start_date == end_date.
Query:
SELECT DISTINCT
unnest(ARRAY[start_date::date, end_date::date]) as dates,
*
FROM
events
Result:
dates id name start_date end_date
2018-09-14 1 Event1 2018-09-14 14:22:00 2018-09-15 14:22:00
2018-09-15 1 Event1 2018-09-14 14:22:00 2018-09-15 14:22:00
2018-09-15 2 Event2 2018-09-15 14:22:00 2018-09-15 15:22:00
demo:db<>fiddle

Rendering multi series line chart with time on x axis in AnyChart

I am trying Anychart line chart with multi series with date time on x axis. Not able to render the chart perfectly. Its drawing the series 1 with the given data then for the second series its rendering the date time freshly on x axis after the 1st series values then plotting the series 2.
data is like:
"data": [
{"x": "10/2/2016 01:00:00 AM", "value": "128.14"},
{"x": "10/2/2016 01:10:00 AM", "value": "112.61"}
]
},{
// second series data
"data": [
{"x": "10/2/2016 01:01:00 AM", "value": "90.54"},
{"x": "10/2/2016 01:02:00 AM", "value": "104.19"},
{"x": "10/2/2016 01:11:00 AM", "value": "150.67"}
]
It has to plot on x axis like 10/2/2016 01:00:00 AM, 10/2/2016 01:01:00 AM, 10/2/2016 01:02:00 AM, 10/2/2016 01:10:00 AM, 10/2/2016 01:11:00 AM
but it plotting like 10/2/2016 01:00:00 AM, 10/2/2016 01:10:00 AM, 10/2/2016 01:01:00 AM, 10/2/2016 01:02:00 AM, 10/2/2016 01:11:00 AM
Updating the code:
anychart.onDocumentReady(function() {
// JSON data
var json = {
// chart settings
"chart": {
// chart type
"type": "line",
// chart title
"title": "Axes settings from JSON",
// series settings
"series": [{
// first series data
"data": [
{"x": "10/2/2016 01:00:00 AM", "value": 128.14},
{"x": "10/2/2016 01:10:00 AM", "value": 112.61},
{"x": "10/3/2016 01:00:00 AM", "value": 12.14},
{"x": "10/3/2016 01:10:00 AM", "value": 152.61},
]},{
"data": [
{"x": "10/2/2016 01:09:00 AM", "value": 28.14},
{"x": "10/2/2016 01:11:00 AM", "value": 12.61},
{"x": "10/3/2016 01:01:00 AM", "value": 1.14},
{"x": "10/3/2016 01:12:00 AM", "value": 15.61},
]
}],
// x scale settings
"xScale": {
ticks:
{scale: "DateTime"}
},
xAxes: [{
title: "Basic X Axis"
}],
// chart container
"container": "container"
}
};
// get JSON data
var chart = anychart.fromJson(json);
// draw chart
chart.draw();
});
With this type of data you need to use scatter chart: http://docs.anychart.com/7.12.0/Basic_Charts_Types/Scatter_Chart
Datetime scale should be set like this in JSON:
"xScale": {
type: "datetime",
minimum: "10/02/2016 00:00:00",
maximum: "10/03/2016 12:00:00",
}
Here is a sample: https://jsfiddle.net/3ewcnp5j/102/