merge rows with the same timestamp into new column values

merge rows with the same timestamp into new column values - kql

I've got a bunch of log data from a KQL Table that I want to plot. Here's the simplified query:
MyDataTable
| where ['TimeGenerated'] >= ago(30m)
| summarize count=count() by bin(TimeGenerated, 15m), log_level
That gets me a table like this:
"TimeGenerated [UTC]"
"log_level"
count
"10/19/2022, 11:00:00.000 PM"
info
3527
"10/19/2022, 11:00:00.000 PM"
warn
33
"10/19/2022, 11:00:00.000 PM"
error
2
"10/19/2022, 11:15:00.000 PM"
info
5274
"10/19/2022, 11:15:00.000 PM"
warn
42
"10/19/2022, 11:15:00.000 PM"
error
5
"10/19/2022, 11:30:00.000 PM"
info
1553
"10/19/2022, 11:30:00.000 PM"
warn
15
"10/19/2022, 11:30:00.000 PM"
error
1
But I want to combine the entries with the same timestamp and put the count into different columns based on log_level. Essentially, I want to end up with this:
"TimeGenerated [UTC]"
info
warn
error
"10/19/2022, 11:00:00.000 PM"
3527
33
2
"10/19/2022, 11:15:00.000 PM"
5274
42
5
"10/19/2022, 11:30:00.000 PM"
1533
15
1
Any tips on how to do that transformation?
PS: An ideal solution would create new columns dynamically depending on the different values of log_level but if I have to use info/warn/error in the query that's better than the current situation.

Seems like you want a pivot
datatable(TimeGenerated:datetime, LogLevel:string, Count:long)
[
datetime(10/19/2022 11:00:00.000 PM), "info", 3527,
datetime(10/19/2022 11:00:00.000 PM), "warn", 33,
datetime(10/19/2022 11:00:00.000 PM), "error", 2,
datetime(10/19/2022 11:15:00.000 PM), "info", 5274,
datetime(10/19/2022 11:15:00.000 PM), "warn", 42,
datetime(10/19/2022 11:15:00.000 PM), "error", 5,
datetime(10/19/2022 11:30:00.000 PM), "info", 1553,
datetime(10/19/2022 11:30:00.000 PM), "warn", 15,
datetime(10/19/2022 11:30:00.000 PM), "error", 1,
]
| evaluate pivot(LogLevel, sum(Count))

the solution below uses:
bag_pack()
make_bag()
bag_unpack()
datatable(TimeGenerated:datetime, LogLevel:string, Count:long)
[
datetime(10/19/2022 11:00:00.000 PM), "info", 3527,
datetime(10/19/2022 11:00:00.000 PM), "warn", 33,
datetime(10/19/2022 11:00:00.000 PM), "error", 2,
datetime(10/19/2022 11:15:00.000 PM), "info", 5274,
datetime(10/19/2022 11:15:00.000 PM), "warn", 42,
datetime(10/19/2022 11:15:00.000 PM), "error", 5,
datetime(10/19/2022 11:30:00.000 PM), "info", 1553,
datetime(10/19/2022 11:30:00.000 PM), "warn", 15,
datetime(10/19/2022 11:30:00.000 PM), "error", 1,
]
| extend p = pack(LogLevel, Count)
| summarize b = make_bag(p) by TimeGenerated
| evaluate bag_unpack(b)
TimeGenerated
error
info
warn
2022-10-19 23:00:00.0000000
2
3527
33
2022-10-19 23:15:00.0000000
5
5274
42
2022-10-19 23:30:00.0000000
1
1553
15

Related

How to filter a dataframe given a specific daily hour?

Given the two data frames:
df1:
datetime v
2020-10-01 12:00:00 15
2020-10-02 4
2020-10-03 07:00:00 3
2020-10-03 08:01:00 51
2020-10-03 09:00:00 9
df2:
datetime p
2020-10-01 11:00:00 1
2020-10-01 12:00:00 2
2020-10-02 13:00:00 14
2020-10-02 13:01:00 5
2020-10-03 20:00:00 12
2020-10-03 02:01:00 30
2020-10-03 07:00:00 7
I want to merge these two dataframes into one, and the policy is looking up the nearest value around 08:00 daily. The final result should be
datetime v p
2020-10-01 08:00:00 15 1
2020-10-02 08:00:00 4 14
2020-10-03 08:00:00 51 7
How can I implement this?

Given the following dataframes:
import pandas as pd
df1 = pd.DataFrame(
{
"datetime": [
"2020-10-01 12:00:00",
"2020-10-02",
"2020-10-03 07:00:00",
"2020-10-03 08:01:00",
"2020-10-03 09:00:00",
],
"v": [15, 4, 3, 51, 9],
}
)
df2 = pd.DataFrame(
{
"datetime": [
"2020-10-01 11:00:00",
"2020-10-01 12:00:00",
"2020-10-02 13:00:00",
"2020-10-02 13:01:00",
"2020-10-03 20:00:00",
"2020-10-03 02:01:00",
"2020-10-03 07:00:00",
],
"p": [1, 2, 14, 5, 12, 30, 7],
}
)
You can define a helper function:
def align(df):
# Set proper type
df["datetime"] = pd.to_datetime(df["datetime"])
# Slice df by day
dfs = [
df.copy().loc[df["datetime"].dt.date == item, :]
for item in df["datetime"].dt.date.unique()
]
# Evaluate distance in seconds between given hour and 08:00:00 and filter on min
for i, df in enumerate(dfs):
df["target"] = pd.to_datetime(df["datetime"].dt.date.astype(str) + " 08:00:00")
df["distance"] = (
df["target"].map(lambda x: x.hour * 3600 + x.minute * 60 + x.second)
- df["datetime"].map(lambda x: x.hour * 3600 + x.minute * 60 + x.second)
).abs()
dfs[i] = df.loc[df["distance"].idxmin(), :]
# Concatenate filtered dataframes
df = (
pd.concat(dfs, axis=1)
.T.drop(columns=["datetime", "distance"])
.rename(columns={"target": "datetime"})
.set_index("datetime")
)
return df
To apply on df1 and df2 and then merge:
df = pd.merge(
right=align(df1), left=align(df2), how="outer", right_index=True, left_index=True
).reindex(columns=["v", "p"])
print(df)
# Output
v p
datetime
2020-10-01 08:00:00 15 1
2020-10-02 08:00:00 4 14
2020-10-03 08:00:00 51 7

Postgres find unique values in json

I am using Postgresql and have a table, with id, sender::jsonb and date, as follows:
id | sender | last login date
----+-----------------------------------------------------------------------------------+----------------------------------
1 | {"firstName": "Ivan", "lastName": "Ivanov", "middleName": "Ivanovich", } | 2021-04-10 14:49:36.234504 +00:00
2 | {"firstName": "Ivan", "lastName": "Ivanov", "middleName": "Ivanovich", } | 2021-04-09 14:49:36.234504 +00:00
3 | {"firstName": "Ivan", "lastName": "Ivanov", "middleName": "Ivanovich", } | 2021-04-11 14:49:36.234504 +00:00
4 | {"firstName": "Nickolai","lastName": "Nickov", "middleName": "Nikovich", } | 2021-04-30 14:49:36.234504 +00:00
5 | {"firstName": "Nickolai","lastName": "Nickov", "middleName": "Nikovich", } | 2021-04-29 14:49:36.234504 +00:00
6 | {"firstName": "Vladimir","lastName": "Vladimirovich","middleName": "Putout", } | 2021-04-15 14:49:36.234504 +00:00
7 | {"firstName": "Petr", "lastName": "Petrov", "middleName": "Petrovich", } | 2021-04-10 14:49:36.234504 +00:00
8 | {"firstName": "Ivan", "lastName": "Ivanov", "middleName": "Ivanovich", } | 2021-04-01 14:49:36.234504 +00:00
9 | {"firstName": "Ignat", "lastName": "Ignatov", "middleName": "Ignatovich", }| 2021-04-06 14:49:36.234504 +00:00
10| {"firstName": "Vladimir","lastName": "Vladimirovich","middleName": "Putout", } | 2021-04-17 14:49:36.234504 +00:00
11| {"firstName": "Ivan", "lastName": "Ivanov", "middleName": "Ivanovich", } | 2021-04-12 14:49:36.234504 +00:00
p.s.There may be other information in the "sender" column, but the search for uniqueness is only necessary by "firstName", "lastName", "midddleName"
It is necessary to return a result consisting of unique names and with the latest date. In particular, I want to get the result:
id | sender | last login date
----+-----------------------------------------------------------------------------------+----------------------------------
4 | {"firstName": "Nickolai","lastName": "Nickov", "middleName": "Nikovich", } | 2021-04-30 14:49:36.234504 +00:00
10| {"firstName": "Vladimir","lastName": "Vladimirovich","middleName": "Putout", } | 2021-04-17 14:49:36.234504 +00:00
11| {"firstName": "Ivan", "lastName": "Ivanov", "middleName": "Ivanovich", } | 2021-04-12 14:49:36.234504 +00:00
7 | {"firstName": "Petr", "lastName": "Petrov", "middleName": "Petrovich", } | 2021-04-10 14:49:36.234504 +00:00
9 | {"firstName": "Ignat", "lastName": "Ignatov", "middleName": "Ignatovich", }| 2021-04-06 14:49:36.234504 +00:00
Everything is very complicated by the fact that json is used. I had thoughts to do - "name" concatenation and perform group by and sorting, but unfortunately it does not work.

You can use distinct on() to do this:
select distinct on (firstname, lastname) id, sender, last_login_date
from (
select id, sender, last_login_date,
sender ->> 'firstName' as firstname,
sender ->> 'lastName' as lastname
from the_table
) t
order by firstname, lastname, last_login_date desc

you can do it using window function :
select * from
(
select * ,rank() over (partition by sender->> 'firstName',sender->> 'lastName' order by last_login_date desc) rn
from yourtable
) t
where rn = 1
order by last_login_date desc
db<>fiddle here

sort dataframe by other dataframe

Given a df's
knn_df
0 1 2 3
0 1.1565523 1.902790 1.927971 1.1530536
1 1.927971 1.1565523 1.815097 1.1530536
2 1.902790 1.1565523 1.815097 1.927971
3 1.815097 1.927971 1.902790 1.1530536
4 1.902790 1.1565523 1.815097 1.1530536
dates_df
0 1 2 3
0 2011-11-14 02:30:00.601 2003-08-12 00:00:00.000 2003-11-30 23:00:00.000 2011-10-25 12:00:00.000
1 2003-11-30 23:00:00.000 2011-11-14 02:30:00.601 2002-08-06 00:00:00.000 2011-10-25 12:00:00.000
2 2003-08-12 00:00:00.000 2011-11-14 02:30:00.601 2002-08-06 00:00:00.000 2003-11-30 23:00:00.000
3 2002-08-06 00:00:00.000 2003-11-30 23:00:00.000 2003-08-12 00:00:00.000 2011-10-25 12:00:00.000
4 2003-08-12 00:00:00.000 2011-11-14 02:30:00.601 2002-08-06 00:00:00.000 2011-10-25 12:00:00.000
I have to sort the values of knn_df be the dates of the dates_df.
Every row in dates_df correspond to row in knn_df
I tried to sort like this.
np.argsort(dates_df.values,axis=1)[:,::-1]
array([[0, 3, 2, 1],
[1, 3, 0, 2],
[1, 3, 0, 2],
[3, 1, 2, 0],
[1, 3, 0, 2]])
Which give the right order of the values by columns, But when i tried to reorder
Sorted_knn = (knn_df.values[np.arange(len(knn_df)),
np.argsort(dates_df.values,axis=1)[:,::-1]])
I get an error
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (5,) (5,4)
I am missing something...

Add [:, None] for making two-dimensional 5x1 array for correct broadcasting:
a = np.argsort(dates_df.values,axis=1)[:,::-1]
b = knn_df.values[np.arange(len(knn_df))[:, None], a]
print (b)
[[1.1565523 1.1530536 1.927971 1.90279 ]
[1.1565523 1.1530536 1.927971 1.815097 ]
[1.1565523 1.927971 1.90279 1.815097 ]
[1.1530536 1.927971 1.90279 1.815097 ]
[1.1565523 1.1530536 1.90279 1.815097 ]]

How to build SQL for get grouped data by date

I have data in Postgres DataBase like this
| id | name | start_date | end_date |
1 Event1 2018-09-14 14:22:00 2018-09-15 14:22:00
2 Event2 2018-09-15 14:22:00 2018-09-15 15:22:00
I need SQL which return me response group_by date and If Event duration (end_date, start_date) took 2 days i need return him twice in two days array and this all should be order by date. So response should look like this.
{
"2018-09-14": [
{
"id": 1,
"name": "Event1",
"start_date": "2018-09-14 14:22:00",
"end_date": "2018-09-15 14:22:00",
}],
"2018-09-15": [{
"id": 1,
"name": "Event1",
"start_date": "2018-09-14 14:22:00",
"end_date": "2018-09-15 14:22:00",
},
{
"id": 2,
"name": "Event2",
"start_date": "2018-09-15 14:22:00",
"end_date": "2018-09-15 15:22:00",
}]
}
Could you help me with this SQL?

demo: db<>fiddle
SELECT
jsonb_object_agg(dates, data_array)
FROM (
SELECT
dates,
jsonb_agg(data) as data_array
FROM (
SELECT DISTINCT
unnest(ARRAY[start_date::date, end_date::date]) as dates,
row_to_json(events)::jsonb as data
FROM
events
)s
GROUP BY dates
) s
Convert table into json object with row_to_json.
aggregating both dates into one array with ARRAY[]
unnest() expands the data with every single date.
The result so far:
dates data
2018-09-14 {"id": 1, "name": "Event1", "end_date": "2018-09-15 14:22:00", "start_date": "2018-09-14 14:22:00"}
2018-09-15 {"id": 1, "name": "Event1", "end_date": "2018-09-15 14:22:00", "start_date": "2018-09-14 14:22:00"}
2018-09-15 {"id": 2, "name": "Event2", "end_date": "2018-09-15 15:22:00", "start_date": "2018-09-15 14:22:00"}
2018-09-15 {"id": 2, "name": "Event2", "end_date": "2018-09-15 15:22:00", "start_date": "2018-09-15 14:22:00"}
DISTINCT eliminates all tied elements where start_date == end_date.
grouping by the dates, aggregating the json elements into an json array (jsonb_agg)
after all, grouping the table into json elements (jsonb_object_agg) with key == date and value == json array

If you just want rows you only need the these steps:
aggregating both dates into one array with ARRAY[]
unnest() expands the data with every single date.
DISTINCT eliminates all tied elements where start_date == end_date.
Query:
SELECT DISTINCT
unnest(ARRAY[start_date::date, end_date::date]) as dates,
*
FROM
events
Result:
dates id name start_date end_date
2018-09-14 1 Event1 2018-09-14 14:22:00 2018-09-15 14:22:00
2018-09-15 1 Event1 2018-09-14 14:22:00 2018-09-15 14:22:00
2018-09-15 2 Event2 2018-09-15 14:22:00 2018-09-15 15:22:00
demo:db<>fiddle

Rendering multi series line chart with time on x axis in AnyChart

I am trying Anychart line chart with multi series with date time on x axis. Not able to render the chart perfectly. Its drawing the series 1 with the given data then for the second series its rendering the date time freshly on x axis after the 1st series values then plotting the series 2.
data is like:
"data": [
{"x": "10/2/2016 01:00:00 AM", "value": "128.14"},
{"x": "10/2/2016 01:10:00 AM", "value": "112.61"}
]
},{
// second series data
"data": [
{"x": "10/2/2016 01:01:00 AM", "value": "90.54"},
{"x": "10/2/2016 01:02:00 AM", "value": "104.19"},
{"x": "10/2/2016 01:11:00 AM", "value": "150.67"}
]
It has to plot on x axis like 10/2/2016 01:00:00 AM, 10/2/2016 01:01:00 AM, 10/2/2016 01:02:00 AM, 10/2/2016 01:10:00 AM, 10/2/2016 01:11:00 AM
but it plotting like 10/2/2016 01:00:00 AM, 10/2/2016 01:10:00 AM, 10/2/2016 01:01:00 AM, 10/2/2016 01:02:00 AM, 10/2/2016 01:11:00 AM
Updating the code:
anychart.onDocumentReady(function() {
// JSON data
var json = {
// chart settings
"chart": {
// chart type
"type": "line",
// chart title
"title": "Axes settings from JSON",
// series settings
"series": [{
// first series data
"data": [
{"x": "10/2/2016 01:00:00 AM", "value": 128.14},
{"x": "10/2/2016 01:10:00 AM", "value": 112.61},
{"x": "10/3/2016 01:00:00 AM", "value": 12.14},
{"x": "10/3/2016 01:10:00 AM", "value": 152.61},
]},{
"data": [
{"x": "10/2/2016 01:09:00 AM", "value": 28.14},
{"x": "10/2/2016 01:11:00 AM", "value": 12.61},
{"x": "10/3/2016 01:01:00 AM", "value": 1.14},
{"x": "10/3/2016 01:12:00 AM", "value": 15.61},
]
}],
// x scale settings
"xScale": {
ticks:
{scale: "DateTime"}
},
xAxes: [{
title: "Basic X Axis"
}],
// chart container
"container": "container"
}
};
// get JSON data
var chart = anychart.fromJson(json);
// draw chart
chart.draw();
});

With this type of data you need to use scatter chart: http://docs.anychart.com/7.12.0/Basic_Charts_Types/Scatter_Chart
Datetime scale should be set like this in JSON:
"xScale": {
type: "datetime",
minimum: "10/02/2016 00:00:00",
maximum: "10/03/2016 12:00:00",
}
Here is a sample: https://jsfiddle.net/3ewcnp5j/102/

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

merge rows with the same timestamp into new column values - kql

Related

How to filter a dataframe given a specific daily hour?

Postgres find unique values in json

sort dataframe by other dataframe

How to build SQL for get grouped data by date

Rendering multi series line chart with time on x axis in AnyChart

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

merge rows with the same timestamp into new column values - kql

Related

How to filter a dataframe given a specific daily hour?

Postgres find unique values ​in json

sort dataframe by other dataframe

How to build SQL for get grouped data by date

Rendering multi series line chart with time on x axis in AnyChart

Categories

Resources

Postgres find unique values in json