Using different time periods in one Azure log query - azure-log-analytics

So I have an Azure log query (KQL) that takes in a date parameter, like check log entries for the last X amount of days. In this query I look up values from two different logs, and I would like to have the two lookups use different dates for the respective logs. To get an idea of what I'm looking for, see the query below which is almost what I have now, with a bit of pseudo code where I can't quite figure out how to structure it.
let usernames = LogNumberOne
| where TimeGenerated > {timeperiod:start} and TimeGenerated < {timeperiod:end}
| bla bla bla lots of stuff
let computernames = LogNumberTwo
| where TimeGenerated > {timeperiod:start} - 2d
| where bla bla bla lots of stuff
usernames
| join kind=innerunique (computernames) on session_id
| some logic to display table
So from LogNumberOne I want the values within the specified time period, but from LogNumberTwo I want the values from the specified time period plus another 2 days before that. Is this possible or do I need another parameter? I have tried with the query above, so {timeperiod:start} - 2d, but that doesn't seem to work, it just uses the timeperiod:start value without subtracting 2 days.

See next variant for using join with filter later.
let usernames = datatable(col1:string, session_id:string, Timestamp:datetime )
[
'user1', '1', datetime(2020-05-14 16:00:00),
'user2', '2', datetime(2020-05-14 16:05:30),
];
let computernames =
datatable(session_id:string, ComputerName:string, Timestamp:datetime )
[
'1', 'Computer1', datetime(2020-05-14 16:00:30),
'2', 'Computer2', datetime(2020-05-14 16:06:20),
];
usernames
| join kind=inner (
computernames
| project-rename ComputerTime = Timestamp
) on session_id
| where Timestamp between(ComputerTime .. (-2d))
In case large sets of join are involved - use technique described in the following article:
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/join-timewindow
let window = 2d;
let usernames = datatable(col1:string, session_id:string, Timestamp:datetime )
[
'user1', '1', datetime(2020-05-13 16:00:00),
'user2', '2', datetime(2020-05-12 16:05:30),
];
let computernames =
datatable(session_id:string, ComputerName:string, Timestamp:datetime )
[
'1', 'Computer1', datetime(2020-05-14 16:00:30),
'2', 'Computer2', datetime(2020-05-14 16:06:20),
];
usernames
| extend _timeKey = range(bin(Timestamp, 1d), bin(Timestamp, 1d)+window, 1d)
| mv-expand _timeKey to typeof(datetime)
| join kind=inner (
computernames
| project-rename ComputerTime = Timestamp
| extend _timeKey = bin(ComputerTime, 1d)
) on session_id, _timeKey
| where Timestamp between(ComputerTime .. (-window))

Related

Convert triple-triple to object-table in Sqlite JSON1

I have an Sqlite table triples that contains triple information in { id, rel, tgt } format [1]. I would like to create a view that exposes this triple-format data to "object format", which is more easily consumed by applications reading from this database. In theory sqlite's JSON1 extension would allow me to construct such objects, but I'm struggling.
My current query
select distinct json_object(
'id', id,
rel, json_group_array(distinct tgt)
) as entity from content
group by src, rel, tgt
order by src, rel, tgt
does not work correctly. It produces objects like
{ id: 'a', 'is': ['b'] }
{ id: 'a', 'is': ['c'] }
Rather than
{ id: 'a', 'is': ['b', 'c'] }
It also produces duplicate keys like
{ id: 'a', id: ['a'] }
Edit
This is closer, but does not handle IDs correctly. It constructs an array, not a string
create view if not exists entity as
select distinct json_group_object(
rel, json_array(distinct tgt)
) as entity from content
group by src
I think iif might help
Question;
Can you help me adjust my query to produce correct output (see below)? Please comment if anything needs disambiguation or clarification
Desired Output
Input:
Triple Format:
id | rel | tgt
-----------------------
Bob | is | Bob
Bob | is | Person
Bob | age | 20
Bob | likes | cake
Bob | likes | chocolate
Alice | id | Alice
Alice | is | Person
Alice | hates | chocolate
Output:
Object Format [2]:
{
id: Bob,
is: [ Person ],
age: [ 20 ],
likes: [ cake, chocolate ]
}
{
id: Alice,
is: [ Person ],
hates: [ chocolate ]
}
Details
[1] This dataset has unpredictable structure; I can assume no prior knowledge of what 'rel' keys exist beyond id. A triple <src> id <src> will exist for every src parameter.
[2] The objects should have the following format. id must not be overwritten.
{
id: <id>
<distinct rel>: [
< tgt >
]
}
Relevant Information
https://www.sqlite.org/json1.html
CREATE TABLE content (
id VARCHAR(32),
rel VARCHAR(32),
tgt VARCHAR(32)
);
INSERT INTO
content
VALUES
('Bob' , 'id' , 'Bob'),
('Bob' , 'is' , 'Person'),
('Bob' , 'age' , '20'),
('Bob' , 'likes', 'cake'),
('Bob' , 'likes', 'chocolate'),
('Alice', 'id' , 'Alice'),
('Alice', 'is' , 'Person'),
('Alice', 'hates', 'chocolate')
WITH
id_rel AS
(
SELECT
id,
rel,
JSON_GROUP_ARRAY(tgt) AS tgt
FROM
content
GROUP BY
id,
rel
)
SELECT
JSON_GROUP_OBJECT(
rel,
CASE WHEN rel='id'
THEN JSON(tgt)->0
ELSE JSON(tgt)
END
)
AS entity
FROM
id_rel
GROUP BY
id
ORDER BY
id
entity
{"hates":["chocolate"],"id":"Alice","is":["Person"]}
{"age":["20"],"id":"Bob","is":["Person"],"likes":["cake","chocolate"]}
fiddle
You must aggregate in two steps, as your edited code doesn't combine cake and chocolate in to a single array of two elements...
https://dbfiddle.uk/9ptyuhuj

BigQuery select multiple tables with different column names

Consider the following BigQuery tables schemas in my dataset my_dataset:
Table_0001: NAME (string); NUMBER (string)
Table_0002: NAME(string); NUMBER (string)
Table_0003: NAME(string); NUMBER (string)
...
Table_0865: NAME (string); CODE (string)
Table_0866: NAME(string); CODE (string)
...
I now want to union all tables using :
select * from `my_dataset.*`
However this will not yield the CODE column of the second set of table. From my understanding, the schema of the first table in the dataset will be adopted instead.
So the result with be something like:
| NAME | NUMBER |
__________________
| John | 123456 |
| Mary | 123478 |
| ... | ...... |
| Abdul | null |
| Ariel | null |
I tried to tap into the INFORMATION_SCHEMA so as to select the two sets of tables separately and then union them:
with t_code as (
select
table_name,
from my_dataset.INFORMATION_SCHEMA.COLUMNS
where column_name = 'CODE'
),
select t.NAME, t.CODE as NUMBER from `my_dataset.*` as t
where _TABLE_SUFFIX in (select * from t_code)
However, still the script will look to the first table of my_dataset for its schema and will return: Error Running Query: Name CODE not found inside t.
So now I'm at a loss: How can I union all my tables without having to union them one by one? ie. how to select CODE as NUMBER in the second set of tables.
Note: Although it seems the question was asked over here, the accepted answer did not seem to actually respond to the question (as far as I'm concerned).
The trick I see you can do is to first gather all codes by running
create table `my_another_dataset.codes` as
select * from `my_dataset.*` where not code is null
Then to do any simple fake update of any just one table with number column - this will make schema with number column default. so now you can gather all numbers
create table `my_another_dataset.numbers` as
select * from `my_dataset.*` where not number is null
Finally then you can do simple union
select * from `my_another_dataset.numbers` union all
select * from `my_another_dataset.codes`
Note: see also my comment below your question
SELECT
borrow.id AS `borrowId`,
IF(borrow.created_date IS NULL, '', borrow.created_date) AS `borrowCreatedDate`,
IF(borrow.return_date IS NULL, '', borrow.return_date) AS `borrowReturnDate`,
IF(borrow.return_date IS NULL, '0', '1') AS `borrowIsReturn`,
IF(person.card_identity IS NULL, '', person.card_identity) AS `personCardIdentity`,
IF(person.fullname IS NULL, '', person.fullname) AS `personFullname`,
IF(person.phone_number IS NULL, '', person.phone_number) AS `personPhoneNumber`,
IF(book.book_name IS NULL, '', book.book_name) AS `bookName`,
IF(book.year IS NULL, '', book.year) AS `bookYear`
FROM tbl_tbl_borrow AS borrow
LEFT JOIN tbl_person AS person
ON person.card_identity = borrow.person_card_identity
LEFT JOIN tbl_book AS book
ON book.unique_id = borrow.book_unique_id
ORDER BY
borrow.return_date ASC, person.fullname ASC;

How do I return columns in timeseries data as arrays

I am using postgres and timescaledb to record data that will be used for dashboards/charting.
I have no issues getting the data I need I'm just not sure if I'm doing it the most efficient way.
Say I have this query
SELECT time, queued_calls, active_calls
FROM call_data ​
​ORDER BY time DESC LIMIT 100;
My front end receives this data for charting like such.
I feel like this is very inefficient by repeating the column name for each value.
Would it be better to send the data in a more efficient way like each column as an array of data like such.
{
time: [...],
queued_calls: [...],
active_calls: [...]
}
I guess my question is, should I be restructuring my query so the column data is in arrays somehow or is this something I should be doing after the query on the server before sending it to the client?
-- Update -- Additional Information
I'm using Node.js with Express and Sequelize as the ORM, however in this case I'm just executing a raw query via Sequelize.
The charting library I'm using on the front end also takes the series data as arrays so I was trying to kill two birds with one stone.
Frontend chart data format:
xaxis:{
categories: [...time]
}
series:[
{name: "Queued Calls", data: [...queued_calls]},
{name: "Active Calls", data: [...active_calls]}
]
Backend code:
async function getLocationData(locationId) {
return await db.sequelize.query(
'SELECT time, queued_calls, active_calls FROM location_data WHERE location_id = :locationId ORDER BY time DESC LIMIT 100;',
{
replacements: { locationId },
type: QueryTypes.SELECT,
}
);
}
...
app.get('/locationData/:locationId', async (req, res) => {
try {
const { locationId } = req.params;
const results = await getLocationData(parseInt(locationId));
res.send(results);
} catch (e) {
console.log('Error getting data', e);
}
});
Maybe you're looking for array aggregation?
Let me see if I follow the idea:
create table call_data (time timestamp, queued_calls integer, active_calls integer);
CREATE TABLE
I skipped the location_id just to simplify here.
Inserting some data:
tsdb=> insert into call_data values ('2021-03-03 01:00', 10, 20);
INSERT 0 1
tsdb=> insert into call_data values ('2021-03-03 02:00', 11, 22);
INSERT 0 1
tsdb=> insert into call_data values ('2021-03-03 03:00', 12, 25);
INSERT 0 1
And now check the data:
SELECT time, queued_calls, active_calls FROM call_data;
time | queued_calls | active_calls
---------------------+--------------+--------------
2021-03-03 01:00:00 | 10 | 20
2021-03-03 02:00:00 | 11 | 22
2021-03-03 03:00:00 | 12 | 25
(3 rows)
And if you want to get only the dates you'll have:
SELECT time::date, queued_calls, active_calls FROM call_data;
time | queued_calls | active_calls
------------+--------------+--------------
2021-03-03 | 10 | 20
2021-03-03 | 11 | 22
2021-03-03 | 12 | 25
(3 rows)
but still not grouping, so, you can use group by combined with array_agg for it:
SELECT time::date, array_agg(queued_calls), array_agg(active_calls) FROM call_data group by time::date;
time | array_agg | array_agg
------------+------------+------------
2021-03-03 | {10,11,12} | {20,22,25}
If the server is compressing data as it sends it won't make much difference over the network layer if you send the data in the array structure you're thinking of.
If you use the array structure you're thinking of you're breaking one of the benefits of JSON - structure with data. You might gain some speed increase but if you want to see the active calls for a time you'd have to have the correct index - and open the possibilty of index errors.
I recommend leaving the data as it is.

Creating custom event schedules. Should I use "LIKE"?

I'm creating a campaign event scheduler that allows for frequencies such as "Every Monday", "May 6th through 10th", "Every day except Sunday", etc.
I've come up with a solution that I believe will work fine (not yet implemented), however, it uses "LIKE" in the queries, which I've never been too fond of. If anyone else has a suggestion that can achieve the same result with a cleaner method, please suggest it!
+----------------------+
| Campaign Table |
+----------------------+
| id:int |
| event_id:foreign_key |
| start_at:datetime |
| end_at:datetime |
+----------------------+
+-----------------------------+
| Event Table |
+-----------------------------+
| id:int |
| valid_days_of_week:string | < * = ALL. 345 = Tue, Wed, Thur. etc.
| valid_weeks_of_month:string | < * = ALL. 25 = 2nd and 5th weeks of a month.
| valid_day_numbers:string | < * = ALL. L = last. 2,7,17,29 = 2nd day, 7th, 17th, 29th,. etc.
+-----------------------------+
A sample event schedule would look like this:
valid_days_of_week = '1357' (Sun, Tue, Thu, Sat)
valid_weeks_of_month = '*' (All weeks)
valid_day_numbers = ',1,2,5,6,8,9,25,30,'
Using today's date (6/25/15) as an example, we have the following information to query with:
Day of week: 5 (Thursday)
Week of month: 4 (4th week in June)
Day number: 25
Therefore, to fetch all of the events for today, the query would look something like this:
SELECT c.*
FROM campaigns AS c,
LEFT JOIN events AS e
ON c.event_id = e.id
WHERE
( e.valid_days_of_week = '*' OR e.valid_days_of_week LIKE '%5%' )
AND ( e.valid_weeks_of_month = '*' OR e.valid_weeks_of_month LIKE '%4%' )
AND ( e.valid_day_numbers = '*' OR e.valid_day_numbers LIKE '%,25,%' )
That (untested) query would ideally return the example event above. The "LIKE" queries are what have me worried. I want these queries to be fast.
By the way, I'm using PostgreSQL
Looking forward to excellent replies!
Use arrays:
CREATE TABLE events (id INT NOT NULL, dow INT[], wom INT[], dn INT[]);
CREATE INDEX ix_events_dow ON events USING GIST(dow);
CREATE INDEX ix_events_wom ON events USING GIST(wom);
CREATE INDEX ix_events_dn ON events USING GIST(dn);
INSERT
INTO events
VALUES (1, '{1,3,5,7}', '{0}', '{1,2,5,6,8,9,25,30}'); -- 0 means any
, then query:
SELECT *
FROM events
WHERE dow && '{0, 5}'::INT[]
AND wom && '{0, 4}'::INT[]
AND dn && '{0, 26}'::INT[]
This will allow using the indexes to filter the data.

Kusto -- generate data diff / delta --

I created a custom data type to store some configuration of an external product. So each day I send the configuration of this specific product / service ( multiple rows but with identical data model) to the Log Analytics data store.
Is there a possibility to show which rows gets added or removed between multiple days? The data structure is always the same f.e.
MyCustomData_CL
| project Guid_g , Name_s, URL_s
I would like to see which records gets added / removed at which time. So basically compare every day with the previous day.
How could I accomplish this with Kusto query language?
Best regards,
Jens
Next query uses full-outer join to compare two sets (one from previous day, and one from the current day). If you don't have datetime column in your table, you can try using ingestion_time() function instead (that reveals the time data was ingested into the table).
let MyCustomData_CL =
datatable (dt:datetime,Guid_g:string , Name_s:string, URL_s:string)
[
datetime(2018-11-27), '111', 'name1', 'url1',
datetime(2018-11-27), '222', 'name2', 'url2',
//
datetime(2018-11-28), '222', 'name2', 'url2',
datetime(2018-11-28), '333', 'name3', 'url3',
];
let data_prev = MyCustomData_CL | where dt between( datetime(2018-11-27) .. (1d-1tick));
let data_new = MyCustomData_CL | where dt between( datetime(2018-11-28) .. (1d-1tick));
data_prev
| join kind=fullouter (data_new) on Guid_g , Name_s , URL_s
| extend diff=case(isnull(dt), 'Added', isnull(dt1), 'Removed', "No change")
Result:
dt Guid_g Name_s URL_s dt1 Guid_g1 Name_s1 URL_s1 diff
2018-11-28 333 name3 url3 Added
2018-11-27 111 name1 url1 Removed
2018-11-27 222 name2 url2 2018-11-28 222 name2 url2 No change