Qlikview: Build Logic/KPI to Count distinct devices using Loops/Set Analysis - qlikview

Please help me building logic on below scenario.
I have a data, in which there are many device saying A/B/C.., with their server up/down status being 1/0 respectively and dates(24 Hour) corresponding to it.
What I want here is to Count No. of distinct devices in a dataset, which are UP in entire day for atleast once. Means, If any device is Up i.e 1 for atleast once in a day, then it is counted as 1, and check for other devices and count the others similarly. and finally show the total UP devices Reported. Vice Versa for the devices which were Down whole day.
I am sorry, if I am putting this again, But I didn't find any post regarding this.
I am not sure which function/loop will give the correct logic? Can we do it through loop, or set analysis can do this?
Thanks in Advance!

Should be pretty simple I'm set analysis. Something like
count({<Status={1}>}distinct DeviceID) to get all machines that have been up at all.
There's probably a clever way to do the down all day but I can't think of it other than
count(distinct DeviceID)-count({<Status={1}>}distinct DeviceID)

Related

Query to get the latest treatment for each machine

Let me just start by saying that I don't care on which type of SQL I get an answer on. In reality I am creating my question in Kusto but the Kusto thread in stackoverflow is dead most of the time. This is just to give me an idea of on which way I could do this so I can then translate it somehow into Kusto.
I have a database called "MachineData" that looks something like this (but with hundreds of thousands of records)
What I want to do is get for each Machine the latest treatment that the machine has done. In other words, I want for each machine to get the most recent StartTime.
I thought about doing something where I say "Order by SerialNumber, StartTime" but because there are hundreds of thousands of records then my system can't do that without crashing because of all the amount of data there is, and also this approach will still show me all records for each Machine and what I want to do is just get the latest StartTime.
The other thing I thought about doing is something like this,
MachineData
| top 1 by SerialNumber, StartTime
but the "top" command on Kusto only accepts one parameter to order by.
Probably you're looking for GROUP BY and max():
SELECT SerialNumber, max(StartTime) as MostRecentStartTime
FROM MachineData
GROUP BY SerialNumber;

BigQuery query extremely slow when adding JOIN

This is my first post in here, so please let me know if I've done anything wrong when posting my question.
I started learning SQL from scratch about three weeks ago, and so I'm fairly new to the whole concept and community and therefore I've probably made a lot of mistakes in my code, but here goes.
I'm struggling with a query, that I'm writing in BigQuery. BigQuery's "validator" has validated the code, so on 'paper' it's seems good, but it takes forever to run. It runs to a point where I stop it, because it has passed an hour. I've been looking in to streamlining my sql-coding so that the proces could run smoother and therefore run faster, but I've hit a wall, where I think I'm out of questions, that could provide me with a useful answer.
(Edit)
What I wan't from this query is a dataset that can help me make a visualisation that creates a timeline based on the dates/timestamps that read_started_at provides.
On this timeline I want a distinct count of reader_id's on the given day/DATE_TRUNC(timestamp). Google Data Studio can make a distinct count of the reader_id's, so I'm in doubt, whether making the distinct count in my query, will slow down or speed up the process in the long run?
Lastly I wanna divide the reader_id's into two groups(dimensions) based on whether they are on a monthly- or yearly-based subscription to see, if one group is more represented at the given read_started_at's, and therefore more active on the website, than the other. This division is supposed to be provided by the chargebee_plan_id where multiple subscriptions are available therefore there's the condition 'yearly' or 'monthly'. The reader_id and membership_id contains the same data and are therefore JOINED upon.
(Edit end)
I really hope that somebody here can help me out. Any advice is appreciated.
My query is the following:
WITH memberships AS (
SELECT im.chargebee_plan_id, im.membership_id
FROM postgres.internal_reporting_memberships AS im
WHERE (im.chargebee_plan_id LIKE 'yearly' OR im.chargebee_plan_id LIKE 'monthly')
AND im.started_at >= TIMESTAMP_SUB(CURRENT_TIMESTAMP, INTERVAL 365 day)
),
readers AS (
SELECT ip.reader_id, DATE_TRUNC(CAST(ip.read_started_at AS DATE), DAY) read_start
FROM postgres.internal_reporting_read_progresses AS ip
WHERE ip.reader_id LIKE '%|%' AND ip.read_started_at >= (TIMESTAMP_SUB(CURRENT_TIMESTAMP, INTERVAL 365 day)
))
SELECT reader_id, read_start, m.chargebee_plan_id
FROM readers AS r
JOIN memberships AS m
ON r.reader_id LIKE m.membership_id
Cheers
Reposting my comment as an answer as it solved the problem.
Use an = instead of a LIKE for the join condition.

Finding statistical outliers in timestamp intervals with SQL Server

We have a bunch of devices in the field (various customer sites) that "call home" at regular intervals, configurable at the device but defaulting to 4 hours.
I have a view in SQL Server that displays the following information in descending chronological order:
DeviceInstanceId uniqueidentifier not null
AccountId int not null
CheckinTimestamp datetimeoffset(7) not null
SoftwareVersion string not null
Each time the device checks in, it will report its id and current software version which we store in a SQL Server db.
Some of these devices are in places with flaky network connectivity, which obviously prevents them from operating properly. There are also a bunch in datacenters where administrators regularly forget about it and change firewall/ proxy settings, accidentally preventing outbound communication for the device. We need to proactively identify this bad connectivity so we can start investigating the issue before finding out from an unhappy customer... because even if the problem is 99% certainly on their end, they tend to feel (and as far as we are concerned, correctly) that we should know about it and be bringing it to their attention rather than vice-versa.
I am trying to come up with a way to query all distinct DeviceInstanceId that have currently not checked in for a period of 150% their normal check-in interval. For example, let's say device 87C92D22-6C31-4091-8985-AA6877AD9B40 has, for the last 1000 checkins, checked in every 4 hours or so (give or take a few seconds)... but the last time it checked in was just a little over 6 hours ago now. This is information I would like to highlight for immediate review, along with device E117C276-9DF8-431F-A1D2-7EB7812A8350 which normally checks in every 2 hours, but it's been a little over 3 hours since the last check-in.
It seems relatively straightforward to brute-force this, looping through all the devices, examining the average interval between check-ins, seeing what the last check-in was, comparing that to current time, etc... but there's thousands of these, and the device count grows larger every day. I need an efficient query to quickly generate this list of uncommunicative devices at least every hour... I just can't picture how to write that query.
Can someone help me with this? Maybe point me in the right direction? Thanks.
I am trying to come up with a way to query all distinct DeviceInstanceId that have currently not checked in for a period of 150% their normal check-in interval.
I think you can do:
select *
from (select DeviceInstanceId,
datediff(second, min(CheckinTimestamp), max(CheckinTimestamp)) / nullif(count(*) - 1, 0) as avg_secs,
max(CheckinTimestamp) as max_CheckinTimestamp
from t
group by DeviceInstanceId
) t
where max_CheckinTimestamp < dateadd(second, - avg_secs * 1.5, getdate());

tableUnavailable dependent upon size of search

I'm experiencing something rather strange with some queries that I'm performing in BigQuery.
Firstly, I'm using an externally backed table (csv.gz) with about 35 columns. The total data in the location is around 5Gb, with an average file size of 350mb. The reason I'm doing this, is that I continually add data and remove to the table on a rolling basis - to give me a view of the last 7 days of our activity.
When querying, if perform something simple like:
select * from X limit 10
everything works fine. It continues to work fine if you increase the limit up to 1 million rows. As soon as you up the limit to ten million:
select * from X limit 10000000
I end up with a tableUnavailable error "Something went wrong with the table you queried. Contact the table owner for assistance. (error code: tableUnavailable)"
Now according to to any literature on this, this usually results from using some externally owned table (I'm not). I can't find any other enlightening information for this error code.
Basically, If I do anything slightly complex on the data, I get the same result. There's a column called event that has maybe a couple hundred of different values in the entire dataset. If I perform the following:
select eventType, count(1) from X group by eventType
I get the same error.
I'm getting the feeling that this might be related to limits on external tables? Can anybody clarify or shed any light on this?
Thanks in advance!
Doug

Efficient way to compute accumulating value in sqlite3

I have an sqlite3 table that tells when I gain/lose points in a game. Sample/query result:
SELECT time,p2 FROM events WHERE p1='barrycarter' AND action='points'
ORDER BY time;
1280622305|-22
1280625580|-9
1280627919|20
1280688964|21
1280694395|-11
1280698006|28
1280705461|-14
1280706788|-13
[etc]
I now want my running point total. Given that I start w/ 1000 points,
here's one way to do it.
SELECT DISTINCT(time), (SELECT
1000+SUM(p2) FROM events e WHERE p1='barrycarter' AND action='points'
AND e.time <= e2.time) AS points FROM events e2 WHERE p1='barrycarter'
AND action='points' ORDER BY time
but this is highly inefficient. What's a better way to write this?
MySQL has #variables so you can do things like:
SELECT time, #tot := #tot+points ...
but I'm using sqlite3 and the above isn't ANSI standard SQL anyway.
More info on the db if anyone needs it: http://ccgames.db.94y.info/
EDIT: Thanks for the answers! My dilemma: I let anyone run any
single SELECT query on "http://ccgames.db.94y.info/". I want to give
them useful access to my data, but not to the point of allowing
scripting or allowing multiple queries with state. So I need a single
SQL query that can do accumulation. See also:
Existing solution to share database data usefully but safely?
SQLite is meant to be a small embedded database. Given that definition, it is not unreasonable to find many limitations with it. The task at hand is not solvable using SQLite alone, or it will be terribly slow as you have found. The query you have written is a triangular cross join that will not scale, or rather, will scale badly.
The most efficient way to tackle the problem is through the program that is making use of SQLite, e.g. if you were using Web SQL in HTML5, you can easily accumulate in JavaScript.
There is a discussion about this problem in the sqlite mailing list.
Your 2 options are:
Iterate through all the rows with a cursor and calculate the running sum on the client.
Store sums instead of, or as well as storing points. (if you only store sums you can get the points by doing sum(n) - sum(n-1) which is fast).