Calculate ratios of counts of rows with subqueries

Calculate ratios of counts of rows with subqueries - sql

(I could not think of a better title for this question. Suggestions welcome.)
(In case versions matter, I'm using SQLAlchemy 1.4.4 and Postgresql 13.1.)
I have a table ('test') of multiple instances of boolean values for multiple persons, representing test results (pass or fail), and I want to create a query returning a result set representing pass/fail ratios for each of them.
I.e., for this table:
id | person | passed
----+--------+--------
1 | p1 | t
2 | p1 | f
3 | p1 | f
4 | p2 | t
5 | p2 | t
6 | p2 | t
7 | p2 | t
8 | p2 | t
9 | p2 | f
10 | p2 | f
11 | p2 | f
the query should return:
person | pass_fail_ratio
-------+-------------------
p1 | 0.5
p2 | 1.6666666666666667
Here is the solution I have been able to come up with so far. (I'm appending a complete MWE to the end.)
results_count = (
sa.select(
test.person,
test.passed,
sa.func.count(test.passed).label('count')
).group_by(test.person).group_by(test.passed)
).subquery()
pass_count = (
sa.select(results_count.c.person, results_count.c.count)
.filter(results_count.c.passed == True) # noqa
).subquery()
fail_count = (
sa.select(results_count.c.person, results_count.c.count)
.filter(results_count.c.passed == False) # noqa
).subquery()
pass_fail_ratio = (
sa.select(
pass_count.c.person,
(
sa.cast(pass_count.c.count, sa.Float)
/ sa.cast(fail_count.c.count, sa.Float)
).label('success_failure_ratio')
)
).filter(fail_count.c.person == pass_count.c.person)
To me, this looks overly complicated for what would seem to be a conceptually rather simple thing. Is there a better solution?
MWE:
# To change database name, modify 'dbname'.
# Expected output:
# ('p1', 0.5)
# ('p2', 1.6666666666666667)
# Lots of constraints and checks omitted for brevity.
# To view generated SQL, uncomment the line containing "echo" below.
import sqlalchemy as sa
import sqlalchemy.orm as orm
import sqlalchemy.types as types
dbname = 'test'
base = orm.declarative_base()
class test(base):
__tablename__ = 'test'
id = sa.Column(sa.Integer, primary_key=True)
person = sa.Column(sa.String)
passed = sa.Column(types.Boolean)
pass
engine = sa.create_engine(
f"postgresql://localhost:5432/{dbname}", future=True
)
base.metadata.drop_all(engine)
base.metadata.create_all(engine)
session = orm.Session(engine)
# Add some data.
session.add(test(person='p1', passed=True))
session.add(test(person='p1', passed=False))
session.add(test(person='p1', passed=False))
session.add(test(person='p2', passed=True))
session.add(test(person='p2', passed=True))
session.add(test(person='p2', passed=True))
session.add(test(person='p2', passed=True))
session.add(test(person='p2', passed=True))
session.add(test(person='p2', passed=False))
session.add(test(person='p2', passed=False))
session.add(test(person='p2', passed=False))
session.commit()
results_count = (
sa.select(
test.person,
test.passed,
sa.func.count(test.passed).label('count')
).group_by(test.person).group_by(test.passed)
).subquery()
pass_count = (
sa.select(results_count.c.person, results_count.c.count)
.filter(results_count.c.passed == True) # noqa
).subquery()
fail_count = (
sa.select(results_count.c.person, results_count.c.count)
.filter(results_count.c.passed == False) # noqa
).subquery()
pass_fail_ratio = (
sa.select(
pass_count.c.person,
(
sa.cast(pass_count.c.count, sa.Float)
/ sa.cast(fail_count.c.count, sa.Float)
).label('success_failure_ratio')
)
).filter(fail_count.c.person == pass_count.c.person)
# engine.echo = True
with orm.Session(engine) as session:
res = session.execute(pass_fail_ratio)
for row in res:
print(row)
pass
pass
pass

That is soooo complicated. I wouldn't use subqueries. One method is:
select person,
count(*) filter (where passed) * 1.0 / count(*) filter (where not passed)
from test t
group by person;
You might find it more convenient to express this "in the old-fashioned way" without filter:
select person,
sum( passed::int ) * 1.0 / sum( (not passed)::int )
from test t
group by person;
Note that the pass ratio is more commonly used than the ratio of passes to fails. That is simply:
select person,
avg( passed::int ) as pass_ratio
from test t
group by person;

Got Gordon Linoff's answer working in SQLAlchemy. Here is my final solution:
import sqlalchemy as sa
pass_fail_ratio_query = sa.select(
test.person,
(
sa.cast(
sa.funcfilter(sa.func.count(), test.passed == True), # noqa
sa.Float
)
/ sa.cast(
sa.funcfilter(sa.func.count(), test.passed == False), # noqa
sa.Float
)
)
).group_by(test.person)

Related

KQL :: return only tags with more than 4 records

I have created a Kusto query that allows me to return all our database park. The query only takes 10 lines of code:
Resources
| join kind=inner (
resourcecontainers
| where type == 'microsoft.resources/subscriptions'
| project subscriptionId, subscriptionName = name)
on subscriptionId
| where subscriptionName in~ ('Subscription1','Subscription2')
| where type =~ 'microsoft.sql/servers/databases'
| where name != 'master'
| project subscriptionName, resourceGroup, name, type, location,sku.tier, properties.requestedServiceObjectiveName, tags.customerCode
By contract we are supposed to give only 4 Azure SQL Database per customer but sometimes developers take a copy of them and they rename it _old or _backup and suddenly a customer can have 5 or 6 databases.
This increase the overall costs of the Cloud and I would like to have a list of all customers that have more than 4 databases.
In order to do so I can use the tag tags.customerCode which has the 3 letters identifier for each customer.
The code should work like this: if a customer is called ABC and there are 4 Azure SQL Databases with tags.customerCode ABC the query should return nothing. If there are 5 or 6 databases with tags.customerCode ABC the query should return all of them.
Not sure if Kusto can be that flexible.

Here is a possible solution.
It should be noted that Azure resource graph supports only a limited subset of KQL.
resourcecontainers
| where type == 'microsoft.resources/subscriptions'
//and name in~ ('Subscription1','Subscription2')
| project subscriptionId, subscriptionName = name
| join kind=inner
(
resources
| where type =~ 'microsoft.sql/servers/databases'
and name != 'master'
)
on subscriptionId
| project subscriptionId, subscriptionName, resourceGroup, name, type, location
,tier = sku.tier
,requestedServiceObjectiveName = properties.requestedServiceObjectiveName
,customerCode = tostring(tags.customerCode)
| summarize dbs = count(), details = make_list(pack_all()) by customerCode
| where dbs > 4
| mv-expand with_itemindex=db_seq ['details']
| project customerCode
,dbs
,db_seq = db_seq + 1
,subscriptionId = details.subscriptionId
,subscriptionName = details.subscriptionName
,resourceGroup = details.resourceGroup
,name = details.name
,type = details.type
,location = details.location
,tier = details.tier
,requestedServiceObjectiveName = details.requestedServiceObjectiveName

How to detect if Tabular variable is empty in KQL

I have a dashboard populated with a number of Kusto KQL Queries.
Sometimes, my query below returns zero results (for instance if by miracle, there are no failures in the last 24 hours).
//my dashboard query
let failureResults = exceptions | where blahblahblah;
failureResults;
When there are no items that match the filters, my dashboard is filled with
'The query returned no Results'.
How could I go about checking if this variable is null and then doing a different op? For instance, if it's null, then I would just issue a print "No Failures for today, awesome!"; instead.
I have tried iff() statements and isempty(failures| distinct Outcome) and the like, but to no avail. For example, here is another one which didn't work:
failures | project column_ifexists(tostring(Outcome),"No failures where reported!")

Just thought on an improved solution based on pack_all() and the bag_unpack plugin
let p_threshold = ... ;// set value
let failureResults = datatable(exception_id:int,exception_val:int,exception_text:string)[1,100,"Hello" ,2,200,"World"];
failureResults
| where exception_val > p_threshold
| as t1
| project result = pack_all()
| union kind=outer (print msg = 'No Failures for today, awesome!' | where toscalar(t1 | take 1 | count) == 0 | project result = pack_all())
| evaluate bag_unpack(result)
let p_threshold = 0;
exception_id
exception_text
exception_val
1
Hello
100
2
World
200
let p_threshold = 300;
msg
No Failures for today, awesome!
Fiddle

Well... Kind of...
let p_threshold = ... ;// set value
let failureResults = datatable(exception_id:int,exception_val:int,exception_text:string)[1,100,"Hello" ,2,200,"World"];
failureResults
| where exception_val > p_threshold
| as t1
| union kind=outer (print msg = 'No Failures for today, awesome!' | where toscalar(t1 | take 1 | count) == 0)
| project-reorder msg
let p_threshold = 0;
msg
exception_id
exception_val
exception_text
1
100
Hello
2
200
World
let p_threshold = 300;
msg
exception_id
exception_val
exception_text
No Failures for today, awesome!
Fiddle

Merge data from multiple tables based on a key in Kusto

I'm trying to merge multiple tables in Azure Log Analytics. Each table has a unique column and a common column. Merging them with Join() is inefficient because I can only do two tables at a time. Union() seems to be the correct function but when I merge my tables I ended with duplicate rows in my common column.
Example:
// CPU usage
let CPU_table=VPN_Metrics_CL | extend timestamp = (todatetime(ts_s)+7h)
| where metric_s == "system/cpmCPUTotal1Min.rrd"
| extend region = substring(host_s,0,4)
| summarize maxCPU = max(val_d) by region
| extend score_CPU = case(maxCPU <= 59, 0,
maxCPU <= 79, 1,
3)
| project score_CPU, region;
// Memory usage
let Memory_table=VPN_Metrics_CL| extend timestamp = todatetime(ts_s)+7h
| where metric_s in ("hw_mem_used_pct") and val_d >= 0 and host_s contains "vpn"
| extend region = substring(host_s,0,4)
| summarize maxMemory = max(val_d) by region
| extend score_mem = case(maxMemory <= 59, 0,
maxMemory <= 79, 1,
3)
| project score_mem, region;
union CPU_table, Memory_table
I plan on having a total of 10+ tables.
Here is the result:
score_mem | score_CPU | region
0 USA
0 USA
etc. etc.
How can I merge rows based on a key? The key being the region.
Thanks

If the source is the same table - the most efficient way will be using conditional aggregates:
let isCpuMetric = (metric_s:string) {metric_s == "system/cpmCPUTotal1Min.rrd"};
let isMemoryMetric = (metric_s:string, val_d:double, host_s:string) {metric_s in ("hw_mem_used_pct") and val_d >= 0 and host_s contains "vpn"};
VPN_Metrics_CL
| extend timestamp = (todatetime(ts_s)+7h)
| extend region = substring(host_s,0,4)
| where isCpuMetric(metric_s) or isMemoryMetric(metric_s, val_d, host_s)
| summarize maxCPU = maxif(val_d, isCpuMetric(metric_s)), maxMemory=maxif(val_d, isMemoryMetric(metric_s, val_d, host_s)) by region
| extend score_mem = case(maxMemory <= 59, 0, maxMemory <= 79, 1, 3),
score_CPU = case(maxCPU <= 59, 0, maxCPU <= 79, 1, 3)
If the sources are different - you can still join or lookup operator. If you have results R1 .. RN - coming from a sub-queries:
R1
| lookup R2 on Region
| lookup R3 on Region
...
Docs for lookup operator: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/lookupoperator

I found it easier to give every category's score column the same name: "score"
Then with Union, I merge all the tables and summarize a total score.
union CPU_table, Memory_table, AAA_table, bw_data, more_tables.....
| summarize score_total = sum(score) by region, bin(timestamp, $__interval)
| project score_total, region, timestamp

How to count two separate columns from two different tables? [duplicate]

This question already has answers here:
Combining the results of two SQL queries as separate columns
(4 answers)
Closed 3 years ago.
I am trying to query two different columns from two different table where I could count the number of certain column. I could do it in two separate query but I think it would create a slowdown in the future. What I want is a single query where I could display both of those counts in a single query only.
I tried using two separated select statements but I think is not good, I also tried using Union All, but the results are not what I expected.
upload_monitoring (12 Columns)
upm_FileName | upm_Status
----------------+--------------
Monitoring_0608 | Distributed
Monitoring_0607 | Distributed
Monitoring_0606 | Distributed
Monitoring_0605 | Uploaded
(100 rows)
distribute_monitoring (7 Columns)
dist_ProductName | dist_Status
-------------------+--------------
Monitoring_0608 | Pending
Monitoring_0607 | Pending
Monitoring_0606 | Pending
Monitoring_0605 | Touched
(100 rows)
I tried with these:
$query2 = "
SELECT
COUNT(upm_Status) AS total_DistItems
FROM
upload_monitoring
WHERE
upm_Status = 'Distributed'
AND
upm_FileName = '$upm_FileName'
";
$result2 = mysqli_query($connection, $query2);
$fetchResult2 = mysqli_fetch_assoc($result2);
$total_DistItems = $fetchResult2['total_DistItems'];
$query3 = "
SELECT
COUNT(dist_Status) AS total_PendItems
FROM
distribute_monitoring
WHERE
dist_Status = 'Pending'
AND
dist_Product = '$upm_FileName'
";
$result3 = mysqli_query($connection, $query3);
$fetchResult3 = mysqli_fetch_assoc($result3);
$total_PendItems = $fetchResult3['total_PendItems'];
I also tried with these one
$query2 = "
SELECT
upm_Status,
COUNT(upm_Status) AS total_DistItems
FROM
upload_monitoring
WHERE
upm_Status = 'Distributed'
AND
upm_FileName = '$upm_FileName'
UNION ALL
SELECT
dist_Status,
COUNT(dist_Status) AS total_PendItems
FROM
distribute_monitoring
WHERE
dist_Status = 'Pending'
AND
dist_Product = '$upm_FileName'
";
$result2 = mysqli_query($connection, $query2);
however the result is
upm_Status | total_DistItems
------------+--------------
Distributed | 34
Pending | 12
What I expect the result to be is like this one.
upm_Status | total_DistItems | dist_Status | total_PendItems
------------+-----------------+-------------+-----------------
Distributed | 34 | Pending | 12

Here is one method:
SELECT u.*, d.*
FROM (SELECT 'Distributed' as upm_Status,
COUNT(*) AS total_DistItems
FROM upload_monitoring
WHERE upm_Status = 'Distributed' AND
upm_FileName = ?
) u CROSS JOIN
(SELECT 'Pending' as dist_Status, COUNT(dist_Status) AS total_PendItems
FROM distribute_monitoring
WHERE dist_Status = 'Pending' AND
dist_Product = ?
) d;
Note that I replaced the $upm_filename with ?. This indicates that you should be using parameters to pass values into the query, rather than munging the query string with such values. Your method puts you at risk for unexpected syntax errors and SQL injection attacks.

SQL Populate table with random data

I have a table with two fields:
id(UUID) that is primary Key and
description (var255)
I want to insert random data with SQL sentence.
I would like that description would be something random.
PS: I am using PostgreSQL.

I dont know exactly if this fits the requirement for a "random description", and it's not clear if you want to generate the full data: but, for example, this generates 10 records with consecutive ids and random texts:
test=# SELECT generate_series(1,10) AS id, md5(random()::text) AS descr;
id | descr
----+----------------------------------
1 | 65c141ee1fdeb269d2e393cb1d3e1c09
2 | 269638b9061149e9228d1b2718cb035e
3 | 020bce01ba6a6623702c4da1bc6d556e
4 | 18fad4813efe3dcdb388d7d8c4b6d3b4
5 | a7859b3bcf7ff11f921ceef58dc1e5b5
6 | 63691d4a20f7f23843503349c32aa08c
7 | ca317278d40f2f3ac81224f6996d1c57
8 | bb4a284e1c53775a02ebd6ec91bbb847
9 | b444b5ea7966cd76174a618ec0bb9901
10 | 800495c53976f60641fb4d486be61dc6
(10 rows)

The following worked for me:
create table t_random as select s, md5(random()::text) from generate_Series(1,5) s;

Here it is a more elegant way using the latest features. I will use the Unix dictionary (/usr/share/dict/words) and copy it into my PostgreSQL data:
cp /usr/share/dict/words data/pg95/words.list
Then, you can easily create a ton of no sense description BUT searchable using dictionary words with the following steps:
1) Create table and function. getNArrayS gets all the elements in an array and teh number of times it needs to concatenate.
CREATE TABLE randomTable(id serial PRIMARY KEY, description text);
CREATE OR REPLACE FUNCTION getNArrayS(el text[], count int) RETURNS text AS $$
SELECT string_agg(el[random()*(array_length(el,1)-1)+1], ' ') FROM generate_series(1,count) g(i)
$$
VOLATILE
LANGUAGE SQL;
Once you have all in place, run the insert using CTE:
WITH t(ray) AS(
SELECT (string_to_array(pg_read_file('words.list')::text,E'\n'))
)
INSERT INTO randomTable(description)
SELECT getNArrayS(T.ray, 3) FROM T, generate_series(1,10000);
And now, select as usual:
postgres=# select * from randomtable limit 3;
id | description
----+---------------------------------------------
1 | ultracentenarian splenodiagnosis manurially
2 | insequent monopolarity funipendulous
3 | ruminate geodic unconcludable
(3 rows)

I assume sentance == statement? You could use perl or plperl as perl has some good random data generators. Check out perl CPAN module Data::Random to start.
Here's a sample of a perl script to generate some different random stuff taken from CPAN.
use Data::Random qw(:all);
my #random_words = rand_words( size => 10 );
my #random_chars = rand_chars( set => 'all', min => 5, max => 8 );
my #random_set = rand_set( set => \#set, size => 5 );
my $random_enum = rand_enum( set => \#set );
my $random_date = rand_date();
my $random_time = rand_time();
my $random_datetime = rand_datetime();
open(FILE, ">rand_image.png") or die $!;
binmode(FILE);
print FILE rand_image( bgcolor => [0, 0, 0] );
close(FILE);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Calculate ratios of counts of rows with subqueries - sql

Related

KQL :: return only tags with more than 4 records

How to detect if Tabular variable is empty in KQL

Merge data from multiple tables based on a key in Kusto

How to count two separate columns from two different tables? [duplicate]

SQL Populate table with random data

Categories

Resources