presto sql query for getting the fill rate of the table

presto sql query for getting the fill rate of the table - sql

I want a generic query to get fill rate of all columns in table .Query should work irrespective of the column number.I have to implement this using presto sql.I have tried searching for a method but nothing seems to working.
Input
A
B
C
D
1
null
null
1
2
2
3
4
Null
Null
Null
5
Output
A
B
C
D
0.66
0.33
0.33
1.0
Explanation:
A Col contains 3 rows with 2 non null values so 2/3
B and C Cols contain 2 null value and one non null value so 1/3
D col there is no null values so 3/3
Thanks in advance

AFAIK Presto/Trino does not provide dynamic query execution capabilities (i.e. something like EXEC in T-SQL) so the only option (unless you are ready to go down user defined function road) to write a query which will enumerate all needed columns (if you are using client from another language - you can build the query dynamically leveraging information_schema.columns info):
with dataset(A, B, C, D) as (
values (1, null, null, 1),
(2, 2, 3, 4),
(Null, Null, Null, 5)
)
select 1.00 * count_if(a is not null) / count(*) a,
1.00 * count_if(b is not null) / count(*) b,
1.00 * count_if(c is not null) / count(*) c,
1.00 * count_if(d is not null) / count(*) d
from dataset;
Output:
a
b
c
d
0.67
0.33
0.33
1.00

Related

How do you allocate an amount in blank to the rest proportionately?

I have a table with a list of markets and corresponding amounts related to those markets
Market
Amount
A
10
B
30
C
50
D
10
10
I would like this $10 in the blank market to be allocated to the rest of the markets proportionately based on amounts excluding the blank market (ex. amount(A)/sum(A+B+C+D))
The desired output is:
Market
Amount
A
11
B
33
C
55
D
11
I think I can query it using multiple CTEs, but wanted to see if it's possible to allocate using as few CTEs as possible or not using CTE at all.

So with this CTE just for data:
with data(market, amount) as (
select * from values
('A', 10),
('B', 30),
('C', 50),
('D', 10),
(null, 10)
)
we can:
select d.*
,sum(iff(d.market is null, d.amount,null)) over() as to_spread
,sum(iff(d.market is not null, d.amount,null)) over() as total
,div0(d.amount, total) as part
,part * to_spread as bump
,d.amount + bump as result
from data as d
qualify market is not null
to get:
MARKET
AMOUNT
TO_SPREAD
TOTAL
PART
BUMP
RESULT
A
10
10
100
0.1
1
11
B
30
10
100
0.3
3
33
C
50
10
100
0.5
5
55
D
10
10
100
0.1
1
11
We can then fold a few of those steps up:
select d.*
,d.amount + div0(d.amount, sum(iff(d.market is not null, d.amount,null)) over()) * sum(iff(d.market is null, d.amount,null)) over() as result
from data as d
qualify market is not null
MARKET
AMOUNT
RESULT
A
10
11
B
30
33
C
50
55
D
10
11
seems these results are on fixed point numbers, the truncation of division, will loss "amounts", which could be spread fairly, but that might require a second pass.

SQL Return rows with mix of nulls and non nulls in certain columns

If I have the following table
id a b c time
-----------------------------
0 1 4 "ca" 23
1 NULL NULL NULL 18
2 NULL 1 "pn" 13
3 6 NULL "ar" 27
4 1 2 NULL 24
I want to return all rows with at least one null and one non-null in columns a, b, and c. So I want to return:
id a b c time
-----------------------------
2 NULL 1 "pn" 13
3 6 NULL "ar" 27
4 1 2 NULL 24
I know I can write
select *
from table
where ((a is null and (b is not null or c is not null))
or (a is not null and (b is null or c is null)))
But what happens if I need to consider 4 columns or more? It becomes a mess. Note that the table could have 20 or more columns, of which I am only considering a small subset of columns for null/non-null analysis. Is there a concise way of doing this? Thanks

One method would be to unpivot your data, and COUNT the NULL and non-NULL values, and filter on that:
SELECT V.ID,
V.a,
V.b,
V.c,
V.time
FROM (VALUES(0,1,4,'"ca"',23),
(1,NULL,NULL,NULL,18),
(2,NULL,1,'"pn"',13),
(3,6,NULL,'"ar"',27),
(4,1,2,NULL,24))V(ID,a,b,c,time)
CROSS APPLY (SELECT COUNT(UP.V) AS NonNull,
COUNT(CASE WHEN UP.V IS NULL THEN 1 END) AS IsNull
FROM (VALUES(CONVERT(varchar(1),V.a)),
(CONVERT(varchar(1),V.b)),
(CONVERT(varchar(1),V.c)))UP(V))C
WHERE C.[IsNull] > 0
AND C.NonNull > 0;

Matching multiple rows in where clause for filter

I have two tables as the below:
Table 1 : Product_Information
Information_ID
Product_Name
1
A
2
B
3
C
4
D
5
E
Table 2 : Discriptor_Values
Information_ID
Descriptor_ID
Descriptor_Value
1
1
98
1
2
142
1
3
29.66
2
1
50
2
2
11
2
3
14
3
1
17
3
2
76
3
3
85
4
1
59
4
2
48
4
3
35
5
1
48
5
2
12
5
3
19
Using the above tables, I am creating a filter page like in any online shopping page i.e. for mobile phone Min and max range of price, Min and max range of internal storage are descriptor and range of values.
Likewise I will select descriptor and give min and max values for it and the matching product will be the result.
If I pass any filter range then the filtered list of products will be shown else all the records should be shown.
I am trying as the below query but not getting the correct output. I am getting the union of rows which matches any of the passed row (#tblFilter ).
CREATE TABLE #tblFilter(
[descriptor_id] [int] NULL,
[min_value] [decimal](18, 0) NULL,
[max_value] [decimal](18, 0) NULL
)
insert into #tblFilter values (1, 40.33, 70.33)
insert into #tblFilter values (2, 100.33, 150.33)
insert into #tblFilter values (3, 10, 60)
select p.*
from Product_Information p
inner join Discriptor_Values dv on p.Information_ID = dv.Information_ID
left join #tblFilter t1 on t1.descriptor_id = dv.Descriptor_id
WHERE ((dv.Descriptor_ID = t1.descriptor_id
and convert(decimal, dv.Descriptor_Value)
between CONVERT(decimal, t1.min_value) and CONVERT(decimal, t1.max_value))
or not exists (select 1 from #tblFilter))
drop TABLE #tblFilter
Please help me to minimize the result list by filter and show all records if there is no row in filter table (#tblFilter).

I believe you want:
select p.*
from Product_Information p join
Discriptor_Values dv
on p.Information_ID = dv.Information_ID left join
#tblFilter t1
on t1.descriptor_id = dv.Descriptor_id
where dv.Descriptor_Value between t1.min_value and t1.max_value or
dv.Descriptor_id is null;
I removed the conversions to decimals. You might actually need them, but in the question the values look like numbers and the question doesn't specify that they are stored as strings.

Populate column based on row values BigQuery Standard SQL

I have a Table lets say :-
Name A B C D
------- --- --- --- ---
alpha 0 1 0 0.6
beta 0.6 0 0 0.1
gama 0 0 0 0.6
Now I want to populate values on Two columns(Result & Class) based on A, B, C, D values.
The condition is if value in any of the field(A,B,C,D) is >.5 then Result column should have "F" else it should have "P". Also the column whose valie is >.5 should be in Class example("A,D")
For better understanding here is the result I want:-
Name A B C D Result Class
------- --- --- --- --- -------- -------
alpha 0 1 0 0.6 F B,D
beta 0.6 0 0 0.1 F A
gama 0 0 0 0.4 P NULL
I am New to BigQuery and need Help. What would be workaround.
This what I have done till yet
SELECT *, CASE WHEN (A > .5 OR B > .5 OR C > .5 OR D >.5)
THEN 'F'
ELSE 'P' END AS Result AND Class....//here i am stuck
FROM table1
Actually, I have no Idea how to Build this exact Script. I was able to achieve first part where I was able to Populate Result column with "F" and "P" but could not make Class to populate column names....

Since you are analysing each column, I assume you do not have a extensive quantity of columns. Therefore, I created a simple JavaScript User Defined Function (UDF) in order to check the row's value and return the column's name if the condition is met.
I have used the provided sample data to test the below query.
#javaScript UDF
CREATE TEMP FUNCTION class(A FLOAT64, B FLOAT64, C FLOAT64, D FLOAT64)
RETURNS String
LANGUAGE js AS """
var class_array=[];
if(A > 0.5){class_array.push("A");}
if(B > 0.5){class_array.push("B");}
if(C > 0.5){class_array.push("C");}
if(D > 0.5){class_array.push("D");}
return class_array;
""";
#sample data
WITH data as (
SELECT "alpha" as Name, 0 as A, 1 as B, 0 as C, 0.6 as D UNION ALL
SELECT "beta", 0.6, 0, 0, 0.1 UNION ALL
SELECT "gama", 0, 0, 0, 0.4
)
Select name, A,B,C,D,
CASE WHEN (A > .5 OR B > .5 OR C > .5 OR D >.5) THEN "F" ELSE "P" END AS Result,
IF(class(A,B,C,D) is null , null, class(A,B,C,D)) as Class from data
And the output,
Row name A B C D Result Class
1 alpha 0 1 0 0.6 F B,D
2 beta 0.6 0 0 0.1 F A
3 gama 0 0 0 0.4 P
As it is shown within the UDF, each row's value is analysed and if the condition is met, the column's name is manually added to an array of strings. In addition, pay attention that the JS UDF returns a String, not an array. It automatically converts the previously created Array to String.
Lastly, I should point that is not possible to retrieve the column name within a query in this context. Although, you can retrieve it, in other scenarios, using INFORMATION_SCHEMA.

Below is for BigQuery Standard SQL
Using javaScript UDF helps in many cases but should be avoid if problem can be solved with SQL as in below example
#standardSQL
SELECT *,
( SELECT IF(LOGICAL_OR(val > 0.5), 'F', 'P')
FROM UNNEST([A,B,C,D]) val
) AS Result,
( SELECT STRING_AGG(['A','B','C','D'][OFFSET(pos)])
FROM UNNEST([A,B,C,D]) val WITH OFFSET pos
WHERE val > 0.5
) AS Class
FROM `project.dataset.table`
You can test , play with above using sample data from y our question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'alpha' name, 0 A, 1 B, 0 C, 0.6 D UNION ALL
SELECT 'beta', 0.6, 0, 0, 0.1 UNION ALL
SELECT 'gamma', 0, 0, 0, 0.4
)
SELECT *,
( SELECT IF(LOGICAL_OR(val > 0.5), 'F', 'P')
FROM UNNEST([A,B,C,D]) val
) AS Result,
( SELECT STRING_AGG(['A','B','C','D'][OFFSET(pos)])
FROM UNNEST([A,B,C,D]) val WITH OFFSET pos
WHERE val > 0.5
) AS Class
FROM `project.dataset.table`
with output as
Row name A B C D Result Class
1 alpha 0.0 1 0 0.6 F B,D
2 beta 0.6 0 0 0.1 F A
3 gamma 0.0 0 0 0.4 P null

How to only SELECT rows with non-zero and non-null columns efficiently in Big Query?

I am having a table with large number of columns in Big Query.
The table has lot of rows with some column values as 0/0.0 and null.
For example
Row A B C D E F
1 "abc" 0 null "xyz" 0 0.0
2 "bcd" 1 5 "wed" 4 65.5
I need to select only those rows which have non zero Integer, Float and non NULL values. Basically, I need just Row 2 in the above table
I know I can do this by using this query for each of the columns
SELECT * FROM table WHERE (B IS NOT NULL AND B is !=0) AND
.
.
.
But I have lot of columns and writing query like this for each of the columns would be difficult. Is there any better approach to handle this?

Below example for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT "abc" a, 0 b, NULL c, "xyz" d, 0 e, 0.0 f UNION ALL
SELECT "bcd", 1, 5, "wed", 4, 65.5
)
SELECT *
FROM `project.dataset.table` t
WHERE NOT REGEXP_CONTAINS(TO_JSON_STRING(t), r':0[,}]|null[,}]')
with output
Row a b c d e f
1 bcd 1 5 wed 4 65.5

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

presto sql query for getting the fill rate of the table - sql

Related

How do you allocate an amount in blank to the rest proportionately?

SQL Return rows with mix of nulls and non nulls in certain columns

Matching multiple rows in where clause for filter

Populate column based on row values BigQuery Standard SQL

How to only SELECT rows with non-zero and non-null columns efficiently in Big Query?

Categories

Resources