postrgresql row by row comparison of values in two columns - sql

I have a postgres db where after a GROUPBY I have to evaluate entries approximated by this table:
I then try to find the MIN and MAX of each column and count the number of rows where there isn't a zero in either column 'A' or 'B'. In this case I count one row because in row '4' there are non-zero values in both column 'A' and 'B'. Getting MIN and MAX is straightforward but I can't figure out how to do the last step.
SELECT MIN(A) as "minA",
MAX(A) as "maxA",
MIN(B) as "minB",
MAX(B) as "maxB",
COUNT(????) as "num_full"
FROM bigDB
GROUPBY inlet
I thought maybe I could do a sum on each row and test if the result was equal to the value of 'A' or 'B' i.e. if A or B is zero then the sum is A or B. But sum() works by column not row. Is there a way to do sums by row or is there a better way to do what I want to do?

Use filter:
COUNT(*) FILTER (WHERE A <> 0 AND B <> 0) as num_full
There is no need to enclose column aliases in double quotes.

Related

Calculating the mode/median/most frequent observation in categorical variables in SQL impala

I would like to calculate the mode/median or better, most frequent observation of a categorical variable within my query.
E.g, if the variable has the following string values:
dog, dog, dog, cat, cat and I want to get dog since its 3 vs 2.
Is there any function that does that? I tried APPX_MEDIAN() but it only returns the first 10 characters as median and I do not want that.
Also, I would like to get the most frequent observation with respect to date if there is a tie-break.
Thank you!
the most frequent observation is mode and you can calculate it like this.
Single value mode can be calculated like this on a value column. Get the count and pick up row with max count.
select count(*),value from mytable group by value order by 1 desc limit 1
now, in case you have multiple modes, you need to join back to the main table to find all matches.
select orig.value from
(select count(*) c, value v from mytable) orig
join (select count(*) cmode from mytable group by value order by 1 desc limit 1) cmode
ON orig.c= cmode.cmode
This will get all count of values and then match them based on count. Now, if one value of count matches to max count, you will get 1 row, if you have two value counts matches to max count, you will get 2 rows and so on.
Calculation of median is little tricky - and it will give you middle value. And its not most frequent one.

Counting number of null values in a row to divide by (unknown) number of columns

I'm using SQL Server 14 and I need to count the number of null values in a row to create a new column where a "% of completeness" for each row will be stored. For example, if 9 out of 10 columns contain values for a given row, the % for that row would be 90%.
I know this can be done via a number of Case expressions, but the thing is, this data will be used for a live dashboard and won't be under my supervision after completion.
I would like for this % to be calculated every time a function (or procedure? not sure what is used in this case) is run and need to know the number of columns that exist in my table in order to count the null values in a row and then divide by the number of columns to find the "% of completeness".
Any help is greatly appreciated!
Thank you
One method uses cross apply to unpivot the columns to rows and count the ratio of non-null values.
Assuming that your table has columns col1 to col4, you would write this as:
select t.*, x.*
from mytable t
cross apply (
select avg(case when col is not null then 1.0 else 0 end) completeness_ratio
from (values (col1), (col2), (col3), (col4)) x(col)
) x

Checking which of two column values is closest to a calculated value

Absolute rookie at SQL so apologies upfront if not possible or absurd.
Single table in SQL-Lite
First of all I want to filter the table to only return rows where the difference between decimal in column A and decimal in column B is more than 3
Then for each row I want to subtract integer in column C from integer in column D to give result E. And then I want to know whether the decimal in column A or decimal in column B is closer to result E
Thanks!
The code below basically uses a subquery to keep all the needed values handy, a CASE operator to make the decision, and the ABS() function to determine absolute distance.
select A, B, C, D, E,
case when ABS(A-E) < abs(B-E) then 'A' else 'B' end [Closer_Value]
from (
select A, B, C, D, (C-D) as [E]
from YourTable
where abs(A-B) > 3
) as Temp

Selective Summation inside the case statement

I am using sum function inside case statement. I want to do selective summation of rows.
case when
type in ('A','B')
then nvl(SUM(Amount_1),Amount)
else Amount
Two columns are there Amount and Amount_1 and for the Amount_1 column, I want to remove few rows based on some condition C1. For e.g. If I have 10 rows that have type in A and based on condition C1 I am getting only 8 rows then I want to sum Amount_1 column on the basis of 8 rows only.
I think you have the basic logic understood correctly, but your syntax is a bit off. You should be using the CASE expression inside the SUM() function, not the other way around.
SELECT type,
SUM(CASE WHEN type IN ('A', 'B')
THEN COALESCE(Amount_1, Amount) ELSE Amount END) AS type_sum -- ELSE 0 ?
FROM yourTable
This will compute the sum of COALESCE(Amount_1, Amount) for those records where the type is either A or B, otherwise it will use Amount in the sum. If you intended to not count non-matching records at all, then modify my query by using ELSE 0 in the CASE expression.

SQL to return one row for each distinct value of a column (do not mind which row)

I have a table with a column named X. X contains number from 0 to 99. But there are duplicates (e.g. 0 is there multiple times! )
Now I need a query that gives any of the rows with 0,1,2,3...99 meaning I get 100 results at with one query, but I don't care which of the x==0 , x==1 ... I get, but just one of them!
Is there such thing in sql?
select distinct x
from your_table
To get a complete record you can group by the X column. But you have to tell the DB which of the duplicate values of the other columns you want.
select x, min(y) as y
from your_table
group by x
If you build a group by X then this value will be distinct. For the other columns you need a so called aggregate function like for example min(). That tells the DB to pick the minimum Y of every X group.