Postgresql: Query to know which fraction of the values are larger/smaller - sql

I would like to query my database to know which fraction/percentage of the elements of a table are larger/smaller than a given value.
For instance, let's say I have a table shopping_list with the following schema:
id integer
name text
price double precision
with contents:
id name price
1 banana 1
2 book 20
3 chicken 5
4 chocolate 3
I am now going to buy a new item with price 4, and I would like to know where this new item will be ranked in the shopping list. In this case the element will be greater than 50% of the elements.
I know I can run two queries and count the number of elements, e.g.:
-- returns = 4
SELECT COUNT(*)
FROM shopping_list;
-- returns = 2
SELECT COUNT(*)
FROM shopping_list
WHERE price > 4;
But I would like to do it with a single query to avoid post-processing the results.

if you just want them in single query use UNION
SELECT COUNT(*), 'total'
FROM shopping_list
UNION
SELECT COUNT(*),'greater'
FROM shopping_list
WHERE price > 4;

The simplest way is to use avg():
SELECT AVG( (price > 4)::float)
FROM shopping_list;

One way to get both results is as follows:
select count(*) as total,
(select count(*) from shopping_list where price > 4) as greater
from shopping_list
It will get both results in a single row, with the names you specified. It does, however, involve a query within a query.

I found the aggregate function PERCENT_RANK which does exactly what I wanted:
SELECT PERCENT_RANK(4) WITHIN GROUP (ORDER BY price)
FROM shopping_list;
-- returns 0.5

Related

SQL Query to create a column to determine count from historical data

Given: columns "basket" & "Fruit".
Output: Column "Count present in all the previous basket"
How to check if a fruit in a basket is present in all the preceding baskets and get the total count present?
For ex: Basket 2 contains Berry, Banana and Orange, now i need to check basket 1 to determine the count of these fruits. In the same way, for the fruits in basket 3, basket 1 and basket 2 are checked.
How can i do this using an SQL query? Currently i'm doing this on the application side using loops, rowfilter etc which consumes a lot of times as i've more than million rows.
You can also go with a window function I think. I am subtracting 1 to avoid the first count for each fruit. Maybe someone can provide a more elegant solution.
select *,
(count(*) over (partition by fruit order by basket) - 1)
from t
order by basket, fruit;
It appears you need a simple correlated subquery, such as:
select *, (
select Count(*) from t t2
where t2.basket < t.basket
and t2.fruit = t.fruit
) "Count in prev baskets"
from t;

Using a WITH as an aggregate value

I am querying a Presto table where I want to calculate what percentage of the total a certain subset of the rows account for.
Consider a table like this:
id
m
1
5
1
7
2
9
3
8
I want to query to report how much of the total measure (m) is contributed by each id. In this example, the total of the measure column is 29 can I find it with a query like...
SELECT SUM("m") FROM t;
output:
sqlite> SELECT SUM("m") FROM t;
29
Then I want to subtotal by id for some of the ids like
SELECT "id", SUM("m") AS "sub_total" FROM t WHERE "id" IN ('1','3') GROUP BY id;
output:
sqlite> SELECT "id", SUM("m") AS "sub_total" FROM t WHERE "id" IN ('1','3') GROUP BY id;
1|12
3|8
Now I want to add a third column where the subtotals are divided by the grand total (29) to get the percentage for each selected id.
I tried:
sqlite>
WITH a AS (
SELECT SUM("m") AS g FROM t )
SELECT "id", SUM("m") AS "sub_total", SUM(m)*100/"a"."g"
FROM a, t
WHERE "t"."id" IN ('1','3') GROUP BY "t"."id";
output:
1|12|41
3|8|27
Which is all good in SQLLite3! But when I translate this to my actual Presto DB (and the right tables and columns), I get this error:
presto error: line 10:5: 'a.g' must be an aggregate expression or appear in GROUP BY clause
I can't understand what I'm missing here or why this would be different in Presto.
When you have a GROUP BY in your query, all expressions that the query is returning must be either:
the expression you are grouping by
or aggregate function
For example if you do GROUP BY id, the resulting query will return one row per id - you cannot just use m, because with id = 1 there are two values: 5 and 7 - so what should be returned? First value, last, sum, average? You need to tell it using aggregate function like sum(m).
Same with a.g - you need to add it to GROUP BY.
WITH a AS (
SELECT SUM("m") AS g FROM t )
SELECT "id", SUM("m") AS "sub_total", SUM(m)*100/"a"."g"
FROM a, t
WHERE "t"."id" IN ('1','3') GROUP BY "t"."id", "a"."g";
There's nothing special about PrestoDB here, it's more SQLite that's less strict, actually most other database engines would complain about your case.

SAS Proc SQL - ranking top nth (3rd) highest for a group of say universities and their price? (HW to be honest)

(this is homework, not going to lie)
I have an ANSI SQL query I wrote
this produces
the required
3rd highest prices correctly,
table sample is
select unique uni, price
from
(
(
select unique uni, price
from
(
select unique uni, price
from table1
group by uni
having price < max(price)
)
group by uni
having price < max(price)
)
group by uni
having price < max(price)
)
now i need to list the 1st, 2nd and 3rd into one table but make is such that it could be used nth times.
example:
Col1 Col2
uni1 10
uni1 20
uni2 20
uni2 10
uni3 30
uni3 20
uni1 30
/sorry for the formatting i havent been here for a very long time, i appreciate any assistance, i will supply a link to the uni of which i have asked the tutor if i can do so he said yes but not the whole code, something like 10%, but anyways./
In SAS you can use the proprietary option OUTOBS to restrict how many rows of a result set are output.
Example:
Use OUTOBS=3 to create top 3 table. Then use that table in a subsequent query.
data have;
input x ##; datalines;
10 9 8 7 6 5 4 3 2 1 0
;
proc sql;
reset outobs=3;
create table top3x as
select * from have
order by x descending;
reset outobs=max;
* another query;
quit;

Filtering Rows in SQL

My data looks like this: Number(String), Number2(String), Transaction Type(String), Cost(Integer)
enter image description here
For number 1, Cost 10 and -10 cancel out so the remaining cost is 100
For number 2, Cost 50 and -50 cancel out, Cost 87 and -87 cancel out
For number 3, Cost remains 274
For number 4, Cost 316 and -316 cancel out, 313 remains as the cost
The output I am looking for Looks like this:
How do I do this in SQL?
I have tried "sum(price)" and group by "number", but oracle doesn't let me get results because of other columns
https://datascience.stackexchange.com/questions/47572/filtering-unique-row-values-in-sql
When you're doing an aggregate query, you have to pick one value for each column - either by including it in the group by, or wrapping it in an aggregate function.
It's not clear what you want to display for columns 2 and 3 in your output, but from your example data it looks like you're taking the MAX, so that's what I did here.
select number, max(number2), max(transaction_type), sum(cost)
from my_data
group by number
having sum(cost) <> 0;
Oracle has very nice functionality equivalent toe first() . . . but the syntax is a little cumbersome:
select number,
max(number2) keep (dense_rank first order by cost desc) as number2,
max(transaction_type) keep (dense_rank first order by cost desc) as transaction_type,
max(cost) as cost
from t
group by number;
In my experience, keep has good performance characteristics.
You're almost there... you'll need to get the sum for each number without the other columns and then join back to your table.
select * from table t
join
(select number,sum(cost)
from table
group by number) sums on sums.number=t.number
You can use correlated subquery :
select t.*
from table t
where t.cost = (select sum(t1.cost) from table t1 where t1.number = t.number);

SQL : how to distinguish between different rows with same value in some field and have a separate function applied to another field

I have a query output showing a list of orders. Some orders might occupy more then one record in the query output if those orders consist of sub-orders.Each sub-order occupies a separate line in the output. There is the OrderID column which has the same value for all sub-orders in the output:
OrderID Sub-Order Price
1 1 100
1 2 50
2 1 30
3 1 50
I need to add a column "Discount" to the output and fill it by following rules:
If certain order has one sub-order - the discount is 10% of the Price
If certain order has more than one sub-order, the discount is 20% on all sub-orders'
My query is a UNION of two SELECTs.
I use mssql with ms sql studio
Use CASE and COUNT window function
SELECT OrderID, Sub-Order, Price,
CASE WHEN (count(*) OVER (PARTITION BY OrderID)) > 1
THEN Price * 0.8
ELSE Price * 0.9
END
FROM ( table or <query> )