Postgres Distinct or Group by - sql

I have a table with field bhk, size, price. i am using distinct to get unique by following query
1 Query
select distinct(bhk,size,perprice),bhk,size,price from project_units;
and i and also querying by
2 Query
select bhk, array_agg(size) as size from project_units where project_id = '12' and bhk is not null and not bhk = '1bhk' group by bhk
as a result i get
[
{
bhk:"1bhk",
size:{123,121,231}
},
{
bhk:"2bhk",
size:{223,321,131}
}
]
By 2 Query i also want to retrieve price also or there is any other way to get distinct on bhk size and price by 2 Query

and what price would you want there? as in the queries you write need to fit all data, not just for a given set. So let's imagine you have a
bkh='1bhk'
size='123'
price='1'
and
bhk='1bhk'
size='321'
price=2
so for a distinct bhk there are 2 possible prices.
If you know which price you want (min, max, average, sum) then you can add it to the query - just it needs to be a group expression

Related

Using a WITH as an aggregate value

I am querying a Presto table where I want to calculate what percentage of the total a certain subset of the rows account for.
Consider a table like this:
id
m
1
5
1
7
2
9
3
8
I want to query to report how much of the total measure (m) is contributed by each id. In this example, the total of the measure column is 29 can I find it with a query like...
SELECT SUM("m") FROM t;
output:
sqlite> SELECT SUM("m") FROM t;
29
Then I want to subtotal by id for some of the ids like
SELECT "id", SUM("m") AS "sub_total" FROM t WHERE "id" IN ('1','3') GROUP BY id;
output:
sqlite> SELECT "id", SUM("m") AS "sub_total" FROM t WHERE "id" IN ('1','3') GROUP BY id;
1|12
3|8
Now I want to add a third column where the subtotals are divided by the grand total (29) to get the percentage for each selected id.
I tried:
sqlite>
WITH a AS (
SELECT SUM("m") AS g FROM t )
SELECT "id", SUM("m") AS "sub_total", SUM(m)*100/"a"."g"
FROM a, t
WHERE "t"."id" IN ('1','3') GROUP BY "t"."id";
output:
1|12|41
3|8|27
Which is all good in SQLLite3! But when I translate this to my actual Presto DB (and the right tables and columns), I get this error:
presto error: line 10:5: 'a.g' must be an aggregate expression or appear in GROUP BY clause
I can't understand what I'm missing here or why this would be different in Presto.
When you have a GROUP BY in your query, all expressions that the query is returning must be either:
the expression you are grouping by
or aggregate function
For example if you do GROUP BY id, the resulting query will return one row per id - you cannot just use m, because with id = 1 there are two values: 5 and 7 - so what should be returned? First value, last, sum, average? You need to tell it using aggregate function like sum(m).
Same with a.g - you need to add it to GROUP BY.
WITH a AS (
SELECT SUM("m") AS g FROM t )
SELECT "id", SUM("m") AS "sub_total", SUM(m)*100/"a"."g"
FROM a, t
WHERE "t"."id" IN ('1','3') GROUP BY "t"."id", "a"."g";
There's nothing special about PrestoDB here, it's more SQLite that's less strict, actually most other database engines would complain about your case.

Filter with SQL Server by Group ID

I have two tables and I need to filter the data by filter id depends on the relation to to filter group id.
For example I have this two tables:
Table 1:
ItemID
FilterID
3
122
3
123
3
4
17
123
Table 2:
FilterID
FilterGroupID
122
5
123
5
4
1
If I search by filter id = 123 than all item id with this filter need to be returned.
If I search two or more different filter id that have different group id I need to get only the item ids that have all filter id and group id.
Desired output:
first input: 123 -> return item id =3 and item id = 17
second input: 123,4 -> return item id = 3 because filter id 123 belong to group id 5 and filter id 4 belong to group id 1 and item id 3 is the only one that has this two filters.
third input: 122,123 -> return item id =3 and item id = 17 because both filter id belong to same group.
I am getting a little lost with this query and I will be glad to get some help.
I’ll try to simplify it: Let’s say we have group filter of size and group filter of color. If I filter by size S or M than I need to get all items with this sizes. If I want to add color like blue than the answer will cut the result by: item with size S or M and Color blue. So filter from different group may cut some results
It seems that you want to get every ItemID which has at least one matching filter from each FilterGroupID within your filter input. So within each group you have or logic, and between groups you have and logic
If you store your input in a table variable or Table-Valued parameter, then you can just use normal relational division techniques.
This then becomes a question of Relational Division With Remainder, with multiple divisors.
There are many ways to slice this cake. Here is one option
Join the filter input to the groups, to get each filter's group ID
Use a combination of DENSE_RANK and MAX to get the total distinct groups (you can't use COUNT(DISTINCT in a window function so we need to hack it)
You can change this step to use a subquery instead of window functions. It may be faster or slower
Join the main table, and filter out any ItemIDs which do not have their total distinct groups the same as the main total
SELECT
t1.ItemID
FROM (
SELECT *,
TotalGroups = MAX(dr) OVER ()
FROM (
SELECT
fi.FilterID,
t2.FilterGroupID,
dr = DENSE_RANK() OVER (ORDER BY t2.FilterGroupID)
FROM #Filters fi
JOIN Table2 t2 ON t2.FilterID = fI.FilterID
) fi
) fi
JOIN Table1 t1 ON t1.FilterID = fi.FilterID
GROUP BY
t1.ItemID
HAVING COUNT(DISTINCT FilterGroupID) = MAX(fi.TotalGroups);
db<>fiddle

Postgresql: Query to know which fraction of the values are larger/smaller

I would like to query my database to know which fraction/percentage of the elements of a table are larger/smaller than a given value.
For instance, let's say I have a table shopping_list with the following schema:
id integer
name text
price double precision
with contents:
id name price
1 banana 1
2 book 20
3 chicken 5
4 chocolate 3
I am now going to buy a new item with price 4, and I would like to know where this new item will be ranked in the shopping list. In this case the element will be greater than 50% of the elements.
I know I can run two queries and count the number of elements, e.g.:
-- returns = 4
SELECT COUNT(*)
FROM shopping_list;
-- returns = 2
SELECT COUNT(*)
FROM shopping_list
WHERE price > 4;
But I would like to do it with a single query to avoid post-processing the results.
if you just want them in single query use UNION
SELECT COUNT(*), 'total'
FROM shopping_list
UNION
SELECT COUNT(*),'greater'
FROM shopping_list
WHERE price > 4;
The simplest way is to use avg():
SELECT AVG( (price > 4)::float)
FROM shopping_list;
One way to get both results is as follows:
select count(*) as total,
(select count(*) from shopping_list where price > 4) as greater
from shopping_list
It will get both results in a single row, with the names you specified. It does, however, involve a query within a query.
I found the aggregate function PERCENT_RANK which does exactly what I wanted:
SELECT PERCENT_RANK(4) WITHIN GROUP (ORDER BY price)
FROM shopping_list;
-- returns 0.5

Counting Results in SQL

I'm having trouble using COUNT in SQL...The following query returns two rows, but then returns the raps column as 137. So I believe it's counting the total number of operation_id columns in the dataset instead of from the results.
Is there any way to make it count only the columns from the results, so that raps returns as 1 in each of the columns? I would then use PHP to add them together.
//Query
SELECT DISTINCT hrap_id,
operation_id,
COUNT (operation_id) AS raps,
operation_type
FROM view_rappels
WHERE year = '2013' AND crew_id = '4'
GROUP BY hrap_id, operation_type, operation_id
//Results
10.00 702020000.00 137.00 operational
1.00 702020000.00 137.00 operational
You need to put DISTINCT inside of the count function like so
COUNT(DISTINCT operation_id) AS raps

SQL query for child table summary and generalazation

I have 4 tables with diagram below
I want to summary query for the Institution table. where I want to get result of only,
InstitutionType ProductName Quantity
For example. sample data of institution table
Id Name Address InstitionTypeId
1 aaa ny132 1001
2 bbb dx23 1001
3 ccc bn33 1002
And the InstitionProduct is like that
Id ProductId Quantity InstitionId
1 1000 120 1
2 1000 100 2
3 1000 50 3
Then I want a query result to output total quantity of a given product by Instition Type wise. The sample output will look like this.
InstitutionTypeId productId quantity
1001 1000 220
1002 1000 50
So I want to group the institution by type and aggregate the product quantity of all institution type group.
I tried to use the group by clause, but with the product quantity not as a grouping element it results in error.
SELECT
Institution.InstitutionTypeID,
InstitutionProduct.ProductID,
SUM(InstitutionProduct.Quantity)
FROM
Institution
LEFT JOIN
InstitutionProduct
ON InstitutionProduct.InstitutionID = Institution.ID
GROUP BY
Institution.InstitutionTypeID,
InstitutionProduct.ProductID
If you are querying with group by you need to use either aggregate functions or group by all included fields. The reason is, that the 'group by' returns exactly one row per 'group by' value, so if you introduce an ungrouped field, this would conflict if the field has more than one value per grouping constraint. Even though this might not be the case for your dataset, the query engine cannot know this, and raises an error.
The solution is to introduce aggregates for all non-grouping field with aggregates being (among others): average (avg), summarize (sum), minimum (min) and maximum (max). This would lead to something like
SELECT i.InstitutionTypeID, i.Institution.ID, SUM(ip.Quantity)
FROM Institution I LEFT JOIN InstitutionProduct IP
ON IP.InstituationID = I.ID
GROUP BY i.InstitutionTypeID, i.Institution.ID