SQL - SELECT all households by last value - sql

I'm facing a problem that I cant wrap my head around so maybe you can help me to solve it!?
I have one table:
id | datetime | property | house_id | household_id | plug_id | value
---+--------------------+----------+----------+--------------+---------+--------
1 |2013-08-31 22:00:01 | 0 | 1 | 1 | 1 | 15
2 |2013-08-31 22:00:01 | 0 | 1 | 1 | 3 | 3
3 |2013-08-31 22:00:01 | 0 | 1 | 2 | 1 | 21
4 |2013-08-31 22:00:01 | 0 | 1 | 2 | 2 | 1
5 |2013-08-31 22:00:01 | 0 | 2 | 1 | 3 | 53
6 |2013-08-31 22:00:02 | 0 | 2 | 2 | 4 | 34
7 |2013-08-31 22:00:02 | 0 | 1 | 1 | 1 | 16
...
The table holds electricity consumption measurements per second for multiple houses that have multiple households (apartments) in them. Each household has multiple electricity plugs. None of the houses or households have a unique id but are identified by a combination of house_id and household_id.
1) I need a SQL query that can give me a list of all the unique households.
2) I want to use the list from 1) to create a SQL query that gives me a list of the highest value for each household (the value is cumulative, so the latest datetime holds the highest value). I need a total value (SUM) for each household (sum of all the plugs in that household), i.e. a list of of households with their total electricity consumption.
Is this even possible? I'm using SQL Server 2012 and the table has 100.000.000 rows.

If I understand correctly, you want the sum of the highest values of value, for house/household/plug combinations. This may do what you want:
select house_id, household_id, sum(maxvalue)
from (select house_id, household_id, plug_id, max(value) as maxvalue
from consumption
group by house_id, household_id, plug_id
) c
group by house_id, household_id;

according to your description I think you can use this query;
select house_id,household_id, max(value), sum(value) from your_table_name group by house_id,household_id

Related

Summing all values with same ID in a column give me duplicated values in SQL?

I am trying to sum all the columns that have the same ID number in a specified date range, but it always gives me duplicated values
select pr.product_sku,
pr.product_name,
pr.brand,
pr.category_name,
pr.subcategory_name,
a.stock_on_hand,
sum(pr.pageviews) as page_views,
sum(acquired_subscriptions) as acquired_subs,
sum(acquired_subscription_value) as asv_value
from dwh.product_reporting pr
join dm_product.product_data_livefeed a
on pr.product_sku = a.product_sku
where pr.fact_day between '2022-05-01' and '2022-05-30' and pr.pageviews > '0' and pr.acquired_subscription_value > '0' and store_id = 1
group by pr.product_sku,
pr.product_name,
pr.brand,
pr.category_name,
pr.subcategory_name,
a.stock_on_hand;
This supposes to give me:
Sum of all KPI values for a distinct product SKU
Example table:
| Date | product_sku |page_views|number_of_subs
|------------|-------------|----------|--------------|
| 2022-01-01 | 1 | 110 | 50 |
| 2022-01-25 | 2 | 1000 | 40 |
| 2022-01-20 | 3 | 2000 | 10 |
| 2022-01-01 | 1 | 110 | 50 |
| 2022-01-25 | 2 | 1000 | 40 |
| 2022-01-20 | 3 | 2000 | 10 |
Expected Output:
| product_sku |page_views|number_of_subs
|-------------|----------|--------------|
| 1 | 220 | 100 |
| 2 | 2000 | 80 |
| 3 | 4000 | 20 |
Sorry I had to edit to add the table examples
Since you're not listing the dupes (assuming they are truly appearing as duplicate rows, and not just multiple rows with different values), I'll offer that there may be something else that's at play here - I would suggest for every string value in your result set that's part of the GROUP BY clause to apply a TRIM(UPPER()) as you might be dealing with either a case insensitivity or trailing blanks that are treated as unique values in the query.
Assuming all the columns are character based:
select trim(upper(pr.product_sku)),
trim(upper(pr.product_name)),
trim(upper(pr.brand)),
trim(upper(pr.category_name)),
trim(upper(pr.subcategory_name)),
sum(pr.pageviews) as page_views,
sum(acquired_subscriptions) as acquired_subs,
sum(acquired_subscription_value) as asv_value
from dwh.product_reporting pr
where pr.fact_day between '2022-05-01' and '2022-05-30' and pr.pageviews > '0' and pr.acquired_subscription_value > '0' and store_id = 1
group by trim(upper(pr.product_sku)),
trim(upper(pr.product_name)),
trim(upper(pr.brand)),
trim(upper(pr.category_name)),
trim(upper(pr.subcategory_name));
Thank you guys for all your help, I found out where the problem was. It was mainly in the group by when I removed all the other column names and left only the product_sku column, it worked as required

How to order id's using subtotal from another column in PostgreSQL

I have a table returned by a select query. Example :
id | day | count |
-- | ------ | ----- |
1 | 71 | 3 |
1 | 70 | 2 |
1 |Subtotal| 5 |
2 | 70 | 5 |
2 | 71 | 2 |
2 | 69 | 2 |
2 |Subtotal| 9 |
3 | 69 | 1 |
3 | 70 | 1 |
3 |Subtotal| 2 |
the day column contains text values (so varchar)
subtotal is the sum of the counts for an id (e.g. id 2 has subtotal of 5 + 2 + 2 = 9)
I now want to order this table so the id’s with the lowest subtotal count come first, and then ordered by day with subtotal at the end (like before)
Expected output:
id | day | count |
-- | ------ | ----- |
3 | 69 | 1 |
3 | 70 | 1 |
3 |Subtotal| 2 |
1 | 70 | 2 |
1 | 71 | 3 |
1 |Subtotal| 5 |
2 | 69 | 2 |
2 | 70 | 5 |
2 | 71 | 2 |
2 |Subtotal| 9 |
I can't figure out how to order based on subtotal only ?
i've tried multiple order by (eg: ORDER BY day = 'Subtotal' & a mix of others) and using window functions but none are helping. Cheers !
Not sure if it's directly applicable to your source query (since you haven't included it) however the ordering you require on the sample data can be done with:
order by Max(count) over(partition by id), day
Note - ordering by day works with your sample data but as it's a string it will not honour numeric ordering, this should really be ordered by the source of the numerical value - again since we don't have your actual query I can't suggest anything more applicable but I'm sure you can substitute the correct column/expression.
I just crated table with 3 columns and tried to reproduce your expected result. I assume that there might be a problem ordering by day, subtotal would be always on top, but it seems as working solution.
create table test
(
id int,
day varchar(15),
count int
)
insert into test
values
(1,'71',3),
(1,'70',2),
(2,'70',5),
(2,'71',2),
(2,'69',2),
(3,'69',1),
(3,'70',1)
select id, day, count
from
(
select id, day, sum(count) as count
from test
group by id, rollup(day)
) as t
order by Max(count) over(partition by id), day

Join logic between tables without an obvious join condition

I've got 2 tables, one with an area, actions and quantities, and the other with prices and the goal is to combine the two in a view
table1
areaid integer
bananaunits integer
kilometers_ran integer
dogecoins integer
areaid | bananaunits | kilometers_ran | dogecoin
1 | 0 | 1 | 10
2 | 4 | 2 | 100
table2
rateid integer
description text
cost_per_unit integer
rateid | description | cost_per_unit
1 | price per banana | 0.5
2 | price per kilometers run | 2
3 | price per doge | 1
The intended outcome is to have a view which has the fields as following:
areaid, rateid, description, cost_per_unit, units, combined_cost
areaid| rateid| description| cost_per_unit| units| total_cost
1 | 1 | price per banana | 0.5 | 0 | 0
1 | 2 | per kilometers run | 2 | 1 | 2
1 | 3 | price per doge | 1 | 10 | 10
2 | 1 | price per banana | 0.5 | 4 | 2
2 | 2 | per kilometers run | 2 | 2 | 4
2 | 3 | price per doge | 1 | 100 | 100
In other words, I need to present all the rates per area in individual rows. how to achieve this?
Edit: current query that doesnt work
select areaid, rateid, description, cost_per_unit, units, combined_cost from table1,table2
Since you don't have a joining key and you want a row for each of combination of the area and rates, you're basically looking for a CROSS JOIN also called cartesian product

count total items, sold items (in another table reference by id) and grouped by serial number

I have a table of items in the shop, an item may have different entries with same serial number (sn) (but different ids) if the same item was bought again later on with different price (price here is how much did a single item cost the shop)
id | sn | amount | price
----+------+--------+-------
1 | AP01 | 100 | 7
2 | AP01 | 50 | 8
3 | X2P0 | 200 | 12
4 | X2P0 | 30 | 18
5 | STT0 | 20 | 20
6 | PLX1 | 200 | 10
and a table of transactions
id | item_id | price
----+---------+-------
1 | 1 | 10
2 | 1 | 9
3 | 1 | 10
4 | 2 | 11
5 | 3 | 15
6 | 3 | 15
7 | 3 | 15
8 | 4 | 18
9 | 5 | 22
10 | 5 | 22
11 | 5 | 22
12 | 5 | 22
and transaction.item_id references items(id)
I want to group items by serial number (sn), get their sum(amount) and avg(price), and join it with a sold column that counts number of transactions with referenced id
I did the first with
select i.sn, sum(i.amount), avg(i.price) from items i group by i.sn;
sn | sum | avg
------+-----+---------------------
STT0 | 20 | 20.0000000000000000
PLX1 | 200 | 10.0000000000000000
AP01 | 150 | 7.5000000000000000
X2P0 | 230 | 15.0000000000000000
Then when I tried to join it with transactions I got strange results
select i.sn, sum(i.amount), avg(i.price) avg_cost, count(t.item_id) sold, sum(t.price) profit from items i left join transactions t on (i.id=t.item_id) group by i.sn;
sn | sum | avg_cost | sold | profit
------+-----+---------------------+------+--------
STT0 | 80 | 20.0000000000000000 | 4 | 88
PLX1 | 200 | 10.0000000000000000 | 0 | (null)
AP01 | 350 | 7.2500000000000000 | 4 | 40
X2P0 | 630 | 13.5000000000000000 | 4 | 63
As you can see, only the sold and profit columns show correct results, the sum and avg show different results than the expected
I can't separate the statements because I am not sure how can I add the count to the sn group which has the item_id as its id?
select
j.sn,
j.sum,
j.avg,
count(item_id)
from (
select
i.sn,
sum(i.amount),
avg(i.price)
from items i
group by i.sn
) j
left join transactions t
on (j.id???=t.item_id);
There are multiple matches in both tables, so the join multiplies the rows (and eventually produces wron results). I would recommend pre-joining, then aggregating:
select
sn,
sum(amount) total_amount,
avg(price) avg_price,
sum(no_transactions) no_transactions
from (
select
i.*,
(
select count(*)
from transactions t
where t.item_id = i.id
) no_transactions
from items i
) t
group by sn

Select dynamic couples of lines in SQL (PostgreSQL)

My objective is to make dynamic group of lines (of product by TYPE & COLOR in fact)
I don't know if it's possible just with one select query.
But : I want to create group of lines (A PRODUCT is a TYPE and a COLOR) as per the number_per_group column and I want to do this grouping depending on the date order (Order By DATE)
A single product with a NB_PER_GROUP number 2 is exclude from the final result.
Table :
-----------------------------------------------
NUM | TYPE | COLOR | NB_PER_GROUP | DATE
-----------------------------------------------
0 | 1 | 1 | 2 | ...
1 | 1 | 1 | 2 |
2 | 1 | 2 | 2 |
3 | 1 | 2 | 2 |
4 | 1 | 1 | 2 |
5 | 1 | 1 | 2 |
6 | 4 | 1 | 3 |
7 | 1 | 1 | 2 |
8 | 4 | 1 | 3 |
9 | 4 | 1 | 3 |
10 | 5 | 1 | 2 |
Results :
------------------------
GROUP_NUMBER | NUM |
------------------------
0 | 0 |
0 | 1 |
~~~~~~~~~~~~~~~~~~~~~~~~
1 | 2 |
1 | 3 |
~~~~~~~~~~~~~~~~~~~~~~~~
2 | 4 |
2 | 5 |
~~~~~~~~~~~~~~~~~~~~~~~~
3 | 6 |
3 | 8 |
3 | 9 |
If you have another way to solve this problem, I will accept it.
What about something like this?
select max(gn.group_number) group_number, ip.num
from products ip
join (
select date, type, color, row_number() over (order by date) - 1 group_number
from (
select op.num, op.type, op.color, op.nb_per_group, op.date, (row_number() over (partition by op.type, op.color order by op.date) - 1) % nb_per_group group_order
from products op
) sq
where sq.group_order = 0
) gn
on ip.type = gn.type
and ip.color = gn.color
and ip.date >= gn.date
group by ip.num
order by group_number, ip.num
This may only work if your nb_per_group values are the same for each combination of type and color. It may also require unique dates, but that could probably be worked around if required.
The innermost subquery partitions the rows by type and color, orders them by date, then calculates the row numbers modulo nb_per_group; this forms a 0-based count for the group that resets to 0 each time nb_per_group is exceeded.
The next-level subquery finds all of the 0 values we mapped in the lower subquery and assigns group numbers to them.
Finally, the outermost query ties each row in the products table to a group number, calculated as the highest group number that split off before this product's date.