Summing all values with same ID in a column give me duplicated values in SQL?

Summing all values with same ID in a column give me duplicated values in SQL? - sql

I am trying to sum all the columns that have the same ID number in a specified date range, but it always gives me duplicated values
select pr.product_sku,
pr.product_name,
pr.brand,
pr.category_name,
pr.subcategory_name,
a.stock_on_hand,
sum(pr.pageviews) as page_views,
sum(acquired_subscriptions) as acquired_subs,
sum(acquired_subscription_value) as asv_value
from dwh.product_reporting pr
join dm_product.product_data_livefeed a
on pr.product_sku = a.product_sku
where pr.fact_day between '2022-05-01' and '2022-05-30' and pr.pageviews > '0' and pr.acquired_subscription_value > '0' and store_id = 1
group by pr.product_sku,
pr.product_name,
pr.brand,
pr.category_name,
pr.subcategory_name,
a.stock_on_hand;
This supposes to give me:
Sum of all KPI values for a distinct product SKU
Example table:
| Date | product_sku |page_views|number_of_subs
|------------|-------------|----------|--------------|
| 2022-01-01 | 1 | 110 | 50 |
| 2022-01-25 | 2 | 1000 | 40 |
| 2022-01-20 | 3 | 2000 | 10 |
| 2022-01-01 | 1 | 110 | 50 |
| 2022-01-25 | 2 | 1000 | 40 |
| 2022-01-20 | 3 | 2000 | 10 |
Expected Output:
| product_sku |page_views|number_of_subs
|-------------|----------|--------------|
| 1 | 220 | 100 |
| 2 | 2000 | 80 |
| 3 | 4000 | 20 |
Sorry I had to edit to add the table examples

Since you're not listing the dupes (assuming they are truly appearing as duplicate rows, and not just multiple rows with different values), I'll offer that there may be something else that's at play here - I would suggest for every string value in your result set that's part of the GROUP BY clause to apply a TRIM(UPPER()) as you might be dealing with either a case insensitivity or trailing blanks that are treated as unique values in the query.
Assuming all the columns are character based:
select trim(upper(pr.product_sku)),
trim(upper(pr.product_name)),
trim(upper(pr.brand)),
trim(upper(pr.category_name)),
trim(upper(pr.subcategory_name)),
sum(pr.pageviews) as page_views,
sum(acquired_subscriptions) as acquired_subs,
sum(acquired_subscription_value) as asv_value
from dwh.product_reporting pr
where pr.fact_day between '2022-05-01' and '2022-05-30' and pr.pageviews > '0' and pr.acquired_subscription_value > '0' and store_id = 1
group by trim(upper(pr.product_sku)),
trim(upper(pr.product_name)),
trim(upper(pr.brand)),
trim(upper(pr.category_name)),
trim(upper(pr.subcategory_name));

Thank you guys for all your help, I found out where the problem was. It was mainly in the group by when I removed all the other column names and left only the product_sku column, it worked as required

Related

How to select the latest date for each group by number?

I've been stuck on this question for a while, and I was wondering if the community would be able to direct me in the right direction?
I have some tag IDs that needs to be grouped, with exceptions (column: deleted) that need to be retained in the results. After which, for each grouped tag ID, I need to select the one with the latest date. How can I do this? An example below:
ID | TAG_ID | DATE | DELETED
1 | 300 | 05/01/20 | null
2 | 300 | 03/01/20 | 04/01/20
3 | 400 | 06/01/20 | null
4 | 400 | 05/01/20 | null
5 | 400 | 04/01/20 | null
6 | 500 | 03/01/20 | null
7 | 500 | 02/01/20 | null
I am trying to reach this outcome:
ID | TAG_ID | DATE | DELETED
1 | 300 | 05/01/20 | null
2 | 300 | 03/01/20 | 04/01/20
3 | 400 | 06/01/20 | null
6 | 500 | 03/01/20 | null
So, firstly if there is a date in the "DELETED" column, I would like the row to be present. Secondly, for each unique tag ID, I would like the row with the latest "DATE" to be present.
Hopefully this question is clear. Would appreciate your feedback and help! A big thanks in advance.

Your results seem to be something like this:
select t.*
from (select t.*,
row_number() over (partition by tag_id, deleted order by date desc) as seqnum
from t
) t
where seqnum = 1 or deleted is not null;
This takes one row where deleted is null -- the most recent row. It also keeps each row where deleted is not null.

You need 2 conditions combined with OR in the WHERE clause:
the 1st is deleted is not null, or
the 2nd that there isn't any other row with the same tag_id and date later than the current row's date, meaning that the current row's date is the latest:
select t.* from tablename t
where t.deleted is not null
or not exists (
select 1 from tablename
where tag_id = t.tag_id and date > t.date
)
See the demo.
Results:
| id | tag_id | date | deleted |
| --- | ------ | ---------- | -------- |
| 1 | 300 | 2020-05-01 | |
| 2 | 300 | 2020-03-01 | 04/01/20 |
| 3 | 400 | 2020-06-01 | |
| 6 | 500 | 2020-03-01 | |

SQL group column where other column is equal

I'm trying to select some information from a database.
I get a database with columns like:
Ident,Name,Length,Width,Quantity,Planned
Table data is as follow
+-----------+-----------+---------+---------+------------+---------+
| Ident | Name | Length | Width | Quantity | Planned |
+-----------+-----------+---------+---------+------------+---------+
| 12345 | Name1 | 1500 | 1000 | 20 | 5 |
| 23456 | Name1 | 1500 | 1000 | 30 | 13 |
| 34567 | Name1 | 2500 | 1000 | 10 | 2 |
| 45678 | Name1 | 2500 | 1000 | 10 | 4 |
| 56789 | Name1 | 1500 | 1200 | 20 | 3 |
+-----------+-----------+---------+---------+------------+---------+
my desired result, would be to group rows where "Name,Length and Width" are equal, sum the "Quantity" and reduce it by the sum of "Planned"
e.g:
- Name1,1500,1000,32 --- (32 because (20+30)-(5+13))
- Name1,2500,1000,14 --- (14 because (10+10)-(2+4)))
- Name1,1500,1200,17
now I got problems how to group or join these information to get the wished select. may be some you of can help me.. if further information's required, please write it in comment.

You can achieve it by grouping your table and subtract sums of Quantity and Planned.
select
Name
,Length
,Width
,sum(Quantity) - sum(Planned)
from yourTable
group by Name,Length,Width

select
A1.Name,A1.Length,A1.Width,((A1.Quantity + A2.Quantity) -(A1.Planned+A2.Planned))
from `Table` AS A1, `Table` AS A2
where A1.Name = A2.Name and A1.Length = A2.Length and A1.Width = A2.Width
group by (whatever)
So you are comparing these columns form the same table?

SQL - SELECT all households by last value

I'm facing a problem that I cant wrap my head around so maybe you can help me to solve it!?
I have one table:
id | datetime | property | house_id | household_id | plug_id | value
---+--------------------+----------+----------+--------------+---------+--------
1 |2013-08-31 22:00:01 | 0 | 1 | 1 | 1 | 15
2 |2013-08-31 22:00:01 | 0 | 1 | 1 | 3 | 3
3 |2013-08-31 22:00:01 | 0 | 1 | 2 | 1 | 21
4 |2013-08-31 22:00:01 | 0 | 1 | 2 | 2 | 1
5 |2013-08-31 22:00:01 | 0 | 2 | 1 | 3 | 53
6 |2013-08-31 22:00:02 | 0 | 2 | 2 | 4 | 34
7 |2013-08-31 22:00:02 | 0 | 1 | 1 | 1 | 16
...
The table holds electricity consumption measurements per second for multiple houses that have multiple households (apartments) in them. Each household has multiple electricity plugs. None of the houses or households have a unique id but are identified by a combination of house_id and household_id.
1) I need a SQL query that can give me a list of all the unique households.
2) I want to use the list from 1) to create a SQL query that gives me a list of the highest value for each household (the value is cumulative, so the latest datetime holds the highest value). I need a total value (SUM) for each household (sum of all the plugs in that household), i.e. a list of of households with their total electricity consumption.
Is this even possible? I'm using SQL Server 2012 and the table has 100.000.000 rows.

If I understand correctly, you want the sum of the highest values of value, for house/household/plug combinations. This may do what you want:
select house_id, household_id, sum(maxvalue)
from (select house_id, household_id, plug_id, max(value) as maxvalue
from consumption
group by house_id, household_id, plug_id
) c
group by house_id, household_id;

according to your description I think you can use this query;
select house_id,household_id, max(value), sum(value) from your_table_name group by house_id,household_id

SQL Group By Having Where Statements

I have a MS Access table tracking quantities of products at end month as below.
I need to generate the latest quantity for a specified ProductId at a specified date e.g.
The Quantity for ProductId 1 on 15-Feb-12 is 100, The Quantity for ProductId 1 on 15-Mar-12 is 150.
ProductId | ReportingDate | Quantity|
1 | 31-Jan-12 | 100 |
2 | 31-Jan-12 | 200 |
1 | 28-Feb-12 | 150 |
2 | 28-Feb-12 | 250 |
1 | 31-Mar-12 | 180 |
2 | 31-Mar-12 | 280 |
My SQL statement below bring all previous values instead the latest one only. Could anyone assist me troubleshoot the query.
SELECT Sheet1.ProductId, Max(Sheet1.ReportingDate) AS MaxOfReportingDate, Sheet1.Quantity
FROM Sheet1
GROUP BY Sheet1.ProductId, Sheet1.Quantity, Sheet1.ReportingDate, Sheet1.ProductId
HAVING (((Sheet1.ReportingDate)<#3/15/2012#) AND ((Sheet1.ProductId)=1))

Here's #naveen's idea:
SELECT TOP 1 Sheet1.ProductId, Sheet1.ReportingDate AS MaxOfReportingDate, Sheet1.Quantity
FROM Sheet1
WHERE (Sheet1.ProductId = 1)
AND (Sheet1.ReportingDate < #2012/03/15#)
ORDER BY Sheet1.ReportingDate DESC
Although note that MsAccess selects top with ties, so this won't work if you have more than one row per ReportingDate, ProductId combo. (But at the same time, this means that the data isn't deterministic anyway)
Edit - I meant that if you have a contradiction in your data like below, you'll get 2 rows back.
ProductId | ReportingDate | Quantity|
1 | 31-Jan-12 | 100
1 | 31-Jan-12 | 200

SQL Combine two tables with two parameters

I searched forum for 1h and didn't find nothing similar.
I have this problem: I want to compare two colums ID and DATE if they are the same in both tables i want to put number from table 2 next to it. But if it is not the same i want to fill yearly quota on the date. I am working in Access.
table1
id|date|state_on_date
1|30.12.2013|23
1|31.12.2013|25
1|1.1.2014|35
1|2.1.2014|12
2|30.12.2013|34
2|31.12.2013|65
2|1.1.2014|43
table2
id|date|year_quantity
1|31.12.2013|100
1|31.12.2014|150
2|31.12.2013|200
2|31.12.2014|300
I want to get:
table 3
id|date|state_on_date|year_quantity
1|30.12.2013|23|100
1|31.12.2013|25|100
1|1.1.2014|35|150
1|2.1.2014|12|150
2|30.12.2013|34|200
2|31.12.2013|65|200
2|1.1.2014|43|300
I tried joins and reading forums but didn't find solution.

Are you looking for this?
SELECT id, date, state_on_date,
(
SELECT TOP 1 year_quantity
FROM table2
WHERE id = t.id
AND date >= t.date
ORDER BY date
) AS year_quantity
FROM table1 t
Output:
| ID | DATE | STATE_ON_DATE | YEAR_QUANTITY |
|----|------------|---------------|---------------|
| 1 | 2013-12-30 | 23 | 100 |
| 1 | 2013-12-31 | 25 | 100 |
| 1 | 2014-01-01 | 35 | 150 |
| 1 | 2014-01-02 | 12 | 150 |
| 2 | 2013-12-30 | 34 | 200 |
| 2 | 2013-12-31 | 65 | 200 |
| 2 | 2014-01-01 | 43 | 300 |
Here is SQLFiddle demo It's for SQL Server but should work just fine in MS Accesss.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Summing all values with same ID in a column give me duplicated values in SQL? - sql

Thank you guys for all your help, I found out where the problem was. It was mainly in the group by when I removed all the other column names and left only the product_sku column, it worked as required

Related

How to select the latest date for each group by number?

SQL group column where other column is equal

SQL - SELECT all households by last value

SQL Group By Having Where Statements

SQL Combine two tables with two parameters

Categories

Resources