How to aggregate based on various conditions

How to aggregate based on various conditions - sql

lets say I have a table which stores itemID, Date and total_shipped over a period of time:
ItemID | Date | Total_shipped
__________________________________
1 | 1/20/2000 | 2
2 | 1/20/2000 | 3
1 | 1/21/2000 | 5
2 | 1/21/2000 | 4
1 | 1/22/2000 | 1
2 | 1/22/2000 | 7
1 | 1/23/2000 | 5
2 | 1/23/2000 | 6
Now I want to aggregate based on several periods of time. For example, I Want to know how many of each item was shipped every two days and in total. So the desired output should look something like:
ItemID | Jan20-Jan21 | Jan22-Jan23 | Jan20-Jan23
_____________________________________________
1 | 7 | 6 | 13
2 | 7 | 13 | 20
How do I do that in the most efficient way
I know I can make three different subqueries but I think there should be a better way. My real data is large and there are several different time periods to be considered i. e. in my real problem I want the shipped items for current_week, last_week, two_weeks_ago, three_weeks_ago, last_month, two_months_ago, three_months_ago so I do not think writing 7 different subqueries would be a good idea.
Here is the general idea of what I can already run but is very expensive for the database
WITH
sq1 as (
SELECT ItemID, sum(Total_shipped) sum1
FROM table
WHERE Date BETWEEN '1/20/2000' and '1/21/2000'
GROUP BY ItemID),
sq2 as (
SELECT ItemID, sum(Total_Shipped) sum2
FROM table
WHERE Date BETWEEN '1/22/2000' and '1/23/2000'
GROUP BY ItemID),
sq3 as(
SELECT ItemID, sum(Total_Shipped) sum3
FROM Table
GROUP BY ItemID)
SELECT ItemID, sq1.sum1, sq2.sum2, sq3.sum3
FROM Table
JOIN sq1 on Table.ItemID = sq1.ItemID
JOIN sq2 on Table.ItemID = sq2.ItemID
JOIN sq3 on Table.ItemID = sq3.ItemID

I dont know why you have tagged this question with multiple database.
Anyway, you can use conditional aggregation as following in oracle:
select
item_id,
sum(case when "date" between date'2000-01-20' and date'2000-01-21' then total_shipped end) as "Jan20-Jan21",
sum(case when "date" between date'2000-01-22' and date'2000-01-23' then total_shipped end) as "Jan22-Jan23",
sum(case when "date" between date'2000-01-20' and date'2000-01-23' then total_shipped end) as "Jan20-Jan23"
from my_table
group by item_id
Cheers!!

Use FILTER:
select
item_id,
sum(total_shipped) filter (where date between '2000-01-20' and '2000-01-21') as "Jan20-Jan21",
sum(total_shipped) filter (where date between '2000-01-22' and '2000-01-23') as "Jan22-Jan23",
sum(total_shipped) filter (where date between '2000-01-20' and '2000-01-23') as "Jan20-Jan23"
from my_table
group by 1
item_id | Jan20-Jan21 | Jan22-Jan23 | Jan20-Jan23
---------+-------------+-------------+-------------
1 | 7 | 6 | 13
2 | 7 | 13 | 20
(2 rows)
Db<>fiddle.

Related

Stop SQL Select After Sum Reached

My database is Db2 for IBM i.
I have read-only access, so my query must use only basic SQL select commands.
==============================================================
Goal:
I want to select every record in the table until the sum of the amount column exceeds the predetermined limit.
Example:
I want to match every item down the table until the sum of matched values in the "price" column >= $9.00.
The desired result:
Is this possible?

You may use sum analytic function to calculate running total of price and then filter by its value:
with a as (
select
t.*,
sum(price) over(order by salesid asc) as price_rsum
from t
)
select *
from a
where price_rsum <= 9
SALESID | PRICE | PRICE_RSUM
------: | ----: | ---------:
1001 | 5 | 5
1002 | 3 | 8
1003 | 1 | 9
db<>fiddle here

Counting distinct stores SQL

I am fairly new to SQL and was wondering if anyone could help with my code.
I am trying to count the distinct number of stores that are tied to a certain Warehouse which is tied to a purchase order.
Example: If there are 100 stores with this PO that came from Warehouse #2 or #5 or etc... then I would like:
| COUNT_STORE | WH_LOCATION |
1 | 100 | 2 |
2 | 25 | 5 |
3 | 56 | 1 |
[]
My Code:
select count(distinct Store_ID) as Count_Store, WH_Location
from alc_Loc
where alloc_PO = 11345
group by Store_ID, WH_Location
When I run this I get a 1 for "count_store" and it shows me the WH_Location multiple times. I feel as if something is not tying in correctly.
Any help is appreciated!

Just remove store_id from the group by:
select count(distinct Store_ID) as Count_Store, WH_Location
from alc_Loc
where alloc_PO = 11345
group by WH_Location;
When you include Store_ID in the group by, you are getting a separate row for each Store_ID. The distinct count is then obviously 1 (or 0 if the store id is NULL).

POSTGRESQL : How to select the first row of each group?

With this query :
WITH responsesNew AS
(
SELECT DISTINCT responses."studentId", notation, responses."givenHeart",
SUM(notation + responses."givenHeart") OVER (partition BY responses."studentId"
ORDER BY responses."createdAt") AS total, responses."createdAt",
FROM responses
)
SELECT responsesNew."studentId", notation, responsesNew."givenHeart", total,
responsesNew."createdAt"
FROM responsesNew
WHERE total = 3
GROUP BY responsesNew."studentId", notation, responsesNew."givenHeart", total,
responsesNew."createdAt"
ORDER BY responsesNew."studentId" ASC
I get this data table :
studentId | notation | givenHeart | total | createdAt |
----------+----------+------------+-------+--------------------+
374 | 1 | 0 | 3 | 2017-02-13 12:43:03
374 | null | 0 | 3 | 2017-02-15 22:22:17
639 | 1 | 2 | 3 | 2017-04-03 17:21:30
790 | 1 | 0 | 3 | 2017-02-12 21:12:23
...
My goal is to keep only in my data table the early row of each group like shown below :
studentId | notation | givenHeart | total | createdAt |
----------+----------+------------+-------+--------------------+
374 | 1 | 0 | 3 | 2017-02-13 12:43:03
639 | 1 | 2 | 3 | 2017-04-03 17:21:30
790 | 1 | 0 | 3 | 2017-02-12 21:12:23
...
How can I get there?
I've read many topics over here but nothing I've tried with DISTINCT, DISTINCT ON, subqueries in WHERE, LIMIT, etc have worked for me (surely due to my poor understanding). I've met errors related to window function, missing column in ORDER BY and a few others I can't remember.

You can do this with distinct on. The query would look like this:
WITH responsesNew AS (
SELECT DISTINCT r."studentId", notation, r."givenHeart",
SUM(notation + r."givenHeart") OVER (partition BY r."studentId"
ORDER BY r."createdAt") AS total,
r."createdAt"
FROM responses r
)
SELECT DISTINCT ON (r."studentId") r."studentId", notation, r."givenHeart", total,
r."createdAt"
FROM responsesNew r
WHERE total = 3
ORDER BY r."studentId" ASC, r."createdAt";
I'm pretty sure this can be simplified. I just don't understand the purpose of the CTE. Using SELECT DISTINCT in this way is very curious.
If you want a simplified query, ask another question with sample data, desired results, and explanation of what you are doing and include the query or a link to this question.

use Row_number() window function to add a row number to each partition and then only show row 1.
no need to fully qualify names if only one table is involved. and use aliases when qualifying to simplify readability.
WITH responsesNew AS
(
SELECT "studentId"
, notation
, "givenHeart"
, SUM(notation + "givenHeart") OVER (partition BY "studentId" ORDER BY "createdAt") AS total
, "createdAt"
, Row_number() OVER ("studentId" ORDER BY "createdAt") As RNum
FROM responses r
)
SELECT RN."studentId"
, notation, RN."givenHeart"
, total
, RN."createdAt"
FROM responsesNew RN
WHERE total = 3
AND RNum = 1
GROUP BY RN."studentId"
, notation
, RN."givenHeart", total
, RN."createdAt"
ORDER BY RN."studentId" ASC

how to get daily profit from sql table

I'm stucking for a solution at the problem of finding daily profits from db (ms access) table. The difference wrt other tips I found online is that I don't have in the table a field "Price" and one "Cost", but a field "Type" which distinguish if it is a revenue "S" or a cost "C"
this is the table "Record"
| Date | Price | Quantity | Type |
-----------------------------------
|01/02 | 20 | 2 | C |
|01/02 | 10 | 1 | S |
|01/02 | 3 | 10 | S |
|01/02 | 5 | 2 | C |
|03/04 | 12 | 3 | C |
|03/03 | 200 | 1 | S |
|03/03 | 120 | 2 | C |
So far I tried different solutions like:
SELECT
(SELECT SUM (RS.Price* RS.Quantity)
FROM Record RS WHERE RS.Type='S' GROUP BY RS.Data
) as totalSales,
(SELECT SUM (RC.Price*RC.Quantity)
FROM Record RC WHERE RC.Type='C' GROUP BY RC.Date
) as totalLosses,
ROUND(totalSales-totaleLosses,2) as NetTotal,
R.Date
FROM RECORD R";
in my mind it could work but obviously it doesn't
and
SELECT RC.Data, ROUND(SUM (RC.Price*RC.QuantitY),2) as DailyLoss
INTO #DailyLosses
FROM Record RC
WHERE RC.Type='C' GROUP BY RC.Date
SELECT RS.Date, ROUND(SUM (RS.Price*RS.Quantity),2) as DailyRevenue
INTO #DailyRevenues
FROM Record RS
WHERE RS.Type='S'GROUP BY RS.Date
SELECT Date, DailyRevenue - DailyLoss as DailyProfit
FROM #DailyLosses dlos, #DailyRevenues drev
WHERE dlos.Date = drev.Date";
My problem beyond the correct syntax is the approach to this kind of problem

You can use grouping and conditional summing. Try this:
SELECT data.Date, data.Income - data.Cost as Profit
FROM (
SELECT Record.Date as Date,
SUM(IIF(Record.Type = 'S', Record.Price * Record.Quantity, 0)) as Income,
SUM(IIF(Record.Type = 'C', Record.Price * Record.Quantity, 0)) as Cost,
FROM Record
GROUP BY Record.Date
) data
In this case you first create a sub-query to get separate fields for Income and Cost, and then your outer query uses subtraction to get actual profit.

SQL: Find rows where field value differs

I have a database table structured like this (irrelevant fields omitted for brevity):
rankings
------------------
(PK) indicator_id
(PK) alternative_id
(PK) analysis_id
rank
All fields are integers; the first three (labeled "(PK)") are a composite primary key. A given "analysis" has multiple "alternatives", each of which will have a "rank" for each of many "indicators".
I'm looking for an efficient way to compare an arbitrary number of analyses whose ranks for any alternative/indicator combination differ. So, for example, if we have this data:
analysis_id | alternative_id | indicator_id | rank
----------------------------------------------------
1 | 1 | 1 | 4
1 | 1 | 2 | 6
1 | 2 | 1 | 3
1 | 2 | 2 | 9
2 | 1 | 1 | 4
2 | 1 | 2 | 7
2 | 2 | 1 | 4
2 | 2 | 2 | 9
...then the ideal method would identify the following differences:
analysis_id | alternative_id | indicator_id | rank
----------------------------------------------------
1 | 1 | 2 | 6
2 | 1 | 2 | 7
1 | 2 | 1 | 3
2 | 2 | 1 | 4
I came up with a query that does what I want for 2 analysis IDs, but I'm having trouble generalizing it to find differences between an arbitrary number of analysis IDs (i.e. the user might want to compare 2, or 5, or 9, or whatever, and find any rows where at least one analysis differs from any of the others). My query is:
declare #analysisId1 int, #analysisId2 int;
select #analysisId1 = 1, #analysisId2 = 2;
select
r1.indicator_id,
r1.alternative_id,
r1.[rank] as Analysis1Rank,
r2.[rank] as Analysis2Rank
from rankings r1
inner join rankings r2
on r1.indicator_id = r2.indicator_id
and r1.alternative_id = r2.alternative_id
and r2.analysis_id = #analysisId2
where
r1.analysis_id = #analysisId1
and r1.[rank] != r2.[rank]
(It puts the analysis values into additional fields instead of rows. I think either way would work.)
How can I generalize this query to handle many analysis ids? (Or, alternatively, come up with a different, better query to do the job?) I'm using SQL Server 2005, in case it matters.
If necessary, I can always pull all the data out of the table and look for differences in code, but a SQL solution would be preferable since often I'll only care about a few rows out of thousands and there's no point in transferring them all if I can avoid it. (However, if you have a compelling reason not to do this in SQL, say so--I'd consider that a good answer too!)

This will return your desired data set - Now you just need a way to pass the required analysis ids to the query. Or potentially just filter this data inside your application.
select r.* from rankings r
inner join
(
select alternative_id, indicator_id
from rankings
group by alternative_id, indicator_id
having count(distinct rank) > 1
) differ on r.alternative_id = differ.alternative_id
and r.indicator_id = differ.indicator_id
order by r.alternative_id, r.indicator_id, r.analysis_id, r.rank

I don't know wich database you are using, in SQL Server I would go like this:
-- STEP 1, create temporary table with all the alternative_id , indicator_id combinations with more than one rank:
select alternative_id , indicator_id
into #results
from rankings
group by alternative_id , indicator_id
having count (distinct rank)>1
-- STEP 2, retreive the data
select a.* from rankings a, #results b
where a.alternative_id = b.alternative_id
and a.indicator_id = b. indicator_id
order by alternative_id , indicator_id, analysis_id
BTW, THe other answers given here need the count(distinct rank) !!!!!

I think this is what you're trying to do:
select
r.analysis_id,
r.alternative_id,
rm.indicator_id_max,
rm.rank_max
from rankings rm
join (
select
analysis_id,
alternative_id,
max(indicator_id) as indicator_id_max,
max(rank) as rank_max
from rankings
group by analysis_id,
alternative_id
having count(*) > 1
) as rm
on r.analysis_id = rm.analysis_id
and r.alternative_id = rm.alternative_id

You example differences seems wrong. You say you want analyses whose ranks for any alternative/indicator combination differ but the example rows 3 and 4 don't satisfy this criteria. A correct result according to your requirement is:
analysis_id | alternative_id | indicator_id | rank
----------------------------------------------------
1 | 1 | 2 | 6
2 | 1 | 2 | 7
1 | 2 | 1 | 3
2 | 2 | 1 | 4
On query you could try is this:
with distinct_ranks as (
select alternative_id
, indicator_id
, rank
, count (*) as count
from rankings
group by alternative_id
, indicator_id
, rank
having count(*) = 1)
select r.analysis_id
, r.alternative_id
, r.indicator_id
, r.rank
from rankings r
join distinct_ranks d on r.alternative_id = d.alternative_id
and r.indicator_id = d.indicator_id
and r.rank = d.rank
You have to realize that on multiple analysis the criteria you have is ambiguous. What if analysis 1,2 and 3 have rank 1 and 4,5 and 6 have rank 2 for alternative/indicator 1/1? The set (1,2,3) is 'different' from the set (4,5,6) but inside each set there is no difference. what is the behavior you desire in that case, should they show up or not? My query finds all records that have a different rank for the same alternative/indicator *from all other analysis' but is not clear if this is correct in your requirement.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to aggregate based on various conditions - sql

Related

Stop SQL Select After Sum Reached

Counting distinct stores SQL

POSTGRESQL : How to select the first row of each group?

how to get daily profit from sql table

SQL: Find rows where field value differs

Categories

Resources