PostgreSQL sum quantity of children items - sql

I have a subscription service that delivers many items.
Subscribers add items to a delivery by creating a row in delivery_items.
Until recently subscribers could only add 1 of each item to a delivery. But now I have added a quantity column to my delivery_items table.
Given this schema, and an outdated query (on SQL Fiddle), how can I select the total amount of an item I will need for each day's deliveries?
This provided a table of days, and items being delivered that day but doesn't account for quantity:
SELECT
d.date,
sum((di.item_id = 1)::int) as "Bread",
sum((di.item_id = 2)::int) as "Eggs",
sum((di.item_id = 3)::int) as "Coffee"
FROM deliveries d
JOIN users u ON u.id = d.user_id
JOIN delivery_items di ON di.delivery_id = d.id
GROUP BY d.date
ORDER BY d.date
Ideally, my query would be agnostic to the specifics of the items, like the id/name.
Thanks
Edit to add schema:
deliveries (TABLE)
id int4(10)
date timestamp(29)
user_id int4(10)
delivery_items (TABLE)
delivery_id int4(10)
item_id int4(10)
quantity int4(10)
items (TABLE)
id int4(10)
name varchar(10)
users (TABLE)
id int4(10)
name varchar(10)

You don't need to JOIN your users table, because you're neither getting any data from it nor using it as your joining condition.
Here's your edited SQL Fiddle
Using conditional sum() function would retrieve values of needed goods to deliver for a particular date.
SELECT
d.date,
sum(CASE WHEN di.item_id = 1 THEN di.quantity ELSE 0 END) as "Bread",
sum(CASE WHEN di.item_id = 2 THEN di.quantity ELSE 0 END) as "Eggs",
sum(CASE WHEN di.item_id = 3 THEN di.quantity ELSE 0 END) as "Coffee"
FROM deliveries d
JOIN delivery_items di ON di.delivery_id = d.id
GROUP BY d.date
ORDER BY d.date
You could also look into crosstab(text, text) function. Result would be the same, but you can also specify query that produces the set of categories.
Though, if you want to get dynamic results when your items table has additional rows, you would need to wrap this up in a function and build the output columns and types definition, because:
The crosstab function is declared to return setof record, so the actual names and types of the output columns must be defined in the FROM clause of the calling SELECT statemen

Related

SQL Aggregating data based on condition containing the key fields for aggregation

I am new to SQL (Oracle SQL if it makes a difference) but it so happens I have to use it. I need to aggregate data by some key fields (CustId, AppId). I also have some AppDate, PDate and Amount.Initial data
What I need to do is aggregate but for each key field combination I need to aggregate the data from other rows with the following conditions:
CustID = CustID aka take only information for this custID
AppId != AppId aka take only information for application different than the current one.
AppDate >= PDate aka take only information available at time of application
From a quick look at SQL language my approach was the use of:
select CustId, AppId, Sum(case when
custid=custid and Appid!=Appid and AppDate >= PDate then Amount else 0 end) as SumAmount
From Table
Group by CustId AppId
Unfortunately, the result I get are all 0 for SumAmount. My guess it is because of the last 2 conditions.
The results I want to get from the example table are: Results
Also, I would probably add condition that AppDate - AppDate of other AppID > 6months exclude those from the aggregated amounts.
P.S. I am really sorry for the substandard formatting and probably bad code. I am not really experienced on how to do it.
Edit: I've found a solution as follows:
select distinct a.CustId, a.AppId, a.AppDate, b.PDate, b.Amount
from table a
inner join (select CustId, AppId, Amount, PDate from Table) b
on a.CustId = b.CustId and a.AppId != b.AppId
where a.AppDate >= b.PDate
After that I aggregate by AppId summing the amount.
Basically, I just append the same information based on a condition and since I get a lot of full duplicates I deduplicate with distinct.
I've found a solution as follows:
select distinct a.CustId, a.AppId, a.AppDate, b.PDate, b.Amount
from table a
inner join (select CustId, AppId, Amount, PDate from Table) b
on a.CustId = b.CustId and a.AppId != b.AppId
where a.AppDate >= b.PDate
After that I aggregate by AppId summing the amount.
Basically, I just append the same information based on a condition and since I get a lot of full duplicates I deduplicate with distinct.

Does it matter to filter results when doing aggregation?

I want to get my sales for each day which is located in my orders_summary table.
orders_summary table columns: id, date, amount, sku_id
products table columns: id, sku
Currently Im getting my daily sales like this:
SELECT
MAX(CASE WHEN os.date = '01/01/2022' THEN COALESCE(amount,0)::INT ELSE 0 END) AS orders_1,
MAX(CASE WHEN os.date = '01/02/2022' THEN COALESCE(amount,0)::INT ELSE 0 END) AS orders_2
FROM products AS p
LEFT JOIN orders_summary AS os ON p.id = os.sku_id
WHERE p.id = '1'
GROUP BY p.id;
Is it important to add AND date BETWEEN '01/01/2022' AND '01/02/2022' in my where clause?
Yes absolutely. Imagine having 10 years worth of data in the table where you're only interested in the data for two days. You must use the where clause which restricts the number of rows (down to 0.05% in this case) before doing the group by.

SQL Pivot column values

I have tried following this and this(SQL Server specific solution) but were not helpful.
I have two tables, Product and Sale and I want to find how many products are sold on each day. But I want to pivot the table so that columns become the products name and each row will contain the amount of products sold for each day ordered by the day.
Simplified schema is as following
CREATE TABLE product (
id integer,
name varchar(40),
price float(2)
);
CREATE TABLE sale(
id integer,
product_id integer,
transaction_time timestamp
);
This is what I want
I only managed to aggregate the total sales per day per product but I am not able to pivot the product names.
select date(sale.transaction_date)
, product.id
, product.name
, count(product_id)
from sale inner join
product on sale.product_id = product.id
group by date(sale.transaction_date)
, product.id
, product.name
This is the situation so far
Please suggest.
You need pivoting logic, e.g.
select
s.transaction_date::date,
count(case when p.name = 'intelligent_rubber_clock' then 1 end) as intelligent_rubber_clock,
count(case when p.name = 'intelligent_iron_wallet' then 1 end) as intelligent_iron_wallet,
count(case when p.name = 'practical_marble_car' then 1 end) as practical_marble_car
from sale s
inner join product p
on s.product_id = p.id
group by
s.transaction_date::date;
Since your expected output aggregates by date alone, then only the transaction date should be in your GROUP BY clause. The trick used here is to take the count of a CASE expression which returns 1 when the record is from a given product, and 0 otherwise. This generates conditional counts for each product, all in separate columns. To add more columns, just add more conditional counts.

Suggest most optimized way using hive or pig

Problem Statement
Assume there is one text file of logs. Below are the fields in the file.
Log File
userID
productID
action
Where Action would be one of these –
Browse, Click, AddToCart, Purchase, LogOut
Select users who performed AddToCart action but did not perform Purchase action.
('1001','101','201','Browse'),
('1002','102','202','Click'),
('1001','101','201','AddToCart'),
('1001','101','201','Purchase'),
('1002','102','202','AddToCart')
Can anyone suggest to get this info using hive or pig with optimised performance
This is possible to do using sum() or analytical sum() depending on exact requirements in a single table scan. What if User added to cart two products, but purchased only one?
For User+Product:
select userID, productID
from
(
select
userID,
productID,
sum(case when action='AddToCart' then 1 else 0 end) addToCart_cnt,
sum(case when action='Purchase' then 1 else 0 end) Purchase_cnt
from table
group by userID, productID
)s
where addToCart_cnt>0 and Purchase_cnt=0
Hive: Use not in
select * from table
where action='AddtoCart' and
userID not in (select distinct userID from table where action='Purchase')
Pig: Filter the ids using action and do a left join and check id is null
A = LOAD '\path\file.txt' USING PigStorage(',') AS (userID:int,b:int,c:int,action:chararray) -- Note I am assuming the first 3 columns are int.You will have to figure out the loading without the quotes.
B = FILTER A BY (action='AddToCart');
C = FILTER A BY (action='Purchase');
D = JOIN B BY userID LEFT OUTER,C BY userID;
E = FILTER D BY C.userID is null;
DUMP E;

Using a stored procedure in Teradata to build a summarial history table

I am using Terdata SQL Assistant connected to an enterprise DW. I have written the query below to show an inventory of outstanding items as of a specific point in time. The table referenced loads and stores new records as changes are made to their state by load date (and does not delete historical records). The output of my query is 1 row for the specified date. Can I create a stored procedure or recursive query of some sort to build a history of these summary rows (with 1 new row per day)? I have not used such functions in the past; links to pertinent previously answered questions or suggestions on how I could get on the right track in researching other possible solutions are totally fine if applicable; just trying to bridge this gap in my knowledge.
SELECT
'2017-10-02' as Dt
,COUNT(DISTINCT A.RECORD_NBR) as Pending_Records
,SUM(A.PAY_AMT) AS Total_Pending_Payments
FROM DB.RECORD_HISTORY A
INNER JOIN
(SELECT MAX(LOAD_DT) AS LOAD_DT
,RECORD_NBR
FROM DB.RECORD_HISTORY
WHERE LOAD_DT <= '2017-10-02'
GROUP BY RECORD_NBR
) B
ON A.RECORD_NBR = B.RECORD_NBR
AND A.LOAD_DT = B.LOAD_DT
WHERE
A.RECORD_ORDER =1 AND Final_DT Is Null
GROUP BY Dt
ORDER BY 1 desc
Here is my interpretation of your query:
For the most recent load_dt (up until 2017-10-02) for record_order #1,
return
1) the number of different pending records
2) the total amount of pending payments
Is this correct? If you're looking for this info, but one row for each "Load_Dt", you just need to remove that INNER JOIN:
SELECT
load_Dt,
COUNT(DISTINCT record_nbr) AS Pending_Records,
SUM(pay_amt) AS Total_Pending_Payments
FROM DB.record_history
WHERE record_order = 1
AND final_Dt IS NULL
GROUP BY load_Dt
ORDER BY 1 DESC
If you want to get the summary info per record_order, just add record_order as a grouping column:
SELECT
load_Dt,
record_order,
COUNT(DISTINCT record_nbr) AS Pending_Records,
SUM(pay_amt) AS Total_Pending_Payments
FROM DB.record_history
WHERE final_Dt IS NULL
GROUP BY load_Dt, record_order
ORDER BY 1,2 DESC
If you want to get one row per day (if there are calendar days with no corresponding "load_dt" days), then you can SELECT from the sys_calendar.calendar view and LEFT JOIN the query above on the "load_dt" field:
SELECT cal.calendar_date, src.Pending_Records, src.Total_Pending_Payments
FROM sys_calendar.calendar cal
LEFT JOIN (
SELECT
load_Dt,
COUNT(DISTINCT record_nbr) AS Pending_Records,
SUM(pay_amt) AS Total_Pending_Payments
FROM DB.record_history
WHERE record_order = 1
AND final_Dt IS NULL
GROUP BY load_Dt
) src ON cal.calendar_date = src.load_Dt
WHERE cal.calendar_date BETWEEN <start_date> AND <end_date>
ORDER BY 1 DESC
I don't have access to a TD system, so you may get syntax errors. Let me know if that works or you're looking for something else.