I have table like this:
id activity pay parent
1 pay all - null
2 pay tax 10 $ 1
3 pay water bills - 1
4 fix house - null
5 fix roof 1 $ 4
6 pay drinking water 1 $ 3
I want get table like this:
id activity pay parent matriks
1 pay all {11 $} null 1 (pay tax + pay water bills)
2 pay tax 10 $ 1 1-2
3 pay water bills {1 $} 1 1-3 (pay drinking water)
4 fix house {1 $} null 4 (fix roof)
5 fix roof 1 $ 4 4-5
6 pay drinking water 1 $ 3 1-3-6
Count from child to parent:
The problem is when water bills not counted from drinking water, pay all cant counted if pay tax or pay water not have pay value.
I tried this on our postgres db (Version 8.4.22), since the fiddle was a bit slow for my taste. But the SQL can be pasted in there and it works for postgres.
Still here is the fiddle demo take like 20 sec first time but then is faster.
Here's what produces the calculated results for me. (I didn't format it according to your requirements, because in my mind the main excercise was the calculation.) This assumes your table is called activity:
with recursive rekmatriks as(
select id, activity, pay, parent, id::text as matriks, 0 as lev
from activity
where parent is null
union all
select activity.id, activity.activity, activity.pay, activity.parent,
rekmatriks.matriks || '-' || activity.id::text as matriks,
rekmatriks.lev+1 as lev
from activity inner join rekmatriks on activity.parent = rekmatriks.id
)
, reksum as (
select id, activity, pay, parent, matriks, lev, coalesce(pay,0) as subsum
from rekmatriks
where not exists(select id from rekmatriks rmi where rmi.parent=rekmatriks.id)
union all
select rekmatriks.*, reksum.subsum+coalesce(rekmatriks.pay, 0) as subsum
from rekmatriks inner join reksum on rekmatriks.id = reksum.parent)
select id, activity, pay, parent, matriks, sum(subsum) as amount, lev
from reksum
group by id, activity, pay, parent, matriks, lev
order by id
As a bonus, this delivers the nesting depth of an id. 0 for a parent, 1 for first sublevel etc. This uses two recursive WITH queries to achieve what you want. The calculated value you need is in the amount column.
The first one (rekmatriks) processes the IDs in the table top to bottom, starting with any ids that have a parent of NULL. The recursive part simply takes the parent id and adds it's own id to it, to achieve your matriks tree representation field.
The second one (reksum) works bottom to top and starts with all rows that have no child elements. The recursive part of this query selects a parent row for each child row selected in the non-recursive part, and computes the sum of pay and subsum for each line. This produces multiple rows per id, since one parent can have multiple children.
All that's left now is the final select statement. It uses GROUP BY and SUM to aggregate the multiple possible child sum values into one row.
This does work for your particular example. It may fail if there's different cases not shown in the sample data, for example, if an item that has children carries a value that needs to be added.
Related
This question already has answers here:
Two SQL LEFT JOINS produce incorrect result
(3 answers)
Closed 12 months ago.
I am having some troubles with a count function. The problem is given by a left join that I am not sure I am doing correctly.
Variables are:
Customer_name (buyer)
Product_code (what the customer buys)
Store (where the customer buys)
The datasets are:
Customer_df (list of customers and product codes of their purchases)
Store1_df (list of product codes per week, for Store 1)
Store2_df (list of product codes per day, for Store 2)
Final output desired:
I would like to have a table with:
col1: Customer_name;
col2: Count of items purchased in store 1;
col3: Count of items purchased in store 2;
Filters: date range
My query looks like this:
SELECT
DISTINCT
C_customer_name,
C.product_code,
COUNT(S1.product_code) AS s1_sales,
COUNT(S2.product_code) AS s2_sales,
FROM customer_df C
LEFT JOIN store1_df S1 USING(product_code)
LEFT JOIN store2_df S2 USING(product_code)
GROUP BY
customer_name, product_code
HAVING
S1_sales > 0
OR S2_sales > 0
The output I expect is something like this:
Customer_name
Product_code
Store1_weekly_sales
Store2_weekly_sales
Luigi
120012
4
8
James
100022
6
10
But instead, I get:
Customer_name
Product_code
Store1_weekly_sales
Store2_weekly_sales
Luigi
120012
290
60
James
100022
290
60
It works when instead of COUNT(product_code) I do COUNT(DSITINCT product_code) but I would like to avoid that because I would like to be able to aggregate on different timespans (e.g. if I do count distinct and take into account more than 1 week of data I will not get the right numbers)
My hypothesis are:
I am joining the tables in the wrong way
There is a problem when joining two datasets with different time aggregations
What am I doing wrong?
The reason as Philipxy indicated is common. You are getting a Cartesian result from your data thus bloating your numbers. To simplify, lets consider just a single customer purchasing one item from two stores. The first store has 3 purchases, the second store has 5 purchases. Your total count is 3 * 5. This is because for each entry in the first is also joined by the same customer id in the second. So 1st purchase is joined to second store 1-5, then second purchase joined to second store 1-5 and you can see the bloat. So, by having each store pre-query the aggregates per customer will have AT MOST, one record per customer per store (and per product as per your desired outcome).
select
c.customer_name,
AllCustProducts.Product_Code,
coalesce( PQStore1.SalesEntries, 0 ) Store1SalesEntries,
coalesce( PQStore2.SalesEntries, 0 ) Store2SalesEntries
from
customer_df c
-- now, we need all possible UNIQUE instances of
-- a given customer and product to prevent duplicates
-- for subsequent queries of sales per customer and store
JOIN
( select distinct customerid, product_code
from store1_df
union
select distinct customerid, product_code
from store2_df ) AllCustProducts
on c.customerid = AllCustProducts.customerid
-- NOW, we can join to a pre-query of sales at store 1
-- by customer id and product code. You may also want to
-- get sum( SalesDollars ) if available, just add respectively
-- to each sub-query below.
LEFT JOIN
( select
s1.customerid,
s1.product_code,
count(*) as SalesEntries
from
store1_df s1
group by
s1.customerid,
s1.product_code ) PQStore1
on AllCustProducts.customerid = PQStore1.customerid
AND AllCustProducts.product_code = PQStore1.product_code
-- now, same pre-aggregation to store 2
LEFT JOIN
( select
s2.customerid,
s2.product_code,
count(*) as SalesEntries
from
store2_df s2
group by
s2.customerid,
s2.product_code ) PQStore2
on AllCustProducts.customerid = PQStore2.customerid
AND AllCustProducts.product_code = PQStore2.product_code
No need for a group by or having since all entries in their respective pre-aggregates will result in a maximum of 1 record per unique combination. Now, as for your needs to filter by date ranges. I would just add a WHERE clause within each of the AllCustProducts, PQStore1, and PQStore2.
I have a table that looks something like this:
sender
reciever
amount
1
2
10
2
1
20
3
2
20
1
3
30
The desired output should be:
user
Trans_Change
1
-20
2
10
3
10
i can't find a way to write a query for it in SQL.
the logic behind the desired output should be that;
1 sends 2 amount of 10, so now 1 has: -10 and 2 has: +10 and so on...
Best Guess given known info:
We simply assign all senders negative transaction amounts union all the receivers as positive amounts and then group the data summing the transactions
With CTE AS (
SELECT sender as aUser, (-1 * amount) as Trans_Change -- Senders lose money
FROM table
UNION ALL
SELECT Receiver as aUser, amount -- receivers get money
FROM Table)
SELECT aUser, sum(Trans_Change) as Trans_Change -- aggregate transaction totals by user
FROM CTE
GROUP BY aUser
Part of addressing this is acknowledging that an amount is being used twice: once for the sender as a negative, once for a receiver as a positive (or credit/debit if you prefer) Realizing this I knew I needed to get that value on two separate rows. selecting the data twice allowed for this. Using two selects and a union all allows us to get that value twice and then it's a simple aggregration.
I have a query that collects many different columns, and I want to include a column that sums the price of every component in an order. Right now, I already have a column that simply shows the price of every component of an order, but I am not sure how to create this new column.
I would think that the code would go something like this, but I am not really clear on what an aggregate function is or why I get an error regarding the aggregate function when I try to run this code.
SELECT ID, Location, Price, (SUM(PriceDescription) FROM table GROUP BY ID WHERE PriceDescription LIKE 'Cost.%' AS Summary)
FROM table
When I say each component, I mean that every ID I have has many different items that make up the general price. I only want to find out how much money I spend on my supplies that I need for my pressure washers which is why I said `Where PriceDescription LIKE 'Cost.%'
To further explain, I have receipts of every customer I've worked with and in these receipts I write down my cost for the soap that I use and the tools for the pressure washer that I rent. I label all of these with 'Cost.' so it looks like (Cost.Water), (Cost.Soap), (Cost.Gas), (Cost.Tools) and I would like it so for Order 1 it there's a column that sums all the Cost._ prices for the order and for Order 2 it sums all the Cost._ prices for that order. I should also mention that each Order does not have the same number of Costs (sometimes when I use my power washer I might not have to buy gas and occasionally soap).
I hope this makes sense, if not please let me know how I can explain further.
`ID Location Price PriceDescription
1 Park 10 Cost.Water
1 Park 8 Cost.Gas
1 Park 11 Cost.Soap
2 Tom 20 Cost.Water
2 Tom 6 Cost.Soap
3 Matt 15 Cost.Tools
3 Matt 15 Cost.Gas
3 Matt 21 Cost.Tools
4 College 32 Cost.Gas
4 College 22 Cost.Water
4 College 11 Cost.Tools`
I would like for my query to create a column like such
`ID Location Price Summary
1 Park 10 29
1 Park 8
1 Park 11
2 Tom 20 26
2 Tom 6
3 Matt 15 51
3 Matt 15
3 Matt 21
4 College 32 65
4 College 22
4 College 11 `
But if the 'Summary' was printed on every line instead of just at the top one, that would be okay too.
You just require sum(Price) over(Partition by Location) will give total sum as below:
SELECT ID, Location, Price, SUM(Price) over(Partition by Location) AS Summed_Price
FROM yourtable
WHERE PriceDescription LIKE 'Cost.%'
First, if your Price column really contains values that match 'Cost.%', then you can not apply SUM() over it. SUM() expects a number (e.g. INT, FLOAT, REAL or DECIMAL). If it is text then you need to explicitly convert it to a number by adding a CAST or CONVERT clause inside the SUM() call.
Second, your query syntax is wrong: you need GROUP BY, and the SELECT fields are not specified correctly. And you want to SUM() the Price field, not the PriceDescription field (which you can't even sum as I explained)
Assuming that Price is numeric (see my first remark), then this is how it can be done:
SELECT ID
, Location
, Price
, (SELECT SUM(Price)
FROM table
WHERE ID = T1.ID AND Location = T1.Location
) AS Summed_Price
FROM table AS T1
to get exact result like posted in question
Select
T.ID,
T.Location,
T.Price,
CASE WHEN (R) = 1 then RN ELSE NULL END Summary
from (
select
ID,
Location,
Price ,
SUM(Price)OVER(PARTITION BY Location)RN,
ROW_number()OVER(PARTITION BY Location ORDER BY ID )R
from Table
)T
order by T.ID
Situation: I have three tables of parts: Raw Material, Individual Parts, and Assembled Parts. I have created a union query to list all the part numbers as well as their minimum levels of inventory and and opening levels of inventory. I also have an inventory table that uses all the part numbers. I this used the union query to find current inventory and a current balance in another query. When I attempt to open this query I get a input box asking for CurrentInventory.
Question: How do I get the input box to stop appearing?
Code:
Tables:
Raw Material, Individual Parts, and Assembled Parts all have similar formats that begin with the following
PartNum | Min | Open
1 50 100
Inventory:
PartNum | Year | Week | In | Out
1 2015 31 20 10
Queries
Union Query:
SELECT PartNum, Open, Min
FROM Raw Material
UNION
SELECT PartNum , Open, Min
FROM Individual Parts
UNION
SELECT PartNum, Open, Min
FROM Assembled Parts;
Which results in:
PartNum | Min | Open
1 50 100
etc.
Current Inventory:
SELECT AllParts.PartNum, AllParts.Open, Sum(Inventory.[In]) AS SumOfIn,
Sum(Inventory.Out) AS SumOfOut,
[Open]+[SumOfIn]-[SumOfOut] AS CurrentInventory,
AllParts.Min, [CurrentInventory]-[Min] AS CurrentBalance
FROM AllParts
INNER JOIN Inventory ON AllParts.PartNum = Inventory.PartNum
GROUP BY AllParts.PartNum, AllParts.Open, AllParts.Min,
[CurrentInventory]-[Min], [Open]+[In]-[Out];
When I attempt to run this is when I get the input box for CurrentInventory. If I don't enter anything it doesn't effect the results. However, when I attempt to run the report I generate from this, the column will show as what I entered and not the actual value.
Even though you are aliasing a calculated result as "CurrentInventory", you can't reference that calculation by the alias in the same query.
Everytime you have "CurrentInventory" (except for after the "AS") you need to replace it with [Open]+[SumOfIn]-[SumOfOut]
I have a list of inventory entries. Each entry has a date, item name, and volumes. What I'm doing now is selecting the top 10 items based on the most recent-date volumes, and then tracking the volumes of these items over the past 5 days in my table. The one piece I'm missing is that I would like to order the resulting table based on the most recent-date volume order of the items, i.e.
Date Item Volumes
1/20 Dog 5
1/20 Bird 4
1/20 Cat 2
1/19 Dog 3
1/19 Bird 6
1/19 Cat 10
1/18 Dog 0
1/18 Bird 2
1/18 Cat 0
Below is a scrubbed version of the sql code I'm running. As of now the second sort I'm doing after sorting on the date is just sorting alphabetically on the item name.
SELECT
TOP_VOLUMES.NAME,
DATA.VOLUMES,
DATA.TIMESTAMP
FROM DATA
RIGHT JOIN
(SELECT TOP 10 NAME
FROM DATA
WHERE TIMESTAMP = (SELECT MAX(TIMESTAMP) FROM DATA)
ORDER BY VOLUMES DESC, NAME) AS TOP_VOLUMES
ON TOP_VOLUMES.NAME = DATA.NAME
WHERE ((SELECT MAX(TIMESTAMP) FROM DATA) - DATA.TIMESTAMP < 5)
ORDER BY DATA.TIMESTAMP DESC , DATA.NAME;
I would really like to avoid creating any temp tables for this. Is there any way to do it within the select statement within the join? Any help would be greatly appreciated!
I came across a link which may contain the answer you are looking for-
Access SQL how to make an increment in SELECT query
Hope this helps you!