I have a system to track orders and related expenditures. This is a Rails app running on PostgreSQL. 99% of my app gets by with plain old Rails Active Record call etc. This one is ugly.
The expenditures table look like this:
+----+----------+-----------+------------------------+
| id | category | parent_id | note |
+----+----------+-----------+------------------------+
| 1 | order | nil | order with no invoices |
+----+----------+-----------+------------------------+
| 2 | order | nil | order with invoices |
+----+----------+-----------+------------------------+
| 3 | invoice | 2 | invoice for order 2 |
+----+----------+-----------+------------------------+
| 4 | invoice | 2 | invoice for order 2 |
+----+----------+-----------+------------------------+
Each expenditure has many expenditure_items and can the orders can be parents to the invoices. That table looks like this:
+----+----------------+-------------+-------+---------+
| id | expenditure_id | cbs_item_id | total | note |
+----+----------------+-------------+-------+---------+
| 1 | 1 | 1 | 5 | Fuit |
+----+----------------+-------------+-------+---------+
| 2 | 1 | 2 | 15 | Veggies |
+----+----------------+-------------+-------+---------+
| 3 | 2 | 1 | 123 | Fuit |
+----+----------------+-------------+-------+---------+
| 4 | 2 | 2 | 456 | Veggies |
+----+----------------+-------------+-------+---------+
| 5 | 3 | 1 | 34 | Fuit |
+----+----------------+-------------+-------+---------+
| 6 | 3 | 2 | 76 | Veggies |
+----+----------------+-------------+-------+---------+
| 7 | 4 | 1 | 26 | Fuit |
+----+----------------+-------------+-------+---------+
| 8 | 4 | 2 | 98 | Veggies |
+----+----------------+-------------+-------+---------+
I need to track a few things:
amounts left to be invoiced on orders (thats easy)
above but rolled up for each cbs_item_id (this is the ugly part)
The cbs_item_id is basically an accounting code to categorize the money spent etc. I have visualized what my end result would look like:
+-------------+----------------+-------------+---------------------------+-----------+
| cbs_item_id | expenditure_id | order_total | invoice_total | remaining |
+-------------+----------------+-------------+---------------------------+-----------+
| 1 | 1 | 5 | 0 | 5 |
+-------------+----------------+-------------+---------------------------+-----------+
| 1 | 2 | 123 | 60 | 63 |
+-------------+----------------+-------------+---------------------------+-----------+
| | | | Rollup for cbs_item_id: 1 | 68 |
+-------------+----------------+-------------+---------------------------+-----------+
| 2 | 1 | 15 | 0 | 15 |
+-------------+----------------+-------------+---------------------------+-----------+
| 2 | 2 | 456 | 174 | 282 |
+-------------+----------------+-------------+---------------------------+-----------+
| | | | Rollup for cbs_item_id: 2 | 297 |
+-------------+----------------+-------------+---------------------------+-----------+
order_total is the sum of total for all the expenditure_items of the given order ( category = 'order'). invoice_total is the sum of total for all the expenditure_items with parent_id = expenditures.id. Remaining is calculated as the difference (but not greater than 0). In real terms the idea here is you place and order for $1000 and $750 of invoices come in. I need to calculate that $250 left on the order (remaining) - broken down into each category (cbs_item_id). Then I need the roll-up of all the remaining values grouped by the cbs_item_id.
So for each cbs_item_id I need group by each order, find the total for the order, find the total invoiced against the order then subtract the two (also can't be negative). It has to be on a per order basis - the overall aggregate difference will not return the expected results.
In the end looking for a result something like this:
+-------------+-----------+
| cbs_item_id | remaining |
+-------------+-----------+
| 1 | 68 |
+-------------+-----------+
| 2 | 297 |
+-------------+-----------+
I am guessing this might be a combination of GROUP BY and perhaps a sub query or even CTE (voodoo to me). My SQL skills are not that great and this is WAY above my pay grade.
Here is a fiddle for the data above:
http://sqlfiddle.com/#!17/2fe3a
Alternate fiddle:
https://dbfiddle.uk/?rdbms=postgres_11&fiddle=e9528042874206477efbe0f0e86326fb
This query produces the result you are looking for:
SELECT cbs_item_id, sum(order_total - invoice_total) AS remaining
FROM (
SELECT cbs_item_id
, COALESCE(e.parent_id, e.id) AS expenditure_id -- ①
, COALESCE(sum(total) FILTER (WHERE e.category = 'order' ), 0) AS order_total -- ②
, COALESCE(sum(total) FILTER (WHERE e.category = 'invoice'), 0) AS invoice_total
FROM expenditures e
JOIN expenditure_items i ON i.expenditure_id = e.id
GROUP BY 1, 2 -- ③
) sub
GROUP BY 1
ORDER BY 1;
db<>fiddle here
① Note how I assume a saner table definition with expenditures.parent_id being integer, and true NULL instead of the string 'nil'. This allows the simple use of COALESCE.
② About the aggregate FILTER clause:
Aggregate columns with additional (distinct) filters
③ Using short syntax with ordinal numbers of an SELECT list items. Example:
Select first row in each GROUP BY group?
can I get the total of all the remaining for all rows or do I need to wrap that into another sub select?
There is a very concise option with GROUPING SETS:
...
GROUP BY GROUPING SETS ((1), ()) -- that's all :)
db<>fiddle here
Related:
Converting rows to columns
I got rather complicated riddle to solve. So far I'm unlocky.
I got 3 tables which I need to join to get the result.
Most important is that I need highest h_id per p_id. h_id is uniqe entry in log history. And I need newest one for given point (p_id -> num).
Apart from that I need ext and name as well.
history
+----------------+---------+--------+
| h_id | p_id | str_id |
+----------------+---------+--------+
| 1 | 1 | 11 |
| 2 | 5 | 15 |
| 3 | 5 | 23 |
| 4 | 1 | 62 |
+----------------+---------+--------+
point
+----------------+---------+
| p_id | num |
+----------------+---------+
| 1 | 4564 |
| 5 | 3453 |
+----------------+---------+
street
+----------------+---------+-------------+
| str_id | ext | name |
+----------------+---------+-------------+
| 15 | | Mein st. 33 | - bad name
| 11 | | eck st. 42 | - bad name
| 62 | abc | Main st. 33 |
| 23 | efg | Back st. 42 |
+----------------+---------+-------------+
EXPECTED RESULT
+----------------+---------+-------------+-----+
| num | ext | name |h_id |
+----------------+---------+-------------+-----+
| 3453 | efg | Back st. 42 | 3 |
| 4564 | abc | Main st. 33 | 4 |
+----------------+---------+-------------+-----+
I'm using Oracle SQL. Tried using query below but result is not true.
SELECT num, max(name), max(ext), MAX(h_id) maxm FROM history
INNER JOIN street on street.str_id = history._str_id
INNER JOIN point on point.p_id = history.p_id
GROUP BY point.num
In Oracle, you can use keep:
SELECT p.num,
MAX(h.h_id) as maxm,
MAX(s.name) KEEP (DENSE_RANK FIRST ORDER BY h.h_id DESC) as name,
MAX(s.ext) KEEP (DENSE_RANK FIRST ORDER BY h.h_id DESC) as ext
FROM history h INNER JOIN
street s
ON s.str_id = h._str_id INNER JOIN
point p
ON p.p_id = h.p_id
GROUP BY p.num;
The keep syntax allows you to do "first()" and "last()" for aggregations.
I'm building a directory of users, where:
each user can have an account on one or more external services, and
each of these accounts can have one or more email addresses.
What I want to know is, how can I aggregate these accounts into single identities through common email addresses?
For example, let's say I have two services, A and B. For each service, I have a table that relates an account to one or more email addresses.
So if service A has these account email addresses:
account_id | email_address
-----------|--------------
1 | a#foo.com
1 | b#foo.com
2 | c#foo.com
and service B has these account email addresses:
account_id | email_address
-----------|--------------
3 | a#foo.com
3 | a#bar.com
4 | d#foo.com
I'd like to create a table that aggregates the email addresses of these accounts into a single user identity:
user_id | email_address
--------|--------------
X | a#foo.com
X | b#foo.com
X | a#bar.com
Y | c#foo.com
Z | d#foo.com
As you can see, account 1 from service A and account 2 from service B have been merged into a common user X, based on the common email address a#foo.com. Here's an animated visual:
The closest answer I could find is this one, and I suspect the solution is a recursive CTE, but given the inputs and engine are different I'm having trouble implementing it.
Clarification: I'm looking for a solution that handles an arbitrary number of services, so perhaps the input table might be better off as:
service_id | account_id | email_address
-----------|------------|--------------
A | 1 | a#foo.com
A | 1 | b#foo.com
A | 2 | c#foo.com
B | 3 | a#foo.com
B | 3 | a#bar.com
B | 4 | d#foo.com
demo1:db<>fiddle, demo2:db<>fiddle
WITH combined AS (
SELECT
a.email as a_email,
b.email as b_email,
array_remove(ARRAY[a.id, b.id], NULL) as ids
FROM
a
FULL OUTER JOIN b ON (a.email = b.email)
), clustered AS (
SELECT DISTINCT
ids
FROM (
SELECT DISTINCT ON (unnest_ids)
*,
unnest(ids) as unnest_ids
FROM combined
ORDER BY unnest_ids, array_length(ids, 1) DESC
) s
)
SELECT DISTINCT
new_id,
unnest(array_cat) as email
FROM (
SELECT
array_cat(
array_agg(a_email) FILTER (WHERE a_email IS NOT NULL),
array_agg(b_email) FILTER (WHERE b_email IS NOT NULL)
),
row_number() OVER () as new_id
FROM combined co
JOIN clustered cl
ON co.ids <# cl.ids
GROUP BY cl.ids
) s
Step by step explanation:
For explanation I'll take this dataset. This is a little bit more complex than yours. It can illustrate my steps better. Some problems don't occur in your smaller set. Think about the characters as variables for email addresses.
Table A:
| id | email |
|----|-------|
| 1 | a |
| 1 | b |
| 2 | c |
| 5 | e |
Table B
| id | email |
|----|-------|
| 3 | a |
| 3 | d |
| 4 | e |
| 4 | f |
| 3 | b |
CTE combined:
JOIN of both tables on same email addresses to get a touch point. IDs of same Ids will be concatenated in one array:
| a_email | b_email | ids |
|-----------|-----------|-----|
| (null) | a#bar.com | 3 |
| a#foo.com | a#foo.com | 1,3 |
| b#foo.com | (null) | 1 |
| c#foo.com | (null) | 2 |
| (null) | d#foo.com | 4 |
CTE clustered (sorry for the names...):
Goal is to get all elements exactly in only one array. In combined you can see, for example currently there are more arrays with the element 4: {5,4} and {4}.
First ordering the rows by the length of their ids arrays because the DISTINCT later should take the longest array (because holding the touch point {5,4} instead of {4}).
Then unnest the ids arrays to get a basis for filtering. This ends in:
| a_email | b_email | ids | unnest_ids |
|---------|---------|-----|------------|
| b | b | 1,3 | 1 |
| a | a | 1,3 | 1 |
| c | (null) | 2 | 2 |
| b | b | 1,3 | 3 |
| a | a | 1,3 | 3 |
| (null) | d | 3 | 3 |
| e | e | 5,4 | 4 |
| (null) | f | 4 | 4 |
| e | e | 5,4 | 5 |
After filtering with DISTINCT ON
| a_email | b_email | ids | unnest_ids |
|---------|---------|-----|------------|
| b | b | 1,3 | 1 |
| c | (null) | 2 | 2 |
| b | b | 1,3 | 3 |
| e | e | 5,4 | 4 |
| e | e | 5,4 | 5 |
We are only interested in the ids column with the generated unique id clusters. So we need all of them only once. This is the job of the last DISTINCT. So CTE clustered results in
| ids |
|-----|
| 2 |
| 1,3 |
| 5,4 |
Now we know which ids are combined and should share their data. Now we join the clustered ids against the origin tables. Since we have done this in the CTE combined we can reuse this part (that's the reason why it is outsourced into a single CTE by the way: We do not need another join of both tables in this step anymore). The JOIN operator <# says: JOIN if the "touch point" array of combined is a subgroup of the id cluster of clustered. This yields in:
| a_email | b_email | ids | ids |
|---------|---------|-----|-----|
| c | (null) | 2 | 2 |
| a | a | 1,3 | 1,3 |
| b | b | 1,3 | 1,3 |
| (null) | d | 3 | 1,3 |
| e | e | 5,4 | 5,4 |
| (null) | f | 4 | 5,4 |
Now we are able to group the email addresses by using the clustered ids (rightmost column).
array_agg aggregates the mails of one column, array_cat concatenates the email arrays of both columns into one big email array.
Since there are columns where email is NULL we can filter these values out before clustering with the FILTER (WHERE...) clause.
Result so far:
| array_cat |
|-----------|
| c |
| a,b,a,b,d |
| e,e,f |
Now we group all email addresses for one single id. We have to generate new unique ids. That's what the window function row_number is for. It simply adds a row count to the table:
| array_cat | new_id |
|-----------|--------|
| c | 1 |
| a,b,a,b,d | 2 |
| e,e,f | 3 |
Last step is to unnest the array to get a row per email address. Since in the array are still some duplicates we can eliminate them in this step with a DISTINCT as well:
| new_id | email |
|--------|-------|
| 1 | c |
| 2 | a |
| 2 | b |
| 2 | d |
| 3 | e |
| 3 | f |
OK, provided you only have two 'services', and assuming that to begin with you are not overly concerned with how to best represent the new key (I've used text as the easiest to hand), then please try the below query. This works for me on Postgres 9.6:
WITH shared_addr AS
(
SELECT foo.account_a, foo.account_b, row_number() OVER (ORDER BY foo.account_a) AS shared_id
FROM (
SELECT
a.account_id as account_a
, b.account_id as account_b
FROM
service_a a
JOIN
service_b b
ON
a.email_address = b.email_address
GROUP BY a.account_id, b.account_id
) foo
)
SELECT
bar.account_id,
bar.email_address
FROM
(
SELECT
'A-' || service_a.account_id::text AS account_id,
service_a.email_address
FROM service_a
LEFT OUTER JOIN
shared_addr
ON
shared_addr.account_a = service_a.account_id
WHERE shared_addr.account_b IS NULL
UNION ALL
SELECT
'B-' ||service_b.account_id::text,
service_b.email_address FROM service_b
LEFT OUTER JOIN
shared_addr
ON
shared_addr.account_b = service_b.account_id
WHERE shared_addr.account_a IS NULL
UNION ALL
(
SELECT
'shared-' || shared_addr.shared_id::text,
service_b.email_address
FROM service_b
JOIN
shared_addr
ON
shared_addr.account_b = service_b.account_id
UNION
SELECT
'shared-' || shared_addr.shared_id::text,
service_a.email_address
FROM service_a
JOIN
shared_addr
ON
shared_addr.account_a = service_a.account_id
)
) bar
;
I have a team of people who are scored on up to three metrics; sales, leads and Hours.
I have a table (tblScores) in MS Access which holds these scores but only if there is any. (e.g if someone had no sales there would be no entry for them for sales)
| USERID | Metric | Score |
----------------------------------
| 20511 | Sales | 12 |
| 20511 | Leads | 9 |
| 20511 | Hours | 8 |
| 20694 | Sales | 10 |
| 20694 | Hours | 7.5 |
I am trying to create an SQL query that will output three records (each possible metric) for each User in the above table including null values where they don't have an entry for that metric. e.g
| USERID | Metric | Score |
----------------------------------
| 20511 | Sales | 12 |
| 20511 | Leads | 9 |
| 20511 | Hours | 8 |
| 20694 | Sales | 10 |
| 20694 | Leads | Null |
| 20694 | Hours | 7.5 |
I have set up another table (tblMetrics) with just these 3 metrics
| Metric |
---------------
| Sales |
| Leads |
| Hours |
and tried to do a left join on the metric table against the score table
SELECT tblMetrics.*, TblScores.UserID, TblScores.Score
FROM tblMetrics LEFT JOIN TblScores ON tblMetrics.Metric = TblScores.Metric;
but it is still not giving the desired output. Does anyone know if this possible?
You need to do a CROSS JOIN first to generate all combinations, then do the LEFT JOIN to find which one are missing and assign NULL
I check access syntaxis and the CROSS JOIN should be write like this
SELECT DISTINCT M.Metric, S.USERID
FROM tblMetric M, tblScore S
And the Left Join should be
SELECT userMetrc.*, S.Score
FROM ( SELECT DISTINCT M.Metric, S.USERID
FROM tblMetric M, tblScore S
) userMetric
LEFT JOIN tblScore S
ON ( userMetric.USERID = S.USERID
AND userMetric.Metric = S.Metric )
I have a Clients table in PostgreSQL (version 9.1.11), and I would like to write a query to filter that table. The query should return only clients which meet one of the following conditions:
--The client's last order (based on orders.created_at) has a fulfill_by_date in the past.
OR
--The client has no orders at all
I've looked for around 2 months, on and off, for a solution.
I've looked at custom last aggregate functions in Postgres, but could not get them to work, and feel there must be a built-in way to do this.
I've also looked at Postgres last_value window functions, but most of the examples are of a single table, not of a query joining multiple tables.
Any help would be greatly appreciated! Here is a sample of what I am going for:
Clients table:
| client_id | client_name |
----------------------------
| 1 | FirstClient |
| 2 | SecondClient |
| 3 | ThirdClient |
Orders table:
| order_id | client_id | fulfill_by_date | created_at |
-------------------------------------------------------
| 1 | 1 | 3000-01-01 | 2013-01-01 |
| 2 | 1 | 1999-01-01 | 2013-01-02 |
| 3 | 2 | 1999-01-01 | 2013-01-01 |
| 4 | 2 | 3000-01-01 | 2013-01-02 |
Desired query result:
| client_id | client_name |
----------------------------
| 1 | FirstClient |
| 3 | ThirdClient |
Try it this way
SELECT c.client_id, c.client_name
FROM clients c LEFT JOIN
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY client_id ORDER BY created_at DESC) rnum
FROM orders
) o
ON c.client_id = o.client_id
AND o.rnum = 1
WHERE o.fulfill_by_date < CURRENT_DATE
OR o.order_id IS NULL
Output:
| CLIENT_ID | CLIENT_NAME |
|-----------|-------------|
| 1 | FirstClient |
| 3 | ThirdClient |
Here is SQLFiddle demo