Remove Duplicate Result on Query - sql

could help me solve this duplication problem where it returns more than 1 result for the same record I want to bring only 1 result for each id, and only the last history of each record.
My Query:
SELECT DISTINCT ON(tickets.ticket_id,ticket_histories.created_at)
ticket.id AS ticket_id,
tickets.priority,
tickets.title,
tickets.company,
tickets.ticket_statuse,
tickets.created_at AS created_ticket,
group_user.id AS group_id,
group_user.name AS user_group,
ch_history.description AS ch_description,
ch_history.created_at AS ch_history
FROM
tickets
INNER JOIN company ON (company.id = tickets.company_id)
INNER JOIN (SELECT id,
tickets_id,
description,
user_id,
MAX(tickets.created_at) AS created_ticket
FROM
ch_history
GROUP BY id,
created_at,
ticket_id,
user_id,
description
ORDER BY created_at DESC LIMIT 1) AS ch_history ON (ch_history.ticket_id = ticket.id)
INNER JOIN users ON (users.id = ch_history.user_id)
INNER JOIN group_users ON (group_users.id = users.group_user_id)
WHERE company = 15
GROUP BY
tickets.id,
ch_history.created_at DESC;
Result of my query, but returns 3 or 5 identical ids with different histories
I want to return only 1 id of each ticket, and only the last recorded history of each tick
ticket_id | priority | title | company_id | ticket_statuse | created_ticket | company | user_group | group_id | ch_description | ch_history
-----------+------------+--------------------------------------+------------+-----------------+----------------------------+------------------------------------------------------+-----------------+----------+------------------------+----------------------------
49713 | 2 | REMOVE DATA | 1 | t | 2019-12-09 17:50:35.724485 | SAME COMPANY | people | 5 | TEST 1 | 2019-12-10 09:31:45.780667
49706 | 2 | INCLUDE DATA | 1 | f | 2019-12-09 09:16:35.320708 | SAME COMPANY | people | 5 | TEST 2 | 2019-12-10 09:38:52.769515
49706 | 2 | ANY TITLE | 1 | f | 2019-12-09 09:16:35.320708 | SAME COMPANY | people | 5 | TEST 3 | 2019-12-10 09:39:22.779473
49706 | 2 | NOTING ELSE MAT | 1 | f | 2019-12-09 09:16:35.320708 | SAME COMPANY | people | 5 | TESTE 4 | 2019-12-10 09:42:59.50332
49706 | 2 | WHITESTRIPES | 1 | f | 2019-12-09 09:16:35.320708 | SAME COMPANY | people | 5 | TEST 5 | 2019-12-10 09:44:30.675434
wanted to return as below
ticket_id | priority | title | company_id | ticket_statuse | created_ticket | company | user_group | group_id | ch_description | ch_history
-----------+------------+--------------------------------------+------------+-----------------+----------------------------+------------------------------------------------------+-----------------+----------+------------------------+----------------------------
49713 | 2 | REMOVE DATA | 1 | t | 2019-12-09 17:50:10.724485 | SAME COMPANY | people | 5 | TEST 1 | 2020-01-01 18:31:45.780667
49707 | 2 | INCLUDE DATA | 1 | f | 2019-12-11 19:22:21.320701 | SAME COMPANY | people | 5 | TEST 2 | 2020-02-05 16:38:52.769515
49708 | 2 | ANY TITLE | 1 | f | 2019-12-15 07:15:57.320950 | SAME COMPANY | people | 5 | TEST 3 | 2020-02-06 07:39:22.779473
49709 | 2 | NOTING ELSE MAT | 1 | f | 2019-12-16 08:30:28.320881 | SAME COMPANY | people | 5 | TESTE 4 | 2020-01-07 11:42:59.50332
49701 | 2 | WHITESTRIPES | 1 | f | 2019-12-21 11:04:00.320450 | SAME COMPANY | people | 5 | TEST 5 | 2020-01-04 10:44:30.675434
I wanted to return as shown below, see that the field ch_description, and ch_history bring only the most recent records and only the last of each ticket listed, without duplication I wanted to bring this way could help me.

Two things jump out at me:
You have listed "created at" as part of your "distinct on," which is going to inherently give you multiple rows per ticket id (unless there happens to be only one)
The distinct on should make the subquery on the ticket history unnecessary... and even if you chose to do it this way, you again are going on the "created at" column, which will give you multiple results. The ideal subquery, should you choose this approach, would have been to group by ticket_id and only ticket_id.
Slightly related:
An alternative approach to the subquery would be an analytic function (windowing function), but I'll save that for another day.
I think the query you want, which will give you one row per ticket_id, based on the history table's created_at field would be something like this:
select distinct on (t.id)
<your fields here>
from
tickets t
join company c on t.company_id = c.id
join ch_history ch on ch.ticket_id = t.id
join users u on ch.user_id = u.ud
join group_users g on u.group_user_id = g.id
where
company = 15
order by
t.id, ch.created_at -- this is what tells distinct on which record to choose

Related

Complex nested aggregations to get order totals

I have a system to track orders and related expenditures. This is a Rails app running on PostgreSQL. 99% of my app gets by with plain old Rails Active Record call etc. This one is ugly.
The expenditures table look like this:
+----+----------+-----------+------------------------+
| id | category | parent_id | note |
+----+----------+-----------+------------------------+
| 1 | order | nil | order with no invoices |
+----+----------+-----------+------------------------+
| 2 | order | nil | order with invoices |
+----+----------+-----------+------------------------+
| 3 | invoice | 2 | invoice for order 2 |
+----+----------+-----------+------------------------+
| 4 | invoice | 2 | invoice for order 2 |
+----+----------+-----------+------------------------+
Each expenditure has many expenditure_items and can the orders can be parents to the invoices. That table looks like this:
+----+----------------+-------------+-------+---------+
| id | expenditure_id | cbs_item_id | total | note |
+----+----------------+-------------+-------+---------+
| 1 | 1 | 1 | 5 | Fuit |
+----+----------------+-------------+-------+---------+
| 2 | 1 | 2 | 15 | Veggies |
+----+----------------+-------------+-------+---------+
| 3 | 2 | 1 | 123 | Fuit |
+----+----------------+-------------+-------+---------+
| 4 | 2 | 2 | 456 | Veggies |
+----+----------------+-------------+-------+---------+
| 5 | 3 | 1 | 34 | Fuit |
+----+----------------+-------------+-------+---------+
| 6 | 3 | 2 | 76 | Veggies |
+----+----------------+-------------+-------+---------+
| 7 | 4 | 1 | 26 | Fuit |
+----+----------------+-------------+-------+---------+
| 8 | 4 | 2 | 98 | Veggies |
+----+----------------+-------------+-------+---------+
I need to track a few things:
amounts left to be invoiced on orders (thats easy)
above but rolled up for each cbs_item_id (this is the ugly part)
The cbs_item_id is basically an accounting code to categorize the money spent etc. I have visualized what my end result would look like:
+-------------+----------------+-------------+---------------------------+-----------+
| cbs_item_id | expenditure_id | order_total | invoice_total | remaining |
+-------------+----------------+-------------+---------------------------+-----------+
| 1 | 1 | 5 | 0 | 5 |
+-------------+----------------+-------------+---------------------------+-----------+
| 1 | 2 | 123 | 60 | 63 |
+-------------+----------------+-------------+---------------------------+-----------+
| | | | Rollup for cbs_item_id: 1 | 68 |
+-------------+----------------+-------------+---------------------------+-----------+
| 2 | 1 | 15 | 0 | 15 |
+-------------+----------------+-------------+---------------------------+-----------+
| 2 | 2 | 456 | 174 | 282 |
+-------------+----------------+-------------+---------------------------+-----------+
| | | | Rollup for cbs_item_id: 2 | 297 |
+-------------+----------------+-------------+---------------------------+-----------+
order_total is the sum of total for all the expenditure_items of the given order ( category = 'order'). invoice_total is the sum of total for all the expenditure_items with parent_id = expenditures.id. Remaining is calculated as the difference (but not greater than 0). In real terms the idea here is you place and order for $1000 and $750 of invoices come in. I need to calculate that $250 left on the order (remaining) - broken down into each category (cbs_item_id). Then I need the roll-up of all the remaining values grouped by the cbs_item_id.
So for each cbs_item_id I need group by each order, find the total for the order, find the total invoiced against the order then subtract the two (also can't be negative). It has to be on a per order basis - the overall aggregate difference will not return the expected results.
In the end looking for a result something like this:
+-------------+-----------+
| cbs_item_id | remaining |
+-------------+-----------+
| 1 | 68 |
+-------------+-----------+
| 2 | 297 |
+-------------+-----------+
I am guessing this might be a combination of GROUP BY and perhaps a sub query or even CTE (voodoo to me). My SQL skills are not that great and this is WAY above my pay grade.
Here is a fiddle for the data above:
http://sqlfiddle.com/#!17/2fe3a
Alternate fiddle:
https://dbfiddle.uk/?rdbms=postgres_11&fiddle=e9528042874206477efbe0f0e86326fb
This query produces the result you are looking for:
SELECT cbs_item_id, sum(order_total - invoice_total) AS remaining
FROM (
SELECT cbs_item_id
, COALESCE(e.parent_id, e.id) AS expenditure_id -- ①
, COALESCE(sum(total) FILTER (WHERE e.category = 'order' ), 0) AS order_total -- ②
, COALESCE(sum(total) FILTER (WHERE e.category = 'invoice'), 0) AS invoice_total
FROM expenditures e
JOIN expenditure_items i ON i.expenditure_id = e.id
GROUP BY 1, 2 -- ③
) sub
GROUP BY 1
ORDER BY 1;
db<>fiddle here
① Note how I assume a saner table definition with expenditures.parent_id being integer, and true NULL instead of the string 'nil'. This allows the simple use of COALESCE.
② About the aggregate FILTER clause:
Aggregate columns with additional (distinct) filters
③ Using short syntax with ordinal numbers of an SELECT list items. Example:
Select first row in each GROUP BY group?
can I get the total of all the remaining for all rows or do I need to wrap that into another sub select?
There is a very concise option with GROUPING SETS:
...
GROUP BY GROUPING SETS ((1), ()) -- that's all :)
db<>fiddle here
Related:
Converting rows to columns

SQL - joining 3 tables and choosing newest logged entry per id

I got rather complicated riddle to solve. So far I'm unlocky.
I got 3 tables which I need to join to get the result.
Most important is that I need highest h_id per p_id. h_id is uniqe entry in log history. And I need newest one for given point (p_id -> num).
Apart from that I need ext and name as well.
history
+----------------+---------+--------+
| h_id | p_id | str_id |
+----------------+---------+--------+
| 1 | 1 | 11 |
| 2 | 5 | 15 |
| 3 | 5 | 23 |
| 4 | 1 | 62 |
+----------------+---------+--------+
point
+----------------+---------+
| p_id | num |
+----------------+---------+
| 1 | 4564 |
| 5 | 3453 |
+----------------+---------+
street
+----------------+---------+-------------+
| str_id | ext | name |
+----------------+---------+-------------+
| 15 | | Mein st. 33 | - bad name
| 11 | | eck st. 42 | - bad name
| 62 | abc | Main st. 33 |
| 23 | efg | Back st. 42 |
+----------------+---------+-------------+
EXPECTED RESULT
+----------------+---------+-------------+-----+
| num | ext | name |h_id |
+----------------+---------+-------------+-----+
| 3453 | efg | Back st. 42 | 3 |
| 4564 | abc | Main st. 33 | 4 |
+----------------+---------+-------------+-----+
I'm using Oracle SQL. Tried using query below but result is not true.
SELECT num, max(name), max(ext), MAX(h_id) maxm FROM history
INNER JOIN street on street.str_id = history._str_id
INNER JOIN point on point.p_id = history.p_id
GROUP BY point.num
In Oracle, you can use keep:
SELECT p.num,
MAX(h.h_id) as maxm,
MAX(s.name) KEEP (DENSE_RANK FIRST ORDER BY h.h_id DESC) as name,
MAX(s.ext) KEEP (DENSE_RANK FIRST ORDER BY h.h_id DESC) as ext
FROM history h INNER JOIN
street s
ON s.str_id = h._str_id INNER JOIN
point p
ON p.p_id = h.p_id
GROUP BY p.num;
The keep syntax allows you to do "first()" and "last()" for aggregations.

Postgres: Aggregate accounts into a single identity by common email address

I'm building a directory of users, where:
each user can have an account on one or more external services, and
each of these accounts can have one or more email addresses.
What I want to know is, how can I aggregate these accounts into single identities through common email addresses?
For example, let's say I have two services, A and B. For each service, I have a table that relates an account to one or more email addresses.
So if service A has these account email addresses:
account_id | email_address
-----------|--------------
1 | a#foo.com
1 | b#foo.com
2 | c#foo.com
and service B has these account email addresses:
account_id | email_address
-----------|--------------
3 | a#foo.com
3 | a#bar.com
4 | d#foo.com
I'd like to create a table that aggregates the email addresses of these accounts into a single user identity:
user_id | email_address
--------|--------------
X | a#foo.com
X | b#foo.com
X | a#bar.com
Y | c#foo.com
Z | d#foo.com
As you can see, account 1 from service A and account 2 from service B have been merged into a common user X, based on the common email address a#foo.com. Here's an animated visual:
The closest answer I could find is this one, and I suspect the solution is a recursive CTE, but given the inputs and engine are different I'm having trouble implementing it.
Clarification: I'm looking for a solution that handles an arbitrary number of services, so perhaps the input table might be better off as:
service_id | account_id | email_address
-----------|------------|--------------
A | 1 | a#foo.com
A | 1 | b#foo.com
A | 2 | c#foo.com
B | 3 | a#foo.com
B | 3 | a#bar.com
B | 4 | d#foo.com
demo1:db<>fiddle, demo2:db<>fiddle
WITH combined AS (
SELECT
a.email as a_email,
b.email as b_email,
array_remove(ARRAY[a.id, b.id], NULL) as ids
FROM
a
FULL OUTER JOIN b ON (a.email = b.email)
), clustered AS (
SELECT DISTINCT
ids
FROM (
SELECT DISTINCT ON (unnest_ids)
*,
unnest(ids) as unnest_ids
FROM combined
ORDER BY unnest_ids, array_length(ids, 1) DESC
) s
)
SELECT DISTINCT
new_id,
unnest(array_cat) as email
FROM (
SELECT
array_cat(
array_agg(a_email) FILTER (WHERE a_email IS NOT NULL),
array_agg(b_email) FILTER (WHERE b_email IS NOT NULL)
),
row_number() OVER () as new_id
FROM combined co
JOIN clustered cl
ON co.ids <# cl.ids
GROUP BY cl.ids
) s
Step by step explanation:
For explanation I'll take this dataset. This is a little bit more complex than yours. It can illustrate my steps better. Some problems don't occur in your smaller set. Think about the characters as variables for email addresses.
Table A:
| id | email |
|----|-------|
| 1 | a |
| 1 | b |
| 2 | c |
| 5 | e |
Table B
| id | email |
|----|-------|
| 3 | a |
| 3 | d |
| 4 | e |
| 4 | f |
| 3 | b |
CTE combined:
JOIN of both tables on same email addresses to get a touch point. IDs of same Ids will be concatenated in one array:
| a_email | b_email | ids |
|-----------|-----------|-----|
| (null) | a#bar.com | 3 |
| a#foo.com | a#foo.com | 1,3 |
| b#foo.com | (null) | 1 |
| c#foo.com | (null) | 2 |
| (null) | d#foo.com | 4 |
CTE clustered (sorry for the names...):
Goal is to get all elements exactly in only one array. In combined you can see, for example currently there are more arrays with the element 4: {5,4} and {4}.
First ordering the rows by the length of their ids arrays because the DISTINCT later should take the longest array (because holding the touch point {5,4} instead of {4}).
Then unnest the ids arrays to get a basis for filtering. This ends in:
| a_email | b_email | ids | unnest_ids |
|---------|---------|-----|------------|
| b | b | 1,3 | 1 |
| a | a | 1,3 | 1 |
| c | (null) | 2 | 2 |
| b | b | 1,3 | 3 |
| a | a | 1,3 | 3 |
| (null) | d | 3 | 3 |
| e | e | 5,4 | 4 |
| (null) | f | 4 | 4 |
| e | e | 5,4 | 5 |
After filtering with DISTINCT ON
| a_email | b_email | ids | unnest_ids |
|---------|---------|-----|------------|
| b | b | 1,3 | 1 |
| c | (null) | 2 | 2 |
| b | b | 1,3 | 3 |
| e | e | 5,4 | 4 |
| e | e | 5,4 | 5 |
We are only interested in the ids column with the generated unique id clusters. So we need all of them only once. This is the job of the last DISTINCT. So CTE clustered results in
| ids |
|-----|
| 2 |
| 1,3 |
| 5,4 |
Now we know which ids are combined and should share their data. Now we join the clustered ids against the origin tables. Since we have done this in the CTE combined we can reuse this part (that's the reason why it is outsourced into a single CTE by the way: We do not need another join of both tables in this step anymore). The JOIN operator <# says: JOIN if the "touch point" array of combined is a subgroup of the id cluster of clustered. This yields in:
| a_email | b_email | ids | ids |
|---------|---------|-----|-----|
| c | (null) | 2 | 2 |
| a | a | 1,3 | 1,3 |
| b | b | 1,3 | 1,3 |
| (null) | d | 3 | 1,3 |
| e | e | 5,4 | 5,4 |
| (null) | f | 4 | 5,4 |
Now we are able to group the email addresses by using the clustered ids (rightmost column).
array_agg aggregates the mails of one column, array_cat concatenates the email arrays of both columns into one big email array.
Since there are columns where email is NULL we can filter these values out before clustering with the FILTER (WHERE...) clause.
Result so far:
| array_cat |
|-----------|
| c |
| a,b,a,b,d |
| e,e,f |
Now we group all email addresses for one single id. We have to generate new unique ids. That's what the window function row_number is for. It simply adds a row count to the table:
| array_cat | new_id |
|-----------|--------|
| c | 1 |
| a,b,a,b,d | 2 |
| e,e,f | 3 |
Last step is to unnest the array to get a row per email address. Since in the array are still some duplicates we can eliminate them in this step with a DISTINCT as well:
| new_id | email |
|--------|-------|
| 1 | c |
| 2 | a |
| 2 | b |
| 2 | d |
| 3 | e |
| 3 | f |
OK, provided you only have two 'services', and assuming that to begin with you are not overly concerned with how to best represent the new key (I've used text as the easiest to hand), then please try the below query. This works for me on Postgres 9.6:
WITH shared_addr AS
(
SELECT foo.account_a, foo.account_b, row_number() OVER (ORDER BY foo.account_a) AS shared_id
FROM (
SELECT
a.account_id as account_a
, b.account_id as account_b
FROM
service_a a
JOIN
service_b b
ON
a.email_address = b.email_address
GROUP BY a.account_id, b.account_id
) foo
)
SELECT
bar.account_id,
bar.email_address
FROM
(
SELECT
'A-' || service_a.account_id::text AS account_id,
service_a.email_address
FROM service_a
LEFT OUTER JOIN
shared_addr
ON
shared_addr.account_a = service_a.account_id
WHERE shared_addr.account_b IS NULL
UNION ALL
SELECT
'B-' ||service_b.account_id::text,
service_b.email_address FROM service_b
LEFT OUTER JOIN
shared_addr
ON
shared_addr.account_b = service_b.account_id
WHERE shared_addr.account_a IS NULL
UNION ALL
(
SELECT
'shared-' || shared_addr.shared_id::text,
service_b.email_address
FROM service_b
JOIN
shared_addr
ON
shared_addr.account_b = service_b.account_id
UNION
SELECT
'shared-' || shared_addr.shared_id::text,
service_a.email_address
FROM service_a
JOIN
shared_addr
ON
shared_addr.account_a = service_a.account_id
)
) bar
;

Repeat all rows in left table for each unique ID in other table

I have a team of people who are scored on up to three metrics; sales, leads and Hours.
I have a table (tblScores) in MS Access which holds these scores but only if there is any. (e.g if someone had no sales there would be no entry for them for sales)
| USERID | Metric | Score |
----------------------------------
| 20511 | Sales | 12 |
| 20511 | Leads | 9 |
| 20511 | Hours | 8 |
| 20694 | Sales | 10 |
| 20694 | Hours | 7.5 |
I am trying to create an SQL query that will output three records (each possible metric) for each User in the above table including null values where they don't have an entry for that metric. e.g
| USERID | Metric | Score |
----------------------------------
| 20511 | Sales | 12 |
| 20511 | Leads | 9 |
| 20511 | Hours | 8 |
| 20694 | Sales | 10 |
| 20694 | Leads | Null |
| 20694 | Hours | 7.5 |
I have set up another table (tblMetrics) with just these 3 metrics
| Metric |
---------------
| Sales |
| Leads |
| Hours |
and tried to do a left join on the metric table against the score table
SELECT tblMetrics.*, TblScores.UserID, TblScores.Score
FROM tblMetrics LEFT JOIN TblScores ON tblMetrics.Metric = TblScores.Metric;
but it is still not giving the desired output. Does anyone know if this possible?
You need to do a CROSS JOIN first to generate all combinations, then do the LEFT JOIN to find which one are missing and assign NULL
I check access syntaxis and the CROSS JOIN should be write like this
SELECT DISTINCT M.Metric, S.USERID
FROM tblMetric M, tblScore S
And the Left Join should be
SELECT userMetrc.*, S.Score
FROM ( SELECT DISTINCT M.Metric, S.USERID
FROM tblMetric M, tblScore S
) userMetric
LEFT JOIN tblScore S
ON ( userMetric.USERID = S.USERID
AND userMetric.Metric = S.Metric )

Filter by value in last row of LEFT OUTER JOIN table

I have a Clients table in PostgreSQL (version 9.1.11), and I would like to write a query to filter that table. The query should return only clients which meet one of the following conditions:
--The client's last order (based on orders.created_at) has a fulfill_by_date in the past.
OR
--The client has no orders at all
I've looked for around 2 months, on and off, for a solution.
I've looked at custom last aggregate functions in Postgres, but could not get them to work, and feel there must be a built-in way to do this.
I've also looked at Postgres last_value window functions, but most of the examples are of a single table, not of a query joining multiple tables.
Any help would be greatly appreciated! Here is a sample of what I am going for:
Clients table:
| client_id | client_name |
----------------------------
| 1 | FirstClient |
| 2 | SecondClient |
| 3 | ThirdClient |
Orders table:
| order_id | client_id | fulfill_by_date | created_at |
-------------------------------------------------------
| 1 | 1 | 3000-01-01 | 2013-01-01 |
| 2 | 1 | 1999-01-01 | 2013-01-02 |
| 3 | 2 | 1999-01-01 | 2013-01-01 |
| 4 | 2 | 3000-01-01 | 2013-01-02 |
Desired query result:
| client_id | client_name |
----------------------------
| 1 | FirstClient |
| 3 | ThirdClient |
Try it this way
SELECT c.client_id, c.client_name
FROM clients c LEFT JOIN
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY client_id ORDER BY created_at DESC) rnum
FROM orders
) o
ON c.client_id = o.client_id
AND o.rnum = 1
WHERE o.fulfill_by_date < CURRENT_DATE
OR o.order_id IS NULL
Output:
| CLIENT_ID | CLIENT_NAME |
|-----------|-------------|
| 1 | FirstClient |
| 3 | ThirdClient |
Here is SQLFiddle demo