A better way to aggregate into a default value

A better way to aggregate into a default value - sql

For this example I have three tables (individual, business, and ind_to_business). Individual has information on people. Business has information on businesses. And ind_to_business has information on which people are linked to which business. Here are their DDL:
CREATE TABLE individual
(
ID INTEGER PRIMARY KEY,
NAME VARCHAR2(100) NOT NULL,
ENTERPRISE_ID VARCHAR2(25) NOT NULL UNIQUE
);
CREATE TABLE business
(
ID INTEGER PRIMARY KEY,
NAME VARCHAR2(100) NOT NULL,
ENTERPRISE_ID VARCHAR2(25) NOT NULL UNIQUE
);
CREATE TABLE ind_to_business
(
ID INTEGER PRIMARY KEY,
IND_ID REFERENCES individual(id),
BUS_ID REFERENCES business(id),
START_DT DATE NOT NULL,
END_DT DATE
);
I'm looking for the best way to display one row for each person. If they are linked to one business, I want to display the the business's ENTERPRISE_ID. If they are linked to more than one business, I want to display the default value 'Multiple'. They will always be linked to a business, so there is no LEFT JOIN necessary. They can also be linked to a business more than once (Leaving and coming back). Multiple records for the same business would be aggregated.
So for the following sample data:
Individual:
+----+------------+---------------+
| ID | NAME | ENTERPRISE_ID |
+----+------------+---------------+
| 1 | John Smith | 53a23B7 |
| 2 | Jane Doe | 63f2a35 |
+----+------------+---------------+
Business:
+----+----------+---------------+
| ID | NAME | ENTERPRISE_ID |
+----+----------+---------------+
| 3 | ABC Corp | 2a34d9b |
| 4 | XYZ Inc | 34bf21e |
+----+----------+---------------+
ind_to_business
+----+--------+--------+-------------+-------------+
| ID | IND_ID | BUS_ID | START_DT | END_DT |
+----+--------+--------+-------------+-------------+
| 5 | 1 | 3 | 01-JAN-2000 | 31-DEC-2002 |
| 6 | 1 | 3 | 01-JAN-2015 | |
| 7 | 2 | 3 | 01-JAN-2000 | |
| 8 | 2 | 4 | 01-MAR-2006 | 05-JUN-2010 |
| 9 | 2 | 4 | 15-DEC-2019 | |
+----+--------+--------+-------------+-------------+
I would expect the following output:
+---------+------------+------------+
| IND_ID | NAME | LINKED_BUS |
+---------+------------+------------+
| 53a23B7 | John Smith | 2a34d9b |
| 63f2a35 | Jane Doe | Multiple |
+---------+------------+------------+
Here is my current query:
SELECT DISTINCT
sub.ind_id,
sub.name,
DECODE(sub.bus_count, 1, sub.bus_id, 'Multiple') AS LINKED_BUS
FROM (SELECT i.enterprise_id AS IND_ID,
i.name,
b.enterprise_id AS BUS_ID,
COUNT(DISTINCT b.enterprise_id) OVER (PARTITION BY i.id) AS BUS_COUNT
FROM individual i
INNER JOIN ind_to_business i2b ON i.id = i2b.ind_id
INNER JOIN business b ON i2b.bus_id = b.id) sub;
My query works, but this is running on a large dataset and taking a long time to run. I'm wondering if anyone has any ideas on how improve this so that there isn't so much wasted processing (i.e Needing to do a DISTINCT on the final result or doing COUNT(DISTINCT) in the inline view only to use that value in the DECODE above).
I've also created a DBFiddle for this question. (Link)
Thanks in advance for any input.

You could try and use a correlated subquery. This removes the need for outer distinct:
SELECT
i.enterprise_id ind_id,
i.name,
(
SELECT DECODE(COUNT(DISTINCT b.enterprise_id), 1, MIN(bus_id), 'Multiple')
FROM ind_to_business i2b
INNER JOIN business b ON i2b.bus_id = b.id
WHERE i2b.ind_id = i.id
) linked_bus
FROM individual i

You can join with the aggregated ind_to_business per individual. One way to do this:
select i.id, i.name, coalesce(b.enterprise_id, 'Multiple')
from individual i
join
(
select
ind_id,
case when min(bus_id) = max(bus_id) then min(bus_id) else null end as bus_id
from ind_to_business
group by ind_id
) ib on ib.ind_id = i.id
left join business b on b.id = ib.bus_id
order by i.id;

First you should sub-query to get all needed dimensions and then do all your final aggregation using CASE statement.
select
ind_id,
name,
case
when count(*) > 1 then 'Multiple'
else ind_id
end as linked_bus
from
(
select
distinct i.enterprise_id as ind_id,
i.name,
b.enterprise_id as bus_id
from individual i
join ind_to_business i2b
on i.id = i2b.ind_id
join business b
on i2b.bus_id = b.id
) vals
group by
ind_id,
name
order by
ind_id

No need of using DISTINCT twice. You could use subquery factoring and put the in-line view in WITH clause, and make the data set DISTINCT in the subquery itself.
WITH data AS
(
SELECT distinct
i.enterprise_id AS IND_ID,
i.name,
b.enterprise_id AS BUS_ID
FROM individual i
JOIN ind_to_business i2b ON i.id = i2b.ind_id
JOIN business b ON i2b.bus_id = b.id
)
SELECT ind_id,
name,
case
when count(*) = 1 then MIN(bus_id)
else 'Multiple'
end AS LINKED_BUS
FROM data
GROUP BY ind_id, name;
IND_ID NAME LINKED_BUS
---------- ---------- -------------------------
53a23B7 John Smith 2a34d9b
63f2a35 Jane Doe Multiple

Related

PosgtreSQL join tables on json field

Let's say we have 2 tables-
Employee:
id integer (pk)
name char
code char
Status:
id integer (pk)
key char
data jsnob
Now here is the sample data of above tables:
Employee
+----+--------+------+
| id | name | code |
+----+--------+------+
| 1 | Brian | BR1 |
| 2 | Andrew | AN1 |
| 3 | Anil | AN2 |
| 4 | Kethi | KE1 |
| 5 | Smith | SM1 |
+----+--------+------+
Status
+----+---------+---------------------------------------+
| id | key | data |
+----+---------+---------------------------------------+
| 1 | Admin | {'BR1':true, 'AN1':true,'KE1':false} |
| 2 | Staff | {'SM1':true, 'AN2':true,'KE1':false} |
| 3 | Member | {'AN2':false, 'AN1':true,'KE1':false} |
| 4 | Parking | {'BR1':true, 'AN1':true,'KE1':false, |
| | | 'AN2':true,'SM1':true} |
| 5 | System | {'AN2':false, 'AN1':true,'KE1':true} |
| 6 | Ticket | {'AN2':false, 'AN1':true,'KE1':false} |
+----+---------+---------------------------------------+
Now my goal is to get status and name of failure keys, employee code wise. For ex:-
I am not an expert in sql complex queries, so any help is much appreciate.
Note: Above are just sample tables (name and data changed), but design is similar to original tables.

you can use the function jsonb_each_text yo get key and value from jsonb type, and if we mix with sql and ... voilà, the following query is a example for you case :
select employee.code,
case
when dat2.count is null then 'TRUE'
else
'FALSE'
end as status,
case
when dat2.count is null then 0
else
dat2.count
end as failures, string_agg as key from employee left join
(
select key, count(*), string_agg(code,',') from (
select key code , (jsonb_each_text(data)).key,(jsonb_each_text(data)).value
from status) as dat
where value='false'
group by 1 ) dat2 on (employee.code=dat2.key)

As far as I can tell, you don't need the employee table for this (because you only want the code column which is also present in the JSON values of the status table). It's enough to unnest the JSON value and then aggregate on the code from that.
select d.code,
bool_and(d.flag::boolean) as status_flag,
count(*) filter (where not d.flag::boolean) as failures,
coalesce(string_agg(key, ', ') filter (where not d.flag::boolean), 'N/A') as keys
from status st
join lateral jsonb_each_text(st.data) as d(code, flag) on true
group by d.code
order by d.code;
The filter() option is used to only include rows in the aggregate that comply with the where condition. In this case those where the value for the code is false.
bool_and is an aggregate function for boolean values that returns true if all input values are true (and false otherwise)
Online example: https://rextester.com/PEKCZ52605

Use a left-join query between those tables, and apply jsonb_each_text() function for jsonb type column.
The trick is to use conditionals as case when (js).value = 'false' then .. else .. end for the aggregated columns :
select e.id, e.code,
min(case when (js).value = 'false' then 'FALSE' else 'TRUE' end ) as status,
count(case when (js).value = 'false' then 1 end) as failures,
coalesce(
string_agg(case when (js).value = 'false' then s.key end, ',' ORDER BY s.id),'NA'
) as key
from Employee e
left join
(
select *, jsonb_each_text(data) as js
from Status
) s on e.code = (js).key
group by e.id, e.code
order by e.id;
where (js).value is extracted from jsonb type Status.data column
Demo

Multiple select from CTE with different number of rows in a StoredProcedure

How to do two select with joins from the cte's which returns total number of columns in the two selects?
I tried doing union but that appends to the same list and there is no way to differentiate for further use.
WITH campus AS
(SELECT DISTINCT CampusName, DistrictName
FROM dbo.file
),creditAcceptance AS
(SELECT CampusName, EligibilityStatusFinal, CollegeCreditAcceptedFinal, COUNT(id) AS N
FROM dbo.file
WHERE (EligibilityStatusFinal LIKE 'Eligible%') AND (CollegeCreditEarnedFinal = 'Yes') AND (CollegeCreditAcceptedFinal = 'Yes')
GROUP BY CampusName, EligibilityStatusFinal, CollegeCreditAcceptedFinal
),eligibility AS
(SELECT CampusName, EligibilityStatusFinal, COUNT(id) AS N, CollegeCreditAcceptedFinal
FROM dbo.file
WHERE (EligibilityStatusFinal LIKE 'Eligible%')
GROUP BY CampusName, EligibilityStatusFinal, CollegeCreditAcceptedFinal
)
SELECT a.CampusName, c.[EligibilityStatusFinal], SUM(c.N) AS creditacceptCount
FROM campus as a FULL OUTER JOIN creditAcceptance as c ON a.CampusName=c.CampusName
WHERE (a.DistrictName = 'xy')
group by a.CampusName ,c.EligibilityStatusFinal
Union ALL
SELECT a.CampusName , b.[EligibilityStatusFinal], SUM(b.N) AS eligible
From Campus as a FULL OUTER JOIN eligibility as b ON a.CampusName = b.CampusName
WHERE (a.DistrictName = 'xy')
group by a.CampusName,b.EligibilityStatusFinal
Expected output:
+------------+------------------------+--------------------+
| CampusName | EligibilityStatusFinal | creditacceptCount |
+------------+------------------------+--------------------+
| M | G | 1 |
| E | NULL | NULL |
| A | G | 4 |
| B | G | 8 |
+------------+------------------------+--------------------+
+------------+------------------------+----------+
| CampusName | EligibilityStatusFinal | eligible |
+------------+------------------------+----------+
| A | G | 8 |
| C | G | 9 |
| A | T | 9 |
+------------+------------------------+----------+

As you can see here CTEs can be used in a single statement only, so you can't get the expected output with CTEs.
Here is an excerpt from Microsoft docs:
A CTE must be followed by a single SELECT, INSERT, UPDATE, or DELETE
statement that references some or all the CTE columns. A CTE can also
be specified in a CREATE VIEW statement as part of the defining SELECT
statement of the view.
You can use table variables (declare #campus table(...)) or temp tables (create table #campus (...)) instead.

How do I join to another table and return only the most recent matching row?

I have a table that stores the lines on a contract. Each contract line his it's own unique ID, it also has the ID of its parent contract. Example:
+-------------+---------+
| contract_id | line_id |
+-------------+---------+
| 1111 | 100 |
| 1111 | 101 |
| 1111 | 102 |
+-------------+---------+
I have another table that stores the historical changes to contract lines. For example, every time the number of units on a contract line is changed a new row is added to the table. Example:
+-------------+---------+--------------+-------+
| contract_id | line_id | date_changed | units |
+-------------+---------+--------------+-------+
| 1111 | 100 | 2016-01-01 | 1 |
| 1111 | 100 | 2016-02-01 | 2 |
| 1111 | 100 | 2016-03-01 | 3 |
+-------------+---------+--------------+-------+
As you can see the contract line with ID 100 belonging to the contract with ID 1111 has been edited 3 times over 3 months. The current value is 3 units.
I'm running a query against the contract lines table to select all data. I want to join to the historical data table and select the most recent row for each contract line and show the units in my results. How do I do this?
Expected results (there would single results for 101 and 102 as well):
+-------------+---------+-------+
| contract_id | line_id | units |
+-------------+---------+-------+
| 1111 | 100 | 3 |
+-------------+---------+-------+
I've tried the query below with a left join but it returns 3 rows instead of 1.
Query:
SELECT *, T1.units
FROM contract_lines
LEFT JOIN (
SELECT contract_id, line_id, units, MAX(date_changed) AS maxdate
FROM contract_history
GROUP BY contract_id, line_id, units) AS T1
ON contract_lines.contract_id = T1.contract_id
AND contract_lines.line_id = T1.line_id
Actual results:
+-------------+---------+-------+
| contract_id | line_id | units |
+-------------+---------+-------+
| 1111 | 100 | 1 |
| 1111 | 100 | 2 |
| 1111 | 100 | 3 |
+-------------+---------+-------+

An extra join to contract_history along with maxdate will work
SELECT contract_lines.*,T2.units
FROM contract_lines
LEFT JOIN (
SELECT contract_id, line_id, MAX(date_changed) AS maxdate
FROM contract_history
GROUP BY contract_id, line_id) AS T1
JOIN contract_history T2 ON
T1.contract_id=T2.contract_id and
T1.line_id= T2.line_id and
T1.maxdate=T2.date_changed
ON contract_lines.contract_id = T1.contract_id
AND contract_lines.line_id = T1.line_id
Output

This is my preferred style because it doesn't require self joining and cleanly expresses your intent. Also, it competes very well with the ROW_NUMBER() method in terms of performance.
select a.*
, b.units
from contract_lines as a
join (
select a.contract_id
, a.line_id
, a.units
, Max(a.date_changed) over(partition by a.contract_id, a.line_id) as max_date_changed
from contract_history as a
) as b
on a.contract_id = b.contract_id
and a.line_id = b.line_id
and b.date_changed = b.max_date_changed;

Another possible solution to this. This uses RANK to sort/filter this. Similar to what you did, just a different tact.
SELECT contract_lines.*, T1.units
FROM contract_lines
LEFT JOIN (
SELECT contract_id, line_id, units,
RANK() OVER (PARTITION BY contract_id, line_id ORDER BY date_changed DESC) AS [rank]
FROM contract_history) AS T1
ON contract_lines.contract_id = T1.contract_id
AND contract_lines.line_id = T1.line_id
AND T1.rank = 1
WHERE T1.units IS NOT NULL
You could change this to a INNER JOIN and remove the IS NOT NULL in the WHERE clause if you expect data to be present all the time.
Glad you figured it out!

Try this simple query:
SELECT TOP 1 T1.*
FROM contract_lines T0
INNER JOIN contract_history T1
ON T0.contract_id = T1.contract_id and
T0.line_id = T1.line_id
ORDER BY date_changed DESC

As always seems to be the way after spending an hour looking at it and shouting at StackOverflow for having a rare period of maintenance I solve my own problem not long after posting a question.
In an effort to help anyone else who's stuck I'll show what I found. It might not be an efficient way to achieve this so if someone has a better suggestion I'm all ears.
I adapted the answer from here: T-SQL Subquery Max(Date) and Joins
SELECT *,
Units = (SELECT TOP 1 units
FROM contract_history
WHERE contract_lines.contract_id = contract_history.contract_id
AND contract_lines.line_id = contract_history.line_id
ORDER BY date_changed DESC
)
FROM ....

Count how many times a value appears in tables SQL

Here's the situation:
So, in my database, a person is "responsible" for job X and "linked" to job Y. What I want is a query that returns: name of person, his ID and he number of jobs it's linked/responsible. So far I got this:
select id_job, count(id_job) number_jobs
from
(
select responsible.id
from responsible
union all
select linked.id
from linked
GROUP BY id
) id_job
GROUP BY id_job
And it returns a table with id in the first column and number of occurrences in the second. Now, what I can't do is associate the name of person to the table. When i put that in the "select" from beginning it gives me all the possible combinations... How can I solve this? Thanks in advance!
Example data and desirable output:
| Person |
id | name
1 | John
2 | Francis
3 | Chuck
4 | Anthony
| Responsible |
process_no | id
100 | 2
200 | 2
300 | 1
400 | 4
| Linked |
process_no | id
101 | 4
201 | 1
301 | 1
401 | 2
OUTPUT:
| OUTPUT |
id | name | number_jobs
1 | John | 3
2 | Francis | 3
3 | Chuck | 0
4 | Anthony | 2

Try this way
select prs.id, prs.name, count(*) from Person prs
join(select process_no, id
from Responsible res
Union all
select process_no, id
from Linked lin ) a on a.id=prs.id
group by prs.id, prs.name

I would recommend aggregating each of the tables by the person and then joining the results back to the person table:
select p.*, coalesce(r.cnt, 0) + coalesce(l.cnt, 0) as numjobs
from person p left join
(select id, count(*) as cnt
from responsible
group by id
) r
on r.id = p.id left join
(select id, count(*) as cnt
from linked
group by id
) l
on l.id = p.id;

select id, name, count(process_no) FROM (
select pr.id, pr.name, res.process_no from Person pr
LEFT JOIN Responsible res on pr.id = res.id
UNION
select pr.id, pr.name, lin.process_no from Person pr
LEFT JOIN Linked lin on pr.id = lin.id) src
group by id, name
order by id
Query ain't tested, give it a shot, but this is the way you want to go

Mysql4: SQL for selecting one or zero record

Table layout:
CREATE TABLE t_order (id INT, custId INT, order DATE)
I'm looking for a SQL command to select a maximum of one row per order (the customer who owns the order is identified by a field named custId).
I want to select ONE of the customer's orders (doesn't matter which one, say sorted by id) if there is no order date given for any of the rows.
I want to retrieve an empty Resultset for the customerId, if there is already a record with given order date.
Here is an example. Per customer there should be one order at most (one without a date given). Orders that have already a date value should not appear at all.
+---------------------------------------------------------+
|id | custId | date |
+---------------------------------------------------------+
| 1 10 NULL |
| 2 11 2008-11-11 |
| 3 12 2008-10-23 |
| 4 11 NULL |
| 5 13 NULL |
| 6 13 NULL |
+---------------------------------------------------------+
|
|
| Result
\ | /
\ /
+---------------------------------------------------------+
|id | custId | date |
+---------------------------------------------------------+
| 1 10 NULL |
| |
| |
| |
| 5 13 NULL |
| |
+---------------------------------------------------------+
powered be JavE
Edit:
I've choosen glavić's answer as the correct one, because it provides
the correct result with slightly modified data:
+---------------------------------------------------------+
|id | custId | date |
+---------------------------------------------------------+
| 1 10 NULL |
| 2 11 2008-11-11 |
| 3 12 2008-10-23 |
| 4 11 NULL |
| 5 13 NULL |
| 6 13 NULL |
| 7 11 NULL |
+---------------------------------------------------------+
Sfossen's answer will not work when customers appear more than twice because of its where clause constraint a.id != b.id.
Quassnoi's answer does not work for me, as I run server version 4.0.24 which yields the following error:
alt text http://img25.imageshack.us/img25/8186/picture1vyj.png

For a specific customer it's:
SELECT *
FROM t_order
WHERE date IS NULL AND custId=? LIMIT 1
For all customers its:
SELECT a.*
FROM t_order a
LEFT JOIN t_order b ON a.custId=b.custID and a.id != b.id
WHERE a.date IS NULL AND b.date IS NULL
GROUP BY custId;

Try this:
SELECT to1.*
FROM t_order AS to1
WHERE
to1.date IS NULL AND
to1.custId NOT IN (
SELECT to2.custId
FROM t_order AS to2
WHERE to2.date IS NOT NULL
GROUP BY to2.custId
)
GROUP BY to1.custId
For MySQL 4:
SELECT to1.*
FROM t_order AS to1
LEFT JOIN t_order AS to2 ON
to2.custId = to1.custId AND
to2.date IS NOT NULL
WHERE
to1.date IS NULL AND
to2.id IS NULL
GROUP BY to1.custId

This query will use one pass over index on custId.
For each distinct custId it will use one subquery over same index.
No GROUP BY, no TEMPORARY and no FILESORT — efficient, if your table is large.
SELECT VERSION()
--------
'4.1.22-standard'
CREATE INDEX ix_order_cust_id ON t_order(custId)
SELECT id, custId, order_date
FROM (
SELECT o.*,
CASE
WHEN custId <> #c THEN
(
SELECT 1
FROM t_order oi
WHERE oi.custId = o.custId
AND order_date IS NOT NULL
LIMIT 1
)
END AS n,
#c <> custId AS f,
#c := custId
FROM
(
SELECT #c := -1
) r,
t_order o
ORDER BY custId
) oo
WHERE n IS NULL AND f
---------
1, 10, ''
5, 13, ''

First filter out rows with dates, then filter out any row that has a similar row with a lower id. This should work because the matching record with the least id is unique if id is unique.
select * from t_order o1
where date is null
and not exists (select * from t_order o2
where o2.date is null
and o1.custId = o2.custId
and o1.id > o2.id)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

A better way to aggregate into a default value - sql

Related

PosgtreSQL join tables on json field

Multiple select from CTE with different number of rows in a StoredProcedure

How do I join to another table and return only the most recent matching row?

Count how many times a value appears in tables SQL

Mysql4: SQL for selecting one or zero record

Categories

Resources