An SQL query that combines aggregate and non-aggregate values in one row - sql

The following query gives me the information that I need but I want it to take it just a step further. In the table at the bottom (only showing a subset of the fields), I want to group by cust_line in an unusual way (at least to me it's unusual).
Let's look at the items with a cust_line of 2 as an example. I would like these to be represented by one line not 5. For this line, I would like to select all the fields except for the price field where the cust_part = "GROUPINVC". For the total field I would like it to be 'sum(total) as new_total' and for the price, I would like it to be new_total / qty_invoiced, where qty_invoiced is the value on the line where cust_part = "GROUPINV".
Is what I am asking for completely ridiculous? Is it even possible? I'm not advanced at SQL so it may also be easy and I just don't know how to approach it. I thought of using 'partition by' but I couldn't imagine how I would get it to work as I figured it would still return 5 rows where I only want 1.
I've also looked at these questions with similar titles but not really what I am looking for:
SQL query that returns aggregate AND non aggregate results
Combined aggregated and non-aggregate query in SQL
SELECT L.CUST_LINE, I.LINE_NO, I.ORDER_NO, I.STAGE, I.ORDER_LINE_POS, I.CUST_PART,
I.LINE_ITEM_NO, I.QTY_INVOICED, I.CUST_DESC, I.DESCRIPTION, I.SALE_UNIT_PRICE, I.PRICE_TOTAL,
I.INVOICE_NO, I.CUSTOMER_PO_NO, I.ORDER_NO, I.CUSTOMER_NO, I.CATALOG_DESC, I.ORDER_LINE_NOTES
FROM
(SELECT CUST_LINE, ORDER_NO, LINE_NO
FROM CUSTOMER_ORDER_LINE
GROUP BY CUST_LINE, ORDER_NO, LINE_NO
) L
INNER JOIN CUSTOMER_ORDER_IVC_REP I
ON I.ORDER_NO = L.ORDER_NO
WHERE RESULT_KEY = 999999
AND I.LINE_NO = L.LINE_NO
ORDER BY L.CUST_LINE;
| cust_line | line_no | cust_part | qty_invoiced | cust_desc | price | total |
| 1 | 4 | ... | 1 | ... | 55 | 55 |
| 2 | 1 | GROUPINV | 1 | some part | 0 | 0 |
| 2 | 6 | ... | 3 | ... | 0 | 0 |
| 2 | 2 | ... | 1 | ... | 0 | 0 |
| 2 | 3 | ... | 1 | ... | 0 | 0 |
| 2 | 7 | ... | 2 | ... | 10 | 20 |
| 3 | 7 | ... | 1 | ... | 67 | 67 |

You can use an analytic function to calculate a total over multiple rows of a result set, then filter out the rows you don't want.
Leaving out all the extra columns for sake of brevity:
SELECT cust_line, qty_invoiced, order_total/qty_invoiced AS price
FROM (
SELECT l.cust_line, qty_invoiced,
SUM(total) OVER (PARTITION BY l.cust_line) AS order_total,
COUNT(cust_line) OVER (PARTITION BY l.cust_line) AS group_count
FROM
(SELECT CUST_LINE, ORDER_NO, LINE_NO
FROM CUSTOMER_ORDER_LINE
GROUP BY CUST_LINE, ORDER_NO, LINE_NO
) L
INNER JOIN CUSTOMER_ORDER_IVC_REP I
ON I.ORDER_NO = L.ORDER_NO
WHERE RESULT_KEY = 999999
AND I.LINE_NO = L.LINE_NO
)
WHERE ( cust_part = 'GROUPINV' OR group_count = 1 )
ORDER BY cust_line
I am guessing on what you want in the PARTITION BY clause; this is essentially a GROUP BY that applies only to the SUM function. Not sure if you might also want order_no in the partition.
The trick is to select all the rows in the inner query, applying SUM across them all; then filter out the rows you are not interested in in the outermost query.

Related

Can I generate a map that shows a particular row was in a particular group in SQLite?

Say I have the following data:
+--------+-------+
| Group | Data |
+--------+-------+
| 1 | row 1 |
| 1 | row 2 |
| 1 | row 3 |
| 20 | row 1 |
| 20 | row 3 |
| 10 | row 1 |
| 10 | row A |
| 10 | row 2 |
| 10 | row 3 |
+--------+-------+
Is it possible to draw a map that shows which groups have which rows? Groups may not be contagious, so they can be placed into a separate table and use the row index for the string index instead. Something like this:
+-------+
| Group |
+-------+
| 1 |
| 20 |
| 10 |
+-------+
+-------+----------------+
| Data | Found in group |
+-------+----------------+
| row 1 | 111 |
| row A | 1 |
| row 2 | 1 1 |
| row 3 | 111 |
+-------+----------------+
Where the first character represents Group 1, the 2nd is Group 20 and the 3rd is Group 10.
Ordering of the Group rows isn't critical so long as I can reference which row goes with which character.
I only ask this because I saw this crazy example in the documentation generating a fractal, but I can't quite get my head around it.
Is this doable?
To find the missing values, first thing is to prepare a dataset which have all possible combination. You can achieve that using CROSS JOIN.
Once you have that DataSet, compare it with the actual DataSet.
Considering the Order by is done in the Grp column, you can achieve it using below.
SELECT
a.Data,group_concat(case when base.Grp is null then "." else "1" end,'') as Found_In_Group
,group_concat(b.Grp) as Group_Order
FROM
(SELECT Data FROM yourtable Group By Data)a
CROSS JOIN
(SELECT Grp FROM yourtable Group By Grp Order by Grp)b
LEFT JOIN yourtable base
ON b.Grp=base.Grp
AND a.Data=base.Data
GROUP BY a.Data
Note: Considered . instead of blank for better visibility to represent missing Group.
Data
Found_In_Group
Group_Order
row 1
111
1,10,20
row 2
11.
1,10,20
row 3
111
1,10,20
row A
.1.
1,10,20
Demo: Try here
SELECT Data, group_concat("Group") AS "Found in group"
FROM yourtable
GROUP BY Data
will give you a CSV list of groups.

Oracle SQL: Counting how often an attribute occurs for a given entry and choosing the attribute with the maximum number of occurs

I have a table that has a number column and an attribute column like this:
1.
+-----+-----+
| num | att |
-------------
| 1 | a |
| 1 | b |
| 1 | a |
| 2 | a |
| 2 | b |
| 2 | b |
+------------
I want to make the number unique, and the attribute to be whichever attribute occured most often for that number, like this (This is the end-product im interrested in) :
2.
+-----+-----+
| num | att |
-------------
| 1 | a |
| 2 | b |
+------------
I have been working on this for a while and managed to write myself a query that looks up how many times an attribute occurs for a given number like this:
3.
+-----+-----+-----+
| num | att |count|
------------------+
| 1 | a | 1 |
| 1 | b | 2 |
| 2 | a | 1 |
| 2 | b | 2 |
+-----------------+
But I can't think of a way to only select those rows from the above table where the count is the highest (for each number of course).
So basically what I am asking is given table 3, how do I select only the rows with the highest count for each number (Of course an answer describing providing a way to get from table 1 to table 2 directly also works as an answer :) )
You can use aggregation and window functions:
select num, att
from (
select num, att, row_number() over(partition by num order by count(*) desc, att) rn
from mytable
group by num, att
) t
where rn = 1
For each num, this brings the most frequent att; if there are ties, the smaller att is retained.
Oracle has an aggregation function that does this, stats_mode().:
select num, stats_mode(att)
from t
group by num;
In statistics, the most common value is called the mode -- hence the name of the function.
Here is a db<>fiddle.
You can use group by and count as below
select id, col, count(col) as count
from
df_b_sql
group by id, col

Getting a distinct value from one column if all rows matches a certain criteria

I'm trying to find a performant and easy-to-read query to get a distinct value from one column, if all rows in the table matches a certain criteria.
I have a table that tracks e-commerce orders and whether they're delivered on time, contents and schema as following:
> select * from orders;
+----+--------------------+-------------+
| id | delivered_on_time | customer_id |
+----+--------------------+-------------+
| 1 | 1 | 9 |
| 2 | 0 | 9 |
| 3 | 1 | 10 |
| 4 | 1 | 10 |
| 5 | 0 | 11 |
+----+--------------------+-------------+
I would like to get all distinct customer_id's which have had all their orders delivered on time. I.e. I would like an output like this:
+-------------+
| customer_id |
+-------------+
| 10 |
+-------------+
What's the best way to do this?
I've found a solution, but it's a bit hard to read and I doubt it's the most efficient way to do it (using double CTE's):
> with hits_all as (
select memberid,count(*) as count from orders group by memberid
),
hits_true as
(select memberid,count(*) as count from orders where hit = true group by memberid)
select
*
from
hits_true
inner join
hits_all on
hits_all.memberid = hits_true.memberid
and hits_all.count = hits_true.count;
+----------+-------+----------+-------+
| memberid | count | memberid | count |
+----------+-------+----------+-------+
| 10 | 2 | 10 | 2 |
+----------+-------+----------+-------+
You use group by and having as follows:
select customer_id
from orders
group by customer_id
having sum(delivered_on_time) = count(*)
This works because an ontime delivery is identified by delivered_on_time = 1. So you can just ensure that the sum of delivered_on_time is equal to the number of records for the customer.
You can use aggregation and having:
select customer_id
from orders
group by customer_id
having min(delivered_on_time) = max(delivered_on_time);

Find number of rows identical one some, but different on another column

Say I have the following table:
CREATE TABLE data (
PROJECT_ID VARCHAR,
TASK_ID VARCHAR,
REF_ID VARCHAR,
REF_VALUE VARCHAR
);
I want to identify rows where
PROJECT_ID, REF_ID, REF_VALUE are the same
but TASK_ID are different.
The desired output is a list of TASK_ID_1, TASK_ID_2 and COUNT(*) of such conflicts. So, for example,
DATA
+------------+---------+--------+-----------+
| PROJECT_ID | TASK_ID | REF_ID | REF_VALUE |
+------------+---------+--------+-----------+
| 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 2 |
| 1 | 2 | 1 | 1 |
| 1 | 2 | 1 | 2 |
+------------+---------+--------+-----------+
OUTPUT
+-----------+-----------+----------+
| TASK_ID_1 | TASK_ID_2 | COUNT(*) |
+-----------+-----------+----------+
| 1 | 2 | 2 |
| 2 | 1 | 2 |
+-----------+-----------+----------+
would mean that there are two entries with TASK_ID == 1 and two entries with TASK_ID == 2 that share the same values for the other three columns. The inherent symmetry in the output is fine.
How would I go about finding this information? I've tried joining the table onto itself and grouping, but this turned up more results for a single task than the table had rows altogether, so it's clearly wrong.
The database used is PostgreSQL, though a solution that applies to most common SQL systems would be preferable.
You want a self join and aggregation:
select d1.task_id as task_id_1, d2.task_id as task_id_2, count(*)
from data d1 join
data d2
on d1.project_id = d2.project_id and
d1.ref_id = d2.ref_id and
d1.ref_value = d2.ref_value and
d1.task_id <> d2.task_id
group by d1.task_id, d2.task_id;
Notes:
Add the condition d1.task_id < d2.task_id if you want each pair to occur only once in the result set.
This does not handle NULL values, although that is easy enough to handle. Use is not distinct from instead of =.
You can also simplify this a bit with the using clause:
select d1.task_id as task_id_1, d2.task_id as task_id_2, count(*)
from data d1 join
data d2
using (project_id, ref_id, ref_value)
where d1.task_id <> d2.task_id
group by d1.task_id, d2.task_id;
You can get an idea of how many rows might be returned by using:
select d.project_id, d.ref_id, d.ref_value, count(distinct d.task_id), count(*)
from data d
group by d.project_id, d.ref_id, d.ref_value;
This is how I understand your question. This assume there are only two task for the same combination.
SQL DEMO
SELECT "PROJECT_ID", "REF_ID", "REF_VALUE",
MIN("TASK_ID") as TASK_ID_1,
MAX("TASK_ID") as TASK_ID_2,
COUNT(*) as cnt
FROM Table1
GROUP BY "PROJECT_ID", "REF_ID", "REF_VALUE"
HAVING MIN("TASK_ID") != MAX("TASK_ID")
-- COUNT(*) > 1 also should work
OUTPUT
I add more column to make clear what are the same elements:
| PROJECT_ID | REF_ID | REF_VALUE | task_id_1 | task_id_2 | cnt |
|------------|--------|-----------|-----------|-----------|-----|
| 1 | 1 | 2 | 1 | 2 | 2 |
| 1 | 1 | 1 | 1 | 2 | 2 |

SQL Change Rank based on any value in group of values

I'm not looking for the answer as much as what to search for as I think this is possible. I have a query where the result can be as such:
| ID | CODE | RANK |
I want to base rank off of the code so my I get these results
| 1 | A | 1 |
| 1 | B | 1 |
| 2 | A | 1 |
| 2 | C | 1 |
| 3 | B | 2 |
| 3 | C | 2 |
| 4 | C | 3 |
Basically, based on the group of IDs, if any of the CODEs = a certain value I want to adjust the rank so then I can order by rank first and then other columns. Never sure how to phrase things in SQL.
I tried
CASE WHEN CODE = 'A' THEN 1 WHEN CODE = 'B' THEN 2 ELSE 3 END rank
ORDER BY rank DESC
But I want to keep the ids together, I don't want them broken apart, I was thinking of doing all ranks the same based on the highest if I can't solve it another way?
Thoughts of a SQL function to look at?
You could use the MIN() OVER() analytic function to get the minimum rank value per group, and just order by that;
WITH cte AS (
SELECT id, code,
MIN(CASE WHEN code='A' THEN 1 WHEN code='B' THEN 2 ELSE 3 END)
OVER (PARTITION BY id) rank
FROM mytable
)
SELECT * FROM cte
ORDER BY rank, id, code
An SQLfiddle to test with.