tSQL aggregate functions and group bys - sql

So, I can't seem to find a way to get this to work. But, what I need is as follow.
I have a table that has lets say types.
Type_ID, Type_Description
Then I have a table of Items. [Item Type is fk to type table]
Item_ID, Item_Type
Then I have a Results Table. [Item_ID is fk to Item table]
Result_ID, Item_ID, Cost
So what i am needing for output is Grouped by the Type_ID
- The Count of Items(can be 0),
- The Count of Results(Can be 0),
- and the sum of Cost(can be 0)
I dont have direct access to these tables. I am having to build the sql and send it to an api so I dont get to know the error simply the results if successful and error 500 if not.
Seems to be older tSQL. As List and STRING_AGG dont seem to be available.
EDIT: As requested - Sample Data
+---------+------------------+
| Type_ID | Type_Description |
+---------+------------------+
| 1 | Example 1 |
+---------+------------------+
| 2 | Example 2 |
+---------+------------------+
+---------+---------------+
| ITEM_ID | ITEM_TYPE |
+---------+---------------+
| 1 | 1 |
+---------+---------------+
| 2 | 1 |
+---------+---------------+
| 3 | 1 |
+---------+---------------+
| 4 | 2 |
+---------+---------------+
| 5 | 2 |
+---------+---------------+
+-----------+---------+------+
| Result_ID | Item_ID | Cost |
+-----------+---------+------+
| 1 | 1 | 10 |
+-----------+---------+------+
| 2 | 1 | 20 |
+-----------+---------+------+
| 3 | 2 | 5 |
+-----------+---------+------+
| 4 | 5 | 100 |
+-----------+---------+------+
Desired Output
+---------+------------+--------------+------+
| Type_ID | Item_Count | Result_Count | Cost |
+---------+------------+--------------+------+
| 1 | 3 | 3 | 35 |
+---------+------------+--------------+------+
| 2 | 2 | 1 | 100 |
+---------+------------+--------------+------+

I think GMB's answer was quite good, but in case there is a type with no items (something in your requirements), it will not be displayed.
So first of all let's create the input data:
select 1 as Type_ID, 'Example 1' as Type_Description into #type
union all
select 2, 'Example 2'
union all
select 3, 'Example 3'
select 1 Result_ID, 1 Item_ID, 10 Cost into #item
union all
select 2, 1, 20 Cost
union all
select 3, 2, 5 Cost
union all
select 4, 5, 100 Cost
select 1 Item_ID, 1 Item_Type INTO #item_type
union all
select 2, 1
union all
select 3, 1
union all
select 4, 2
union all
select 5, 2
Note I added also a type 3 with no items to test the no item case.
And then the query you need:
SELECT
t.Type_ID,
COUNT(DISTINCT it.Item_ID) Item_Count,
COUNT(DISTINCT i.Result_ID) Result_Count,
SUM(ISNULL(Cost, 0)) Cost
FROM #type t
LEFT JOIN #item_type it on it.Item_Type = t.Type_ID
LEFT JOIN #item i on i.Item_ID = it.Item_ID
GROUP BY t.Type_ID
I think it is pretty straightforward and doesn't need much explanation, but feel free to ask in the comments if necessary.
The results are just like you requested, with also a line for type 3:
+---------+------------+--------------+------+
| Type_ID | Item_Count | Result_Count | Cost |
+---------+------------+--------------+------+
| 1 | 3 | 3 | 35 |
+---------+------------+--------------+------+
| 2 | 2 | 1 | 100 |
+---------+------------+--------------+------+
| 3 | 0 | 0 | 0 |
+---------+------------+--------------+------+
You mentioned that the count of items could be 0 and also the count of results. But aren't both values always either 0, or both > 0? Only in case your type-item many-to-many table doesn't have a FK, you could have that scenario. For example, if I add:
insert into #item_type
select 6, 3
Then the last row is:
+---------+------------+--------------+------+
| 3 | 1 | 0 | 0 |
+---------+------------+--------------+------+
I am not sure if that makes sense in your scenario, but as your post implies that items and results can be 0 independently, that confused me a bit.

You can join and aggregate. Assuming that your tables are called types, types_costs and costs, that would be:
select
t.type_id,
count(distinct tc.item_id) item_count,
count(distinct c.result_id) result_count,
sum(c.cost) cost
from types t
inner join types_costs tc on tc.item_type = t.type_id
left costs c on c.item_id = tc.item_id
group by t.type_id
An important thing is to use a left join to bring the costs table so item_ids that do not exist in costs are not eliminated before you get a change to count them. Depending on your actual use case, you might also want a left join on table types_costs.

Related

Postgresql hierarchical (tree) query

I found few topics about it but none fits my expected results.
I have levels of categories stored in the table, just want to display it as tree structure.
All answers are kind of following query:
DB FIDDLE
WITH RECURSIVE cte AS (
SELECT category_id, category_name, parent_category, 1 AS level
FROM category
WHERE level = 1
UNION ALL
SELECT c.category_id, c.category_name, c.parent_category, ct.level + 1
FROM cte ct
JOIN category c ON c.parent_category = ct.category_id
)
SELECT *
FROM cte;
But the results are like
level
1
1
2
2
2
3
3
3
3
3
What I want to achieve is
level
1
2
3
3
2
3
3
1
2
3
3
2
3
3
You would typically keek track of the path to each node and use that for ordering. In Postgres, arrays come handy for this:
with recursive cte as (
select category_id, category_name, parent_category, 1 as level, array[category_id] path
from category
where parent_category is null
union all
select c.category_id, c.category_name, c.parent_category, ct.level + 1, ct.path || c.category_id
from cte ct
join category c on c.parent_category = ct.category_id
)
select *
from cte
order by path
Note that there is no need to store the level in the table; you can compute the information on the fly as you iterate. To identify the root nodes, you can filter on rows whose parent is null.
In your db fiddle, the query returns:
category_id | category_name | parent_category | level | path
----------: | :------------ | --------------: | ----: | :-------
1 | cat1 | null | 1 | {1}
3 | cat3 | 1 | 2 | {1,3}
8 | cat8 | 3 | 3 | {1,3,8}
9 | cat9 | 3 | 3 | {1,3,9}
4 | cat4 | 1 | 2 | {1,4}
6 | cat6 | 4 | 3 | {1,4,6}
7 | cat7 | 4 | 3 | {1,4,7}
5 | cat5 | 1 | 2 | {1,5}
10 | cat10 | 5 | 3 | {1,5,10}
11 | cat11 | 5 | 3 | {1,5,11}
2 | cat2 | null | 1 | {2}
You can keep track of the hierarchy as an array and use that for ordering:
WITH RECURSIVE cte AS (
SELECT category_id, category_name, parent_category, 1 AS level, array[category_id] as categories
FROM category
WHERE level = 1
UNION ALL
SELECT c.category_id, c.category_name, c.parent_category, ct.level + 1, ct.categories || c.category_id
FROM cte ct JOIN
category c
ON c.parent_category = ct.category_id
)
SELECT *
FROM cte
ORDER BY categories;
Here is a db<>fiddle.

Find number of rows identical one some, but different on another column

Say I have the following table:
CREATE TABLE data (
PROJECT_ID VARCHAR,
TASK_ID VARCHAR,
REF_ID VARCHAR,
REF_VALUE VARCHAR
);
I want to identify rows where
PROJECT_ID, REF_ID, REF_VALUE are the same
but TASK_ID are different.
The desired output is a list of TASK_ID_1, TASK_ID_2 and COUNT(*) of such conflicts. So, for example,
DATA
+------------+---------+--------+-----------+
| PROJECT_ID | TASK_ID | REF_ID | REF_VALUE |
+------------+---------+--------+-----------+
| 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 2 |
| 1 | 2 | 1 | 1 |
| 1 | 2 | 1 | 2 |
+------------+---------+--------+-----------+
OUTPUT
+-----------+-----------+----------+
| TASK_ID_1 | TASK_ID_2 | COUNT(*) |
+-----------+-----------+----------+
| 1 | 2 | 2 |
| 2 | 1 | 2 |
+-----------+-----------+----------+
would mean that there are two entries with TASK_ID == 1 and two entries with TASK_ID == 2 that share the same values for the other three columns. The inherent symmetry in the output is fine.
How would I go about finding this information? I've tried joining the table onto itself and grouping, but this turned up more results for a single task than the table had rows altogether, so it's clearly wrong.
The database used is PostgreSQL, though a solution that applies to most common SQL systems would be preferable.
You want a self join and aggregation:
select d1.task_id as task_id_1, d2.task_id as task_id_2, count(*)
from data d1 join
data d2
on d1.project_id = d2.project_id and
d1.ref_id = d2.ref_id and
d1.ref_value = d2.ref_value and
d1.task_id <> d2.task_id
group by d1.task_id, d2.task_id;
Notes:
Add the condition d1.task_id < d2.task_id if you want each pair to occur only once in the result set.
This does not handle NULL values, although that is easy enough to handle. Use is not distinct from instead of =.
You can also simplify this a bit with the using clause:
select d1.task_id as task_id_1, d2.task_id as task_id_2, count(*)
from data d1 join
data d2
using (project_id, ref_id, ref_value)
where d1.task_id <> d2.task_id
group by d1.task_id, d2.task_id;
You can get an idea of how many rows might be returned by using:
select d.project_id, d.ref_id, d.ref_value, count(distinct d.task_id), count(*)
from data d
group by d.project_id, d.ref_id, d.ref_value;
This is how I understand your question. This assume there are only two task for the same combination.
SQL DEMO
SELECT "PROJECT_ID", "REF_ID", "REF_VALUE",
MIN("TASK_ID") as TASK_ID_1,
MAX("TASK_ID") as TASK_ID_2,
COUNT(*) as cnt
FROM Table1
GROUP BY "PROJECT_ID", "REF_ID", "REF_VALUE"
HAVING MIN("TASK_ID") != MAX("TASK_ID")
-- COUNT(*) > 1 also should work
OUTPUT
I add more column to make clear what are the same elements:
| PROJECT_ID | REF_ID | REF_VALUE | task_id_1 | task_id_2 | cnt |
|------------|--------|-----------|-----------|-----------|-----|
| 1 | 1 | 2 | 1 | 2 | 2 |
| 1 | 1 | 1 | 1 | 2 | 2 |

how to bake in a record count in a sql query

I have a query that looks like this:
select id, extension, count(distinct(id)) from publicids group by id,extension;
This is what the results looks like:
id | extension | count
-------------+-------------------------+-------
18459154909 | 12333 | 1
18459154909 | 9891114 | 1
18459154919 | 43244 | 1
18459154919 | 8776232 | 1
18766145025 | 12311 | 1
18766145025 | 1122111 | 1
18766145201 | 12422 | 1
18766145201 | 14141 | 1
But what I really want is for the results to look like this:
id | extension | count
-------------+-------------------------+-------
18459154909 | 12333 | 2
18459154909 | 9891114 | 2
18459154919 | 43244 | 2
18459154919 | 8776232 | 2
18766145025 | 12311 | 2
18766145025 | 1122111 | 2
18766145201 | 12422 | 2
18766145201 | 14141 | 2
I'm trying to get the count field to show the total number of records that have the same id.
Any suggestions would be appreciated
I think you want to count distincts extentions, not ids.
Run this query:
select id
, extension
(select count(*) from publicids p1 where p.id = p1.id ) distinct_id_count
from publicids p
group by id,extension;
This is more or less the same as Pastor's answer. Depending on what the optimizer does it might be faster with higher record count source tables.
select p.id, p.extension, p2.id_count
from publicids p
inner join (
select id, count(*) as id_count
from publicids group by id
) as p2 on p.id = p2.id

An SQL query that combines aggregate and non-aggregate values in one row

The following query gives me the information that I need but I want it to take it just a step further. In the table at the bottom (only showing a subset of the fields), I want to group by cust_line in an unusual way (at least to me it's unusual).
Let's look at the items with a cust_line of 2 as an example. I would like these to be represented by one line not 5. For this line, I would like to select all the fields except for the price field where the cust_part = "GROUPINVC". For the total field I would like it to be 'sum(total) as new_total' and for the price, I would like it to be new_total / qty_invoiced, where qty_invoiced is the value on the line where cust_part = "GROUPINV".
Is what I am asking for completely ridiculous? Is it even possible? I'm not advanced at SQL so it may also be easy and I just don't know how to approach it. I thought of using 'partition by' but I couldn't imagine how I would get it to work as I figured it would still return 5 rows where I only want 1.
I've also looked at these questions with similar titles but not really what I am looking for:
SQL query that returns aggregate AND non aggregate results
Combined aggregated and non-aggregate query in SQL
SELECT L.CUST_LINE, I.LINE_NO, I.ORDER_NO, I.STAGE, I.ORDER_LINE_POS, I.CUST_PART,
I.LINE_ITEM_NO, I.QTY_INVOICED, I.CUST_DESC, I.DESCRIPTION, I.SALE_UNIT_PRICE, I.PRICE_TOTAL,
I.INVOICE_NO, I.CUSTOMER_PO_NO, I.ORDER_NO, I.CUSTOMER_NO, I.CATALOG_DESC, I.ORDER_LINE_NOTES
FROM
(SELECT CUST_LINE, ORDER_NO, LINE_NO
FROM CUSTOMER_ORDER_LINE
GROUP BY CUST_LINE, ORDER_NO, LINE_NO
) L
INNER JOIN CUSTOMER_ORDER_IVC_REP I
ON I.ORDER_NO = L.ORDER_NO
WHERE RESULT_KEY = 999999
AND I.LINE_NO = L.LINE_NO
ORDER BY L.CUST_LINE;
| cust_line | line_no | cust_part | qty_invoiced | cust_desc | price | total |
| 1 | 4 | ... | 1 | ... | 55 | 55 |
| 2 | 1 | GROUPINV | 1 | some part | 0 | 0 |
| 2 | 6 | ... | 3 | ... | 0 | 0 |
| 2 | 2 | ... | 1 | ... | 0 | 0 |
| 2 | 3 | ... | 1 | ... | 0 | 0 |
| 2 | 7 | ... | 2 | ... | 10 | 20 |
| 3 | 7 | ... | 1 | ... | 67 | 67 |
You can use an analytic function to calculate a total over multiple rows of a result set, then filter out the rows you don't want.
Leaving out all the extra columns for sake of brevity:
SELECT cust_line, qty_invoiced, order_total/qty_invoiced AS price
FROM (
SELECT l.cust_line, qty_invoiced,
SUM(total) OVER (PARTITION BY l.cust_line) AS order_total,
COUNT(cust_line) OVER (PARTITION BY l.cust_line) AS group_count
FROM
(SELECT CUST_LINE, ORDER_NO, LINE_NO
FROM CUSTOMER_ORDER_LINE
GROUP BY CUST_LINE, ORDER_NO, LINE_NO
) L
INNER JOIN CUSTOMER_ORDER_IVC_REP I
ON I.ORDER_NO = L.ORDER_NO
WHERE RESULT_KEY = 999999
AND I.LINE_NO = L.LINE_NO
)
WHERE ( cust_part = 'GROUPINV' OR group_count = 1 )
ORDER BY cust_line
I am guessing on what you want in the PARTITION BY clause; this is essentially a GROUP BY that applies only to the SUM function. Not sure if you might also want order_no in the partition.
The trick is to select all the rows in the inner query, applying SUM across them all; then filter out the rows you are not interested in in the outermost query.

Doing a market basket analysis on the order details

I have a table that looks (abbreviated) like:
| order_id | item_id | amount | qty | date |
|---------- |--------- |-------- |----- |------------ |
| 1 | 1 | 10 | 1 | 10-10-2014 |
| 1 | 2 | 20 | 2 | 10-10-2014 |
| 2 | 1 | 10 | 1 | 10-12-2014 |
| 2 | 2 | 20 | 1 | 10-12-2014 |
| 2 | 3 | 45 | 1 | 10-12-2014 |
| 3 | 1 | 10 | 1 | 9-9-2014 |
| 3 | 3 | 45 | 1 | 9-9-2014 |
| 4 | 2 | 20 | 1 | 11-11-2014 |
I would like to run a query that would calculate the list of items
that most frequently occur together.
In this case the result would be:
|items|frequency|
|-----|---------|
|1,2, |2 |
|1,3 |1 |
|2,3 |1 |
|2 |1 |
Ideally, first presenting orders with more than one items, then presenting
the most frequently ordered single items.
Could anyone please provide an example for how to structure this SQL?
This query generate all of the requested output, in the cases where 2 items occur together. It doesn't include the last item of the requested output since a single value (2) technically doesn't occur together with anything... although you could easily add a UNION query to include values that happen alone.
This is written for PostgreSQL 9.3
create table orders(
order_id int,
item_id int,
amount int,
qty int,
date timestamp
);
INSERT INTO ORDERS VALUES(1,1,10,1,'10-10-2014');
INSERT INTO ORDERS VALUES(1,2,20,1,'10-10-2014');
INSERT INTO ORDERS VALUES(2,1,10,1,'10-12-2014');
INSERT INTO ORDERS VALUES(2,2,20,1,'10-12-2014');
INSERT INTO ORDERS VALUES(2,3,45,1,'10-12-2014');
INSERT INTO ORDERS VALUES(3,1,10,1,'9-9-2014');
INSERT INTO ORDERS VALUES(3,3,45,1,'9-9-2014');
INSERT INTO ORDERS VALUES(4,2,10,1,'11-11-2014');
with order_pairs as (
select (pg1.item_id, pg2.item_id) as items, pg1.date
from
(select distinct item_id, date
from orders) as pg1
join
(select distinct item_id, date
from orders) as pg2
ON
(
pg1.date = pg2.date AND
pg1.item_id != pg2.item_id AND
pg1.item_id < pg2.item_id
)
)
SELECT items, count(*) as frequency
FROM order_pairs
GROUP by items
ORDER by items;
output
items | frequency
-------+-----------
(1,2) | 2
(1,3) | 2
(2,3) | 1
(3 rows)
Market Basket Analysis with Join.
Join on order_id and compare if item_id < self.item_id. So for every item_id you get its associated items sold. And then group by items and count the number of rows for each combinations.
select items,count(*) as 'Freq' from
(select concat(x.item_id,',',y.item_id) as items from orders x
JOIN orders y ON x.order_id = y.order_id and
x.item_id != y.item_id and x.item_id < y.item_id) A
group by A.items order by A.items;