Query Totaling Multiple columns and grouping by another

Query Totaling Multiple columns and grouping by another - sql

I am trying to get some totals of multiple records based on Location - but get totals for multiple columns to present back - basically there are various locations and each location has different species and totals at that location.
Without doing something really ugly - I have no idea of the best approach - any suggestions?
I have a table
Record
RecordID, Location, Species1, Species2, Species3
1, Loc1, 3, NULL,1
2, Loc2, NULL, 12, NULL
3, Loc2, 2, 2, 2
4, Loc1, 1, 2, 3
5, Loc4, 3, NULL, NULL
I need to get the following data like this:
Location | Species1 | Species2 | Species3
Loc1 | 4 | 2 | 4
Loc2 | 2 | 14 | 2
Loc4 | 3 | NULL | NULL

SELECT Location,
SUM(ISNULL(Species1, 0)) AS Species1,
SUM(ISNULL(Species2, 0)) AS Species2,
SUM(ISNULL(Species3, 0)) AS Species3
FROM Record
GROUP BY Location

Related

(Presto) SQL: Group by on columns "A" and "B" and count column "C", but also include count of "C" grouped only by "A"

The title of the question feels a bit weird so if you can imagine a better one please feel free to help.
Hello,
imagine a situation like this - there's a "Sales" table with 3 columns: date, store and sale_price, each row indicates a single item sale:
date | store | sale_price
---------------+---------+------------
2021-09-01 | foo | 15
2021-09-01 | foo | 10
2021-09-01 | foo | 10
2021-09-01 | bar | 5
2021-09-02 | foo | 30
2021-09-02 | bar | 40
2021-09-02 | bar | 20
etc...
What I'm trying to do is create a query that groups by date and store, and counts how many items have been sold by each store in each day (so, disregarding the price). So far it's very easy, but for visualization purposes, I'm also trying to add an extra row, that per day also includes the aggregate of sale counts.
Here's the end result I'm looking for:
date | store | sales_count
---------------+-------------+------------
2021-09-01 | foo | 3
2021-09-01 | bar | 1
2021-09-01 | aggregate | 4
2021-09-02 | foo | 1
2021-09-02 | bar | 2
2021-09-02 | aggregate | 3
etc...
I know I can create this by doing a UNION ALL, but it is not super efficient because it scans the original table twice:
SELECT date,
store,
count(sale_price) AS sales_count
FROM sales
GROUP BY 1, 2
UNION ALL
SELECT date,
'aggregate' AS store,
count(sale_price) AS sales_count
FROM sales
GROUP BY 1
I also know that I can create an extra column using over() clause, and avoid scanning "sales" twice, but then I would have two different columns instead of just one like I'm looking for:
SELECT date,
store,
count(sale_price) AS sales_count,
sum(count(sale_price)) over(PARTITION BY date) AS sales_per_day
FROM sales
GROUP BY 1, 2
--->
date | store | sales_count | sales_per_day
---------------+-------------+--------------+-----------------
2021-09-01 | foo | 3 | 4
2021-09-01 | bar | 1 | 4
2021-09-02 | foo | 1 | 3
2021-09-02 | bar | 2 | 3
etc...
Is it even possible to achieve what I'm trying to do without scanning twice? Can the last two columns (sales_count and sales_per_day) be somehow merged?
Thanks in advance for your help.

You can use GROUPING SETS, CUBE and ROLLUP to aggregate at a different levels within the same query. You can also use the GROUPING operation to determine which columns were considered in the group for a given output row:
WITH data(day, store, sale_price) AS (
VALUES
(DATE '2021-09-01', 'foo', 15),
(DATE '2021-09-01', 'foo', 10),
(DATE '2021-09-01', 'foo', 10),
(DATE '2021-09-01', 'bar', 5),
(DATE '2021-09-02', 'foo', 30),
(DATE '2021-09-02', 'bar', 40),
(DATE '2021-09-02', 'bar', 20)
)
SELECT day,
if(grouping(store) = 1, '<aggregate>', store),
count(sale_price) as sales_count
FROM data
GROUP BY GROUPING SETS ((day), (day, store))
ORDER BY day, grouping(store)

How to pivot column data into a row where a maximum qty total cannot be exceeded?

Introduction:
I have come across an unexpected challenge. I'm hoping someone can help and I am interested in the best method to go about manipulating the data in accordance to this problem.
Scenario:
I need to combine column data associated to two different ID columns. Each row that I have associates an item_id and the quantity for this item_id. Please see below for an example.
+-------+-------+-------+---+
|cust_id|pack_id|item_id|qty|
+-------+-------+-------+---+
| 1 | A | 1 | 1 |
| 1 | A | 2 | 1 |
| 1 | A | 3 | 4 |
| 1 | A | 4 | 0 |
| 1 | A | 5 | 0 |
+-------+-------+-------+---+
I need to manipulate the data shown above so that 24 rows (for 24 item_ids) is combined into a single row. In the example above I have chosen 5 items to make things easier. The selection format I wish to get, assuming 5 item_ids, can be seen below.
+---------+---------+---+---+---+---+---+
| cust_id | pack_id | 1 | 2 | 3 | 4 | 5 |
+---------+---------+---+---+---+---+---+
| 1 | A | 1 | 1 | 4 | 0 | 0 |
+---------+---------+---+---+---+---+---+
However, here's the condition that is making this troublesome. The maximum total quantity for each row must not exceed 5. If the total quantity exceeds 5 a new row associated to the cust_id and pack_id must be created for the rest of the item_id quantities. Please see below for the desired output.
+---------+---------+---+---+---+---+---+
| cust_id | pack_id | 1 | 2 | 3 | 4 | 5 |
+---------+---------+---+---+---+---+---+
| 1 | A | 1 | 1 | 3 | 0 | 0 |
| 1 | A | 0 | 0 | 1 | 0 | 0 |
+---------+---------+---+---+---+---+---+
Notice how the quantities of item_ids 1, 2 and 3 summed together equal 6. This exceeds the maximum total quantity of 5 for each row. For the second row the difference is created. In this case only item_id 3 has a single quantity remaining.
Note, if a 2nd row needs to be created that total quantity displayed in that row also cannot exceed 5. There is a known item_id limit of 24. But, there is no known limit of the quantity associated for each item_id.

Here's an approach which goes from left-field a bit.
One approach would have been to do a recursive CTE, building the rows one-by-one.
Instead, I've taken an approach where I
Create a new (virtual) table with 1 row per item (so if there are 6 items, there will be 6 rows)
Group those items into groups of 5 (I've called these rn_batches)
Pivot those (based on counts per item per rn_batch)
For these, processing is relatively simple
Creating one row per item is done using INNER JOIN to a numbers table with n <= the relevant quantity.
The grouping then just assigns rn_batch = 1 for the first 5 items, rn_batch = 2 for the next 5 items, etc - until there are no more items left for that order (based on cust_id/pack_id).
Here is the code
/* Data setup */
CREATE TABLE #Order (cust_id int, pack_id varchar(1), item_id int, qty int, PRIMARY KEY (cust_id, pack_id, item_id))
INSERT INTO #Order (cust_id, pack_id, item_id, qty) VALUES
(1, 'A', 1, 1),
(1, 'A', 2, 1),
(1, 'A', 3, 4),
(1, 'A', 4, 0),
(1, 'A', 5, 0);
/* Pivot results */
WITH Nums(n) AS
(SELECT (c * 100) + (b * 10) + (a) + 1 AS n
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) A(a)
CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) B(b)
CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) C(c)
),
ItemBatches AS
(SELECT cust_id, pack_id, item_id,
FLOOR((ROW_NUMBER() OVER (PARTITION BY cust_id, pack_id ORDER BY item_id, N.n)-1) / 5) + 1 AS rn_batch
FROM #Order O
INNER JOIN Nums N ON N.n <= O.qty
)
SELECT *
FROM (SELECT cust_id, pack_id, rn_batch, 'Item_' + LTRIM(STR(item_id)) AS item_desc
FROM ItemBatches
) src
PIVOT
(COUNT(item_desc) FOR item_desc IN ([Item_1], [Item_2], [Item_3], [Item_4], [Item_5])) pvt
ORDER BY cust_id, pack_id, rn_batch;
And here are results
cust_id pack_id rn_batch Item_1 Item_2 Item_3 Item_4 Item_5
1 A 1 1 1 3 0 0
1 A 2 0 0 1 0 0
Here's a db<>fiddle with
additional data in the #Orders table
the answer above, and also the processing with each step separated.
Notes
This approach (with the virtual numbers table) assumes a maximum of 1,000 for a given item in an order. If you need more, you can easily extend that numbers table by adding additional CROSS JOINs.
While I am in awe of the coders who made SQL Server and how it determines execution plans in millisends, for larger datasets I give SQL Server 0 chance to accurately predict how many rows will be in each step. As such, for performance, it may work better to split the code up into parts (including temp tables) similar to the db<>fiddle example.

How to sum up the sums of grouped results with SQL?

I am new to SQL and have no clue how to solve my problem. I've got a column of names [name]), a column of integer values that I wanna sum up ([Values]) and another column of integer values ([Day]).
I want to sum up the values grouped by name for each day. So for example if there is a name "Chris" with value 4 on day 1 and there is another entry "Chris" with value 2 on day 3, I want to show the sum of chris on day 1 (4) and on day 2 (4+2=6).
So far I've only worked it out to sum up the values in total (see code below).
select Name,
Sum(Values) AS SumValues
from X1
Group by Name;
but as in the example above ("chris") I wanna sum them up, showing the sum for each name on each day (the sum from day 1 until day x).

With sum() window function:
select name, day,
sum(value) over (partition by name order by day) total
from tablename
For this table:
create table tablename(name varchar(10), day int, value int);
insert into tablename(name, day, value) values
('Chris', 1, 2), ('Chris', 2, 4), ('Chris', 3, 8),
('Alice', 1, 5), ('Alice', 2, 10), ('Alice', 3, 20);
the results are:
> name | day | total
> :---- | --: | ----:
> Alice | 1 | 5
> Alice | 2 | 15
> Alice | 3 | 35
> Chris | 1 | 2
> Chris | 2 | 6
> Chris | 3 | 14
See the demo.

Access sql to retrieve counts of values meeting a condition

I'm trying to write a query in Access that will return a count of values for each site in a table where the value exceeds a specified level, but also, for sites that have no values exceeding that level, return a specified value, such as "NA".
I've tried Iif, Switch, Union, sub queries, querying a different query, but no luck. I can get all the counts exceeding the level, or all sites with "NA" correct but showing total count for the rest, not just count above the level.
For example, in the table below, assuming level > 10, Houston = "NA", Detroit = 2, Pittsburgh PA = 3. I just can't get both sides of the query to work.
Apologize in advance for poor formatting.
+-----------------+-------+
| 1. Site | Value |
+-----------------+-------+
| 2. Houston | 10 |
| 3. Houston | 3 |
| 4. Houston | 0 |
| 5. Detroit | 15 |
| 6. Detroit | 7 |
| 7. Detroit | 4 |
| 8. Detroit | 12 |
| 9. Pittsburgh | 23 |
| 10. Pittsburgh | 2 |
| 11. Pittsburgh | 18 |
| 12. Pittsburgh | 12 |
+-----------------+-------+

Another solution is to use conditional aggregation, as follows :
SELECT site, SUM(IIf(value > 10, 1, 0)) AS value
FROM mytable
GROUP BY site
This approach should be more efficient than self-joining the table, since it requires to scan the table only once.
The SUM(IIf ...) is a handy construct to count how many records satisfy a given condition.
NB : it is generally not a good idea to return two different data types in the same column (in your use case, either a number or string 'NA'). Most RDBMS do not allow that. So I provided a query that will return 0 when there are not matches, instead of NA. If you really want 'NA', you can try :
IIF(
SUM(IIf(value > 10, 1, 0)) = 0,
'NA',
STR(SUM(IIf(value > 10, 1, 0)))
) AS value
This demo on DB Fiddle, with your sample data returns :
site | value
:--------- | ----:
Detroit | 2
Houston | 0
Pittsburgh | 3

Get a list of all sites independant of the counts (SiteList derived table below)
LEFT Join this back to your base table (SiteValues) to get the counts for each site where it's meeting threshold. --note should join on key which I'm not sure what is for this table. site alone isn't enough
Count the values from the siteValues dataset as NULL's will get counted as 0.
WORKING DEMO:
.
SELECT SiteList.Site, Count(Sitevalues.Site)
FROM (SELECT site, value
FROM TableName) SiteList
LEFT JOIN TableName SiteValues
on SiteList.Site = SiteValues.Site
and SiteValues.Value > 10
and SiteValues.Value = SiteList.value
GROUP BY SiteList.Site
GIVING US:
+----+------------+------------------+
| | Site | (No column name) |
+----+------------+------------------+
| 1 | Detroit | 2 |
| 2 | Houston | 0 |
| 3 | Pittsburgh | 3 |
+----+------------+------------------+
Or if you need the NA you have to cast the count to a varchar
SELECT SiteList.Site, case when Count(Sitevalues.Site) = 0 then 'NA' else cast(count(Sitevalues.site) as varchar(10)) end as SitesMeetingThreshold
FROM (SELECT site, value
FROM TableName) SiteList
LEFT JOIN TableName SiteValues
on SiteList.Site = SiteValues.Site
and SiteValues.Value > 10
and SiteValues.Value = SiteList.value
GROUP BY SiteList.Site

Just use conditional aggregation:
select site,
max(iif(value > 10, 1, 0)) as cnt_11plus
from t
group by site;
I think 0 is better than N/A. But if you want that you'll need to convert the results to a string.
select site,
iif(max(iif(value > 10, 1, 0)) > 0,
str(max(iif(value > 10, 1, 0))),
"N/A"
) as cnt_11plus
from t
group by site;

You can use UNION like this:
SELECT site, count(value) AS counter
FROM sites
WHERE value > 10
GROUP BY site
UNION
SELECT s.site, 'NA' AS counter
FROM sites AS s
WHERE value <= 10
AND NOT EXISTS (
SELECT 1 FROM sites WHERE site = s.site AND value > 10
)
GROUP BY site
Results:
site counter
Detroit 2
Houston NA
Pittsburgh 3
There is no need to convert the integer counter to Text, because Access does this implicitly for you.

How to aggregate values from different rows in sql (HANA)?

I have a table of shipments defined like so (the table is stored in a HANA database, if relevant):
CREATE COLUMN TABLE SHIPMENTS (
ShipmentID INT PRIMARY KEY,
Received INT,
Facility NVARCHAR(10),
Item NVARCHAR(20)
);
Here, the 'Received' column denotes the point in time at which each shipment is received, Facility is where the shipment is received and Item is the content of the shipment.
I have filled it with data like so:
INSERT INTO SHIPMENTS VALUES (1, 0, 'Factory', 'Production machine');
INSERT INTO SHIPMENTS VALUES (2, 0, 'Office', 'Printer');
INSERT INTO SHIPMENTS VALUES (3, 0, 'Factory', 'Coffee maker');
INSERT INTO SHIPMENTS VALUES (4, 1, 'Office', 'Coffee maker');
INSERT INTO SHIPMENTS VALUES (5, 1, 'Factory', 'Fax Machine');
INSERT INTO SHIPMENTS VALUES (6, 2, 'Office', 'Computers');
INSERT INTO SHIPMENTS VALUES (7, 2, 'Factory', 'Fridge');
INSERT INTO SHIPMENTS VALUES (8, 2, 'Factory', 'Freezer');
INSERT INTO SHIPMENTS VALUES (9, 2, 'Office', 'Fax Machine');
I would like to query the database to find, at each point in time, which items have been received up until that point. Based on an answer from another thread, I start by doing this:
SELECT Facility, Received, STRING_AGG (Item, ';') as Items
FROM (
SELECT * FROM SHIPMENTS
ORDER BY Facility, Received
)
GROUP BY Facility, Received
ORDER BY Facility, Received;
which results in
| FACILITY | RECEIVED | ITEMS
---------------------------------------------------------
1 | Factory | 0 | Production Machine;Coffee maker
2 | Factory | 1 | Fax Machine
3 | Factory | 2 | Fridge;Freezer
4 | Office | 0 | Printer
5 | Office | 1 | Coffee maker
6 | Office | 2 | Computers;Fax Machine
However, I would like this
| FACILITY | RECEIVED | ITEMS
---------------------------------------------------------
1 | Factory | 0 | Production Machine;Coffee maker
2 | Factory | 1 | Production Machine;Coffee maker;Fax Machine
3 | Factory | 2 | Production Machine;Coffee maker;Fax Machine;Fridge;Freezer
4 | Office | 0 | Printer
5 | Office | 1 | Printer;Coffee maker
6 | Office | 2 | Printer;Coffee maker;Computers;Fax Machine
I.e, each row displays what is received at that point, and everything that has already been received. Is there a nice way to do this in SQL?

You can try using a correlated query in the select clause to generate the csv data you want:
SELECT
Facility,
Received,
(SELECT STRING_AGG (s2.Item, ';') FROM SHIPMENTS s2
WHERE s2.Facility = s1.Facility AND s2.Received <= s1.Received
GROUP BY s2.Facility) AS ITEMS
FROM SHIPMENTS s1
GROUP BY
Facility,
Received
ORDER BY
Facility;

Maybe it could be a good idea to use ORDER BY clause with String_Agg function to make sure that the concatenation will be in desired order
select
distinct Facility, Received,
(
select string_agg(s.Item, ';' order by Received, ShipmentID)
from Shipments s
where
s.Facility = t.Facility and
s.Received <= t.Received
) as Items
from Shipments t

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Query Totaling Multiple columns and grouping by another - sql

SELECT Location, SUM(ISNULL(Species1, 0)) AS Species1, SUM(ISNULL(Species2, 0)) AS Species2, SUM(ISNULL(Species3, 0)) AS Species3 FROM Record GROUP BY Location

Related

(Presto) SQL: Group by on columns "A" and "B" and count column "C", but also include count of "C" grouped only by "A"

How to pivot column data into a row where a maximum qty total cannot be exceeded?

How to sum up the sums of grouped results with SQL?

Access sql to retrieve counts of values meeting a condition

How to aggregate values from different rows in sql (HANA)?

Categories

Resources