How to sum up the sums of grouped results with SQL? - sql

I am new to SQL and have no clue how to solve my problem. I've got a column of names [name]), a column of integer values that I wanna sum up ([Values]) and another column of integer values ([Day]).
I want to sum up the values grouped by name for each day. So for example if there is a name "Chris" with value 4 on day 1 and there is another entry "Chris" with value 2 on day 3, I want to show the sum of chris on day 1 (4) and on day 2 (4+2=6).
So far I've only worked it out to sum up the values in total (see code below).
select Name,
Sum(Values) AS SumValues
from X1
Group by Name;
but as in the example above ("chris") I wanna sum them up, showing the sum for each name on each day (the sum from day 1 until day x).

With sum() window function:
select name, day,
sum(value) over (partition by name order by day) total
from tablename
For this table:
create table tablename(name varchar(10), day int, value int);
insert into tablename(name, day, value) values
('Chris', 1, 2), ('Chris', 2, 4), ('Chris', 3, 8),
('Alice', 1, 5), ('Alice', 2, 10), ('Alice', 3, 20);
the results are:
> name | day | total
> :---- | --: | ----:
> Alice | 1 | 5
> Alice | 2 | 15
> Alice | 3 | 35
> Chris | 1 | 2
> Chris | 2 | 6
> Chris | 3 | 14
See the demo.

Related

How to exclude data based on the weeks in oracle

CREATE TABLE states_tab (
id NUMBER(10),
states VARCHAR2(50),
action VARCHAR2(50),
schedule_time DATE,
CONSTRAINT pk_states_tab PRIMARY KEY ( id )
);
INSERT INTO states_tab VALUES(1,'Albania','Rejected','07-03-22');
INSERT INTO states_tab VALUES(2,'Albania','Approved','07-03-22');
INSERT INTO states_tab VALUES(3,'Albania','Rejected','28-02-22');
INSERT INTO states_tab VALUES(4,'Albania','Approved','21-02-22');
INSERT INTO states_tab VALUES(5,'Albania','Reviewed','14-02-22');
INSERT INTO states_tab VALUES(6,'Albania','Reviewed','14-02-22');
INSERT INTO states_tab VALUES(7,'Albania','Reviewed','07-02-22');
commit;
Hi Team,
Above are some sample data for which I need to extract the data based on the dates. For example in my sample data, I have data from 7th Feb till 7th March. But I need only the past 4 weeks' data i.e till 14th Feb. So, I need to exclude the data whichever coming after the 4th week. Below is my attempt.
SELECT
states,
schedule_time,
SUM(decode(action, 'Rejected', 1, 0)) reject_count,
SUM(decode(action, 'Approved', 1, 0)) approve_count,
SUM(decode(action, 'Reviewed', 1, 0)) review_count
FROM
states_tab
GROUP BY
states,
schedule_time
ORDER BY schedule_time DESC ;
From the above query, I am getting all the records but I need to restrict the records which are beyond the 4th week from 7th Mar 2022.
Expected Output:
+---------+----------------+---------------+----------------+---------------+
| States | Schedule_TIME | REJECT_COUNT | APPROVE_COUNT | REVIEW_COUNT |
+---------+----------------+---------------+----------------+---------------+
| Albania | 07-03-22 | 1 | 1 | 0 |
| Albania | 28-02-22 | 1 | 0 | 0 |
| Albania | 21-02-22 | 0 | 1 | 0 |
| Albania | 14-02-22 | 0 | 0 | 2 |
+---------+----------------+---------------+----------------+---------------+
Just I need to exclude the 4th week's record from my existing query. Rest output is as expected from my attempt but how will I be able to exclude the data after the 4th week?
Note: From the current date I need past 4-week data only no matter how many records are present I need just till 4th-week data. This I am not able to achieve
Tool used: Oracle SQL Developer(18c)
Sorry to inform you, but - what you got actually is 4 weeks back. Result you expect is 3 weeks back.
SQL> select sysdate,
2 trunc(sysdate) - (4 * 7) four_weeks,
3 trunc(sysdate) - (3 * 7) three_Weeks
4 from dual;
SYSDATE FOUR_WEEKS THREE_WEEK
---------- ---------- ----------
07.03.2022 07.02.2022 14.02.2022
Anyway - whichever period you need, you'll get data if you filter it, and that's done with the WHERE clause (see line #8):
SQL> SELECT
2 states,
3 schedule_time,
4 SUM(decode(action, 'Rejected', 1, 0)) reject_count,
5 SUM(decode(action, 'Approved', 1, 0)) approve_count,
6 SUM(decode(action, 'Reviewed', 1, 0)) review_count
7 FROM states_tab
8 where schedule_time >= trunc(sysdate) - (3 * 7)
9 GROUP BY
10 states,
11 schedule_time
12 ORDER BY schedule_time DESC ;
STATES SCHEDULE_T REJECT_COUNT APPROVE_COUNT REVIEW_COUNT
--------------- ---------- ------------ ------------- ------------
Albania 07.03.2022 1 1 0
Albania 28.02.2022 1 0 0
Albania 21.02.2022 0 1 0
Albania 14.02.2022 0 0 2
SQL>

(Presto) SQL: Group by on columns "A" and "B" and count column "C", but also include count of "C" grouped only by "A"

The title of the question feels a bit weird so if you can imagine a better one please feel free to help.
Hello,
imagine a situation like this - there's a "Sales" table with 3 columns: date, store and sale_price, each row indicates a single item sale:
date | store | sale_price
---------------+---------+------------
2021-09-01 | foo | 15
2021-09-01 | foo | 10
2021-09-01 | foo | 10
2021-09-01 | bar | 5
2021-09-02 | foo | 30
2021-09-02 | bar | 40
2021-09-02 | bar | 20
etc...
What I'm trying to do is create a query that groups by date and store, and counts how many items have been sold by each store in each day (so, disregarding the price). So far it's very easy, but for visualization purposes, I'm also trying to add an extra row, that per day also includes the aggregate of sale counts.
Here's the end result I'm looking for:
date | store | sales_count
---------------+-------------+------------
2021-09-01 | foo | 3
2021-09-01 | bar | 1
2021-09-01 | aggregate | 4
2021-09-02 | foo | 1
2021-09-02 | bar | 2
2021-09-02 | aggregate | 3
etc...
I know I can create this by doing a UNION ALL, but it is not super efficient because it scans the original table twice:
SELECT date,
store,
count(sale_price) AS sales_count
FROM sales
GROUP BY 1, 2
UNION ALL
SELECT date,
'aggregate' AS store,
count(sale_price) AS sales_count
FROM sales
GROUP BY 1
I also know that I can create an extra column using over() clause, and avoid scanning "sales" twice, but then I would have two different columns instead of just one like I'm looking for:
SELECT date,
store,
count(sale_price) AS sales_count,
sum(count(sale_price)) over(PARTITION BY date) AS sales_per_day
FROM sales
GROUP BY 1, 2
--->
date | store | sales_count | sales_per_day
---------------+-------------+--------------+-----------------
2021-09-01 | foo | 3 | 4
2021-09-01 | bar | 1 | 4
2021-09-02 | foo | 1 | 3
2021-09-02 | bar | 2 | 3
etc...
Is it even possible to achieve what I'm trying to do without scanning twice? Can the last two columns (sales_count and sales_per_day) be somehow merged?
Thanks in advance for your help.
You can use GROUPING SETS, CUBE and ROLLUP to aggregate at a different levels within the same query. You can also use the GROUPING operation to determine which columns were considered in the group for a given output row:
WITH data(day, store, sale_price) AS (
VALUES
(DATE '2021-09-01', 'foo', 15),
(DATE '2021-09-01', 'foo', 10),
(DATE '2021-09-01', 'foo', 10),
(DATE '2021-09-01', 'bar', 5),
(DATE '2021-09-02', 'foo', 30),
(DATE '2021-09-02', 'bar', 40),
(DATE '2021-09-02', 'bar', 20)
)
SELECT day,
if(grouping(store) = 1, '<aggregate>', store),
count(sale_price) as sales_count
FROM data
GROUP BY GROUPING SETS ((day), (day, store))
ORDER BY day, grouping(store)

How to pivot column data into a row where a maximum qty total cannot be exceeded?

Introduction:
I have come across an unexpected challenge. I'm hoping someone can help and I am interested in the best method to go about manipulating the data in accordance to this problem.
Scenario:
I need to combine column data associated to two different ID columns. Each row that I have associates an item_id and the quantity for this item_id. Please see below for an example.
+-------+-------+-------+---+
|cust_id|pack_id|item_id|qty|
+-------+-------+-------+---+
| 1 | A | 1 | 1 |
| 1 | A | 2 | 1 |
| 1 | A | 3 | 4 |
| 1 | A | 4 | 0 |
| 1 | A | 5 | 0 |
+-------+-------+-------+---+
I need to manipulate the data shown above so that 24 rows (for 24 item_ids) is combined into a single row. In the example above I have chosen 5 items to make things easier. The selection format I wish to get, assuming 5 item_ids, can be seen below.
+---------+---------+---+---+---+---+---+
| cust_id | pack_id | 1 | 2 | 3 | 4 | 5 |
+---------+---------+---+---+---+---+---+
| 1 | A | 1 | 1 | 4 | 0 | 0 |
+---------+---------+---+---+---+---+---+
However, here's the condition that is making this troublesome. The maximum total quantity for each row must not exceed 5. If the total quantity exceeds 5 a new row associated to the cust_id and pack_id must be created for the rest of the item_id quantities. Please see below for the desired output.
+---------+---------+---+---+---+---+---+
| cust_id | pack_id | 1 | 2 | 3 | 4 | 5 |
+---------+---------+---+---+---+---+---+
| 1 | A | 1 | 1 | 3 | 0 | 0 |
| 1 | A | 0 | 0 | 1 | 0 | 0 |
+---------+---------+---+---+---+---+---+
Notice how the quantities of item_ids 1, 2 and 3 summed together equal 6. This exceeds the maximum total quantity of 5 for each row. For the second row the difference is created. In this case only item_id 3 has a single quantity remaining.
Note, if a 2nd row needs to be created that total quantity displayed in that row also cannot exceed 5. There is a known item_id limit of 24. But, there is no known limit of the quantity associated for each item_id.
Here's an approach which goes from left-field a bit.
One approach would have been to do a recursive CTE, building the rows one-by-one.
Instead, I've taken an approach where I
Create a new (virtual) table with 1 row per item (so if there are 6 items, there will be 6 rows)
Group those items into groups of 5 (I've called these rn_batches)
Pivot those (based on counts per item per rn_batch)
For these, processing is relatively simple
Creating one row per item is done using INNER JOIN to a numbers table with n <= the relevant quantity.
The grouping then just assigns rn_batch = 1 for the first 5 items, rn_batch = 2 for the next 5 items, etc - until there are no more items left for that order (based on cust_id/pack_id).
Here is the code
/* Data setup */
CREATE TABLE #Order (cust_id int, pack_id varchar(1), item_id int, qty int, PRIMARY KEY (cust_id, pack_id, item_id))
INSERT INTO #Order (cust_id, pack_id, item_id, qty) VALUES
(1, 'A', 1, 1),
(1, 'A', 2, 1),
(1, 'A', 3, 4),
(1, 'A', 4, 0),
(1, 'A', 5, 0);
/* Pivot results */
WITH Nums(n) AS
(SELECT (c * 100) + (b * 10) + (a) + 1 AS n
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) A(a)
CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) B(b)
CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) C(c)
),
ItemBatches AS
(SELECT cust_id, pack_id, item_id,
FLOOR((ROW_NUMBER() OVER (PARTITION BY cust_id, pack_id ORDER BY item_id, N.n)-1) / 5) + 1 AS rn_batch
FROM #Order O
INNER JOIN Nums N ON N.n <= O.qty
)
SELECT *
FROM (SELECT cust_id, pack_id, rn_batch, 'Item_' + LTRIM(STR(item_id)) AS item_desc
FROM ItemBatches
) src
PIVOT
(COUNT(item_desc) FOR item_desc IN ([Item_1], [Item_2], [Item_3], [Item_4], [Item_5])) pvt
ORDER BY cust_id, pack_id, rn_batch;
And here are results
cust_id pack_id rn_batch Item_1 Item_2 Item_3 Item_4 Item_5
1 A 1 1 1 3 0 0
1 A 2 0 0 1 0 0
Here's a db<>fiddle with
additional data in the #Orders table
the answer above, and also the processing with each step separated.
Notes
This approach (with the virtual numbers table) assumes a maximum of 1,000 for a given item in an order. If you need more, you can easily extend that numbers table by adding additional CROSS JOINs.
While I am in awe of the coders who made SQL Server and how it determines execution plans in millisends, for larger datasets I give SQL Server 0 chance to accurately predict how many rows will be in each step. As such, for performance, it may work better to split the code up into parts (including temp tables) similar to the db<>fiddle example.

Calculate difference of counter data in Hive

I have counter data stored in Hive table. Counter increments in time and sometimes is reset to zero.
I want to calculate difference between consecutive rows, but in case of a counter reset the difference is negative. An example data and expected output is here:
data: 1, 3, 6, 7, 1, 4
difference: 2, 3, 1, -6, 3, NA
expected: 2, 3, 1, 1, 3, NA
Usually such an operation is done by calculating a lag and subtracting it from the data. In case of negative difference, we should put just the value from lag, here is an example of function, which does this in R/dplyr:
diff_counter <-function(x){
# count difference between measurements
lag <- lag(x)
dx <- x - lag
reset_idx <- dx < 0 & !is.na(dx)
dx[reset_idx] = lag[reset_idx]
return(dx)
}
Can I do something similar in Hive?
Regards
Paweł
Assuming that t is your datetime column and the counter gets incremented in that order, you may use a CASE block with the LEAD function like this.
SELECT x
,CASE
WHEN (
LEAD(x) OVER (
ORDER BY t
) - x
) > 0
THEN LEAD(x) OVER (
ORDER BY t
) - x
ELSE LEAD(x) OVER (
ORDER BY t
)
END AS diff
FROM yourtable;
| X | DIFF |
|---|--------|
| 1 | 2 |
| 3 | 3 |
| 6 | 1 |
| 7 | 1 |
| 1 | 3 |
| 4 | (null) |

How to aggregate values from different rows in sql (HANA)?

I have a table of shipments defined like so (the table is stored in a HANA database, if relevant):
CREATE COLUMN TABLE SHIPMENTS (
ShipmentID INT PRIMARY KEY,
Received INT,
Facility NVARCHAR(10),
Item NVARCHAR(20)
);
Here, the 'Received' column denotes the point in time at which each shipment is received, Facility is where the shipment is received and Item is the content of the shipment.
I have filled it with data like so:
INSERT INTO SHIPMENTS VALUES (1, 0, 'Factory', 'Production machine');
INSERT INTO SHIPMENTS VALUES (2, 0, 'Office', 'Printer');
INSERT INTO SHIPMENTS VALUES (3, 0, 'Factory', 'Coffee maker');
INSERT INTO SHIPMENTS VALUES (4, 1, 'Office', 'Coffee maker');
INSERT INTO SHIPMENTS VALUES (5, 1, 'Factory', 'Fax Machine');
INSERT INTO SHIPMENTS VALUES (6, 2, 'Office', 'Computers');
INSERT INTO SHIPMENTS VALUES (7, 2, 'Factory', 'Fridge');
INSERT INTO SHIPMENTS VALUES (8, 2, 'Factory', 'Freezer');
INSERT INTO SHIPMENTS VALUES (9, 2, 'Office', 'Fax Machine');
I would like to query the database to find, at each point in time, which items have been received up until that point. Based on an answer from another thread, I start by doing this:
SELECT Facility, Received, STRING_AGG (Item, ';') as Items
FROM (
SELECT * FROM SHIPMENTS
ORDER BY Facility, Received
)
GROUP BY Facility, Received
ORDER BY Facility, Received;
which results in
| FACILITY | RECEIVED | ITEMS
---------------------------------------------------------
1 | Factory | 0 | Production Machine;Coffee maker
2 | Factory | 1 | Fax Machine
3 | Factory | 2 | Fridge;Freezer
4 | Office | 0 | Printer
5 | Office | 1 | Coffee maker
6 | Office | 2 | Computers;Fax Machine
However, I would like this
| FACILITY | RECEIVED | ITEMS
---------------------------------------------------------
1 | Factory | 0 | Production Machine;Coffee maker
2 | Factory | 1 | Production Machine;Coffee maker;Fax Machine
3 | Factory | 2 | Production Machine;Coffee maker;Fax Machine;Fridge;Freezer
4 | Office | 0 | Printer
5 | Office | 1 | Printer;Coffee maker
6 | Office | 2 | Printer;Coffee maker;Computers;Fax Machine
I.e, each row displays what is received at that point, and everything that has already been received. Is there a nice way to do this in SQL?
You can try using a correlated query in the select clause to generate the csv data you want:
SELECT
Facility,
Received,
(SELECT STRING_AGG (s2.Item, ';') FROM SHIPMENTS s2
WHERE s2.Facility = s1.Facility AND s2.Received <= s1.Received
GROUP BY s2.Facility) AS ITEMS
FROM SHIPMENTS s1
GROUP BY
Facility,
Received
ORDER BY
Facility;
Maybe it could be a good idea to use ORDER BY clause with String_Agg function to make sure that the concatenation will be in desired order
select
distinct Facility, Received,
(
select string_agg(s.Item, ';' order by Received, ShipmentID)
from Shipments s
where
s.Facility = t.Facility and
s.Received <= t.Received
) as Items
from Shipments t