Generating Rows Based on Column Value - sql

One of my tables in my database contains rows with requisition numbers and other related info. I am trying to create a second table (populated with an INSERT INTO statement) that duplicates these rows and adds a series value based on the value in the QuantityOrdered column.
For example, the first table is shown below:
+-------------+----------+
| Requisition | Quantity |
+-------------+----------+
| 10001_01_AD | 4 |
+-------------+----------+
and I would like the output to be as follows:
+-------------+----------+----------+
| Requisition | Quantity | Series |
+-------------+----------+----------+
| 10001_01_AD | 4 | 1 |
| 10001_01_AD | 4 | 2 |
| 10001_01_AD | 4 | 3 |
| 10001_01_AD | 4 | 4 |
+-------------+----------+----------+
I've been attempting to use Row_Number() to sequence the values but it's numbering rows based on instances of Requisition values, not based on the Quantity value.

Non-recursive way:
SELECT *
FROM tab t
CROSS APPLY (SELECT n
FROM (SELECT ROW_NUMBER() OVER(ORDER BY 1/0) AS n
FROM master..spt_values s1) AS sub
WHERE sub.n <= t.Quantity) AS s2(Series);
db<>fiddle demo

You need recursive way :
with t as (
select Requisition, 1 as start, Quantity
from table
union all
select Requisition, start + 1, Quantity
from t
where start < Quantity
)
select Requisition, Quantity, start as Series
from t;
However, by default it has limited to only 100 Quantities, if you have a more then you need to specify the query hint by using option (maxrecursion 0).

A simple method uses recursive CTEs:
with cte as (
select requsition, quantity, 1 as series
from t
union all
select requsition, quantity, 1 + series
from t
where lev < quantity
)
select requsition, quantity, series
from cte;
With default setting, this works up to a quantity of 100. For larger quantities, you can add option (maxrecursion 0) to the query.

Related

Stop SQL Select After Sum Reached

My database is Db2 for IBM i.
I have read-only access, so my query must use only basic SQL select commands.
==============================================================
Goal:
I want to select every record in the table until the sum of the amount column exceeds the predetermined limit.
Example:
I want to match every item down the table until the sum of matched values in the "price" column >= $9.00.
The desired result:
Is this possible?
You may use sum analytic function to calculate running total of price and then filter by its value:
with a as (
select
t.*,
sum(price) over(order by salesid asc) as price_rsum
from t
)
select *
from a
where price_rsum <= 9
SALESID | PRICE | PRICE_RSUM
------: | ----: | ---------:
1001 | 5 | 5
1002 | 3 | 8
1003 | 1 | 9
db<>fiddle here

How to pivot column data into a row where a maximum qty total cannot be exceeded?

Introduction:
I have come across an unexpected challenge. I'm hoping someone can help and I am interested in the best method to go about manipulating the data in accordance to this problem.
Scenario:
I need to combine column data associated to two different ID columns. Each row that I have associates an item_id and the quantity for this item_id. Please see below for an example.
+-------+-------+-------+---+
|cust_id|pack_id|item_id|qty|
+-------+-------+-------+---+
| 1 | A | 1 | 1 |
| 1 | A | 2 | 1 |
| 1 | A | 3 | 4 |
| 1 | A | 4 | 0 |
| 1 | A | 5 | 0 |
+-------+-------+-------+---+
I need to manipulate the data shown above so that 24 rows (for 24 item_ids) is combined into a single row. In the example above I have chosen 5 items to make things easier. The selection format I wish to get, assuming 5 item_ids, can be seen below.
+---------+---------+---+---+---+---+---+
| cust_id | pack_id | 1 | 2 | 3 | 4 | 5 |
+---------+---------+---+---+---+---+---+
| 1 | A | 1 | 1 | 4 | 0 | 0 |
+---------+---------+---+---+---+---+---+
However, here's the condition that is making this troublesome. The maximum total quantity for each row must not exceed 5. If the total quantity exceeds 5 a new row associated to the cust_id and pack_id must be created for the rest of the item_id quantities. Please see below for the desired output.
+---------+---------+---+---+---+---+---+
| cust_id | pack_id | 1 | 2 | 3 | 4 | 5 |
+---------+---------+---+---+---+---+---+
| 1 | A | 1 | 1 | 3 | 0 | 0 |
| 1 | A | 0 | 0 | 1 | 0 | 0 |
+---------+---------+---+---+---+---+---+
Notice how the quantities of item_ids 1, 2 and 3 summed together equal 6. This exceeds the maximum total quantity of 5 for each row. For the second row the difference is created. In this case only item_id 3 has a single quantity remaining.
Note, if a 2nd row needs to be created that total quantity displayed in that row also cannot exceed 5. There is a known item_id limit of 24. But, there is no known limit of the quantity associated for each item_id.
Here's an approach which goes from left-field a bit.
One approach would have been to do a recursive CTE, building the rows one-by-one.
Instead, I've taken an approach where I
Create a new (virtual) table with 1 row per item (so if there are 6 items, there will be 6 rows)
Group those items into groups of 5 (I've called these rn_batches)
Pivot those (based on counts per item per rn_batch)
For these, processing is relatively simple
Creating one row per item is done using INNER JOIN to a numbers table with n <= the relevant quantity.
The grouping then just assigns rn_batch = 1 for the first 5 items, rn_batch = 2 for the next 5 items, etc - until there are no more items left for that order (based on cust_id/pack_id).
Here is the code
/* Data setup */
CREATE TABLE #Order (cust_id int, pack_id varchar(1), item_id int, qty int, PRIMARY KEY (cust_id, pack_id, item_id))
INSERT INTO #Order (cust_id, pack_id, item_id, qty) VALUES
(1, 'A', 1, 1),
(1, 'A', 2, 1),
(1, 'A', 3, 4),
(1, 'A', 4, 0),
(1, 'A', 5, 0);
/* Pivot results */
WITH Nums(n) AS
(SELECT (c * 100) + (b * 10) + (a) + 1 AS n
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) A(a)
CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) B(b)
CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) C(c)
),
ItemBatches AS
(SELECT cust_id, pack_id, item_id,
FLOOR((ROW_NUMBER() OVER (PARTITION BY cust_id, pack_id ORDER BY item_id, N.n)-1) / 5) + 1 AS rn_batch
FROM #Order O
INNER JOIN Nums N ON N.n <= O.qty
)
SELECT *
FROM (SELECT cust_id, pack_id, rn_batch, 'Item_' + LTRIM(STR(item_id)) AS item_desc
FROM ItemBatches
) src
PIVOT
(COUNT(item_desc) FOR item_desc IN ([Item_1], [Item_2], [Item_3], [Item_4], [Item_5])) pvt
ORDER BY cust_id, pack_id, rn_batch;
And here are results
cust_id pack_id rn_batch Item_1 Item_2 Item_3 Item_4 Item_5
1 A 1 1 1 3 0 0
1 A 2 0 0 1 0 0
Here's a db<>fiddle with
additional data in the #Orders table
the answer above, and also the processing with each step separated.
Notes
This approach (with the virtual numbers table) assumes a maximum of 1,000 for a given item in an order. If you need more, you can easily extend that numbers table by adding additional CROSS JOINs.
While I am in awe of the coders who made SQL Server and how it determines execution plans in millisends, for larger datasets I give SQL Server 0 chance to accurately predict how many rows will be in each step. As such, for performance, it may work better to split the code up into parts (including temp tables) similar to the db<>fiddle example.

Counting the total number of rows with SELECT DISTINCT ON without using a subquery

I have performing some queries using PostgreSQL SELECT DISTINCT ON syntax. I would like to have the query return the total number of rows alongside with every result row.
Assume I have a table my_table like the following:
CREATE TABLE my_table(
id int,
my_field text,
id_reference bigint
);
I then have a couple of values:
id | my_field | id_reference
----+----------+--------------
1 | a | 1
1 | b | 2
2 | a | 3
2 | c | 4
3 | x | 5
Basically my_table contains some versioned data. The id_reference is a reference to a global version of the database. Every change to the database will increase the global version number and changes will always add new rows to the tables (instead of updating/deleting values) and they will insert the new version number.
My goal is to perform a query that will only retrieve the latest values in the table, alongside with the total number of rows.
For example, in the above case I would like to retrieve the following output:
| total | id | my_field | id_reference |
+-------+----+----------+--------------+
| 3 | 1 | b | 2 |
+-------+----+----------+--------------+
| 3 | 2 | c | 4 |
+-------+----+----------+--------------+
| 3 | 3 | x | 5 |
+-------+----+----------+--------------+
My attemp is the following:
select distinct on (id)
count(*) over () as total,
*
from my_table
order by id, id_reference desc
This returns almost the correct output, except that total is the number of rows in my_table instead of being the number of rows of the resulting query:
total | id | my_field | id_reference
-------+----+----------+--------------
5 | 1 | b | 2
5 | 2 | c | 4
5 | 3 | x | 5
(3 rows)
As you can see it has 5 instead of the expected 3.
I can fix this by using a subquery and count as an aggregate function:
with my_values as (
select distinct on (id)
*
from my_table
order by id, id_reference desc
)
select count(*) over (), * from my_values
Which produces my expected output.
My question: is there a way to avoid using this subquery and have something similar to count(*) over () return the result I want?
You are looking at my_table 3 ways:
to find the latest id_reference for each id
to find my_field for the latest id_reference for each id
to count the distinct number of ids in the table
I therefore prefer this solution:
select
c.id_count as total,
a.id,
a.my_field,
b.max_id_reference
from
my_table a
join
(
select
id,
max(id_reference) as max_id_reference
from
my_table
group by
id
) b
on
a.id = b.id and
a.id_reference = b.max_id_reference
join
(
select
count(distinct id) as id_count
from
my_table
) c
on true;
This is a bit longer (especially the long thin way I write SQL) but it makes it clear what is happening. If you come back to it in a few months time (somebody usually does) then it will take less time to understand what is going on.
The "on true" at the end is a deliberate cartesian product because there can only ever be exactly one result from the subquery "c" and you do want a cartesian product with that.
There is nothing necessarily wrong with subqueries.

PostgreSQL return multiple rows with DISTINCT though only latest date per second column

Lets says I have the following database table (date truncated for example only, two 'id_' preix columns join with other tables)...
+-----------+---------+------+--------------------+-------+
| id_table1 | id_tab2 | date | description | price |
+-----------+---------+------+--------------------+-------+
| 1 | 11 | 2014 | man-eating-waffles | 1.46 |
+-----------+---------+------+--------------------+-------+
| 2 | 22 | 2014 | Flying Shoes | 8.99 |
+-----------+---------+------+--------------------+-------+
| 3 | 44 | 2015 | Flying Shoes | 12.99 |
+-----------+---------+------+--------------------+-------+
...and I have a query like the following...
SELECT id, date, description FROM inventory ORDER BY date ASC;
How do I SELECT all the descriptions, but only once each while simultaneously only the latest year for that description? So I need the database query to return the first and last row from the sample data above; the second it not returned because the last row has a later date.
Postgres has something called distinct on. This is usually more efficient than using window functions. So, an alternative method would be:
SELECT distinct on (description) id, date, description
FROM inventory
ORDER BY description, date desc;
The row_number window function should do the trick:
SELECT id, date, description
FROM (SELECT id, date, description,
ROW_NUMBER() OVER (PARTITION BY description
ORDER BY date DESC) AS rn
FROM inventory) t
WHERE rn = 1
ORDER BY date ASC;

SQL Query to select bottom 2 from each category

In Mysql, I want to select the bottom 2 items from each category
Category Value
1 1.3
1 4.8
1 3.7
1 1.6
2 9.5
2 9.9
2 9.2
2 10.3
3 4
3 8
3 16
Giving me:
Category Value
1 1.3
1 1.6
2 9.5
2 9.2
3 4
3 8
Before I migrated from sqlite3 I had to first select a lowest from each category, then excluding anything that joined to that, I had to again select the lowest from each category. Then anything equal to that new lowest or less in a category won. This would also pick more than 2 in case of a tie, which was annoying... It also had a really long runtime.
My ultimate goal is to count the number of times an individual is in one of the lowest 2 of a category (there is also a name field) and this is the one part I don't know how to do.
Thanks
SELECT c1.category, c1.value
FROM catvals c1
LEFT OUTER JOIN catvals c2
ON (c1.category = c2.category AND c1.value > c2.value)
GROUP BY c1.category, c1.value
HAVING COUNT(*) < 2;
Tested on MySQL 5.1.41 with your test data. Output:
+----------+-------+
| category | value |
+----------+-------+
| 1 | 1.30 |
| 1 | 1.60 |
| 2 | 9.20 |
| 2 | 9.50 |
| 3 | 4.00 |
| 3 | 8.00 |
+----------+-------+
(The extra decimal places are because I declared the value column as NUMERIC(9,2).)
Like other solutions, this produces more than 2 rows per category if there are ties. There are ways to construct the join condition to resolve that, but we'd need to use a primary key or unique key in your table, and we'd also have to know how you intend ties to be resolved.
You could try this:
SELECT * FROM (
SELECT c.*,
(SELECT COUNT(*)
FROM user_category c2
WHERE c2.category = c.category
AND c2.value < c.value) cnt
FROM user_category c ) uc
WHERE cnt < 2
It should give you the desired results, but check if performance is ok.
Here's a solution that handles duplicates properly. Table name is 'zzz' and columns are int and float
select
smallest.category category, min(smallest.value) value
from
zzz smallest
group by smallest.category
union
select
second_smallest.category category, min(second_smallest.value) value
from
zzz second_smallest
where
concat(second_smallest.category,'x',second_smallest.value)
not in ( -- recreate the results from the first half of the union
select concat(c.category,'x',min(c.value))
from zzz c
group by c.category
)
group by second_smallest.category
order by category
Caveats:
If there is only one value for a given category, then only that single entry is returned.
If there was a unique recordID for each row you wouldn't need all the concats to simulate a unique key.
Your mileage may vary,
--Mark
A union should work. I'm not sure of the performance compared to Peter's solution.
SELECT smallest.category, MIN(smallest.value)
FROM categories smallest
GROUP BY smallest.category
UNION
SELECT second_smallest.category, MIN(second_smallest.value)
FROM categories second_smallest
WHERE second_smallest.value > (SELECT MIN(smallest.value) FROM categories smallest WHERE second.category = second_smallest.category)
GROUP BY second_smallest.category
Here is a very generalized solution, that would work for selecting first n rows for each Category. This will work even if there are duplicates in value.
/* creating temporary variables */
mysql> set #cnt = 0;
mysql> set #trk = 0;
/* query */
mysql> select Category, Value
from (select *,
#cnt:=if(#trk = Category, #cnt+1, 0) cnt,
#trk:=Category
from user_categories
order by Category, Value ) c1
where c1.cnt < 2;
Here is the result.
+----------+-------+
| Category | Value |
+----------+-------+
| 1 | 1.3 |
| 1 | 1.6 |
| 2 | 9.2 |
| 2 | 9.5 |
| 3 | 4 |
| 3 | 8 |
+----------+-------+
This is tested on MySQL 5.0.88
Note that initial value of #trk variable should be not the least value of Category field.