Limiting output of rows based on count of values in another table? - sql

As a base example, I have a query that effectively produces a table with a list of values (ID numbers), each of which is attached to a specific category. As a simplified example, it would produce something like this (but at a much larger scale):
IDS
Categories
12345
type 1
12456
type 6
77689
type 3
32456
type 4
67431
type 2
13356
type 2
.....
.....
Using this table, I want to populate another table that gives me a list of ID numbers, with a limit placed on how many of each category are in that list, cross referenced against a sort of range based chart. For instance, if there are 5-15 IDS of type 1 in my first table, I want the new table with the column of IDS to have 3 type 1 IDS in it, if there are 15-30 type 1 IDS in the first table, I want to have 6 type 1 IDS in the new table.
This sort of range based limit would apply to each category, and the IDS would all populate the same column in the new table. The order, or specific IDS that end up in the final table don't matter, as long as the correct number of IDS end up as a part of that final list of ID numbers. This is being used to provide a semi-random sampling of ID numbers based on categories for a sort of QA related process.
If parts of this are unclear I can do my best to explain more. My initial thought was using a variable for a limit clause, but that isnt possible. I have been trying to sort out how to do this with a case statement but I'm really just not making any headway there but I feel like I am at this sort of paper thin wall I just can't break through.

You can use two window functions:
COUNT to keep track of the amount of ids for each category
ROW_NUMBER to uniquely identify each id within each category
Once you have collected these information, it's sufficient to keep all those rows that satisfy either of the following conditions:
count of rows less or equal to 30 >> ranking less or equal to 6
count of rows less or equal to 15 >> ranking less or equal to 3
WITH cte AS (
SELECT IDS,
Categories,
ROW_NUMBER() OVER(ORDER BY IDS PARTITION BY Categories) AS rn
COUNT(IDS) OVER(PARTITION BY Categories) AS cnt
FROM tab
)
SELECT *
FROM cte
WHERE (rn <= 3 AND cnt <= 15)
OR (rn <= 6 AND cnt <= 30)
Note: If you have concerns regarding a specific ordering, you need to fix the ORDER BY clause inside the ROW_NUMBER window function.

Related

Reduce the table fragmentation Postgresql

I have a table with n elements with ids from 1 to n. Then I delete nearly all the data from the table, leaving only m elements, m << n, including element with id=n. Before deleting the difference between ids of adjacent rows was 1, now it could be much larger. I need to update the table such a way that remaining element ids were from 1 to m and difference between ids of adjacent rows was 1. How I can do this? I use PostgreSQL.
You may find a better option to not update the primary key (especially in heavy multiuser einvironment) but to delegate the removal of the gaps to the accessing view.
Simple Example
create table test as
select id from generate_series(1,5) id;
delete from test
where id in (2,3,4);
The view assigns teh new continuous sequence using the row_number function.
create view test_gapless as
select
id,
row_number() over (order by id) gapless_id
from test;
select * from test_gapless;
id, gapless_id
1 1
5 2

How to distinguish rows in a database table on the basis of two or more columns while returning all columns in sql server

I want to distinguish Rows on the basis of two or more columns value of the same table at the same time returns all columns from the table.
Ex: I have this table
DB Table
I want my result to be displayed as: filter on the basis of type and Number only. As in abover table type and Number for first and second Row is same so it should be suppressed in result.
txn item Discrip Category type Number Mode
60 2 Loyalty L 6174 XXXXXXX1390 0
60 4 Visa C 1600 XXXXXXXXXXXX4108 1
I have tried with sub query but yet unsuccessful. Please suggest what to try.
Thanks
You can do what you want with row_number():
select t.*
from (select t.*,
row_number() over (partition by type, number order by item) as seqnum
from t
) t
where seqnum = 1;

how can I calculate the sum of my top n records in crystal report?

I m using report tab -> group sort expert-> top n to get top n record but i m getting sum of value in report footer for all records
I want only sum of value of top n records...
In below image i have select top 3 records but it gives sum of all records.
The group sort expert (and the record sort expert too) intervenes in your final result after the total summary is calculated. It is unable to filter and remove rows, in the same way an ORDER BY clause of SQL cannot effect the SELECT's count result (this is a job for WHERE clause). As a result, your summary will always be computed for all rows of your detail section and, of course, for all your group sums.
If you have in mind a specific way to exlude specific rows in order to appear the appropriate sum the you can use the Select Expert of Crystal Reports to remove rows.
Alternatively (and I believe this is the best way), I would make all the necessary calculations in the SQL command and I would sent to the report only the Top 3 group sums (then you can get what you want with a simple total summary of these 3 records)
Something like that
CREATE TABLE #TEMP
(
DEP_NAME varchar(50),
MINVAL int,
RMAVAL int,
NETVAL int
)
INSERT INTO #TEMP
SELECT TOP 3
T.DEP_NAME ,T.MINVAL,T.RMAVAL,T.NETVAL
FROM
(SELECT DEP_NAME AS DEP_NAME,SUM(MINVAL) AS MINVAL,SUM(RMAVAL) AS
RMAVAL,SUM(NETVAL) AS NETVAL
FROM YOURTABLE
GROUP BY DEP_NAME) AS T
ORDER BY MINVAL DESC
SELECT * FROM #TEMP

Query to find duplicate values for two fields

Sorry for the Title, But didn't know how to explain.
I have a table that have 2 fields A and B.
I want find all rows in the table that have duplicate A (more than one record) but at the same time A will consider as a duplicate only if B is different in both rows.
Example:
FIELD A Field B
10 10
10 10 // This is not duplicate
10 10
10 5 // this is a duplicate
How to to this in a single query
Let's break this down into how you would go about constructing such a query. You don't make it clear whether you're looking for all values of A or all rows but let's assume all values of A initially.
The first step therefore is to create a list of all values of A. This can be done two ways, DISTINCT or GROUP BY. I'm going to use GROUP BY because of what else you want to do:
select a
from your_table
group by a
This returns a single column that is unique on A. Now, how can you change this to give you the unique values? The most obvious thing to use is the HAVING clause, which allows you to restrict on aggregated values. For instance the following will give you all values of A which only appear once in the table
select a
from your_table
group by a
having count(*) = 1
That is the count of all values of A inside the group is 1. You don't want this of course, you want to do this with the column B. You need there to exist more than one value of B in order for the situation you want to identify to be possible (if there's only one value of B then it's impossible). This gets us to
select a
from your_table
group by a
having count(b) > 1
This still isn't enough as you want two different values of B. The above just counts the number of records with the column B. Inside an aggregate function you use the DISTINCT keyword to determine unique values; bringing us to:
select a
from your_table
group by a
having count(distinct b) > 1
To transcribe this into English this means select all unique values of A from YOUR_TABLE that have more than one values of B in the group.
You can use this method, or something similar, to build up your own queries as you create them. Determine what you want to achieve and slowly build up to it.
select FIELD from your_table group by FIELD having count(b) > 1
take in consideration that this will return count of all duplicate
example
if you have values
1
1
2
1
it will return 3 for value 1 not 2

Approach to a Bin Packing sql problem

I have a problem in sql where I need to generate a packing list from a list of transactions.
Data Model
The transactions are stored in a table that contains:
transaction id
item id
item quantity
Each transaction can have multiple items (and coincidentally multiple rows with the same transaction id). Each item then has a quantity from 1 to N.
Business Problem
The business requires that we create a packing list, where each line item in the packing list contains the count of each item in the box.
Each box can only contain 160 items (they all happen to be the same size/weight). Based on the total count of the order we need to split items into different boxes (sometimes splitting even the individual item's collection into two boxes)
So the challenge is to take that data schema and come up with the result set that includes how many of each item belong in each box.
I am currently brute forcing this in some not so pretty ways and wondering if anyone has an elegant/simple solution that I've overlooked.
Example In/Out
We really need to isolate how many of each item end up in each box...for example:
Order 1:
100 of item A100 of item B140 of item C
This should result in three rows in the result set:
Box 1: A (100), B (60) Box 2: B(40), C (120) Box 3: C(20)
Ideally the query would be smart enough to put all of C together, but at this point - we're not too concerned with that.
How about something like
SELECT SUM([Item quantity]) as totalItems
, SUM([Item quantity]) / 160 as totalBoxes
, MOD(SUM([Item Quantity), 160) amountInLastBox
FROM [Transactions]
GROUP BY [Transaction Id]
Let me know what fields in the resultset you're looking for and I could come up with a better one
I was looking for something similar and all I could achieve was expanding the rows to the number of item counts in a transaction, and grouping them into bins. Not very elegant though.. Moreover, because string aggregation is still very cumbersome in SQL Server (Oracle, i miss you!), I have to leave the last part out. I mean putting the counts in one single row..
My solution is as follows:
Example transactions table:
INSERT INTO transactions
(trans_id, item, cnt) VALUES
('1','A','50'),
('2','A','140'),
('3','B','100'),
('4','C','80');
GO
Create a dummy sequence table, which contains numbers from 1 to 1000 (I assume that maximum number allowed for an item in a single transaction is 1000):
CREATE TABLE numseq (n INT NOT NULL IDENTITY) ;
GO
INSERT numseq DEFAULT VALUES ;
WHILE SCOPE_IDENTITY() < 1000 INSERT numseq DEFAULT VALUES ;
GO
Now we can generate a temporary table from transactions table, in which each transaction and item exist "cnt" times in a subquery, and then give numbers to the bins using division, and group by bin number:
SELECT bin_nr, item, count(*) count_in_bin
INTO result
FROM (
SELECT t.item, ((row_number() over (order by t.item, s.n) - 1) / 160) + 1 as bin_nr
FROM transactions t
INNER JOIN numseq s
ON t.cnt >= s.n -- join conditionally to repeat transaction rows "cnt" times
) a
GROUP BY bin_id, item
ORDER BY bin_id, item
GO
Result is:
bin_id item count_in_bin
1 A 160
2 A 30
2 B 100
2 C 30
3 C 50
In Oracle, the last step would be as simple as that:
SELECT bin_id, WM_CONCAT(CONCAT(item,'(',count_in_bin,')')) contents
FROM result
GROUP BY bin_id
This isn't the prettiest answer but I am using a similar method to keep track of stock items through an order process, and it is easy to understand, and may lead to you developing a better method than I have.
I would create a table called "PackedItem" or something similar. The columns would be:
packed_item_id (int) - Primary Key, Identity column
trans_id (int)
item_id (int)
box_number (int)
Each record in this table represents 1 physical unit you will ship.
Lets say someone adds a line to transaction 4 with 20 of item 12, I would add 20 records to the PackedItem table, all with the transaction ID, the Item ID, and a NULL box number. If a line is updated, you need to add or remove records from the PackedItem table so that there is always a 1:1 correlation.
When the time comes to ship, you can simply
SELECT TOP 160 FROM PackedItem WHERE trans_id = 4 AND box_number IS NULL
and set the box_number on those records to the next available box number, until no records remain where the box_number is NULL. This is possible using one fairly complicated UPDATE statement inside a WHILE loop - which I don't have the time to construct fully.
You can now easily get your desired packing list by querying this table as follows:
SELECT box_number, item_id, COUNT(*) AS Qty
FROM PackedItem
WHERE trans_id = 4
GROUP BY box_number, item_id
Advantages - easy to understand, fairly easy to implement.
Pitfalls - if the table gets out of sync with the lines on the Transaction, the final result can be wrong; This table will get many records in it and will be extra work for the server. Will need each ID field to be indexed to keep performance good.