How to Count the number of occurrences for a group of rows and remove duplicated rows? - vba

I have a report where I need to find duplicated groups of rows defined by Column named Cut No and each time this group of rows repeated I have to count the number of its occurrence as 1,2,3 until any column data changes as follows:
Cut No Length BBS No BM Quantity Occurrences
1 3300 2453-04-ST-RF-HA-FO-0411-02 39 2
1 3200 2453-04-ST-RF-HA-FO-0411-02 952 1
1 2125 2453-04-ST-RF-HA-FO-0411-02 77 1 N
2 3300 2453-04-ST-RF-HA-FO-0411-02 39 2
2 3200 2453-04-ST-RF-HA-FO-0411-02 952 1
2 2125 2453-04-ST-RF-HA-FO-0411-02 77 1 N + 1
3 3300 2453-04-ST-RF-HA-FO-0411-02 77 1
3 3200 2453-04-ST-RF-HA-FO-0411-02 952 1
3 2125 **2453-04-ST-RF-HA-FO-0412-02** 77 1 N + 1
The problem is that all codes works on a row level but in this case I am working on changing group of rows defined by cut No and the other four columns should be unique,
When I use Remove Duplicates using the Cut No, it says No Duplicates
And I use Remove Duplicates without using the Cut No, I get the following
Length BBS No BM Quantity
3300 2453-04-ST-RF-HA-FO-0411-02 39 2
3200 2453-04-ST-RF-HA-FO-0411-02 952 1
2125 2453-04-ST-RF-HA-FO-0411-02 77 1
2125 **2453-04-ST-RF-HA-FO-0412-02** 77 1
It gives unique rows But What I want Unique Group of Rows As Explained Above
So how I can do this When each group of rows matched with the next, the Number of Occurrences Increased by 1
Thanks your reply
Moheb Labib

Use subtotals. Select your table and Data -> Subtotal and selec the colums you want subtotals of:

If (and only if) I understand you well, then maybe a quick solution to your problem would be simpy convert each group of rows into a single row having three times more columns e.g. in an additional sheet. Then you may simply work on rows instead of groups of rows.

Related

Convert This SQL Query to ANSI SQL

I would like to convert this SQL query into ANSI SQL. I am having trouble wrapping my head around the logic of this query.
I use Snowflake Data Warehouse, but it does not understand this query because of the 'delete' statement right before join, so I am trying to break it down. From my understanding the row number column is giving me the order from 1 to N based on timestamp and placing it in C. Then C is joined against itself on the rows other than the first row (based on id) and placed in C1. Then C1 is deleted from the overall data, which leaves only the first row.
I may be understanding the logic incorrectly, but I am not used to seeing the 'delete' statement right before a join. Let me know if I got the logic right, or point me in the right direction.
This query was copy/pasted from THIS stackoverflow question which has the exact situation I am trying to solve, but on a much larger scale.
with C as
(
select ID,
row_number() over(order by DT) as rn
from YourTable
)
delete C1
from C as C1
inner join C as C2
on C1.rn = C2.rn-1 and
C1.ID = C2.ID
The specific problem I am trying to solve is this. Let's assume I have this table. I need to partition the rows by primary key combinations (primKey 1 & 2) while maintaining timestamp order.
ID primKey1 primKey2 checkVar1 checkVar2 theTimestamp
100 1 2 302 423 2001-07-13
101 3 6 506 236 2005-10-25
100 1 2 302 423 2002-08-15
101 3 6 506 236 2008-12-05
101 3 6 300 100 2010-06-10
100 1 2 407 309 2005-09-05
100 1 2 302 423 2012-05-09
100 1 2 302 423 2003-07-24
Once the rows are partitioned and the timestamp is ordered within each partition, I need to delete the duplicate checkVar combination (checkVar 1 & 2) rows until the next change. Thus leaving me with the earliest unique row. The rows with asterisks are the ones which need to be removed since they are duplicates.
ID primKey1 primKey2 checkVar1 checkVar2 theTimestamp
100 1 2 302 423 2001-07-13
*100 1 2 302 423 2002-08-15
*100 1 2 302 423 2003-07-24
100 1 2 407 309 2005-09-05
100 1 2 302 423 2012-05-09
101 3 6 506 236 2005-10-25
*101 3 6 506 236 2008-12-05
101 3 6 300 100 2010-06-10
This is the final result. As you can see for ID=100, even though the 1st and 3rd record are the same, the checkVar combination changed in between, which is fine. I am only removing the duplicates until the values change.
ID primKey1 primKey2 checkVar1 checkVar2 theTimestamp
100 1 2 302 423 2001-07-13
100 1 2 407 309 2005-09-05
100 1 2 302 423 2012-05-09
101 3 6 506 236 2005-10-25
101 3 6 300 100 2010-06-10
If you want to keep the earliest row for each id, then you can use:
delete from yourtable yt
where yt.dt > (select min(yt2.dt)
from yourtable yt
where yt2.id = yd.id
);
Your query would not do this, if that is your intent.

Combine multiple rows using SUM that share a same column value but has different other column values

I thought this would be a very simple query but for some reason, I can't seem to get the results I'm looking for. I have a table that has this structure. I just want a single entry for each account while summing the charges. I don't really care which date I keep, just one of them.
Account Charges Charges2 Date
1 100 50 1/1/2015
1 50 0 1/2/2015
2 50 0 2/4/2015
2 70 30 2/19/2015
3 100 0 1/12/2014
4 0 20 4/3/2015
4 40 20 4/9/2015
The result I want is:
Account Charges Charges2 Date
1 150 50 1/1/2015
2 120 30 2/4/2015
3 100 0 1/12/2014
4 40 40 4/3/2015
The result I currently get is:
Account Charges Charges2 Date
1 100 50 1/1/2015
2 70 30 2/19/2015
3 100 0 1/12/2014
4 40 40 4/9/2015
I thought this would be very simple and I tried below. But this doesn't sum them up, it just seems to return the rows where Charges2 is NOT 0.
SELECT Account, SUM(Charges) As TotCharges, SUM(Charges2) AS TotCharges2
FROM TABLE
GROUP BY Account
ORDER BY Account
You can apply the min() aggregate function to the date to limit the number of rows returned to one per account:
SELECT
Account,
SUM(Charges) AS TotCharges,
SUM(Charges2) AS TotCharges2,
MIN(Date) AS Date
FROM TABLE
GROUP BY Account
ORDER BY Account
Sample SQL Fiddle

select previous row value for same user (multiple records)

I have a query in Access 2010 (have also tried on 2013, same result) that is working but not perfectly for all records. I'm wondering if anyone knows what is causing the error.
Here is the query (adapted from http://allenbrowne.com/subquery-01.html#AnotherRecord):
SELECT t_test_table.individ, t_test_table.test_date, t_test_table.score1, (SELECT top 1 Dupe.score1
FROM t_test_table AS Dupe
WHERE Dupe.individ = t_test_table.individ
AND Dupe.test_date < t_test_table.test_date
ORDER BY Dupe.primary DESC, Dupe.individ
) AS PriorValue, [score1]-[priorvalue] AS scorechange
FROM t_test_table;
The way the data is set up, an individual has multiple records in the file (designated by individ) representing different dates a test was taken. A date AND individ combination are unique - you can only take a test once. [primary] refers to primary key column. I just made it because the individ field is not a primary key since multiples are possible (I'm not including it here due to space)
The goal of the above code was to create the following:
individ test_date score1 PriorValue scorechange
1 3/1/2013 40
1 6/4/2013 51 40 11
1 7/25/2013 55 51 4
1 12/13/2013 59 55 4
5 8/29/2009 39
5 12/9/2009 47 39 8
5 6/1/2010 58 47 11
5 8/28/2010 42 58 -16
5 12/15/2010 51 42 9
Here is what I actually got. You can see that for individ 1, it winds up taking the first score rather than the previous score for each subsequent record. For individ 5, it kind of works, but the final priorvalue should be 42 and not 58.
individ test_date score1 PriorValue scorechange
1 3/1/2012 40
1 6/4/2012 51 40 11
1 7/25/2012 55 40 15
1 12/13/2012 59 40 19
5 8/29/2005 39
5 12/9/2005 47 39 8
5 6/1/2006 58 47 11
5 8/28/2006 42 58 -16
5 12/15/2006 51 58 -7
Does anyone have any ideas about what went wrong here? In other records, it works perfectly, but I can't determine what is causing some records to fail to take the previous value.Any help is appreciated, and let me know if you require additional information.
To get the most recent test for a given individ, you'll need to include a sort by date. In your inner query, replace
ORDER BY Dupe.primary DESC, Dupe.individ
with
ORDER BY Dupe.test_date DESC
It's hard to say exactly what effect sorting by primary has, since you haven't told us how you're generating the values of primary. If the combination of individ and test_date is guaranteed to be unique, you might want to consider making the two of them into your primary key instead of creating a new thing. The Dupe.individ in the ORDER BY line has no effect, since your WHERE clause already limited the results of the inner query to one individ.

How do I loop through a table until condition reached

I have a table product, pick_qty, shortfall, location, loc_qty
Product Picked Qty Shortfall Location Location Qty
1742 4 58 1 15
1742 4 58 2 20
1742 4 58 3 15
1742 4 58 4 20
1742 4 58 5 20
1742 4 58 6 20
1742 4 58 7 15
1742 4 58 8 15
1742 4 58 9 15
1742 4 58 10 20
I want a report to loop around and show the number of locations and the quantity I need to drop to fulfil the shortfall for replenishment. So the report would look like this.
Product Picked Qty Shortfall Location Location Qty
1742 4 58 1 15
1742 4 58 2 20
1742 4 58 3 15
1742 4 58 4 20
Note that it is best not to think about SQL "looping through a table" and instead to think about it as operating on some subset of the rows in a table.
What it sounds like you need to do is create a running total that tells how many of the item you would have if you were to take all of them from a location and all of the locations that came before the current location and then check to see if that would give you enough of the item to fulfill the shortfall.
Based on your example data, the following query would work, though if Locations aren't actually numerics then you would need to add a row number column and tweak the query a bit to use the row number instead of the Location Number; It would still be very similar to the query below.
SELECT
Totals.Product, Totals.PickedQty, Totals.ShortFall, Totals.Location, Totals.LocationQty
FROM (
SELECT
TheTable.Product, TheTable.PickedQty, TheTable.ShortFall,
TheTable.Location, TheTable.LocationQty, SUM(ForRunningTotal.LocationQty) AS RunningTotal
FROM TheTable
JOIN TheTable ForRunningTotal ON TheTable.Product = ForRunningTotal.Product
AND TheTable.Location >= ForRunningTotal.Location
GROUP BY TheTable.Product, TheTable.PickedQty, TheTable.ShortFall, TheTable.Location, TheTable.LocationQty
) Totals
-- Note you could also change the join above so the running total is actually the count of only the rows above,
-- not including the current row; Then the WHERE clause below could be "Totals.RunningTotal < Totals.ShortFall".
-- I liked RunningTotal as the sum of this row and all prior, it seems more appropriate to me.
WHERE Totals.RunningTotal - Totals.LocationQty <= Totals.ShortFall
AND Totals.LocationQty > 0
Also - as long as you are reading my answer, an unrelated side-note: Based on the data you showed above, your database schema isn't normalized as far as it could be. It seems like the Picked Quantity and the ShortFall actually depend only on the Product, so that would be a table of its own, and then the Location Quantity depends on the Product and Location, so that would be a table of its own. I'm pointing it out because if your data contained different Picked Quantities/ShortFall for a single product, then the above query would break; This situation would be impossible with the normalized tables I mentioned.

How to generate a column with a series of numbers based on a min and max value

I have a table structured as so:
fake_id start end misc_data
------------------------------------------------------
1 101 105 ab
1 101 105 cd
1 101 105 ef
2 117 123 gh
2 117 123 ij
2 117 123 kl
2 117 123 mn
3 51 53 op
3 51 53 qr
Notice that the fake_id field is not really a primary key, but is repeated a number of times equal to the number of distinct odd numbers in the range specified by start and end. The real id for each record is one of the odd numbers in that range. I need to write a query that returns fake_id, misc_data, and another column that contains those odd numbers to produce a real id, as follows:
fake_id real_id misc_data
------------------------------------------
1 101 ab
1 103 cd
1 105 ef
2 117 gh
2 119 ij
2 121 kl
2 123 mn
3 51 op
3 53 qr
As far as I know, there is no guarantee that there will be no gaps in the sequence (for example, there might be no records for range 21-31). How do I tell the query (or procedure, but query is preferable) that for each record with a particular fake_id, it should return the next odd number between start and end?
Also, is there a way to make the values for misc_data belong to a particular real_id? Using the second table as an example, how could I tell the query that "ab" belongs to real_id 101 instead of 103?
Thanks in advance.
Guessing here that you plan to sort on misc_data:
SELECT "fake_id",
((ROW_NUMBER()OVER(PARTITION BY "start"
ORDER BY "misc_data")-1)*2)+"start" AS "real_id",
"misc_data"
FROM t
ORDER BY "misc_data";
http://www.sqlfiddle.com/#!4/ae23c/23
Apologies for not answering sooner or to the individual comments. #John Dewey, I believe when I tried your script it did not correctly keep the gaps between the start-end series, but I was motivated to learn more about the PARTITION keyword and I think I am more enlightened now.
Since this was for an ETL task, I ended up writing code to generate the real IDs in a loop on the extract (I guess it would also count as a transform) side.