I have four sets of data representing a softball schedule. Looks like this:
Day Team 1 Team 2
M A Team B Team
T C Team D Team
....
but four times over. I want to be able to change the schedule and have it automatically tally how many times a team plays on a given day. Ideas?
You would us something like this:
=COUNTIF(2:2,"A Team")
Edit:
You can use a SUMPRODUCT() Function with math operands * and +:
=SUMPRODUCT(($A$2:$A$43=H$1)*(($B$2:$B$43=$G2)+($C$2:$C$43=$G2)))
So how it works:
Since TRUE/FALSE is a Boolean and it can be reduced to 1/0 respectively. Using the * and + operands is like AND and OR respectively.
The SUMPRODUCT iterates through the range and test each criterion inside the () So it first test whether the cell in column A is equal to H1, if so it returns a 1, or a 0 if not. the next part sets up the OR if in the same row the team name is found it also returns a 1. 1 * 1 = 1. SUMPRODUCT keeps track of all the 1 and 0 and adds them together, so you get the count.
If there are other columns that have the team names just add those columns with the + area.
Ok, so let's start with making your table a real table via "Start > format as table" and call your table "data". Then you have three columns called data[Day], data[Team 1] and data[Team 2]. For instance this:
Day Team 1 Team 2
Monday A Team B team
Tuesday C Team D Team
Wednesday C Team A Team
Monday B Team C Team
Now comes the ugly part. You need a matrix of 7*10 (days * teams)
(Cell E1) Team 1 Team 2 Team 3 Team 4 ...
Monday *1
Tuesday
Wednesday
...
Formula *1
=SUMPRODUCT((data[Day]=$E2)*((data[Team 1]=F$1)+(data[Team 2]=F$1)))
Now drag down that formula till Sunday and then copy it to the other teams (when I tried dragging it to the other teams, Excel messed up the column names!).
This will automatically fill the matrix and tell you which team plays how often on a specific day.
What does it do? Basically SUMPRODUCT can not only build products, but it can also evaluate boolean conditions. So if on Monday, Team A plays, then the first column would return (for Team A / Monday):
1*(1+0)
SUMPRODUCT does that for each line in the matrix and then sums up the result.
I have two tables (sql server), as shown below:
locations
id cubicfeet order
-------------------------------------
1 5 1
2 10 1
3 6 1
items
id cubic feet order
--------------------------------------
1 6 1
2 6 1
3 6 1
I need a query to tell me if all the items will fit into all the locations (for a given order). If all items will not fit into 1 or all locations then I need to create a new location for that given order - and then move any items that DID fit into the locations before to the new location (as many as fit). The new location will only be given a certain amount of cubic feet also - say 17. In this example, sum won't work because all 3 records are 6 so the sum is 18, which is less than the sum of 5,10,6, but the location with volume 5 can't fit any of the items since they are all volume 6 cubic feet.
the only way I think I can do it is creating temp tables in my sp and using a while loop to go through them and update the locations 1 at a time to see if it still fits more...
I have a set of time-phased data in an Access (2010) table. There are 3 levels, Account (1), Package (2), Element (3). Each row has the Account, Package, Element along with a time Period and dollar amount. I want to be able to roll-this up so I can see what the current period and totals are at each level (one output for Account, one for Package, and one for Element) and save those different levels as their own tables (or just output back to excel).
So if I have this data:
Account Package Element Period Dollars
A 11 X 2010 5
A 11 O 2010 5
A 11 X 2011 5
B 44 X 2010 5
B 52 O 2010 5
B 44 L 2011 5
C 24 X 2011 5
C 14 L 2011 5
C 14 L 2011 5
C 14 L 2010 5
I want to roll it up by element to get this table (if current is 2010)
Account Package Element Current Total
A 11 X 5 5
A 11 O 5 0
B 44 X 5 5
B 52 O 5 0
C 24 X 0 5
C 14 L 5 10
and then roll-it up by element to get this:
Account Package Current Total
A 11 10 5
B 44 5 5
B 52 5 0
C 24 0 5
C 14 5 10
An obvious problem is one table that isn't normalized, but I'm importing this data from an excel file given by a customer. I did create this successfully in Excel using a lot of SUMIFs, but I'm close to 500k rows and it just starts locking up on me.
I'd thought I'd see if Access would work quicer.So If I have just the one table, I tried looping through Account then Package then Element and doing a compare Period to Current and calculating sums.
Is there a better way instead of opening a bunch of recordsets - to use creative SQL queries?
Simply run aggregate group by queries using the one table. The only challenge is the other descriptives will need to be removed or run with an aggregate. As example, below I used Max().
By Element
SELECT Max(Account) As MaxOfAccount, Max(Package) As MaxOfPackage,
Element, Sum(IIF(Period=2010,1,0)) As Current, Sum(Dollars) As TotalDollars
FROM TimePhasedData
GROUP BY Element
By Element for only 2010:
SELECT Max(Account) As MaxOfAccount, Max(Package) As MaxOfPackage,
Element, Count(Period) As Current, Sum(Dollars) As TotalDollars
FROM TimePhasedData
WHERE Period = 2010
GROUP BY Element
Purely by Element
SELECT Element, Sum(IIF(Period=2010,1,0)) As Current, Sum(Dollars) As TotalDollars
FROM TimePhasedData
GROUP BY Element
By Account
SELECT Account, Max(Package) As MaxOfPackage, Max(Element) As MaxOfElement,
Sum(IIF(Period=2010,1,0)) As Current, Sum(Dollars) As TotalDollars
FROM TimePhasedData
GROUP BY Account
By Package
SELECT Max(Account) As MaxOfAccount, Package, Max(Element) As MaxOfElement,
Sum(IIF(Period=2010,1,0)) As Current, Sum(Dollars) As TotalDollars
FROM TimePhasedData
GROUP BY Package
Finally, many Excel functions have their SQL counterparts including SumIf(), CountIf(), VLookup(), Index(), Match(). And with 500K rows, consider the robustness of using Access' default SQL engine.
I have a data set where you have a Document Property that Selects "items", each "item" has a particular "usage days". I want to calculate an output of "Moving Average" for 1 or more selected items. the data for the moving average lives under a column named "usage days".
How do I calculate this taking into account the "selected date of my choice" and the rolling average number of days of my choice.
Do you have particular ideas of how I can perform the calculation i.e. in a calculated column or a text field?
Car/ Trip / Start Date/ End Date / Days on trip
1 AB123 / 2 / 6/07/2013
1 AB234 / 29/07/2013 / 6/09/2013 / 42
1 AB345 /6/09/2013 /28/09/2013 /22
1 AB456 /29/09/2013 /21/10/2013 /23
2 AB567 / 26/10/2013 / 12/11/2013 / 22
2 AB678 /12/11/2013 /8/12/2013 /26
[The rows above have an example of the problem (sorry couldn't paste an image because im new), I want to calculate the %usage of the Car and or cars for a selected range of time e.g (Select date range JUlY to AUGUST then (#of days on trip for car 1and 2)/#on days in that period)/2*100]
As phiver said, it is still difficult to see what you expect as a result... but I think I have something that might work. First, I slightly altered the dataset you provided, like so:
car trip startDate endDate daysOnTrip
1 AB123 7/6/2013 7/29/2013 23
1 AB234 7/29/2013 9/6/2013 42
1 AB345 9/6/2013 9/28/2013 22
1 AB456 9/29/2013 10/21/2013 23
2 AB567 10/26/2013 11/12/2013 22
2 AB678 11/12/2013 12/8/2013 26
I then added 2 document properties, "DateRangeFirst" and "DateRangeLast", to allow the user to select beginning and ending dates. Next I made input box property controls for each of the aforementioned document properties in a text area so the user can alter the date range. I then added a datatable visualization with a "Limit data using expression:" of "[startDate] >= Date(${DateRangeFirst}) and [endDate]<= Date(${DateRangeLast})" so we could see the trips selected. Finally, to get the average you appear to be looking for, a barchart set to % of total (daysOnTrip) / car with the same data limiting expression as above. The below screenshot should have everything you need to reproduce my results. I hope this gives you what you need.
NOTE: With this method if you select a date in the middle of a trip, an entire row and all of the days on that trip will be ignored.
I have a SQL Server database containing real-time stock quotes.
There is a Quotes table containing what you would expect-- a sequence number, ticker symbol, time, price, bid, bid size, ask, ask size, etc.
The sequence number corresponds to a message that was received containing data for a set of ticker symbols being tracked. A new message (with a new, incrementing sequence number) is received whenever anything changes for any of the symbols being tracked. The message contains data for all symbols (even for those where nothing changed).
When the data was put into the database, a record was inserted for every symbol in each message, even for symbols where nothing changed since the prior message. So a lot of records contain redundant information (only the sequence number changed) and I want to remove these redundant records.
This is not the same as removing all but one record from the entire database for a combination of identical columns (already answered). Rather, I want to compress each contiguous block of identical records (identical except for sequence number) into a single record. When finished, there may be duplicate records but with differing records between them.
My approach was to find contiguous ranges of records (for a ticker symbol) where everything is the same except the sequence number.
In the following sample data I simplify things by showing only Sequence, Symbol, and Price. The compound primary key would be Sequence+Symbol (each symbol appears only once in a message). I want to remove records where Price is the same as the prior record (for a given ticker symbol). For ticker X it means I want to remove the range [1, 6], and for ticker Y I want to remove the ranges [1, 2], [4, 5] and [7, 7]:
Before:
Sequence Symbol Price
0 X $10
0 Y $ 5
1 X $10
1 Y $ 5
2 X $10
2 Y $ 5
3 X $10
3 Y $ 6
4 X $10
4 Y $ 6
5 X $10
5 Y $ 6
6 X $10
6 Y $ 5
7 X $11
7 Y $ 5
After:
Sequence Symbol Price
0 X $10
0 Y $ 5
3 Y $ 6
6 Y $ 5
7 X $11
Note that (Y, $5) appears twice but with (Y, $6) between.
The following generates the ranges I need. The left outer join ensures I select the first group of records (where there is no earlier record that is different), and the BETWEEN is intended to reduce the number of records that need to be searched to find the next-earlier different record (the results are the same without the BETWEEN, but slower). I would need only to add something like "DELETE FROM Quotes WHERE Sequence BETWEEN StartOfRange AND EndOfRange".
SELECT
GroupsOfIdenticalRecords.Symbol,
MIN(GroupsOfIdenticalRecords.Sequence)+1 AS StartOfRange,
MAX(GroupsOfIdenticalRecords.Sequence) AS EndOfRange
FROM
(
SELECT
Q1.Symbol,
Q1.Sequence,
MAX(Q2.Sequence) AS ClosestEarlierDifferentRecord
FROM
Quotes AS Q1
LEFT OUTER JOIN
Quotes AS Q2
ON
Q2.Sequence BETWEEN Q1.Sequence-100 AND Q1.Sequence-1
AND Q2.Symbol=Q1.Symbol
AND Q2.Price<>Q1.Price
GROUP BY
Q1.Sequence,
Q1.Symbol
) AS GroupsOfIdenticalRecords
GROUP BY
GroupsOfIdenticalRecords.Symbol,
GroupsOfIdenticalRecords.ClosestEarlierDifferentRecord
The problem is that this is way too slow and runs out of memory (crashing SSMS- remarkably) for the 2+ million records in the database. Even if I change "-100" to "-2" it is still slow and runs out of memory. I expected the "ON" clause of the LEFT OUTER JOIN to limit the processing and memory usage (2 million iterations, processing about 100 records each, which should be tractable), but it seems like SQL Server may first be generating all combinations of the 2 instances of the table, Q1 and Q2 (about 4e12 combinations) before selecting based on the criteria specified in the ON clause.
If I run the query on a smaller subset of the data (for example, by using "(SELECT TOP 100000 FROM Quotes) AS Q1", and similar for Q2), it completes in a reasonable amount time. I was trying to figure out how to automatically run this 20 or so times using "WHERE Sequence BETWEEN 0 AND 99999", then "...BETWEEN 100000 AND 199999", etc. (actually I would use overlapping ranges such as [0,99999], [99900, 199999], etc. to remove ranges that span boundaries).
The following generates sets of ranges to split the data into 100000 record blocks ([0,99999], [100000, 199999], etc). But how do I apply the above query repeatedly (once for each range)? I keep getting stuck because you can't group these using "BETWEEN" without applying an aggregate function. So instead of selecting blocks of records, I only know how to get MIN(), MAX(), etc. (single values) which does not work with the above query (as Q1 and Q2). Is there a way to do this? Is there totally different (and better) approach to the problem?
SELECT
CONVERT(INTEGER, Sequence / 100000)*100000 AS BlockStart,
MIN(((1+CONVERT(INTEGER, Sequence / 100000))*100000)-1) AS BlockEnd
FROM
Quotes
GROUP BY
CONVERT(INTEGER, Sequence / 100000)*100000
You can do this with a nice little trick. The groups that you want can be defined as the difference between two sequences of numbers. One is assigned for each symbol in order by sequence. The other is assigned for each symbol and price. This is what is looks like for your data:
Sequence Symbol Price seq1 seq2 diff
0 X $10 1 1 0
0 Y $ 5 1 1 0
1 X $10 2 2 0
1 Y $ 5 2 2 0
2 X $10 3 3 0
2 Y $ 5 3 3 0
3 X $10 4 4 0
3 Y $ 6 4 1 3
4 X $10 5 5 0
4 Y $ 6 5 2 3
5 X $10 6 6 0
5 Y $ 6 6 3 3
6 X $10 7 7 0
6 Y $ 5 7 4 3
7 X $11 8 1 7
7 Y $ 5 8 5 3
You can stare at this and figure out that the combination of symbol, diff, and price define each group.
The following puts this into a SQL query to return the data you want:
select min(q.sequence) as sequence, symbol, price
from (select q.*,
(row_number() over (partition by symbol order by sequence) -
row_number() over (partition by symbol, price order by sequence)
) as grp
from quotes q
) q
group by symbol, grp, price;
If you want to replace the data in the original table, I would suggest that you store the results of the query in a temporary table, truncate the original table, and then re-insert the values from the temporary table.
Answering my own question. I want to add some additional comments to complement the excellent answer by Gordon Linoff.
You're right. It is a nice little trick. I had to stare at it for a while to understand how it works. Here's my thoughts for the benefit of others.
The numbering by Sequence/Symbol (seq1) always increases, whereas the numbering by Symbol/Price (seq2) only increases sometimes (within each group, only when a record for Symbol contains the group's Price). Therefore seq1 either remains in lock step with seq2 (i.e., diff remains constant, until either Symbol or Price changes), or seq1 "runs away" from seq2 (while it is busy "counting" other Prices and other Symbols-- which increases the difference between seq1 and seq2 for a given Symbol and Price). Once seq2 falls behind, it can never "catch up" to seq1, so a given value of diff is never seen again once diff moves to the next larger value (for a given Price). By taking the minimum value within each Symbol/Price group, you get the first record in each contiguous block, which is exactly what I needed.
I don't use SQL a lot, so I wasn't familiar with the OVER clause. I just took it on faith that the first clause generates seq1 and the second generates seq2. I can kind of see how it works, but that's not the interesting part.
My data contained more than just Price. It was a simple thing to add the other fields (Bid, Ask, etc.) to the second OVER clause and the final GROUP BY:
row_number() over (partition by Symbol, Price, Bid, BidSize, Ask, AskSize, Change, Volume, DayLow, DayHigh, Time order by Sequence)
group by Symbol, grp, price, Bid, BidSize, Ask, AskSize, Change, Volume, DayLow, DayHigh, Time
Also, I was able to use use >MIN(...) and <=MAX(...) to define ranges of records to delete.