Counting Distinct Values with multiple criteria

Counting Distinct Values with multiple criteria - vba

I have an Excel file with the below data example:
IssueNumber Client
100 Client 1
100 Client 1
101 Client 1
102 Client 2
102 Client 2
I want to count the number of unique IssueNumbers for each client. So the end result would be:
Client Count
Client 1 2
Client 2 1
I have a list of the clients in a separate tab from the main data, and am trying to look up using COUNTIFS, and passing the Client name as one of the criteria, but I am struggling to figure out how to count the unique issues.
This must be really simple, but it is Friday afternoon and my brain has given up!

Its Friday Morning here so I am not so burnt out yet,
=SUMPRODUCT(($B$2:$B$6=D2)*1/COUNTIF($A$2:$A$6,$A$2:$A$6))

Few more options:
=SUM(IF(((MATCH($A$2:$A$6&$B$2:$B$6,$A$2:$A$6&$B$2:$B$6,0))>=(ROW($A$2:$A$6)-(MIN(ROW($A$2:$A$6))-1)))*($B$2:$B$6=D2)=1,1,0))
&
=SUM(IF(FREQUENCY(IF($B$2:$B$6=D2,MATCH($A$2:$A$6&"_"&$B$2:$B$6,$A$2:$A$6&"_"&$B$2:$B$6,0)),ROW($A$2:$A$6)-ROW($A$2)+1),1))
Both the above formulas are array formula so should be entered by pressing Ctrl+Shift+Enter.

Related

SQL different null values in different rows

I have a quick question regarding writing a SQL query to obtain a complete entry from two or more entries where the data is missing in different columns.
This is the example, suppose I have this table:
Client Id | Name | Email
1234 | John | (null)
1244 | (null) | john#example.com
Would it be possible to write a query that would return the following?
Client Id | Name | Email
1234 | John | john#example.com
I am finding this particularly hard because these are 2 entires in the same table.
I apologize if this is trivial, I am still studying SQL and learning, but I wasn't able to come up with a solution for this and I although I've tried looking online I couldn't phrase the question in the proper way, I suppose and I couldn't really find the answer I was after.
Many thanks in advance for the help!

Yes, but actually no.
It is possible to write a query that works with your example data.
But just under the assumption that the first part of the mail is always equal to the name.
SELECT clients.id,clients.name,bclients.email FROM clients
JOIN clients bclients ON upper(clients.name) = upper(substring(bclients.email from 0 for position('#' in bclients.email)));
db<>fiddle
Explanation:
We join the table onto itself, to get the information into one row.
For this we first search for the position of the '#' in the email, get the substring from the start (0) of the string for the amount of characters until we hit the # (result of positon).
To avoid case-problems the name and substring are cast to uppercase for comparsion.
(lowercase would work the same)
The design is flawed
How can a client have multiple ids and different kind of information about the same user at the same time?
I think you want to split the table between clients and users, so that a user can have multiple clients.
I recommend that you read information about database normalization as this provides you with necessary knowledge for successfull database design.

Is it possible to match the "next" unmatched record in a SQL query where there is no strictly unique common field between tables?

Using Access 2010 and its version of SQL, I am trying to find a way to relate two tables in a query where I do not have strict, unique values in each table, using concatenated fields that are mostly unique, then matching each unmatched next record (measured by a date field or the record id) in each table.
My business receives checks that we do not cash ourselves, but rather forward to a client for processing. I am trying to build a query that will match the checks that we forward to the client with a static report that we receive from the client indicating when checks were cashed. I have no control over what the client reports back to us.
When we receive a check, we record the name of the payor, the date that we received the check, the client's account number, the amount of the check, and some other details in a table called "Checks". We add a matching field which comes as close as we can get to a unique identifier to match against the client reports (more on that in a minute).
Checks:
ID Name Acct Amt Our_Date Match
__ ____ ____ ____ _____ ______
1 Dave 1001 10.51 2/14/14 1001*10.51
2 Joe 1002 12.14 2/28/14 1002*12.14
3 Sam 1003 50.00 3/01/14 1003*50.00
4 Sam 1003 50.00 4/01/14 1003*50.00
5 Sam 1003 50.00 5/01/14 1003*50.00
The client does not report back to us the date that WE received the check, the check number, or anything else useful for making unique matches. They report the name, account number, amount, and the date of deposit. The client's report comes weekly. We take that weekly report and append the records to make a second table out of it.
Return:
ID Name Acct Amt Their_Date Unique1
__ ____ ____ ____ _____ ______
355 Dave 1001 10.51 3/25/14 1001*10.51
378 Joe 1002 12.14 4/04/14 1002*12.14
433 Sam 1003 50.00 3/08/14 1003*50.00
599 Sam 1003 50.00 5/11/14 1003*50.00
Instead of giving us back the date we received the check, we get back the date that they processed it. There is no way to make a rule to compare the two dates, because the deposit dates vary wildly. So the closest thing I can get for a unique identifier is a concatenated field of the account number and the amount.
I am trying to match the records on these two tables so that I know when the checks we forward get deposited. If I do a simple join using the two concatenated fields, it works most of the time, but we run into a problem with payors like Sam, above, who is making regular monthly payments of the same amount. In a simple join, if one of Sam's payments appears in the Return table, it matches to all of the records in the Checks table.
To limit that behavior and match the first Sam entry on the Return table to the first Sam entry on the Checks table, I wrote the following query:
SELECT return.*, checks.*
FROM return, checks
WHERE (( ( checks.id ) = (SELECT TOP 1 id
FROM checks
WHERE match = return.unique1
ORDER BY [our_date]) ));
This works when there is only one of Sam's records in the Return table. The problem comes when the second entry for Sam hits the Return table (Return.ID 599) as the client's weekly reports are added to the table. When that happens, the query appropriately (for my purposes) only lists that two of Sam's checks have been processed, but uses the "Top 1 ID" record to supply the row's details from the Return table:
Checks_Return_query:
Checks.ID Name Acct Amt Our_Date Their_Date Return.ID
__ ____ ____ ____ _____ ______ ________
1 Dave 1001 10.51 2/14/14 3/25/14 355
2 Joe 1002 12.14 2/28/14 4/04/14 378
3 Sam 1003 50.00 3/01/14 3/08/14 433
4 Sam 1003 50.00 4/01/14 3/08/14 433
In other words, the query repeats the Return table info for record Return.ID 433 instead of matching Return.ID 599, which is I guess what I should expect from the TOP 1 operator.
So I am trying to figure out how I can get the query to take the two concatenated fields in Checks and Return, compare them to find matching sets, then select the next unmatched record in Checks (with "next" being measured either by the ID or Our_Date) with the next unmatched record in Return (again, with "next" being measured either by the ID or Their_Date).
I spent many hours in a dark room turning the query into various joins, and back again, looking at functions like WHERE NOT IN, WHERE NOT EXISTS, FIRST() NEXT() MIN() MAX(). I am afraid I am way over my head.
I am beginning to think that I may have a structural problem, and may need to write the "matched" records in this query to another table of completed transactions, so that I can differentiate between "matched" and "unmatched" records better. But that still wouldn't help me if two of Sam's transactions are on the same weekly report I get from my client.
Are there any suggestions as to query functions I should look into for further research, or confirmation that I am barking up the wrong tree?
Thanks in advance.

I'd say that you really need another table of completed transactions, it could be temporary table.
Regarding your fears "... if two of Sam's transactions are on the same weekly report ", you can use cursor in order to write records "one-by-one" instead of set based transaction.

Counting number of occurences of tuples in an m:n relationship

I'd like to know if there's an efficient way to count the number of occurences of a permutation of entities from one side of the m:n relationship. Hopefully, the next example will illustrate properly what I mean:
Let's imagine a base with people and events of some sort. People can organize multiple events and events can be organized by more than one person. What i'd like to count is whether a certain tuple of people have already organized an event or if it's their first time. My first idea to do this is to add an attribute to the m:n relationship
PeopleID | EventID | TimesOrganized
100 1 1
200 1 1
300 2 1
400 3 1
Now, there's an event no. 4 that's again organized by persons 200 and 100 (let's say they should be added in that order). The new table should look like:
PeopleID | EventID | TimesOrganized
100 1 2
200 1 2
300 2 1
400 3 1
200 4 2
100 4 2
Now, if I added an event organized by persons 200 and 300 it would look like this:
PeopleID | EventID | TimesOrganized
100 1 2
200 1 2
300 2 1
400 3 1
200 4 2
100 4 2
200 5 1
300 5 1
How would I go about keeping the third column updated properly and what are my options?
I should also add that this a part of the larger project we have for one of the classes and we'll be implementing an application that uses the database in some way, so I might as well move this to application logic if there's no easy way.

I wouldn't recommend tracking a TimesOrganized column as you suggest.
You can simple query it as needed using a COUNT(EventId)..GROUP BY PeopleID.
If you do feel you need to maintain the value somewhere it probably is better normalized to the (presumed) table People. Something like People.TimesOrganized. But then you have to increment it as you go instead of just recalculating as needed.

If you want to count how many many time someone have organized an event the problem is not m:n, but 1:m. Just count the event grouped by the people, that's it, you don't really need to have that column in the table, if it's not needed a lot of time.
That said I find you table a little confusing, there are detail and aggregation mixed, the third one downright wrong: the PeopleID 200 had organized 3 event and the 300 have 2 event.

Jasper / DynamicJasper: How to invert Column Headers

I have a Jasper report with the following output format:
Item | Price | Quantity
----------------------------
1 100 5
2 150 8
3 200 11
How do I make that table to this format:
Item 1 2 3
Price 100 150 200
Quantity 5 8 11
The column headers have now become row headers.
I'm actually using DynamicJasper, but of course, it's still relies on Jasper.
What special setting or property should I set to achieve the format I'm looking for.
Also, what do you call this format? Inverted Headers? Inverted Columns? It's hard to Google this issue since the keywords I'm using doesn't seem to be correct. Google always gives me a different answer.

Please check if Crosstabs serve your purpose

As it was suggested before, either check out crosstab, or you can check their CrosstabBuilder/LayoutManager classes and probably override/extend some to adopt to your needs

Sort Excel Grouped Rows

I have a spreadsheet that has information in groups. The header row contain company names and information and then the grouped rows beneath them contain names of people in the company.
Company Name | Number of Employees | Revenue |
Employee Name | Email | Phone
Is there anyway to sort by the number of employees and/or revenue and keep the grouped employee information below the company with the information?
Normally when I try it, it will sort the company information but keep the employee information in the order that it is entered.

If I understand your question correctly, I have a way you can accomplish what you want (don't know if there is a more efficient method).
Write code which will, for each company header row, copy the number of employess and revenue data into two of the chosen unused columns. The data needs to be copied into the columns for both the header company row and detail employee rows.
In the third column assign a sequence number. This is to keep data together and in order when sorting by employee/revenue.
Now you can sort by either the newly created number of employees and/or revenue columns (along with the sequence column to maintain ordering within company).
After the sort you can delete the extra copied data rows.
So if your data looked like this to start with...
A B C
Penetrode 200 750000
Micheal Bolton mbolton#pene.com 555-555-3333
Samir N samirn#pene.com
Initech 500 500000
Bill Lumbergh umumyeah#init.com 555-555-1212
Peter Gibbons pgibbons#init.com 555-555-2222
Your code would then copy the employee count and revenue data and sequencify the rows using three unused columns.
A B C D E F
Penetrode 200 750000 200 750000 1
Micheal Bolton mbolton#pene.com 555-555-3333 200 750000 2
Samir N samirn#pene.com 555-555-3334 200 750000 3
Initech 500 500000 500 500000 4
Bill Lumbergh umumyeah#init.com 555-555-1212 500 500000 5
Peter Gibbons pgibbons#init.com 555-555-2222 500 500000 6
Then you can code a sort on any of the column combos: (D,F), (E,F), (D,E,F), or (E,D,F)

Better late than never, I suppose, but I feel my LAselect plugin would have solved your problem. I created this plugin because I do much non-standard 'stuff' with my data and needed a tool to handle it. LAselect can produce your 'group' output too and you would not need hidden columns or anything. I mean, you would not need to change the screens you are used to to sort them in whatever way you wanted.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas