What I have: A datasource with a string column, let's call it "name".
There are more, but those are not relevant to the question.
The "name" column in the context of a concrete query contains only 2 distinct values:
""
"SomeName"
But any of the two a varying amount of times. There will only be those two.
Now, what I need is: In the context of a summarize statement, I need a column filled with the two distinct values strcated together, so I end up with just "SomeName".
What I have is not meeting this requirement and I cannot bring myself to find a solution for this:
datatable(name:string)["","SomeName","SomeName"] // just to give a minimal reproducible example
| summarize Name = strcat_array(make_list(name), "")
which gives me
| Name
> SomeNameSomeName
but I need just
| Name
> SomeName
I am aware that I need to do some sort of "distinct" somehow and somewhere or maybe there is a completely different solution to get to the same result?
So, my question is: What do I need to change in the shown query to fullfill my requirement?
take_any()
When the function is provided with a single column reference, it will
attempt to return a non-null/non-empty value, if such value is
present.
datatable(name:string)["","SomeName","SomeName", ""]
| summarize take_any(name)
name
SomeName
Fiddle
Wow, just as I posted the question, I found an answer:
datatable(name:string)["","SomeName","SomeName", ""]
| summarize Name = max(name)
I have no idea, why this works for a string column, but here I am.
This results in my desired outcome:
| Name
> SomeName
...which I suppose is probably less efficient than David's answer. So I'll prefer his one.
Related
I have the following question!
I have a table like this:
Data Source
I want to create a field(i suppose it's a field) that i can take the apl_ids,
that have as service_offered some that i want.
Example from the above table. If i want the apl_ids that have ONLY the service_offered
Pending 1, Pending 2 and Pending 7.
In that case, I want to get the apl_id = "13" since apl_id = "12" got one more service that i don't need.
Which is the best way to get that?
Thank you in advance!
Add a calculated field which gives 1 for desired values and 0 for other values. Add another calc field with fixed LOD to apl_id to sum of calcF1. Filter all ids with values=3 only. I think that should work.
Else tell me I will post screenshots
You can create a set based on the field api_id defined by the condition
max([service_offering]=“Pending 1”) and
max([service_offering]=“Pending 2”) and
max([service_offering]=“Pending 7”) and
min([service_offering]=“Pending 1” or [service_offering]=“Pending 2” or [service_offering]=“Pending 7”)
This set will contain those api_ids that have at least one record where service_offering is “Pending 1” and at least one record with Pending 2 ... and where every record has a service offering of 1, 2 or 7 (I.e. no others)
The key is to realize that Tableau treats True as greater than False, so min() and max() for boolean expressions correspond to every() and any().
Once you have a set of api_ids() you can use it on shelves and in calculated fields in many different ways.
I have tried each of the following formulas to get the highest number when I have a duplicate record. They both give me what appears to be the same output, but I know of at least one ID# where the response for both is "6" when I am expecting "7".
F2 = ID# to look for
PSStatus!$A = Hour; data ranges from 1 to 7; column is formatted as a number.
PSStatus!$F = ID#s
=INDEX(QUERY(PSStatus!$A$2:$A,,), MATCH(MAX($F2), (QUERY(PSStatus!$F$2:$F,,)),0))
=MAX(INDEX(QUERY(PSStatus!$A$2:$A,,), MATCH($F2, (QUERY(PSStatus!$F$2:$F,,)),0)))
I'll assume that your data table looks like the following one. Please, forgive me if I am mistaken.
If my assumption is correct, you can use the formula =MAX(FILTER({DATA TABLE RANGE}, {ID COLUMN FROM DATA TABLE}={ID})). That formula will first use FILTER to pick only the requested ID and then MAX will pick the highest one. In my example above, the formula should be =MAX(FILTER(Sheet1!$A$2:$B$26, Sheet1!$A$2:$A$26=A2)) for the first row. This is the end result:
Please, ask me anything if you need further help.
Hello Stackoverflow community!
I am currently confronted with writing somthing, that automatically cleans up a database after certain requirements. Here is what I have to do:
If a cell in one table contains a certain substr (lets say "Mrs."), in another table a Togglebox is supposed to be checked (Yes if "Mrs." is contained and No if not)
I already worked out something that sets The Toggleboxes to -1 in that column, but I have to include a check, if it even has to be changed and in which row. For that I wanted to bring up an If-construction and a variable that would contain the ID of the first table's row to determine, which of the Rows in the second table have to be changed.
So the thought is like this:
If 1st_Table contains "Mrs." Then
CurrentDb.Execute "UPDATE 2nd_Table SET gender = -1 WHERE foo = 1st_Table.ID_Var"
1st_Table.ID_Var would contain the ID of the row of the first table
Now the more or less obvious question: How do I accomplish that?
This is bascally the first time working with VBA/Access AND SQL so I have no closer thoughts on how to do that.
Thanks in advance
Greetings
geisterfurz007
P.S. The variable names will be changed in the final version; they are just to visualize. Just saw that they were colored by SO.
Try this (SQL update query):
UPDATE 2nd_Table INNER JOIN 1st_Table ON 2nd_Table.ID = 1st_Table.ID
SET 2nd_Table.gender = -1
WHERE 1st_Table.PersonName LIKE '*Mrs.*'
ID -> reference between both tables
PersonName -> column in 1st table containing Mrs.
First of all, even though this SQL: How do you select only groups that do not contain a certain value? thread is almost identical to my problem, it doesn't fully dissipate my confusion about the problem.
Let's have a table "Contacts" like this one:
+----------------------+
| Department FirstName |
+----------------------+
| 100 Thomas |
| 200 Peter |
| 100 Jerry |
+----------------------+
First, I want to group the rows by the department number and show number of rows in each displayed group. This, I believe, can be easily done by the following query.
SELECT Department, Count(*) As "Rows_in_group"
FROM Contacts
GROUP BY Department
This outputs 2 groups. First with dep.no. 100 containing 2 rows, second with 200 containing only one row.
But then, I want to extend the query to exclude any group that doesn't contain certain value in certain column (e.g. Thomas in FirstName). Here are my questions:
1) Reading the above-mentioned thread I was able to come up with this, which seems to work correctly:
SELECT Department, Count(*) As "Rows_in_group"
FROM Contacts
WHERE Department IN (SELECT Department FROM Contacts WHERE FirstName = "Thomas")
GROUP BY Department
Q: How does this work? I understand the "WHERE Department IN" part, but then I'd expect a value, but instead another nested query is included, which to me doesn't make much sense as I'm only beginner with SQL.
2) By accident I was able to come up with another query that also seems to work, but feels weird and I also don't understand its workings.
SELECT Department, Count(*) As "Rows_in_group"
FROM Contacts
GROUP BY Department
HAVING NOT SUM(FirstName = "Thomas") = 0
Q: How does this work? Why alteration: HAVING SUM(FirstName = "Thomas") > 0 doesn't work?
3) Q: Is there any simple and correct way to do this using the HAVING clause?
I expected, that simple "HAVING FirstName='Thomas'" after the GROUP BY would do the trick as it seems to follow a common language, but it does not.
Note that I want the whole groups to be chosen by the query so "WHERE FirstName='Thomas'" isn't s solution for my problem as it excludes all the rows that don't satisfy the condition before the grouping takes place (at least the way I understand it).
Q: How does this work? I understand the "WHERE Department IN" part,
but then I'd expect a value, but instead another nested query is
included, which to me doesn't make much sense as I'm only beginner
with SQL.
The nested query returns values which are used to match against Department
2) By accident I was able to come up with another query that also
seems to work, but feels weird and I also don't understand its
workings.
HAVING NOT SUM(FirstName = "Thomas") = 0
"Feels weird" because, well, it is. This is not a place for the SUM function.
EDIT: Why does this work?
The expression FirstName = "Thomas" gets evaluated as true or false (known as a Boolean expression). True numerically is equal to 1 and False converts to 0 (zero). By including SUM you then calculated the totals so really zero (still) means false and "not zero" is true. Then to make it weird(er) you included NOT which negated the whole thing and it becomes NOT TRUE = 0 or FALSE = FALSE (which is of course... TRUE)!!
EDIT: I think what could be more helpful to you is consideration of when to use WHERE and when to use HAVING (instead of the Boolean magic taking place).
From this answer:
WHERE clause introduces a condition on individual rows; HAVING clause introduces a condition on aggregations, i.e. results of selection where a single result, such as count, average, min, max, or sum, has been produced from multiple rows.
WHERE was appropriate for your example because first you want to "only return rows WHERE Department IN (100)" and then you want to "group those rows by Department" and get a COUNT of how many rows had been selected.
It seems like there are a lot of answers out there but I can't seem to relate it to my specific issue. I want to get the breakdown of yes/no for the specific Group. Not get the percent of the yes for the entire population of data.
I have tried the following code in the "What I'm Getting" % of Total cell =
=FormatPercent(Count(Fields!SessionID.Value)/Count((Fields!SessionID.Value), "Tablix1"),)
=FormatPercent(Count(Fields!Value.Value)/Count((Fields!SessionID.Value), "Value"),)
It should just be a case of changing the Scope in your expression to make sure the denominator is the total for the group, not the entire Dataset or Tablix, i.e. something like:
=Count(Fields!SessionID.Value) / Count(Fields!SessionID.Value, "MyGroup")
Where MyGroup is the name of the group, i.e. something like:
If this is still not clear, your best option would be to add a few sample rows, and your desired result for these, to the question so we can replicate your exact issue.
Edit after more info added
Thanks for adding more details. I have created a Dataset based on your example:
And I've created a table based on this:
The group is based on the Group field:
The Group % expression is:
=Fields!YesNoCount.Value / Sum(Fields!YesNoCount.Value, "MyGroup")
This is taking the YesNoCount value of each row and comparing it to the total YesNoCount value in that particular group (i.e. the MyGroup scope).
Note that I'm using Sum here, not Count as in your example expression - that seems to be the appropriate aggregate for your data and the required value.
Results look OK to me: