How to combine rows in BigQuery that share a similar name - google-bigquery

i'm having trouble creating a query that'll group together responses from multiple rows that share a similar name and count the specific response record in them.
the datatable i currently have looks like this
test_control
values
test
selected
control
selected
test us
not selected
control us
selected
test mom
not selected
control mom
selected
what i'd like, is an output like the below that only counts the number of "selected" responses and groups together the rows that have either "control" or "test" in the name"
test_control
values
test
3
control
1
The query i have below is wrong as it doesn't give me an output of anything. The group by section is where im lost as i'm not sure how to do this. tried to google but couldn't seem to find anything. appreciate any help in advance!!!
SELECT distinct(test_control), values FROM `total_union`
where test_control="%test%" and values="selected"
group by test_control, values

use below
SELECT
REGEXP_EXTRACT(test_control, r'^(TEST|CONTROL) ') AS test_control,
COUNTIF(values = 'selected') AS values
FROM `total_union`
GROUP BY 1

As mentioned by #Mikhail Berlyant, you can use REGEX_EXTRACT to match the expression and COUNTIF to get the count of the total number of matching expressions according to the given condition. Try below code to get the expected output :
Code
SELECT
REGEXP_EXTRACT(test_control, r'^(test|control)') AS test_control,
COUNTIF(values = "selected") AS values
FROM `project.dataset.testvalues`
group by 1
Output

Related

Retrieving Columns with count greater than 1 - Google Sheet Query

I'm using Google sheets, and I want to get the data from one sheet to another where I want only the columns with count > 1.
Let's say we have 3 columns A, B, and C. I tried the following (the first sheet name is "Form Responses 1"):
I thought about using a query in the second sheet as: =query('Form Responses 1'!A1:Z, "Select A having count (A) >1 union select B having count (B) >1 union select C having count (C) > 1"). But I got a parse error where it seems that union and having are not supported in google sheets query.
How can I achieve this (whether it's using query or any other Google sheets function that can work)?
More details:
The first sheet contains info about exercises conducted during a lecture and it gets its data from a Google Form (so the responses are fed in this sheet). Here is a screenshot of it:
Please note that the form is divided into sections. When the user selects the course, the attendance, the participation, and adds a comment, then they go to the next section, the next section will be based on the selected course, the newly opened section will have the exercise name and rating questions (the exercise name is a dropdown list with items that are prefilled and specific to the selected course). That's why, you can see that "exercise name" and "rate the exercise" columns are repeated because we have 2 sections in this form.
The second sheet should contain the data of a selected course only (either mobile dev or web dev) which can be achieved easily through a query with a where clause. But, in addition to that, it shouldn't contain the empty columns of "exercise name" and "rate the exercise" as they correspond to another section. So, it should have only one exercise name column and one rating column that correspond to the selected course. Here is a screenshot if we only use a query with where clause without removing the extra name and rating columns:
Here is a screenshot with the desired result:
Thanks.
why not use just:
=QUERY('Form Responses 1'!A1:Z, "select A,B,C,D,E,F,G where F is not null", 1)
Use "OR" condition
Eg:-
QUERY(Data!A:R,"select A, N, P where N>0 or P>0")
where A column has country and N, P columns have population values

Countif query in access

I am trying to run a query that calculate with a countif function but I am having trouble with it. I have used the count and the iif functions in the builder but I think something weird is going on. I am trying to count the number of times a certain value occurs in a column so I do not want a specific value to equal to if that's possible?
Thanks!
To count the number of times a value appears you can use something like.
If you want to know how many times each value appears just omit the WHERE clause (without a sample of data I've used a table in the database I'm working on).
SELECT ProcessID,
COUNT(ProcessID)
FROM tbl_PrimaryData_Step1
WHERE ProcessID = 4
GROUP BY ProcessID
if you need just the value you can use:
SELECT COUNT(ProcessID)
FROM tbl_PrimaryData_Step1
WHERE ProcessID = 4
GROUP BY ProcessID
Another way is:
SELECT DCOUNT("ProcessID","tbl_PrimaryData_Step1","ProcessID = 4")
Edit:
In reply to your comment on your original post this SQL will give the result you're after:
SELECT Concatenate,
COUNT(Concatenate)
FROM MyTable
GROUP BY Concatenate

Percent of Group, not total

It seems like there are a lot of answers out there but I can't seem to relate it to my specific issue. I want to get the breakdown of yes/no for the specific Group. Not get the percent of the yes for the entire population of data.
I have tried the following code in the "What I'm Getting" % of Total cell =
=FormatPercent(Count(Fields!SessionID.Value)/Count((Fields!SessionID.Value), "Tablix1"),)
=FormatPercent(Count(Fields!Value.Value)/Count((Fields!SessionID.Value), "Value"),)
It should just be a case of changing the Scope in your expression to make sure the denominator is the total for the group, not the entire Dataset or Tablix, i.e. something like:
=Count(Fields!SessionID.Value) / Count(Fields!SessionID.Value, "MyGroup")
Where MyGroup is the name of the group, i.e. something like:
If this is still not clear, your best option would be to add a few sample rows, and your desired result for these, to the question so we can replicate your exact issue.
Edit after more info added
Thanks for adding more details. I have created a Dataset based on your example:
And I've created a table based on this:
The group is based on the Group field:
The Group % expression is:
=Fields!YesNoCount.Value / Sum(Fields!YesNoCount.Value, "MyGroup")
This is taking the YesNoCount value of each row and comparing it to the total YesNoCount value in that particular group (i.e. the MyGroup scope).
Note that I'm using Sum here, not Count as in your example expression - that seems to be the appropriate aggregate for your data and the required value.
Results look OK to me:

SQL - How to insert a subquery into a query in order to retrieve unique values

I am writing reports using Report Builder 3, and I need some help with an sql query to get unique values.
Using the following sample data:
I need to be able to get one single value for feeBudRec returned for each feeRef. The value of each feeBudRec is always the same for each individual feeRef (eg for every data row for feeRef LR01 will have a feeBudRec of 1177).
The reason why I need to get a single feeBudRec value for each feeRef is that I need to be able to total the feeBudRec value for each feeRef in a feePin (eg for feePin LEE, I need to total the feeBudRec values for LR01 and PS01, which should be 1177 + 1957 to get a total of 3134; but if I don't have unique values for feeBudRec, it will add the values for each row, which would bring back a total of 11756 for the 8 LEE rows).
My experience with writing SQL queries is very limited, but from searching the internet, it looks like I'll need to put in a subquery into my SQL query in order to get a single unique feeBudRec figure for each feeRef, and that a subquery that gets a minimum feeBudRec value for each feeRef should work for me.
Based on examples I've found, I think the following subquery should work:
SELECT a.feeRef, a.feeBudRec
FROM (
SELECT uvw_EarnerInfo.feeRef, Min(uvw_EarnerInfo.feeBudRec) as AvailableTime
FROM uvw_EarnerInfo
GROUP BY
uvw_EarnerInfo.feeRef
) as x INNER JOIN uvw_EarnerInfo as a ON a.feeRef = x.feeRef AND a.feeBudRec = x.AvailableTime;
The problem is that I have no idea how to insert that subquery into the query I'm using to produce the report (as follows):
SELECT
uvw_EarnerInfo.feeRef
,uvw_EarnerInfo.PersonName
,uvw_EarnerInfo.PersonSurname
,uvw_EarnerInfo.feePin
,uvw_RB_TimeLedger.TimeDate
,uvw_RB_TimeLedger.matRef
,uvw_RB_TimeLedger.TimeTypeCode
,uvw_RB_TimeLedger.TimeCharge
,uvw_RB_TimeLedger.TimeElapsed
,uvw_WoffTimeByTime.WoffMins
,uvw_WoffTimeByTime.WoffCharge
,uvw_EarnerInfo.feeBudRec
,uvw_EarnerInfo.personOccupation
FROM
uvw_RB_TimeLedger
LEFT OUTER JOIN uvw_WoffTimeByTime
ON uvw_RB_TimeLedger.TimeId = uvw_WoffTimeByTime.TimeId
RIGHT OUTER JOIN uvw_EarnerInfo
ON uvw_EarnerInfo.feeRef = uvw_RB_TimeLedger.feeRef
WHERE
uvw_RB_TimeLedger.TimeDate >= #TimeDate
AND uvw_RB_TimeLedger.TimeDate <= #TimeDate2
If that subquery will get the correct results, can anyone please help me with inserting it into my report query. Otherwise, can anyone let me know what I will need to do to get a unique feeBudRec value for each feeRef?
Depends on the exact schema, but assuming the uvw_EarnerInfo lists the Pin, Ref, and Rec without duplicates, try adding an extra column (after personOccupation) on the end of your query such as :
feeBudRecSum = (Select SUM(FeeBudRec) From uvw_EarnerInfo x
where x.feePin = uvw_EarnerInfo.feePin
Group By x.FeePin)
Note that you would not Sum these values in your report. This column should have the total you are looking for.
The key to Report Builder is to get your query correct from the offset and let the wizard then structure your report for you. It takes all the hard work out of structuring your report manually.
I haven't used Report Builder for a while now but in the query builder of the report displaying the graphical representation of your query you should be able to drag and drop columns in and out of the query set. Dragging a column upwards and out of the box (showing your columns) would have the effect of causing your report to break on this column.
If you restructure this way you will probably have to run the report generator again to regenerate the report and restructure it.
Once you are happy with the structure you can then begin to add the summary columns.

SQL Multiple IN statements on one column

Okay, I'm using WordPress, but this pertains to the SQL side.
I have a query in which I need to filter out posts using three different categories, but they're all terms in the post.
For example:
In my three categories, I select the following: (Academia,Webdevelopment) (Fulltime,Parttime) (Earlycareer).
Now what I want to do is make sure when I query that the post has AT LEAST ONE of each of those terms.
CORRECT RESULT: A post with tags Academia, Fulltime, Earlycareer
INCORRECT RESULT: A post with tags Academia, Earlycareer (doesn't have fulltime or parttime)
Currently, my query looks something like this:
SELECT * FROM $wpdb->posts WHERE
(
$wpdb->terms.slug IN (list of selected from category 1) AND
$wpdb->terms.slug IN (list of selected from category 2) AND
$wpdb->terms.slug IN (list of selected from category 3)
)
AND $wpdb->term_taxonomy.taxonomy = 'jobtype' AND .......
When using this query, it returns no results when I select across the different categories (that is, I can choose 4 things from category 1 and it has results, but I can't choose anything from category 2 or 3. And vice versa)
I'm not sure if this is something to do with using IN more than once on the same column.
Thanks in advance for any help!
Your query seems to be correct. There is no any limitations in SQL about using IN for the same column miltimple times.
But ensure that you don't have any NULL values in your list of selected from category 1/2/3 queries. Even single NULL value in these lists will give NULL as a result of whole 'WHERE' condition and you will get nothing as a result.
If this won't help then it must be WordPress issue.