Using SQL in Domo to apply multiple groups or filters - sql

I am trying to filter for unique UserIDs and then see which IDs have a value of >1 in another column. I have tried pretty much every example from SQL - Selecting unique values from one column then filtering based on another and at least 1 or two more sites but I cant find the appropriate links for that as it turned out to be a little unrelated anyways.
An example of the relevant columns of my dataset are as follows:
+---------+-----------+
| UserIDs | EventType |
+---------+-----------+
| 100 | Start |
| 100 | Start |
| 100 | Finish |
| 100 | Finish |
| 200 | Start |
| 200 | Start |
| 200 | Start |
| 200 | Finish |
| 200 | Finish |
| 200 | Finish |
| 300 | Start |
| 400 | Start |
| 400 | Finish |
+---------+-----------+
What I am trying to figure out is how many users triggered EventType-Finish more than once. The data I would want from the example above would be:
+------------------------------------------------+
| Total # of students that battled more than once|
+------------------------------------------------+
| 2 |
+------------------------------------------------+ *edited*
None of the Group By stuff seems right because it would just compress the other rows into each other?
For the record I am very new to SQL and programming in general so try not to be too technical in your answer. Anytime I get remotely close to thinking I have solved it, I run it and it gives me syntax errors so I have no idea where else to turn.
Sorry guys I wrote the output incorrectly, I am actually just looking for the number of students who triggered Finish more than once. Also, what would I use in place of a table name eg: FROM "TABLE" since Domo has a kind of strange way of breaking up the hierarchy. For context, my table is called "Metrics Data" so trying to type that into the table name generally converts data into SQL code.
Attempt at answer

Try this answer,
SELECT UserId,COUNT(1) [# of Finishes]
FROM Your_Table
WHERE EventType='Finish'
GROUP BY UserID
HAVING COUNT(1)>1
Hope it helps.

I would like to do some conditional aggregation with help of case expressions
SELECT
UserIDs [UserID],
SUM(CASE(EventType) WHEN 'Finish' THEN 1 ELSE 0 END) [# of Finishes]
FROM <table> GROUP BY UserIDs
HAVING SUM(CASE(EventType) WHEN 'Finish' THEN 1 ELSE 0 END) > 1
Result :
UserID # of Start
100 2
200 3

Related

Access text count in query design

I am new to Access and am trying to develop a query that will allow me to count the number of occurrences of one word in each field from a table with 15 fields.
The table simply stores test results for employees. There is one table that stores the employee identification - id, name, etc.
The second table has 15 fields - A1 through A15 with the words correct or incorrect in each field. I need the total number of incorrect occurrences for each field, not for the entire table.
Is there an answer through Query Design, or is code required?
The solution, whether Query Design, or code, would be greatly appreciated!
Firstly, one of the reasons that you are struggling to obtain the desired result for what should be a relatively straightforward request is because your data does not follow database normalisation rules, and consequently, you are working against the natural operation of a RDBMS when querying your data.
From your description, I assume that the fields A1 through A15 are answers to questions on a test.
By representing these as separate fields within your database, aside from the inherent difficulty in querying the resulting data (as you have discovered), if ever you wanted to add or remove a question to/from the test, you would be forced to restructure your entire database!
Instead, I would suggest structuring your table in the following way:
Results
+------------+------------+-----------+
| EmployeeID | QuestionID | Result |
+------------+------------+-----------+
| 1 | 1 | correct |
| 1 | 2 | incorrect |
| ... | ... | ... |
| 1 | 15 | correct |
| 2 | 1 | correct |
| 2 | 2 | correct |
| ... | ... | ... |
+------------+------------+-----------+
This table would be a junction table (a.k.a. linking / cross-reference table) in your database, supporting a many-to-many relationship between the tables Employees & Questions, which might look like the following:
Employees
+--------+-----------+-----------+------------+------------+-----+
| Emp_ID | Emp_FName | Emp_LName | Emp_DOB | Emp_Gender | ... |
+--------+-----------+-----------+------------+------------+-----+
| 1 | Joe | Bloggs | 01/01/1969 | M | ... |
| ... | ... | ... | ... | ... | ... |
+--------+-----------+-----------+------------+------------+-----+
Questions
+-------+------------------------------------------------------------+--------+
| Qu_ID | Qu_Desc | Qu_Ans |
+-------+------------------------------------------------------------+--------+
| 1 | What is the meaning of life, the universe, and everything? | 42 |
| ... | ... | ... |
+-------+------------------------------------------------------------+--------+
With this structure, if ever you wish to add or remove a question from the test, you can simply add or remove a record from the table without needing to restructure your database or rewrite any of the queries, forms, or reports which depends upon the existing structure.
Furthermore, since the result of an answer is likely to be a binary correct or incorrect, then this would be better (and far more efficiently) represented using a Boolean True/False data type, e.g.:
Results
+------------+------------+--------+
| EmployeeID | QuestionID | Result |
+------------+------------+--------+
| 1 | 1 | True |
| 1 | 2 | False |
| ... | ... | ... |
| 1 | 15 | True |
| 2 | 1 | True |
| 2 | 2 | True |
| ... | ... | ... |
+------------+------------+--------+
Not only does this consume less memory in your database, but this may be indexed far more efficiently (yielding faster queries), and removes all ambiguity and potential for error surrounding typos & case sensitivity.
With this new structure, if you wanted to see the number of correct answers for each employee, the query can be something as simple as:
select results.employeeid, count(*)
from results
where results.result = true
group by results.employeeid
Alternatively, if you wanted to view the number of employees answering each question correctly (for example, to understand which questions most employees got wrong), you might use something like:
select results.questionid, count(*)
from results
where results.result = true
group by results.questionid
The above are obviously very basic example queries, and you would likely want to join the Results table to an Employees table and a Questions table to obtain richer information about the results.
Contrast the above with your current database structure -
Per your original question:
The second table has 15 fields - A1 through A15 with the words correct or incorrect in each field. I need the total number of incorrect occurrences for each field, not for the entire table.
Assuming that you want to view the number of incorrect answers by employee, you are forced to use an incredibly messy query such as the following:
select
employeeid,
iif(A1='incorrect',1,0)+
iif(A2='incorrect',1,0)+
iif(A3='incorrect',1,0)+
iif(A4='incorrect',1,0)+
iif(A5='incorrect',1,0)+
iif(A6='incorrect',1,0)+
iif(A7='incorrect',1,0)+
iif(A8='incorrect',1,0)+
iif(A9='incorrect',1,0)+
iif(A10='incorrect',1,0)+
iif(A11='incorrect',1,0)+
iif(A12='incorrect',1,0)+
iif(A13='incorrect',1,0)+
iif(A14='incorrect',1,0)+
iif(A15='incorrect',1,0) as IncorrectAnswers
from
YourTable
Here, notice that the answer numbers are also hard-coded into the query, meaning that if you decide to add a new question or remove an existing question, not only would you need to restructure your entire database, but queries such as the above would also need to be rewritten.

Database design for partially changing data points, with history and snapshot functionality?

I'm looking for a best practice or solution, on a conceptual level, to a problem I'm working on.
I have a collection of data points (around 500) which are partially changed, by a user, over time. It is important to able to tell, which values have been changed at what point in time. The data might look like this:
Data changed over time:
+--------------------------------------------------------------------------------------+
| Date | Value no. 1 | Value no. 2 | Value no. 3 | ... | Value no. 500 |
|------------+---------------+---------------+---------------+-------+-----------------|
| 1/1/2018 | | | 2 | | 1 |
| 1/3/2018 | 2 | 1 | | | |
| 1/7/2018 | | | 4 | | 8 |
| 1/12/2018 | 5 | 3 | | | |
....
It must be possible to take a snapshot at a certain point in time, to get a complete set of data points, that were valid for that particular point in time, like this:
Snapshot taken 1/3/2018 will yield:
+---------------------------------------------------------+
| Value 1 | Value 2 | Value 3 | ... | Value 500 |
|-----------+-----------+-----------+-------+-------------|
| 2 | 1 | 2 | 0 | 1 |
Snapshot taken 1/9/2018 will yield:
+---------------------------------------------------------+
| Value 1 | Value 2 | Value 3 | ... | Value 500 |
|-----------+-----------+-----------+-------+-------------|
| 2 | 1 | 4 | 0 | 8 |
Snapshot taken 1/13/2018 will yield:
+---------------------------------------------------------+
| Value 1 | Value 2 | Value 3 | ... | Value 500 |
|-----------+-----------+-----------+-------+-------------|
| 5 | 3 | 4 | 0 | 8 |
and so on...
I'm not bound by a particular database technology, so either SQL or NoSQL will do. It is probably not possible to satisfy all the requirements in the DB-domain - some will probably have to be addressed in code. But my main question is what database technology is best suited for this task?
I'm not quite sure this fits a time-series database (TSDB), since only a portion of the values are changed at a given time, and it is important to know which values changed. Maybe I'm wrong?
/Chris
My suggestion would be to model this in a sparse format, something like:
CREATE TABLE DataPoint (
DataID int, /* 1 to 500 in your example, or whatever you need to identify it*/
ValidFrom timestamp, /*default value 01/01/1970-00:00:00 or a suitable "Epoch" */
ValidUntil timestamp, /*default value 31/12/3999-00:00:00 or again something that is in the far future for your case */
value Number (7,5) /* again, this may be any data type, or even more than one field if needed, like Price & Currency
);
What we have just defined is a set of data and the "interval" in which each data has a specific value, so if you measured DataPoint 1 yesterday and got a value of 89.768 you will insert:
DataId=1
ValidFrom=26/11/2018-14:52:41
ValidUntil=31/12/3999-00:00:00
Value=89.768
Then you measure it again tomorrow and get:
DataId=1
ValidFrom=28/11/2018-14:51:23
ValidUntil=31/12/3999-00:00:00
Value=89.443
(Let assume that you have also logic so that when you record a new value you update the current value record and assign ValidUntil=28/11/2018-14:51:23 this is not really needed but will make the example query simpler).
One month from now you have accumulated more measurements for data #1, and the same, on different moments, for data #2 to 500.
You now want to find out what the values were at noon today (i.e. one month "ago") i.e. at 27/11/2018:12:00:00:00
Select DataID, Value from DataPoint where ValidFrom <= 27/11/2018:12:00:00 and ValidUntil > 27/11/2018:12:00:00
This will return:
001,89.768
002,45.678
...,...
500,112.809
Regarding logging who did this, or for what reason, you can either log it separately (saving for example DataPoint Id, Timestamp, UserId...) or make it part of the original table, so that whenever you register a new datapoint you also log who measured it.
Have a look at SQL Server temporal tables engine which may be a solution in your case. This approach allow to run the queries mentioned in the question, for example
SELECT *
FROM my_data
FOR SYSTEM_TIME AS OF '2018-01-01'
However, the table in the example seems to be very large (maybe denormalized). I would suggest to group columns by some technical or functional characteristics (vertical partitioning) to avoid further maintenance drawbacks.

How can i get the activated services for a user in the SQL table?

I have a table in SQL Database and insert the last status of each service activation for users.
logid | userid | serviceid | status
-------+-------------+-------------+--------
1 | 123456789 | a | 0
2 | 123456789 | b | 1
3 | 123456789 | b | 0
4 | 123456789 | a | 1
5 | 123456789 | c | 1
6 | 123456789 | a | 0
7 | 123456789 | d | 1
1)How can i have a select query that returns active services for a user?
for example: in above table, i need to get c,d where userid = 123456789
2)Is it OK to add another field on table to store current status and write a trigger to update current status for all records on row insert? (it works but takes long time on millions records)
3)Is there a query to read all records one by one and update current status to last status that i run it after all records inserted?
Thanks
answer for your first question:
select t.serviceid
from Log t
where t.status=1 and userid=123456789
group by t.logid,t.serviceid
having t.logid=(select max(tt.logid) from Log tt where tt.serviceid=t.serviceid)
SqlFiddle: http://sqlfiddle.com/#!6/9ff99/5
1: using SQL. The simple sql you learn in the first 20 pages of a book about SQL - which I would say you should.
select userid, serviceid from whateveryoourtablename where status = 1
I get the idea that this likely is not what you asked for, but hey - you enver explain the business rules clear enough.
2: Yes and no. Why would a user logging in update millions of rows? THAT would be a problem. A user / service table containing active services that is maintained by a trigger is doable and a user login will NOT update millions of rows.
3: Not clear what you are asking. But yes, such a query exists. It is trivial to make the update if you get hte last status and that is trivial too.
For what it worth, I would have a separate table with service status
statusid | userid | serviceid | status
----------+-------------+-------------+--------
1 | 123456789 | a | 0
2 | 123456789 | b | 0
3 | 123456789 | c | 1
4 | 123456789 | d | 1
I would have a unique constraint on userid, serviceid, or alternatively make these two columns the primary key instead of the id column, if appropriate, the id column can be removed in this case. I would also put a trigger on the log table and update the status in this new status table. Note that this would only update a single record.

Is it possible to construct dynamic aggregate columns in an ARel query that uses a join?

Here's a bit of sample context for my question below to help clarify what I'm asking...
The Schema
Users
- id
- name
Answers
- id
- user_id
- topic_id
- was_correct
Topics
- id
- name
The Data
Users
id | name
1 | Gabe
2 | John
Topics
id | name
1 | Math
2 | English
Answers
id | user_id | topic_id | was_correct
1 | 1 | 1 | 0
2 | 1 | 1 | 1
3 | 1 | 2 | 1
4 | 2 | 1 | 0
5 | 2 | 2 | 0
What I'd like to have, in a result set, is a table with one row per user, and two columns per topic, one that shows the sum of correct answers for the topic, and one that shows the sum of the incorrect answers for that topic. For the sample data above, this result set would look like:
My desired result
users.id | users.name | topic_1_correct_sum | topic_1_incorrect_sum | topic_2_correct_sum | topic_2_incorrect_sum
1 | Gabe | 1 | 1 | 1 | 0
2 | John | 0 | 1 | 0 | 1
Obviously, if there were more topics in the Topics table, I'd like this query to include new correct_sum and incorrect_sums for each topic that exists, so I'm looking for a way to write this without hard-coding topic_ids into the sum functions of my select clause.
Is there a smart way to magic this sort of thing with ARel?
Gabe,
What you're looking for here is a crosstab query. There are many approaches to writing this, unfortunately none that will be generic enough in SQL. AFAIK each database handles crosstabs differently. Another way of looking at this is as a "cube", something typically found in OLAP-type databases (as opposed to OLTP).
Its easily writeable in SQL, however will likely include some functions native to the database you're using. What DB are you using?
Your answers table looks like it needs to have 1,2,3,4,5 and not 1,1,1,1,1 as ids...

Retrieve comma delimited data from a field

I've created a form in PHP that collects basic information. I have a list box that allows multiple items selected (i.e. Housing, rent, food, water). If multiple items are selected they are stored in a field called Needs separated by a comma.
I have created a report ordered by the persons needs. The people who only have one need are sorted correctly, but the people who have multiple are sorted exactly as the string passed to the database (i.e. housing, rent, food, water) --> which is not what I want.
Is there a way to separate the multiple values in this field using SQL to count each need instance/occurrence as 1 so that there are no comma delimitations shown in the results?
Your database is not in the first normal form. A non-normalized database will be very problematic to use and to query, as you are actually experiencing.
In general, you should be using at least the following structure. It can still be normalized further, but I hope this gets you going in the right direction:
CREATE TABLE users (
user_id int,
name varchar(100)
);
CREATE TABLE users_needs (
need varchar(100),
user_id int
);
Then you should store the data as follows:
-- TABLE: users
+---------+-------+
| user_id | name |
+---------+-------+
| 1 | joe |
| 2 | peter |
| 3 | steve |
| 4 | clint |
+---------+-------+
-- TABLE: users_needs
+---------+----------+
| need | user_id |
+---------+----------+
| housing | 1 |
| water | 1 |
| food | 1 |
| housing | 2 |
| rent | 2 |
| water | 2 |
| housing | 3 |
+---------+----------+
Note how the users_needs table is defining the relationship between one user and one or many needs (or none at all, as for user number 4.)
To normalise your database further, you should also use another table called needs, and as follows:
-- TABLE: needs
+---------+---------+
| need_id | name |
+---------+---------+
| 1 | housing |
| 2 | water |
| 3 | food |
| 4 | rent |
+---------+---------+
Then the users_needs table should just refer to a candidate key of the needs table instead of repeating the text.
-- TABLE: users_needs (instead of the previous one)
+---------+----------+
| need_id | user_id |
+---------+----------+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 1 | 2 |
| 4 | 2 |
| 2 | 2 |
| 1 | 3 |
+---------+----------+
You may also be interested in checking out the following Wikipedia article for further reading about repeating values inside columns:
Wikipedia: First normal form - Repeating groups within columns
UPDATE:
To fully answer your question, if you follow the above guidelines, sorting, counting and aggregating the data should then become straight-forward.
To sort the result-set by needs, you would be able to do the following:
SELECT users.name, needs.name
FROM users
INNER JOIN needs ON (needs.user_id = users.user_id)
ORDER BY needs.name;
You would also be able to count how many needs each user has selected, for example:
SELECT users.name, COUNT(needs.need) as number_of_needs
FROM users
LEFT JOIN needs ON (needs.user_id = users.user_id)
GROUP BY users.user_id, users.name
ORDER BY number_of_needs;
I'm a little confused by the goal. Is this a UI problem or are you just having trouble determining who has multiple needs?
The number of needs is the difference:
Len([Needs]) - Len(Replace([Needs],',','')) + 1
Can you provide more information about the Sort you're trying to accomplish?
UPDATE:
I think these Oracle-based posts may have what you're looking for: post and post. The only difference is that you would probably be better off using the method I list above to find the number of comma-delimited pieces rather than doing the translate(...) that the author suggests. Hope this helps - it's Oracle-based, but I don't see .