Create a SQL column based on list of values - sql

I am trying to create a column dependent on whether a certain value exists in a list regardless of when it occurs.
In my table currently I have:
Attendee No - unique number for every attendance Tracking Activity -
Description of Activity Tracking Date/Time - Date & Time when
Activity took place Activity Type - <<<< I need to calculate this
column based on specific logic
[Attendee No] can have multiple [Tracking Activity] & associated [Tracking Date/Time]
Table example. Tracker
+-------------+-------------------+--------------------+---------------+
| Attendee_No | Tracking Activity | Tracking Date/Time | Activity Type |
+-------------+-------------------+--------------------+---------------+
| 3623 | Eat | 05/04/2020 16:28 | Physical |
| 3623 | Music | 05/04/2020 07:16 | Physical |
| 3623 | Run | 05/04/2020 03:52 | Physical |
| 3623 | Booked in | 05/04/2020 03:42 | Physical |
| 3624 | Sleep | 05/04/2020 15:47 | Physical |
| 3624 | Walk | 05/04/2020 11:55 | Physical |
| 3624 | TV | 05/04/2020 11:54 | Physical |
| 3624 | Booked in | 05/04/2020 11:52 | Physical |
+-------------+------------------+--------------------+----------------+
Using the example above what im looking to do is:
For every Attendee No if the Tracker Activity = "Run", "Walk", "Jog", "Gym" regardless when it occurred the Activity Type should = "Physical"
im a SQL noob so have no idea what im doing really so your help will be SOOOO GRATEFULLY appreciated!

For every Attendee No if the Tracker Activity = "Run", "Walk", "Jog", "Gym" regardless when it occurred the Activity Type should = "Physical"
SELECT
CASE
WHEN activity IN('Run','Walk','Jog','Gym') THEN 'Physical'
--WHEN .... other boolean test .. THEN ... other output value...
--ELSE .. catch all value...
END as ActivityType
FROM
...
The -- is commented out - i put these things in as comments to show you how to add more cases, and how to add an else. If you want to use them, uncomment them and modify them.
CASE WHEN has to perform X number of tests and return a single value. It cannot return multiple values
CASE WHEN can also be written like:
CASE activity
WHEN 'Walk' THEN 'Physical'
WHEN 'Jog' THEN 'Physical'
...
--ELSE ...
In this format, you can only supply a single value after the WHEN, and it is compared using =. You can't CASE column WHEN > 0 THEN.. or CASE column WHEN value AND othercolumn = blah THEN..
If you want those kinds of complexity you have to use the CASE WHEN column > 0 AND othercolumn = blah THEN.. form
If you get lots of variations, or you expect to be able to add more in the future without modifying the sql, create another table that has two columns:
Activity,ActivityType
------------
Run,Physical
Walk,Physical
Jog,Physical
Gym,Physical
Study,Mental
...
And join it in:
SELECT * FROM
table t
INNER JOIN activityTypes a ON a.Activity = t.Activity

YOu can have a CASE logic to calculate the Activity Type.
As a general best practice, when you are naming the columns, avoid
below things. They will lead to issues one or other way.
Names with spaces
Keywords
Names with -
types names
SELECT [Attendee No], [Tracking Activity],[Tracking Date/Time],
CASE WHEN [Tracking Activity] IN ( 'Run', 'Walk', 'Jog', 'Gym') THEN 'Physical'
ELSE 'NOT Physical' END AS [Activity Type]
FROM Tracker

I started off with the case statement but what im getting blanks when the Tracking Activity is not in the list. I need ALL the activity type to say "Physical" if the attendee has 'Run', 'Walk', 'Jog', 'Gym' at anytime in the Attendee_No

You can use SQL CASE syntax to achieve it.
See SQL CASE on W3SCHOOLS
Modify your SQL to something like the following and it should do the trick.
SQL
SELECT Attendance_No, Tracking_date,
CASE
WHEN Tracking_activity = 'Run' THEN 'Physical'
WHEN Tracking_ctivity = 'Walk' THEN 'Physical'
WHEN Tracking_ctivity = 'Jog' THEN 'Physical'
WHEN Tracking_ctivity = 'Gym' THEN 'Physical'
ELSE `Other`
END AS 'Activity-type'
FROM Tracker

Related

Issue displaying empty value of repeated columns in Google Data Studio

I've got an issue when trying to visualize in Google Data Studio some information from a denormalized table.
Context: I want to gather all the contact of a company and there related orders in a table in Big Query. Contacts can have no order or multiple orders. Following Big Query best practice, this table is denormalized and all the orders for a client are in arrays of struct. It looks like this:
Fields Examples:
+-------+------------+-------------+-----------+
| Row # | Contact_Id | Orders.date | Orders.id |
+-------+------------+-------------+-----------+
|- 1 | 23 | 2019-02-05 | CB1 |
| | | 2020-03-02 | CB293 |
|- 2 | 2321 | - | - |
|- 3 | 77 | 2010-09-03 | AX3 |
+-------+------------+-------------+-----------+
The issue is when I want to use this table as a data source in Data Studio.
For instance, if I build a table with Contact_Id as dimension, everything is fine and I can see all my contacts. However, if I add any dimensions from the Orders struct, all info from contact with no orders are not displayed. For instance, all info from Contact_Id 2321 is removed from the table.
Have you find any workaround to visualize these empty arrays (for instance as null values)?
The only solution I've found is to build an intermediary table with the orders unnested.
The way I've just discovered to work around this is to add an extra field in my DS-> BQ connector:
ARRAY_LENGTH(fields.orders) AS numberoforders
This will return zero if the array is empty - you can then create calculated fields within DataStudio - using the "numberoforders" field to force values to NULL or zero.
You can fix this behaviour by changing a little your query on the BigQuery connector.
Instead of doing this:
SELECT
Contact_id,
Orders
FROM myproject.mydataset.mytable
try this:
SELECT
Contact_id,
IF(ARRAY_LENGTH(Orders) > 0, Orders, [STRUCT(CAST(NULL AS DATE) AS date, CAST(NULL AS STRING) AS id)]) AS Orders
FROM myproject.mydataset.mytable
This way you are forcing your repeated field to have, at least, an array containing NULL values and hence Data Studio will represent those missing values.
Also, if you want to create new calculated fields using one of the nested fields, you should check before if the value is NULL to avoid filling all NULL values. For example, if you have a repeated and nested field which can be 1 or 0, and you want to create a calculated field swaping the value, you should do:
IF(myfield.key IS NOT NULL, IF(myfield.key = 1, 0, 1), NULL)
Here you can see what happens if you check before swaping and if you don't:
Original value No check Check
1 0 0
0 1 1
NULL 1 NULL
1 0 0
NULL 1 NULL

Header will depend on the change of the operation name

I have tables in which are banking operations and other tables with the amount of operations.
Operation Id | name operation
-------------+----------------
0 | transfer
1 | registration
2 | BLIK
Operation Id | amount
-------------+--------
0 | 15,000
1 | 53,000
2 | 200
E.t.c
I was supposed to write a query that shows the names of the operations in the form of a column together with the amount. Well, I wrote something like this:
Select
case id_operacji
when 0 then amount
end as 'transfer',
case id_operacji
when 1 then amount
end as 'registration ',
case operation id
when 2 then amount of operation
end as 'BLIK'
from ...
In response to the above solution I received information that the main problem is to check that the header will depend on the change of the operation name. Could someone help me how to do it?
As far as I can tell, you are looking for JOIN between the two tables:
select a.amount, o.name_operation
from operations o
join amounts a on o.operation_id = a.operation_id;
I had to guess the table and column names as you did not disclose the real table structures.

SQL - Multiple select filter: Combine filter conditions to get proper results

I'm working on a filter where the user can choose different conditions for the end output. Right now I'm doing the construction of the SQL query, but whenever more conditions are selected, it doesn't work.
Example of the advalues table.
+----+-----------+---------------+------------+
| id | listingId | value | identifier |
+----+-----------+---------------+------------+
| 1 | 1a | Alaskan Husky | race |
+----+-----------+---------------+------------+
| 2 | 1a | Højt | activity |
+----+-----------+---------------+------------+
| 3 | 1c | Akita | race |
+----+-----------+---------------+------------+
| 4 | 1c | Mellem | activity |
+----+-----------+---------------+------------+
As you can see, there's a different row for each advalue.
The outcome I expect
Let's say the user has checked/ticked the checkbox for the race where it says "Alaskan Husky", then it should return the listingId for the match (once). If the user has selected both "Alaskan Husky" and activity level to "Low" then it should return nothing, if the activity level is either "Mellem" or "Højt" (medium, high), then it should return the listingId for where the race is "Alaskan Husky" only, not "Akita". I hope you understand what I'm trying to accomplish.
I tried something like this, which returns nothing.
SELECT * FROM advalues WHERE (identifier="activity" AND value IN("Mellem","Højt")) AND (identifier="race" AND value IN("Alaskan Husky"))
By the way, I want to select distinct listingId as well, so it only returns unique listingId's.
I will continue to search around for solutions, which I've been doing for the past few hours, but wanted to post here too, since I haven't been able to find anything that helped me yet. Thanks!
You can split the restictions on identifier in two tables for each type. Then you join on listingid to obtain the listingId wich have the two type of identifier.
SELECT ad.listingId
FROM advalues ad
JOIN advalues ad2
ON ad.listingId = ad2.listingId
WHERE ( ad.identifier = 'activity' AND ad.value IN( 'Mellem', 'Højt' ) )
AND ( ad2.identifier = 'race' AND ad2.value IN( 'Alaskan Husky' ) )
The question isn't exactly clear, but I think you want this:
WHERE (identifier="activity" AND value IN("Mellem","Højt")) OR (identifier="race" AND value IN("Alaskan Husky"))
If I got you right you are trying to fetch data with different "filters".
Your Query
SELECT listingId FROM advalues
WHERE identifier="activity"
AND value IN("Mellem","Højt")
AND identifier="race"
AND value IN("Alaskan Husky")
Will always return 0 results as you are asking for identifier = "activity" AND identifier = "race"
I think you wanted to do something like this instead:
SELECT listingId FROM advalues
WHERE
(identifier="activity" AND value IN("Mellem","Højt"))
OR
(identifier="race" AND value IN("Alaskan Husky"))

Single record buffering in SAP ABAP

My table is stud.
+-----+------+-------+
| no | name | grade |
+-----+------+-------+
| 101 | naga | A |
| 102 | raj | A |
| 103 | john | A |
+-----+------+-------+
The query I'm using is:
SELECT * FROM stud WHERE no = 101 AND grade = 'A'.
If am using single record buffering, how much data is being stored in the buffer area?
This query doesn't do anything. There is no "into" clause. meaning it wont store anything selected.
You are probably looking to do something like this....
SELECT * FROM stud into wa_stud WHERE no = 101 AND grade = 'A'.
"processing of each single row is performed here
endselect.
or perhaps something like this, where only 1 row (the first rows ordered by primary key) is selected...
select single * from stud into wa_stud where no = 101 and grade = 'A' .
or perhaps you want everything brought in to a table, meaning number and grade does not include the full primary key.
select * from stud into table it_stud where no = 101 and grade = 'A'.
this is from ABAP Keyword documentation in SE38:
SAP Buffer - Single Record Buffering
Only those rows in the table are buffered that are actually accessed.
This requires less space in the buffer than when using generic or full
buffering. On the other hand, more administration work is required and
significantly more direct database accesses.
So since your query returns a single record (based on the data you displayed) it should just get one row and hold in the buffer.
I'd suggest looking at SAP help and Google - also have a look at SELECT SINGLE and incompletely specified keys - there used to be a problem with the buffer being bypassed in some situations - have a read for reference.

Items getting double-counted in SQL Server, dependent counting logic not working right

I am counting the number of RFIs (requests for info) from various agencies. Some of these agencies are also part of a task force (committee). Currently this SQL combines the agencies and task forces into one list and counts the RFIs for each. The problem is, if the RFI belongs to a task force (which is also assigned to an agency), I only want it to count for the task force and not for the agency. However, if the agency does not have a task force assigned to the RFI, I want it to still count for the agency. The RFIs are linked to various agencies through a _LinkEnd table, but that logic works just fine. Here is the logic thus far:
SELECT t.Submitting_Agency, COUNT(DISTINCT t.Count) AS RFICount
FROM (
SELECT RFI_.Submitting_Agency, RFI_.Unique_ID, _LinkEnd.EntityType_ID1, _LinkEnd.Link_ID as Count
FROM RFI_
JOIN _LinkEnd ON RFI_.Unique_ID=_LinkEnd.Entity_ID1
WHERE _LinkEnd.Link_ID LIKE 'CAS%' AND RFI_.Date_Submitted BETWEEN '20110430' AND '20110630'
UNION ALL
SELECT RFI_.Task_Force__Initiative AS Submitting_Agency, RFI_.Unique_ID, _LinkEnd.EntityType_ID1, _LinkEnd.Link_ID as Count
FROM RFI_
JOIN _LinkEnd ON RFI_.Unique_ID=_LinkEnd.Entity_ID1
WHERE _LinkEnd.Link_ID LIKE 'CAS%' AND RFI_.Date_Submitted BETWEEN '20110430' AND '20110630' AND RFI_.Task_Force__Initiative IS NOT NULL) t
GROUP BY t.Submitting_Agency
How can I get it to only count an RFI one time, even though the two fields are combined? For instance, here are sample records from the RFI_ table:
---------------------------------------------------------------------------
| Unique_ID | Submitting_Agency | Task_Force__Initiative | Date_Submitted |
---------------------------------------------------------------------------
| 1 | Social Service | Flood Relief TF | 2011-05-08 |
---------------------------------------------------------------------------
| 2 | Faith-Based Init. | Homeless Shelter Min. | 2011-06-08 |
---------------------------------------------------------------------------
| 3 | Psychology Group | | 2011-05-04 |
---------------------------------------------------------------------------
| 4 | Attorneys at Law | | 2011-05-05 |
---------------------------------------------------------------------------
| 5 | Social Service | | 2011-05-10 |
---------------------------------------------------------------------------
So assuming only one link existed to one RFI for each of these, the count should be as follows:
Social Service 1
Faith-Based Unit. 0
Psychology Group 1
Attorneys at Law 1
Flood Relief TF 1
Homeless Shelter Min. 1
Note that if both an agency and a task force are in one record, then the task force gets the count, not the agency. But it is possible for the agency to have a record without a task force, in which case the agency gets the count. How could I get this to work in this fashion so that RFIs are not double-counted? As it stands both the agency and the task force get counted, which I do not want to happen. The task force always gets the count, unless that field is blank, then the agency gets it.
I guess a simple COLESCE() would do the trick?
SELECT COLAESCE(Task_Force__Initiative, Submitting_Agency), COUNT(DISTINCT _LinkEnd.Link_ID) AS RFICount
FROM RFI_
JOIN _LinkEnd ON RFI_.Unique_ID=_LinkEnd.Entity_ID1
WHERE _LinkEnd.Link_ID LIKE 'CAS%' AND RFI_.Date_Submitted BETWEEN '20110430' AND '20110630'
GROUP BY COLAESCE(Task_Force__Initiative, Submitting_Agency);
Rather than:
SELECT t.Submitting_Agency ...
Try
SELECT
CASE t.[Task_Force__Initiative]
WHEN NULL THEN -- Or whatever value constitutes "empty"
t.[Submitting_Agency]
ELSE
t.[Task_Force__Initiative]
END ...
and then GROUP BY the same.
http://msdn.microsoft.com/en-us/library/ms181765.aspx
The result will be that your count will aggregate from the proper specified grouping point, rather than from the single agency column.
EDIT: From your example it looks like you don't use NULL for the empty field but maybe a blank string? In that case you'll want to replace the NULL in the CASE above with the proper "blank" value. If it is NULL then you can COALESCE as suggested in the other answer.
EDIT: Based on what I think your schema is... and your WHERE criteria
SELECT
COALESCE(RFI_.[Task_Force__Initiative], RFI_.[Submitting_Agency]),
COUNT(*)
FROM
RFI_
JOIN _LinkEnd
ON RFI_.[Unique_ID]=_LinkEnd.[Entity_ID1]
WHERE
_LinkEnd.[Link_ID] LIKE 'CAS%'
AND RFI_.[Date_Submitted] BETWEEN '20110430' AND '20110630'
GROUP BY
COALESCE(RFI_.[Task_Force__Initiative], RFI_.[Submitting_Agency])