SAS: group rows into different datasets by condition - sql

favorite
I need to create 7 datasets (local, web, call, local&call, local&web, call&web, all) depending on if the customer has used a channel from the below sample data.
| customer | call | local | web |
|----------|------|-------|-----|
| 1 | 1 | 1 | 1 |
| 1 | | 1 | 1 |
| 1 | | 1 | |
| 2 | 1 | | 1 |
| 2 | | 1 | |
| 2 | 1 | | |
| 3 | | | 1 |
| 3 | 1 | 1 | |
please see this picture for more details on the sample table
So if a customer has used all three channels in one instance and in the other instance he just uses either of them, then that row with Customer=1 should go to the'all' dataset. Similarly for 3, if he has used local and web in one instance and just web in another instance, then it should go to the local&web dataset.
Customer IDs should not be duplicated in other dataset i.e. customer 1 can belong to wither one of the dataset only.
I am stuck with this, can anyone give me a snippet of either sas or sql code to proceed further.
Thanks !

If all three go to "all", then use aggregation:
select customer,
(case when max(call) > 0 and max(local) > 0 and max(web) > 0 then 'all'
else concat_ws('&', (case when max(call) > 0 then 'call' end),
(case when max(local) > 0 then 'local' end),
(case when max(web) > 0 then 'web' end)
)
end) as grp
from t
group by customer;

Related

SQL Pivot on Conditional Count

I have a table of vulnerabilities using SQL server, when I perform the following query
select * from table
The output looks like so.
| Name | HostName | Week |
| ------------- |------------| -------|
| java | Hosta | 1 |
| java | Hostb | 1 |
| java | Hostb | 2 |
| Ansible | Hosta | 1 |
| Ansible | Hosta | 2 |
| Ansible | Hosta | 3 |
| Ansible | Hostb | 3 |
My aim is to generate an output that pivots the weeks into column tables, with the values being a count of Hosts for a given vulnerability in that week.
| Vulnerability | Week 1 | Week 2 | Week 3 |
| ------------- |--------| -------| -------|
| java | 2 | 1 | 0 |
| Ansible | 1 | 1 | 2 |
My initial attempt was to do
select * from table
PIVOT(
count(HostName)
For week in ([1],[2],[3])
) AS OUT
But the output was the correct layout, but incorrect data as if it was only counting the first occurrence.
Is an amendment to the count term required or is my approach the wrong one?
Conditional aggregation is simpler:
select vulnerability,
sum(case when week = 1 then 1 else 0 end) as week_1,
sum(case when week = 2 then 1 else 0 end) as week_2,
sum(case when week = 3 then 1 else 0 end) as week_3
from t
group by vulnerability;
Note only is pivot bespoke syntax, but it is sensitive to what columns are in the table. Extra columns are interpreted as "group by" criteria, affecting the results from the query.

SQL combine multiple records of one table into multiple columns of another table

I am working in SQL and have a destination table:
Event ID (key) | Road | total count | motorcycles | cars | trucks | bus
And I have a record like table:
[Event ID | mode of transport | count
1 | bus | 3
1 | cars | 20
1 | trucks | 2
1 | motorcycles | 5
2 | bus | 1
2 | cars | 12
2 | motorcycles | 1][1]
(combination of Event ID and mode of transport combination is unique)
How do I combine the data from the second table into the first easily as result:
Event ID (key) | Road | total count | motorcycles | cars | trucks | bus
1 | ... | ... | 5 | 20 | 2 | 3
2 | ... | ... | 1 | 12 | | 1
I am looking for a way that can incorporate the record data from the second table in one SQL structure / statement. Thank you!
You can use conditional aggregation:
select eventid, road,
sum(count) as total_count,
sum(case when mode = 'motorcycles' then count end) as cnt_motorcycles,
sum(case when mode = 'car' then count end) as cnt_car,
sum(case when mode = 'bus' then count end) as cnt_bus,
sum(case when mode = 'truck' then count end) as cnt_truck
from t
group by eventid, road;

What Clause would most optimally create this query?

So I don't have much experience with SQL, and am trying to learn. An interview question I came across had this question. I'm trying to learn more SQL but maybe I'm missing a piece of info to solve this? Or maybe I'm approaching the problem wrong.
This is the question:
We have following two tables , below is their info:
POLICY (id as int, policy_content as varchar2)
POLICY_VOTES (vote as boolean, policy_id as int)
Write a single query that returns the policy_id, number of yes(true) votes and number of no(false) votes with a row for each policy up for a vote stored
My first thought when approaching this was to use a WITH clause to get the policy_ids and use an inner join to get the votes for yes and no but I can't find a way to make it work, which is what leads me to believe that there's another clause in SQL I'm not aware of or couldn't find that would make it easier. Either that or I'm thinking of the problem in the wrong way.
Good question.
I cannot answer too specifically, since you did not specify a DBMS, but what you will want to do is count or situationally sum based on criteria. When you use an aggregate function like that, you also need GROUP BY.
Here are two example tables I made with test data:
policy
| id | policy_content |
|----|----------------|
| 1 | foo |
| 2 | foo |
| 3 | foo |
| 4 | foo |
| 5 | foo |
policy votes
| vote | policy_id |
|------|-----------|
| yes | 1 |
| no | 1 |
| yes | 2 |
| yes | 2 |
| no | 3 |
| no | 3 |
| no | 4 |
| yes | 4 |
| yes | 5 |
| yes | 5 |
Using the below query:
SELECT
policy_votes.policy_id,
SUM(CASE WHEN vote = 'yes' THEN 1 ELSE 0 END) AS yes_votes,
SUM(CASE WHEN vote = 'no' THEN 1 ELSE 0 END) AS no_votes
FROM
policy_votes
GROUP BY
policy_votes.policy_id
You get:
| POLICY_ID | YES_VOTES | NO_VOTES |
|-----------|-----------|----------|
| 1 | 1 | 1 |
| 2 | 2 | 0 |
| 4 | 1 | 1 |
| 5 | 2 | 0 |
| 3 | 0 | 2 |
Here is an SQL Fiddle for you to try it out.
Try this:
select p.id, p.content,
Count(case when pv.vote='true' then 1 end) as number_of_yes,
Count(case when pv.vote='false' then 1 end) as number_of_no
From policy p join policy_votes pv
On(p.id = pv.policy_id)
Group by p.id, p.content
Cheers!!

Count function with multiple conditions

I'm trying to do an overall count function on a set of data with multiple conditions but am having trouble with it. I'm a beginner and tried using a simple count function but am having no luck. I looked into using case when but am having trouble with it. Does anyone know how I should go about this code?
Here is an example of my table:
Name | Date | Status | Candy | Soda | Water
Nancy | 10/19/16 | active | 2 | 0 | 1
Lindsy| 10/20/15 | active | 0 | 1 | 0
Erica | 10/20/13 | active | 0 | 2 | 3
Lane | 10/19/14 | active | 0 | 0 | 4
Alexa | 10/19/16 | notactive | 0 | 5 | 1
Jenn | 10/19/16 | active | 0 | 0 | 0
I'm looking to do an overall count of the names under the conditions that: either candy, soda, or water are anything other than zero(doesn't matter what column or how many, just if one of those three are not zero), the account is active and also when the date falls within the last two years, 10/2014 - 10/2016.
I would want the query to tell me that the count total was 3 and also show me:
Name | Date | Status | Candy | Soda | Water
Nancy | 10/19/16 | active | 2 | 0 | 1
Lindsy| 10/20/15 | active | 0 | 1 | 0
Lane | 10/19/14 | active | 0 | 0 | 4
These are two different questions. The basic idea to get the rows is:
select t.*
from t
where greatest(candy, soda, water) > 0 and
status = 'active' and
date >= curdate() - interval 2 year;
(In Oracle, you would could use sysdate rather than curdate().)
To get the count, you would use count(*) rather than * in the select. SQL queries only return one result set . . . so you either get all the rows or a single count.
SELECT *
FROM yourTable
WHERE (Candy > 0 OR Soda > 0 OR Water > 0) AND
Status = 'active' AND
Date BETWEEN '2014-10-01' AND SYSDATE

SQL ratio between rows

I have a SQL table with the following format:
+------------------------------------+
| function_id | event_type | counter |
+-------------+------------+---------+
| 1 | fail | 1000 |
| 1 | started | 5000 |
| 2 | fail | 800 |
| 2 | started | 4500 |
| ... | ... | ... |
+-------------+------------+---------+
I want to run a query over this that will group the results by function_id, by giving a ratio of the number of 'fail' events vs the number of 'started' events, as well as maintaining the number of failures. I.e. I want to run a query that will give something that looks like the following:
+-------------------------------------+
| function_id | fail_ratio | failures |
+-------------+------------+----------+
| 1 | 20% | 1000 |
| 2 | 17.78% | 800 |
| ... | ... | |
+-------------+------------+----------+
I've tried a few approaches but have been unsuccessful so far. I'm using Apache Drill SQL at the moment, as this data is being pulled from flat files.
Any help would be greatly appreciated! :)
This is all conditional aggregation:
select function_id,
sum(case when event_type = 'fail' then counter*1.0 end) / sum(case when event_type = 'started' then counter end) as fail_start_ratio,
sum(case when event_type = 'fail' then counter end) as failures
from t
group by function_id