SQL Pivot on Conditional Count - sql

I have a table of vulnerabilities using SQL server, when I perform the following query
select * from table
The output looks like so.
| Name | HostName | Week |
| ------------- |------------| -------|
| java | Hosta | 1 |
| java | Hostb | 1 |
| java | Hostb | 2 |
| Ansible | Hosta | 1 |
| Ansible | Hosta | 2 |
| Ansible | Hosta | 3 |
| Ansible | Hostb | 3 |
My aim is to generate an output that pivots the weeks into column tables, with the values being a count of Hosts for a given vulnerability in that week.
| Vulnerability | Week 1 | Week 2 | Week 3 |
| ------------- |--------| -------| -------|
| java | 2 | 1 | 0 |
| Ansible | 1 | 1 | 2 |
My initial attempt was to do
select * from table
PIVOT(
count(HostName)
For week in ([1],[2],[3])
) AS OUT
But the output was the correct layout, but incorrect data as if it was only counting the first occurrence.
Is an amendment to the count term required or is my approach the wrong one?

Conditional aggregation is simpler:
select vulnerability,
sum(case when week = 1 then 1 else 0 end) as week_1,
sum(case when week = 2 then 1 else 0 end) as week_2,
sum(case when week = 3 then 1 else 0 end) as week_3
from t
group by vulnerability;
Note only is pivot bespoke syntax, but it is sensitive to what columns are in the table. Extra columns are interpreted as "group by" criteria, affecting the results from the query.

Related

What Clause would most optimally create this query?

So I don't have much experience with SQL, and am trying to learn. An interview question I came across had this question. I'm trying to learn more SQL but maybe I'm missing a piece of info to solve this? Or maybe I'm approaching the problem wrong.
This is the question:
We have following two tables , below is their info:
POLICY (id as int, policy_content as varchar2)
POLICY_VOTES (vote as boolean, policy_id as int)
Write a single query that returns the policy_id, number of yes(true) votes and number of no(false) votes with a row for each policy up for a vote stored
My first thought when approaching this was to use a WITH clause to get the policy_ids and use an inner join to get the votes for yes and no but I can't find a way to make it work, which is what leads me to believe that there's another clause in SQL I'm not aware of or couldn't find that would make it easier. Either that or I'm thinking of the problem in the wrong way.
Good question.
I cannot answer too specifically, since you did not specify a DBMS, but what you will want to do is count or situationally sum based on criteria. When you use an aggregate function like that, you also need GROUP BY.
Here are two example tables I made with test data:
policy
| id | policy_content |
|----|----------------|
| 1 | foo |
| 2 | foo |
| 3 | foo |
| 4 | foo |
| 5 | foo |
policy votes
| vote | policy_id |
|------|-----------|
| yes | 1 |
| no | 1 |
| yes | 2 |
| yes | 2 |
| no | 3 |
| no | 3 |
| no | 4 |
| yes | 4 |
| yes | 5 |
| yes | 5 |
Using the below query:
SELECT
policy_votes.policy_id,
SUM(CASE WHEN vote = 'yes' THEN 1 ELSE 0 END) AS yes_votes,
SUM(CASE WHEN vote = 'no' THEN 1 ELSE 0 END) AS no_votes
FROM
policy_votes
GROUP BY
policy_votes.policy_id
You get:
| POLICY_ID | YES_VOTES | NO_VOTES |
|-----------|-----------|----------|
| 1 | 1 | 1 |
| 2 | 2 | 0 |
| 4 | 1 | 1 |
| 5 | 2 | 0 |
| 3 | 0 | 2 |
Here is an SQL Fiddle for you to try it out.
Try this:
select p.id, p.content,
Count(case when pv.vote='true' then 1 end) as number_of_yes,
Count(case when pv.vote='false' then 1 end) as number_of_no
From policy p join policy_votes pv
On(p.id = pv.policy_id)
Group by p.id, p.content
Cheers!!

SQL: tricky question for finding lockout dates

Hope you can help. We have a table with two columns Customer_ID and Trip_Date. The customer receives 15% off on their first visit and on every visit where they haven't received the 15% off offer in the past thirty days. How do I write a single SQL query that finds all days where a customer received 15% off?
The table looks like this
+-----+-------+----------+
| Customer_ID | date |
+-----+-------+----------+
| 1 | 01-01-17 |
| 1 | 01-17-17 |
| 1 | 02-04-17 |
| 1 | 03-01-17 |
| 1 | 03-15-17 |
| 1 | 04-29-17 |
| 1 | 05-18-17 |
+-----+-------+----------+
The desired output would look like this:
+-----+-------+----------+--------+----------+
| Customer_ID | date | received_discount |
+-----+-------+----------+--------+----------+
| 1 | 01-01-17 | 1 |
| 1 | 01-17-17 | 0 |
| 1 | 02-04-17 | 1 |
| 1 | 03-01-17 | 0 |
| 1 | 03-15-17 | 1 |
| 1 | 04-29-17 | 1 |
| 1 | 05-18-17 | 0 |
+-----+-------+----------+--------+----------+
We are doing this work in Netezza. I can't think of a way using just window functions, only using recursion and looping. Is there some clever trick that I'm missing?
Thanks in advance,
GF
You didn't tell us what your backend is, nor you gave some sample data and expected output nor you gave a sensible data schema :( This is an example based on guess of schema using postgreSQL as backend (would be too messy as a comment):
(I think you have Customer_Id, Trip_Date and LocationId in trips table?)
select * from trips t1
where not exists (
select * from trips t2
where t1.Customer_id = t2.Customer_id and
t1.Trip_Date > t2.Trip_Date
and t1.Trip_date - t2.Trip_Date < 30
);

Count function with multiple conditions

I'm trying to do an overall count function on a set of data with multiple conditions but am having trouble with it. I'm a beginner and tried using a simple count function but am having no luck. I looked into using case when but am having trouble with it. Does anyone know how I should go about this code?
Here is an example of my table:
Name | Date | Status | Candy | Soda | Water
Nancy | 10/19/16 | active | 2 | 0 | 1
Lindsy| 10/20/15 | active | 0 | 1 | 0
Erica | 10/20/13 | active | 0 | 2 | 3
Lane | 10/19/14 | active | 0 | 0 | 4
Alexa | 10/19/16 | notactive | 0 | 5 | 1
Jenn | 10/19/16 | active | 0 | 0 | 0
I'm looking to do an overall count of the names under the conditions that: either candy, soda, or water are anything other than zero(doesn't matter what column or how many, just if one of those three are not zero), the account is active and also when the date falls within the last two years, 10/2014 - 10/2016.
I would want the query to tell me that the count total was 3 and also show me:
Name | Date | Status | Candy | Soda | Water
Nancy | 10/19/16 | active | 2 | 0 | 1
Lindsy| 10/20/15 | active | 0 | 1 | 0
Lane | 10/19/14 | active | 0 | 0 | 4
These are two different questions. The basic idea to get the rows is:
select t.*
from t
where greatest(candy, soda, water) > 0 and
status = 'active' and
date >= curdate() - interval 2 year;
(In Oracle, you would could use sysdate rather than curdate().)
To get the count, you would use count(*) rather than * in the select. SQL queries only return one result set . . . so you either get all the rows or a single count.
SELECT *
FROM yourTable
WHERE (Candy > 0 OR Soda > 0 OR Water > 0) AND
Status = 'active' AND
Date BETWEEN '2014-10-01' AND SYSDATE

SQL ratio between rows

I have a SQL table with the following format:
+------------------------------------+
| function_id | event_type | counter |
+-------------+------------+---------+
| 1 | fail | 1000 |
| 1 | started | 5000 |
| 2 | fail | 800 |
| 2 | started | 4500 |
| ... | ... | ... |
+-------------+------------+---------+
I want to run a query over this that will group the results by function_id, by giving a ratio of the number of 'fail' events vs the number of 'started' events, as well as maintaining the number of failures. I.e. I want to run a query that will give something that looks like the following:
+-------------------------------------+
| function_id | fail_ratio | failures |
+-------------+------------+----------+
| 1 | 20% | 1000 |
| 2 | 17.78% | 800 |
| ... | ... | |
+-------------+------------+----------+
I've tried a few approaches but have been unsuccessful so far. I'm using Apache Drill SQL at the moment, as this data is being pulled from flat files.
Any help would be greatly appreciated! :)
This is all conditional aggregation:
select function_id,
sum(case when event_type = 'fail' then counter*1.0 end) / sum(case when event_type = 'started' then counter end) as fail_start_ratio,
sum(case when event_type = 'fail' then counter end) as failures
from t
group by function_id

SAS: group rows into different datasets by condition

favorite
I need to create 7 datasets (local, web, call, local&call, local&web, call&web, all) depending on if the customer has used a channel from the below sample data.
| customer | call | local | web |
|----------|------|-------|-----|
| 1 | 1 | 1 | 1 |
| 1 | | 1 | 1 |
| 1 | | 1 | |
| 2 | 1 | | 1 |
| 2 | | 1 | |
| 2 | 1 | | |
| 3 | | | 1 |
| 3 | 1 | 1 | |
please see this picture for more details on the sample table
So if a customer has used all three channels in one instance and in the other instance he just uses either of them, then that row with Customer=1 should go to the'all' dataset. Similarly for 3, if he has used local and web in one instance and just web in another instance, then it should go to the local&web dataset.
Customer IDs should not be duplicated in other dataset i.e. customer 1 can belong to wither one of the dataset only.
I am stuck with this, can anyone give me a snippet of either sas or sql code to proceed further.
Thanks !
If all three go to "all", then use aggregation:
select customer,
(case when max(call) > 0 and max(local) > 0 and max(web) > 0 then 'all'
else concat_ws('&', (case when max(call) > 0 then 'call' end),
(case when max(local) > 0 then 'local' end),
(case when max(web) > 0 then 'web' end)
)
end) as grp
from t
group by customer;