Table not aggregating properly - sql

I am trying to create a list of percentages from a dataset of transactional data using SAS/SQL to understand how a specific department contributes to overall sales count for a given quarter. For example, if there were 100 sales of Store ID 234980 and 20 of those were in department a in Q4 of 2006, then the list should output:
Store ID 234980 , 20%.
This is the code I am using to achieve this result.
data testdata;
set work.dataset;
format PostingDate yyq.;
run;
PROC SQL;
CREATE TABLE aggregatedata AS
SELECT DISTINCT testdata.ID,
SUM(CASE
WHEN testdata.Store='A' THEN 1 ELSE 0
END)/COUNT(Store) as PERCENT,
PostingDate
FROM work.testdata
group by testdata.ID, testdata.PostingDate;
QUIT;
However, the output I am receiving is more like this:
StoreID DepartmentA Quarter
100 1 2014Q1
100 0 2014Q2
100 1 2014Q2
100 0 2014Q2
100 0 2014Q2
100 0 2014Q2
101 1 2015Q3
101 0 2015Q3
101 0 2015Q4
Why does my code not aggregate to the store level?

If you want to group by QTR then you need to transform your date values into quarter values. Otherwise '01JAN2017'd and '01FEB2017'd would be seen as two distinct values even though they would both display the same using the YYQ. format.
proc sql;
create table aggregatedata as
select id
, intnx('qtr',postingdate,0,'b') as postingdate format=yyq.
, sum(store='A')/count(store) as percent
from work.testdata
group by 1,2
;
quit;

You do not want to set both DISTINCT and GROUP BY
Perhaps try:
select t.testingdate
,t.StoreID
,t.Department
,count(t.*) / count(select t2.*
from testdata t2
where t.testingdate = t2.testingdate
and t.StoreID = t2.StoreID) AS Percentage
from testdata t
group by t.testingdate
,t.StoreID
,t.Department
Alternately you could use a left join, which may be more efficient. The nested select to count all records, regardless of department may be more clear to read.

Related

SQL aggregate rows with same id , specific value in secondary column

I'm looking to filter out rows in the database (PostgreSQL) if one of the values in the status column occurs. The idea is to sum the amount column if the unique reference only has a status equals to 1. The query should not SELECT the reference at all if it has also a status of 2 or any other status for that matter. status refers to the state of the transaction.
Current data table:
reference | amount | status
1 100 1
2 120 1
2 -120 2
3 200 1
3 -200 2
4 450 1
Result:
amount | status
550 1
I've simplified the data example but I think it gives a good idea of what I'm looking for.
I'm unsuccessful in selecting only references that only have status 1.
I've tried sub-queries, using the HAVING clause and other methods without success.
Thanks
Here's a way using not exists to sum all rows where the status is 1 and other rows with the same reference and a non 1 status do not exist.
select sum(amount) from mytable t1
where status = 1
and not exists (
select 1 from mytable t2
where t2.reference = t1.reference
and t2.status <> 1
)
SELECT SUM(amount)
FROM table
WHERE reference NOT IN (
SELECT reference
FROM table
WHERE status<>1
)
The subquery SELECTs all references that must be excluded, then the main query sums everything except them
select sum (amount) as amount
from (
select sum(amount) as amount
from t
group by reference
having not bool_or(status <> 1)
) s;
amount
--------
550
You could use windowed functions to count occurences of status different than 1 per each group:
SELECT SUM(amount) AS amount
FROM (SELECT *,COUNT(*) FILTER(WHERE status<>1) OVER(PARTITION BY reference) cnt
FROM tc) AS sub
WHERE cnt = 0;
Rextester Demo

Calculate percentages of columns in Oracle SQL

I have three columns, all consisting of 1's and 0's. For each of these columns, how can I calculate the percentage of people (one person is one row/ id) who have a 1 in the first column and a 1 in the second or third column in oracle SQL?
For instance:
id marketing_campaign personal_campaign sales
1 1 0 0
2 1 1 0
1 0 1 1
4 0 0 1
So in this case, of all the people who were subjected to a marketing_campaign, 50 percent were subjected to a personal campaign as well, but zero percent is present in sales (no one bought anything).
Ultimately, I want to find out the order in which people get to the sales moment. Do they first go from marketing campaign to a personal campaign and then to sales, or do they buy anyway regardless of these channels.
This is a fictional example, so I realize that in this example there are many other ways to do this, but I hope anyone can help!
The outcome that I'm looking for is something like this:
percentage marketing_campaign/ personal campaign = 50 %
percentage marketing_campaign/sales = 0%
etc (for all the three column combinations)
Use count, sum and case expressions, together with basic arithmetic operators +,/,*
COUNT(*) gives a total count of people in the table
SUM(column) gives a sum of 1 in given column
case expressions make possible to implement more complex conditions
The common pattern is X / COUNT(*) * 100 which is used to calculate a percent of given value ( val / total * 100% )
An example:
SELECT
-- percentage of people that have 1 in marketing_campaign column
SUM( marketing_campaign ) / COUNT(*) * 100 As marketing_campaign_percent,
-- percentage of people that have 1 in sales column
SUM( sales ) / COUNT(*) * 100 As sales_percent,
-- complex condition:
-- percentage of people (one person is one row/ id) who have a 1
-- in the first column and a 1 in the second or third column
COUNT(
CASE WHEN marketing_campaign = 1
AND ( personal_campaign = 1 OR sales = 1 )
THEN 1 END
) / COUNT(*) * 100 As complex_condition_percent
FROM table;
You can get your percentages like this :
SELECT COUNT(*),
ROUND(100*(SUM(personal_campaign) / sum(count(*)) over ()),2) perc_personal_campaign,
ROUND(100*(SUM(sales) / sum(count(*)) over ()),2) perc_sales
FROM (
SELECT ID,
CASE
WHEN SUM(personal_campaign) > 0 THEN 1
ELSE 0
end AS personal_campaign,
CASE
WHEN SUM(sales) > 0 THEN 1
ELSE 0
end AS sales
FROM the_table
WHERE ID IN
(SELECT ID FROM the_table WHERE marketing_campaign = 1)
GROUP BY ID
)
I have a bit overcomplicated things because your data is still unclear to me. The subquery ensures that all duplicates are cleaned up and that you only have for each person a 1 or 0 in marketing_campaign and sales
About your second question :
Ultimately, I want to find out the order in which people get to the
sales moment. Do they first go from marketing campaign to a personal
campaign and then to sales, or do they buy anyway regardless of these
channels.
This is impossible to do in this state because you don't have in your table, either :
a unique row identifier that would keep the order in which the rows were inserted
a timestamp column that would tell when the rows were inserted.
Without this, the order of rows returned from your table will be unpredictable, or if you prefer, pure random.

Compare Multiple Values SQL

I am creating the 2 temporary tables below. I need to create a flag that says whether all of the weekly_sales values are less than the single average in Table 1, for each customer. What is the best way of doing this?
As an example, here is table 1:
Table 1
cust_nbr avg_sales
1234 200
And here is table 2:
Table 2
cust_nbr weekly_sales week
1234 222 1
1234 211 2
1234 121 4
Try this: it should select each customer, and flag them if their maximum weekly sales figure is still below the average set for them in Table1.
SELECT
A.Cust_nbr,
A.Avg_Sales,
CASE WHEN B.MaxSale < A.Avg_Sales THEN 1 ELSE 0 END IsAlwaysBelowAverage
FROM
Table1 A
LEFT JOIN
(
SELECT
Cust_Nbr,
MAX(Weekly_Sales) AS MaxSale
FROM Table2
) B ON
A.Cust_Nbr = B.Cust_Nbr
To get whether any single value is greater than all of the weekly_sales values, you can do something like this.
CASE
WHEN #avg > (
SELECT MAX(weeky_sales)
FROM [Table 2]
) THEN
1
ELSE
0
END
If you incorporate that into a function, you can add a computed column on [Table 1] to call the function based on the avg_sales.

How can I combine 3 queries into one query and the result form look like schedule table?

I have 3 select queries :
the result of first for heading of my table.(like : select id, name from cars)
the second result show left side of my schedule table shows the date of sales (select date from dates inner join car on date.carid = car.carid where date.date1 > XXX/XX/XX for example)
the third result returns the data for inside the table. and it is the price of each car in each date.
But I don't know how to combine them?
I guess you need something like this Working SQL Server fiddle here
You need either of the following
Pivot feature of SQL Server
Aggregate function with group-by
Query: Pivot feature of SQL Server
SELECT *
FROM
(
SELECT [SALE_DATE], [CAR_NAME], [COST]
FROM CARS_SALES
) AS source
PIVOT
(
MAX(COST)
FOR [CAR_NAME] IN ([BENZ] , [BMW], [RENAULT])
) as pvt;
Query: Aggregate function with group-by
SELECT SALE_DATE,
MAX(CASE WHEN CAR_NAME = 'BENZ' THEN COST ELSE NULL END) [BENZ],
MAX(CASE WHEN CAR_NAME = 'BMW' THEN COST ELSE NULL END) [BMW],
MAX(CASE WHEN CAR_NAME = 'RENAULT' THEN COST ELSE NULL END) [RENAULT]
FROM CARS_SALES
GROUP BY SALE_DATE
Both the Queries give an
output result
as below:
SALE_DATE BENZ BMW RENAULT
09/07/2014 (null) (null) 900
09/08/2014 100 200 300
09/09/2014 400 600 (null)
09/10/2014 700 500 800
It's really unclear, but based on that you've posted, the solution would be something like this:
select cars.name, dates.date, dates.price
from dates
left join cars on (cars.carid=dates.carid)
order by cars.name, dates.date;
This gets the car's name, price and the date in one query. But I don't understand what your third query is for. If you provide more information I'll update this answer.

SQL Query Help: Returning distinct values from Count subquery

I've been stuck for quite a while now trying to get this query to work.
Here's the setup:
I have a [Notes] table that contains a nonunique (Number) column and a nonunique (Result) column. I'm looking to create a SELECT statement that will display each distinct (Number) value where the count of the {(Number), (Result)} tuple where Result = 'NA' is > 25.
Number | Result
100 | 'NA'
100 | 'TT'
101 | 'NA'
102 | 'AM'
100 | 'TT'
200 | 'NA'
200 | 'NA'
201 | 'NA'
Basically, have an autodialer that calls a number and returns a code depending on the results of the call. We want to ignore numbers that have had an 'NA'(no answer) code returned more than 25 times.
My basic attempts so far have been similar to:
SELECT DISTINCT n1.Number
FROM Notes n1
WHERE (SELECT COUNT(*) FROM Notes n2
WHERE n1.Number = n2.Number and n1.Result = 'NA') > 25
I know this query isn't correct, but in general I'm not sure how to relate the DISTINCT n1.Number from the initial select to the Number used in the subquery COUNT. Most examples I see aren't actually doing this by adding a condition to the COUNT returned. I haven't had to touch too much SQL in the past half decade, so I'm quite rusty.
you can do it like this :
SELECT Number
FROM Notes
WHERE Result = 'NA'
GROUP BY Number
HAVING COUNT(Result) > 25
Try this:
SELECT Number
FROM (
SELECT Number, Count(Result) as CountNA
FROM Notes
WHERE Result = 'NA'
GROUP BY Number
)
WHERE CountNA > 25
EDIT: depending on SQL product, you may need to give the derived table a table correlation name e.g.
SELECT DT1.Number
FROM (
SELECT Number, Count(Result) as CountNA
FROM Notes
WHERE Result = 'NA'
GROUP
BY Number
) AS DT1 (Number, CountNA)
WHERE DT1.CountNA > 25;