sas/sql logic needed - sql

I have a data with SSN and Open date and have to calculate if a customer has opened 2 or more accounts within 120 days based on the open_date field. I know to use INTCK/INTNX functions but it requires 2 date fields, not sure how to apply the same logic on a single field for same customer.Please suggest.
SSN account Open_date
xyz 000123 12/01/2015
xyz 112344 11/22/2015
xyz 893944 04/05/2016
abc 992343 01/10/2016
abc 999999 03/05/2016
123 111123 07/16/2015
123 445324 10/12/2015

You can use exists or join:
proc sql;
select distinct SSN
from t
where exists (select 1
from t t2
where t2.SSN = t.SSN and
t2.open_date between t.open_date and t.open_date + 120
);

I'd do it using JOIN :
proc sql;
create table want as
select *
from have
where SSN in
(select a.SSN
from have a
inner join have b
on a.SSN=b.SSN
where intck('day', a.Open_date, b.Open_Date)+1 < 120)
;
quit;

Just a slightly different solution here - use the dif function which calculates the number of days between accounts being open.
proc sort data=have;
by ssn open_date;
run;
data want;
set have;
by ssn;
days_between_open = dif(open_date);
if first.ssn then days_between_open = .;
*if 0 < days_between_open < 120 then output;
run;
Then you can filter the table above as required. I've left it commented out at this point because you haven't specified how you want your output table.

Related

Table not aggregating properly

I am trying to create a list of percentages from a dataset of transactional data using SAS/SQL to understand how a specific department contributes to overall sales count for a given quarter. For example, if there were 100 sales of Store ID 234980 and 20 of those were in department a in Q4 of 2006, then the list should output:
Store ID 234980 , 20%.
This is the code I am using to achieve this result.
data testdata;
set work.dataset;
format PostingDate yyq.;
run;
PROC SQL;
CREATE TABLE aggregatedata AS
SELECT DISTINCT testdata.ID,
SUM(CASE
WHEN testdata.Store='A' THEN 1 ELSE 0
END)/COUNT(Store) as PERCENT,
PostingDate
FROM work.testdata
group by testdata.ID, testdata.PostingDate;
QUIT;
However, the output I am receiving is more like this:
StoreID DepartmentA Quarter
100 1 2014Q1
100 0 2014Q2
100 1 2014Q2
100 0 2014Q2
100 0 2014Q2
100 0 2014Q2
101 1 2015Q3
101 0 2015Q3
101 0 2015Q4
Why does my code not aggregate to the store level?
If you want to group by QTR then you need to transform your date values into quarter values. Otherwise '01JAN2017'd and '01FEB2017'd would be seen as two distinct values even though they would both display the same using the YYQ. format.
proc sql;
create table aggregatedata as
select id
, intnx('qtr',postingdate,0,'b') as postingdate format=yyq.
, sum(store='A')/count(store) as percent
from work.testdata
group by 1,2
;
quit;
You do not want to set both DISTINCT and GROUP BY
Perhaps try:
select t.testingdate
,t.StoreID
,t.Department
,count(t.*) / count(select t2.*
from testdata t2
where t.testingdate = t2.testingdate
and t.StoreID = t2.StoreID) AS Percentage
from testdata t
group by t.testingdate
,t.StoreID
,t.Department
Alternately you could use a left join, which may be more efficient. The nested select to count all records, regardless of department may be more clear to read.

update multiple records with different where conditions

I have two tables which are used to deal with identifier changes.
So the table below is where identifiers are logged.
tblNewIds
DateFrom OldId NewId
2017-06-02 ABC ABB
2017-04-21 XYZ JHG
The next table is where all the daily sales are stored.
tblSales
DateSale Id
2017-01-01 ABC
2017-01-01 XYZ
2017-01-02 ABC
2017-01-02 XYZ
...
2017-06-20 ABC
2017-06-20 XYZ
I want a query to update tblSales such that from 2017-04-21 any Id that equals XYZ changes to JHG & for from 2017-06-02 change ABC to ABB.
I know how I can do this for one record at a time with the update statement below but I would like to know how to do both at once?
update tblSales
set Id = 'ABB'
where Id = 'ABC' and DateSale >= '2017-06-02'
Assuming that ids are not chained, then you can do:
update s
set id = ni.NewId
from tblSales s join
tblNewIds ni
on s.id = ni.oldId and s.DateSale >= ni.DateFrom;
I would be cautious about making the change in the data, though. Losing the information about the original id could have unexpected side-effects.
If the ids can change more than once, I would suggest just running the update until there are no more changes. Although you can construct the correct id at a given point in time using a recursive CTE, it is a lot more work for a one-time effort.
You might be able to slightly modify your current update to use a CASE expression which can cover both types of update in a single statement.
update tblSales
set Id = case when Id = 'ABC' and DateSale >= '2017-06-02' then 'ABB'
when Id = 'XYZ' and DateSale >= '2017-04-21' then 'JHG' END
where (Id = 'ABC' and DateSale >= '2017-06-02') or
(Id = 'XYZ' and DateSale >= '2017-04-21')
UPDATE tblSales
SET id= CASE
WHEN (Id = 'ABC' and DateSale >= '2017-06-02') THEN 'ABB'
WHEN (Id = 'XYZ' and DateSale >= '2017-04-21') THEN 'JHG'
END ;

How to add a sum column with respect to a group by

I have a table1 :
ZP age Sexe Count
A 40 0 5
A 40 1 3
C 55 1 2
And i want to add a column wich sum the count column by grouping the first two variables :
ZP age Sexe Count Sum
A 40 0 5 8
A 40 1 3 8
C 55 1 2 2
this is what i do :
CREATE TABLE table2 AS SELECT zp, age, SUM(count) FROM table1 GROUP BY zp, age
then :
CREATE TABLE table3 AS SELECT * FROM table1 NATURAL JOIN table2
But i have a feeling this is a sloppy way to do it. Do you know any better ways ? For example with no intermediates tables.
edit : i am using SQL through a proc sql in SAS
I'm not quite sure if there is a method for a single select statement but below will work without multiple create table statements:
data have;
length ZP $3 age 3 Sexe $3 Count 3;
input ZP $ age Sexe $ Count;
datalines;
A 40 0 5
A 40 1 3
C 55 1 2
;
run;
proc sql noprint;
create table WANT as
select a.*, b.SUM
from
(select * from HAVE) a,
(select ZP,sum(COUNT) as SUM from HAVE group by ZP) b
where a.ZP = b.ZP;
quit;
PROC SQL does not support enhanced SQL features like PARTITION.
But it looks like you want to include summarized data and detail rows at the same time? If that is the question then PROC SQL will do that for you automatically. If you include in your list of variables to select variables that are neither group by variables or summary statistics then SAS will automatically add in the needed re-joining of the summary statistics to the detail rows to produce the table you want.
proc sql;
SELECT zp, age, sexe, count, SUM(count)
FROM table1
group by zp, age
;
quit;
You can use SUM as follows with standard SQL:2003 syntax (I don't know if SAS accepts it):
SELECT zp, age, sexe, count, SUM(count) OVER (PARTITION BY zp, age)
FROM table1;
data have;
input ZP $ age Sexe Count;
datalines;
A 40 0 5
A 40 1 3
C 55 1 2
;
run;
proc sql;
create table want as select
*, sum(count) as sum
from have
group by zp, age;
quit;

Adding in missing dates from results in SQL

I have a database that currently looks like this
Date | valid_entry | profile
1/6/2015 1 | 1
3/6/2015 2 | 1
3/6/2015 2 | 2
5/6/2015 4 | 4
I am trying to grab the dates but i need to make a query to display also for dates that does not exist in the list, such as 2/6/2015.
This is a sample of what i need it to be:
Date | valid_entry
1/6/2015 1
2/6/2015 0
3/6/2015 2
3/6/2015 2
4/6/2015 0
5/6/2015 4
My query:
select date, count(valid_entry)
from database
where profile = 1
group by 1;
This query will only display the dates that exist in there. Is there a way in query that I can populate the results with dates that does not exist in there?
You can generate a list of all dates that are between the start and end date from your source table using generate_series(). These dates can then be used in an outer join to sum the values for all dates.
with all_dates (date) as (
select dt::date
from generate_series( (select min(date) from some_table), (select max(date) from some_table), interval '1' day) as x(dt)
)
select ad.date, sum(coalesce(st.valid_entry,0))
from all_dates ad
left join some_table st on ad.date = st.date
group by ad.date, st.profile
order by ad.date;
some_table is your table with the sample data you have provided.
Based on your sample output, you also seem to want group by date and profile, otherwise there can't be two rows with 2015-06-03. You also don't seem to want where profile = 1 because that as well wouldn't generate two rows with 2015-06-03 as shown in your sample output.
SQLFiddle example: http://sqlfiddle.com/#!15/b0b2a/2
Unrelated, but: I hope that the column names are only made up. date is a horrible name for a column. For one because it is also a keyword, but more importantly it does not document what this date is for. A start date? An end date? A due date? A modification date?
You have to use a calendar table for this purpose. In this case you can create an in-line table with the tables required, then LEFT JOIN your table to it:
select "date", count(valid_entry)
from (
SELECT '2015-06-01' AS d UNION ALL '2015-06-02' UNION ALL '2015-06-03' UNION ALL
'2015-06-04' UNION ALL '2015-06-05' UNION ALL '2015-06-06') AS t
left join database AS db on t.d = db."date" and db.profile = 1
group by t.d;
Note: Predicate profile = 1 should be applied in the ON clause of the LEFT JOIN operation. If it is placed in the WHERE clause instead then LEFT JOIN essentially becomes an INNER JOIN.

Show data from table even if there is no data!! Oracle

I have a query which shows count of messages received based on dates.
For Eg:
1 | 1-May-2012
3 | 3-May-2012
4 | 6-May-2012
7 | 7-May-2012
9 | 9-May-2012
5 | 10-May-2012
1 | 12-May-2012
As you can see on some dates there are no messages received. What I want is it should show all the dates and if there are no messages received it should show 0 like this
1 | 1-May-2012
0 | 2-May-2012
3 | 3-May-2012
0 | 4-May-2012
0 | 5-May-2012
4 | 6-May-2012
7 | 7-May-2012
0 | 8-May-2012
9 | 9-May-2012
5 | 10-May-2012
0 | 11-May-2012
1 | 12-May-2012
How can I achieve this when there are no rows in the table?
First, it sounds like your application would benefit from a calendar table. A calendar table is a list of dates and information about the dates.
Second, you can do this without using temporary tables. Here is the approach:
with constants as (select min(thedate>) as firstdate from <table>)
dates as (select( <firstdate> + rownum - 1) as thedate
from (select rownum
from <table> cross join constants
where rownum < sysdate - <firstdate> + 1
) seq
)
select dates.thedate, count(t.date)
from dates left outer join
<table> t
on t.date = dates.thedate
group by dates.thedate
Here is the idea. The alias constants records the earliest date in your table. The alias dates then creates a sequence of dates. The inner subquery calculates a sequence of integers, using rownum, and then adds these to the first date. Note this assumes that you have on average at least one transaction per date. If not, you can use a bigger table.
The final part is the join that is used to bring back information about the dates. Note the use of count(t.date) instead of count(*). This counts the number of records in your table, which should be 0 for dates with no data.
You don't need a separate table for this, you can create what you need in the query. This works for May:
WITH month_may AS (
select to_date('2012-05-01', 'yyyy-mm-dd') + level - 1 AS the_date
from dual
connect by level < 31
)
SELECT *
FROM month_may mm
LEFT JOIN mytable t ON t.some_date = mm.the_date
The date range will depend on how exactly you want to do this and what your range is.
You could achieve this with a left outer join IF you had another table to join to that contains all possible dates.
One option might be to generate the dates in a temp table and join that to your query.
Something like this might do the trick.
CREATE TABLE #TempA (Col1 DateTime)
DECLARE #start DATETIME = convert(datetime, convert(nvarchar(10), getdate(), 121))
SELECT #start
DECLARE #counter INT = 0
WHILE #counter < 50
BEGIN
INSERT INTO #TempA (Col1) VALUES (#start)
SET #start = DATEADD(DAY, 1, #start)
SET #counter = #counter+1
END
That will create a TempTable to hold the dates... I've just generated 50 of them starting from today.
SELECT
a.Col1,
COUNT(b.MessageID)
FROM
TempA a
LEFT OUTER JOIN YOUR_MESSAGE_TABLE b
ON a.Col1 = b.DateColumn
GROUP BY
a.Col1
Then you can left join your message counts to that.