I have 2 sets.
First one is big (~1000k rows), it contains patient observation data grouped by observation year, from, lets say 2000 to 2005. In this set there are some patients that contain observations for all years (or should I say for each year in sequence), and there are some that has, for example, observations for year 2002-2003 only.
The second set contains only sequence of years from 2000 till 2005, 6 rows.
What I want to have is a table with the data from set 1 for each patient, but extended so that for each patient I would see observations for each year from set 2, and if there were not any observation for particular year in set 1, the empty rows should be added or emptyness (or better "-") in the data column only.
For example set 1 could be:
patient_id | obs_year | data
a 2000 10
a 2001 12
a 2002 13
a 2003 9
a 2004 1
a 2005 6
bb 2002 100
bb 2003 110
Set 2 is like:
year |
2000
2001
2002
2003
2004
2005
So what I want in result ideally would be like this:
patient_id | obs_year | data
a 2000 10
a 2001 12
a 2002 13
a 2003 9
a 2004 1
a 2005 6
bb 2000 -
bb 2001 -
bb 2002 100
bb 2003 110
bb 2004 -
bb 2005 -
I should also mention that I do this job in SAS, so SQL query or SAS script (or both )solutions are welcomed.
Dedup your patient_id from set 1 in a sort. Merge this onto set 2 to give every patient_id against the years, then merge this back onto set 1 by patient_id and year to give your output. Anywhere that patient_id and year do not match will be blank as in your desired output
Another option is PROC FREQ with sparse, which produces a line for every possible combination whether they appear or not. This works if you don't have any legitimate zeroes in the data; if you do and care that they're different from missing, this won't work.
proc freq data=have noprint;
weight data;
tables patient_id*obs_year/missing sparse out=want(rename=count=data keep=count patient_id obs_year);
run;
Then you need to convert 0 back to missing, if you care about the difference (presumably in the next step, if there is one).
A similar approach that is closer to the desired results is proc tabulate with printmiss, which works similarly to sparse:
proc tabulate data=have out=want(keep=patient_id obs_year data_sum rename=data_sum=data);
class patient_id obs_year;
var data;
tables patient_id,obs_year*data*sum='data'/printmiss misstext='.';
run;
That actually does get you missing values properly.
Related
I'm working in Access with some Crosstab columns that are generated dynamically based on the reports received (eg: Based on the parameter year: 2009,2010,2011). I’d like to calculate the difference of two years consecutively for all the generated columns (eg:2009-10,10-11,….). Is there any way to do it?
Eg:
Default crosstab:
2009 2010 2011
A 30 20 5
B 30 20 5
Expected output:
2009 2010 2011 20010-2009 2011-10
A 30 20 5 -10 -15
B 30 20 5 -10 -15
I can easily created calculated fields and do this manually, but the data changes and I get additional years received in the data and want to make it dynamic.
Is there a way to do this in MS Access?
In Pentaho's PRD, I am working with an object datasource (i.e. I do not have a SQL query I may edit to group the data). To realize the required report, I must group the data within the PRD (OK) and only show these grouped values (OK). How can I sum the group values in the group headers to generate totals (MY PROBLEM) when there are multiple records per group? Here is a simplified example:
Assume the dataset I provide to the PRD is:
X 42 1
X 42 2
X 42 3
Y 10 12
Y 10 7
Z 8 22
Z 8 92
So, I need to display groups based upon column 1 and 2 only.
Column 3 is excluded; but, I can't remove it from the dataset.
Then, I must provide a total for the 2nd column, as follows:
X 42
Y 10
Z 8
---------
Total 60
I have an excel document that is an dumped output of all service tickets(with statuses, assigned to, submitted by...etc) from our ticket tracking software. There is one row per ticket.
I am trying to make a flexible report generator in vba that will allow me to take in the ticket dump and output a report which will have a copy of the data in one sheet, a summary in another sheet, and a line graph in another sheet.
I feel like a pivot table is the perfect approach for this, the only problem is in the summary.
The data from the ticket dump looks something like this:
| Submitted_On | Priority | Title | Status | Closed_On |
10/10/2016 1 Ticket 1 New
10/11/2016 1 Ticket 2 Fixed 11/10/2016
10/12/2016 3 Ticket 3 Rejected 11/9/2016
10/15/2016 1 Ticket 4 In Review
The problem is the way I want the summary to look. Basically the summary should show all tickets that were opened and closed at the first of every month at exactly midnight within the past three years. In other words, if this report was a time machine, at that exact time X would be open and Y would be closed. Furthermore, The summary table should break that down by priority.
The hard part is that these simulated report dates (first of every month within the last 3 years) are extraneous values and are not within each data row.
So the report would be like this:
| Open | Closed |
| Reporting Date | P1 | P2 | P3 | P1 | P2 | P3 |
1-Oct-2016 6 10 0 3 2 0
1-Nov-2016 4 10 0 5 2 0
1-Dec-2016 6 3 0 5 9 0
Basically the formula for the Open section would be something like:
priority=1 AND Submitted_On<Reporting Date AND (Closed_On>Reporting Date OR Closed_On="")
and the formula for the closed section would be something like:
priority=1 AND Submitted_On<Reporting Date AND Closed_On<Reporting Date
It would be needed where I can filter the data so that its only coming assigned to x or only with these statuses...etc. which is why I don't think a regular sheet with formulas would work.
I thought pivot tables would work but Reporting Date isn't a field.
Do you have any advice as to what I can do to make this report work and be very flexible as far as filtering goes?
Thank you!
P.S. I am using excel 2010, so I do not have access to queries
This questions is posted on a suggestion in this thread.
I'm using SQLite/Database browser and my data looks like this:
data.csv
company year value
A 2000 15
A 2001 12
A 2002 20
B 2000 25
B 2001 20
B 2002 10
C 2000 18
C 2001 14
C 2002 22
etc..............
What I want to do is get all companies which have a value of <= 20 for all years in the data set. Using above data this would mean I want the query to answer me:
result.csv
company year value
A 2000 15
A 2001 12
A 2002 20
Thus excluding company C due to value > 20 in 2002 and company B for value > 20 in 2000.
You want all companies whose maximum value is no larger than 20:
SELECT *
FROM Data
WHERE company IN (SELECT company
FROM Data
GROUP BY company
HAVING max(value) <= 20)
Not sure if there are better solutions, but I think this will work:
select company
, sum(case when value < 20 then 1 else 0 end) s
, count(*) c
from data
where year in (2000, 2001, 2002)
group
by company
having s = c
It will check whether the count equals the number of years where the value is less than 20.
I am using DB2 and have the following table (Table A - 3 Columns) :
EmpNum YearMonth Value
100 201201 2
100 201207 1
100 201206 7
102 201201 8
102 201205 15
102 201207 4
… etc
I would like to produce a new Table B with one row per employee, and a column for each YearMonth.
I am hoping that I can generate the Table B 'YearMonth' column name dynamically from the data as there will be 120 columns.
The value would then be put in the cell with the associated YearMonth heading to give a table like this :
EmpNum 201201 201202 201203 201204 201205 201206 201207 etc ….
100 2 7 1
102 8 15 4
I have tried looking up 'Stored Procedures' and 'Dynamic Column Names' but cannot fine anything quite like this.
I have two questions :
Is this possible in DB2 ?
What should I look up for more information ?
Thanks in anticipation !
Ross
What you are looking for is called Pivot. Unfortunately, DB2 doesn't implement the PIVOT statement (unlike Oracle). So it will not be possible to create a query that creates a dynamic number of columns.
Have a look at Poor Man's SQL Pivot. List Questions as Columns and Answers per User in one row. That's the closest you can get to.
This is a generic procedure that allows to pivot a table: https://github.com/angoca/db2tools/blob/master/pivot.sql