Splitting data into new columns - dataframe

I was wondering if anyone could advise me on a problem I am having with 'r'.
I am working on a drug trial of sorts, with data from 3 experimental conditions.
For every participant, six data points were collected. Meaning each participant is represented on one row, with 6 columns of data points.
However I am now trying to create a 3 level longitudinal model, but in order to do so I need to reformat my data so that instead of each participant having one row with 6 columns, there will be 6 rows for one participant, with two columns, one indicating the round of data collection (1-6), and the other giving the actual value.
Is there a straightforward way to do this? I am hoping I won't be stuck manually reformatting my data, as it will cost me a lot of time that I'd rather invest into analysing the results.
I will also need to combine all three data frames into one main data frame to run the model, but I assume this will be an easy step once I figure out the first problem.

Related

Compare Two Rows and Update Start and End Dates

I need some help and I know I am not the only one to deal with this issue but I am wondering if you might have some ideas on how to handle the situation of comparing two rows of data filling out start and end dates.
To give you some context, we have a huge hierarchy (approx 8,000 rows and about 12 columns wide) that is updated each year. Sometimes the values change and sometimes they don’t. When the values don’t change, then I don’t need to adjust the dates. When the values do change and a new row is added, I need to change the data.
I have attached some fake data to try and illustrate my data. I am building this in MS Access, so I think this is more of a DBA type question that is going to be manipulated via a recordset type method.
In my example I have two tables – Old Table and New Table. In each table there is a routing code field that represents my join field and primary key for this table.
The Old table represents existing data - tblMain. The New Table represents the data to be appended - tblTemp.
To append the data, I have an append query set up in Access. I perform a left join between the Old and New tables, joining on every field and append the rows that are null in the Old table. That’s fine and that is not where my issue is.
What is causing me issue is how to fill out the start and end dates.
So as you can see from my tables, we are running a zoo. Let’s just say for the sake of the argument, our zoo started off pretty simple and has become more sophisticated. We now want our hierarchy to expand out and become a bit more detailed as we are now capturing the type of animal (Level 4) and the native location (Level 5).
As you can see when comparing one table to another the routing codes are the same, so the append query has to have a join on each field. When you do this, you return the Result Table which is essentially the Old and New tables stacked on top of each other. You might think about a Union query but this is going to give me duplicates and I don’t want that.
If you notice in the Result Table there is a Start and End Date. Let’s just say I get the start and end dates via message box that pops up upon the import of the data and is held in a variable. I think there are dates in my real data but still trying to verify this.
So how do I compare (pseudo code for the logic needed)?
• For each routing code:
Compare Levels 1-5
If the routing code is the same but Levels 1 -5 are not the same
fill out the end date of the old record
fill out the start date of the new record
This idea of comparing two records and filling out a data is quite prevalent in my organization but I haven’t found a way of creating the logic that consistently works so any help or suggestions would be appreciated.
Old Table
New Table
Result Table

Dynamically creating a pivot table using fuzzy matching

So, I'm constantly being given data in new and different formats. I'm on a crusade to get my work to standardize data for easy use, and if I managed to convince the powers that be to standardize data, this problem becomes entirely moot. Until then, I have the following problem:
I get data in a variety of ways. Sometimes my gross sales are called total sales. Sometimes gross sales before discounts, total sales before discounts, Gross_Sales, etc. Discounts, deductions, exempt amounts, etc. form another column. So on and so forth. I'd like to be able to do the following:
1) Figure out what columns I want,
2) Turn those columns into a pivot table.
For part 1, I have two options, and I'm wondering if there's anymore: The 1st is to use Microsoft's fuzzy-matching add-in to help me match. I'd have a separate tab dedicated to fuzzy matching each column I need. The second is to just generate a long list of all the variants, and to test each one until I find a hit, assign it, and move onto testing the next one.
The second part is turning all of this into a pivot table - the resouces I have so far are https://www.thespreadsheetguru.com/blog/2014/9/27/vba-guide-excel-pivot-tables and How to Create a Pivot Table in VBA
Is there a better method? Is there another way?
Edit: Slightly better method - Grab the data columns, place them into a table, and pivot everything off of that table - it removes the need to re-create pivot tables, just need to move the data over.
Having the same problem, I use a mix of your two methods.
My data consists of a bunch of logs for rejected x-ray images, and the reject reason is a free text field. My solution was to create a table where the first column contains my desired output categories, and then each subsequent column contains a different variation of it.
For example, a row might have (column one/ouput first entry):
Positioning, POS, Positioning Error, Patient Positioning
Note that these are all fairly different from each other. Where the fuzzy matching comes in - it is used to capture all the smaller differences and mispellings around those other columns. When the fuzzy matching section decides a given reason matches a column's entry, it is then replaced with the appropriate desired output reason from column 1 of the table. In my example, a reason of 'Possitioning Err' [sic] would match to column 3 (Positioning Error) and then get converted to Positioning.
Then wash rinse repeat over the rest of your data as needed. This approach was super useful and fairly flexible in helping standardize my data. It was also computationally more expensive, but you'd only need to run the matching portion once I guess.
As for the actual mechanics of going about doing this - I use 2010, so no inbuilt functionality. I run the fuzzy matching code on a temporary worksheet until best percentage matches are found, and then overwrite the actual source data afterwards.

One column per report page

I have a 4x4 table in SQL with 20 rows. I want to split this data into four pages. Page 1 has the first 10 rows of the first column, Page 2 has the first 10 rows of the second column, etc.
After every four pages, this patern repeats showing the next 10 rows from the first column, etc. How can I arrange this?
I could arrange the data of the 4x4 table into another temporary table with just one column in its schema. Then I could read a single column of this table into my report. But can I instead do this directly without an intermediary table?
The intermediary table sounds like the best solution to me. I'd just write a custom SQL command in Crystal's Database Expert to arrange the data as you see fit.
You could in theory pull this off with repeating subreports in some manner of repeating header, but it would be much less work to have SQL properly format the incoming data for you.

How to apply a single section across multiple columns in Business Intelligence

I do a lot of reporting out of our Electronic Health Record using a Business Objects product, and one thing I run into frequently is records for which most of the columns are the same, but a few may have multiple different values.
For instance, a report I'm working on has 8 columns, mostly static information about the patient/encounter, some lab values, and a column for the consulting physician. All the columns will have only a single value per patient/encounter, except for consulting physician which may have multiple. I'd like to somehow set the table to show only a single row for the data that is unchanged, so they don't end up seeing the FIN, MRN, and lab values over and over.
However, as far as I've been able to tell with my fiddling around, I can only apply a section or break to a single column. Creating multiple sections or breaks nests them. Does anybody know of a way to treat multiple columns as sort of a composite section?
edit: I did try pulling the consulting physician column out into its own table and then setting the room number as a section, but it still caused repeated rows of the other data for any that had multiple consultings.
Additional edit: As requested here's a mockup of approximately what I'd like to see. This is mostly how it looks already when I tell BO to use the room number (the number in blue, top left of each row) as a section, however in the case of the third room, it would repeat the information in the first 5 columns for each consulting listed.
Couple of ways to do it, but putting breaks on each column is what I would do.
So, starting from "FIN" and working to "Attending", add a break on each column. It will add a summary row for each, so it will look like:
Then select the summary rows, right-click, and Delete:

Summing different parts of a column in SQL

I have a database extract in excel and want to create a custom value in Tablue using their create calculation, which I believe is SQL based.
Basically I have a large number of feeds which all show up different amounts in a column. For example:
feed 1
feed 1
feed 2
feed 3
feed 4
feed 4
feed 4
And I want to have a sum for feed 1, feed 2, and feed 4. But in my actual DB there's about 100 feeds all with different number of appearances. I'm having troubles finding a good way to do this. If there even is one. Any help or direction would be appreciated!
I'm assuming that your list is a single column and you need a count of the number of occurrences of each feed. For the sake of example, since a column or table names were not supplied, let's call them colname and tablename.
select colname, count(*) as Ct from tablename group by colname
It would be easier to give an exact answer if you posted a small simplified subset of your spreadsheet. But assuming you have a column called "feed_name" which takes on values like "feed 1", "feed 2" etc depending on the row. Then the feed_name column should be a discrete dimension in Tableau.
Then just put the feed_name pill on a shelf, say the row shelf. And put the "Number of Records" field on another shelf, say the column shelf.
You don't need to write SQL to do this (or most tasks) in Tableau. It helps to understand SQL concepts and its very helpful to drop down to the SQL level when needed to solve tricky issues. But for most situations, you can just interactively explore the data by moving fields around and writing some simple calculations -- and let Tableau take care of generating the SQL necessary to retrieve the data needed to build the visualization you requested.
Tableau supports SQL and some NO-SQL data sources, along with some cubes too. It does that quite well and in multiple ways. You just can work more quickly and efficiently by using Tableau's visual based manipulations in most cases, and then drop to the lower level detail when needed. It just takes getting used to how Tableau operates.