Adding multiple columns from a pandas data frame into a new column - pandas

I ran some code which resulted in either a 1 if something happened or a 0 if it did not. The results are stored in 8 separate columns (count600, count800, etc) which visual studio has as a list type. I then added the 8 columns together and have the results appear in a new column titled SUM_OVER90th. The value stored in the SUM_OVER90th appears like 11111.0 instead of the value 5. I tried a few different scripts which all seem to give a similar result. I'm not sure why I'm not able to add these column together. Thanks for any advice on what I might be doing wrong!
COLS_TO_ADD = ['600_COUNT','800_COUNT','1000_COUNT','1200_COUNT','1400_COUNT','1600_COUNT','1800_COUNT','2000_COUNT']
Q_ETL_sub['SUM_OVER90th'] = Q_ETL_sub[COLS_TO_ADD].sum(axis=1)
enter image description here

One possible reason is that your columns to add are strings, not numbers. You can verify this with
Q_ETL_sub.dtypes
and if they are not numbers, try convert them into integer with
Q_ETL_sub[COLS_TO_ADD] = Q_ETL_sub[COLS_TO_ADD].astype(int)

Related

Is there a way to use variables in vba to identify MS-Access report fields?

I am not a programmer, but have been tasked with doing this anyway! We are working on a research project that involves testing properties of different samples. I am trying to create a form that will generate a custom report based on what the user chooses. So, I have multiple text boxes and check boxes to allow the user to define the query parameters (e.g. composition of the sample must contain at least 5% component A) and choose what data they are interested in seeing in said report (e.g. show pH, color, but not melting point). I have successfully created code to generate the query, then generate a report based on that query, but the report defaults to column widths that are generally too big (for example, the pH column width is 3 inches, it only needs to be about 1). I would like to be able to fix this, but have not been able to figure out how. At the same time, some of these fields contain numbers that are averages of multiple test results, so I would like to limit the number of digits shown, and display them as % where appropriate. I started with just fixing the column width issue:
I have tried to make a collection of the fields that are included, then loop through the collection and set column widths, but cannot figure out how to identify a field with a variable:
If I know the field name I can do this:
Reports("ReportName")!FieldID.Width = 200
But if I have a collection of names, FieldNames, or a string VariableName, none of these work, giving me an error that FieldNames or VariableName is not a valid field in the report:
Reports("ReportName")!FieldNames(1).Width = 200
Reports("ReportName")![FieldNames(1)].Width = 200
Reports("ReportName")![VariableName].Width = 200
Is there a way to reference a field name with a variable?
Alternatively, I thought there might be a way to loop through all fields and set widths - this would involve looking up a column width for each field, which I thought to do by adding a key to a collection of column widths. But I cannot find a way to do that, something like:
For each Field in Reports("Report")
Field.Width = ColumnWidthCollection(Field)
Next
This hangs up on the Field.Width line, with "invalid procedure call or argument", which brings me back to how to reference a field name with a variable.
Any help would be greatly appreciated!
Try with:
Reports("ReportName")(VariableName).Width = 200

Postgres - split number and letter doesnt fill column

I have received help for splitting a column wit nr and letter.
In the SQL script it all works perfect. It runs complete, with no errors.
But the columns itself doesn't get filled.
I have tried to create te columns in advance as text or as integer. But it doesn't get filled. The SQL query it self turn out ok. But in reality it stay empty. What is wrong?
Your question is not completely clear, but it sounds like what you are trying to do is take a value from one column of a table, split it and use the result to update two other columns in the same table.
If that is the case, you would want to be using the SQL UPDATE command instead of SELECT.
UPDATE d1_plz_whatever
SET nr=SUBSTRING(hn FROM '^[0-9]+'),
zusatz =SUBSTRING(hn FROM '[a-zA-Z]+$');

Searching for value in a linked table in power pivot

I have a PowerPivot table that has a column of IDs and a linked table that contains a set of specific IDs that I want to use to create an indicator variable which I can use to sort on in existing tables and charts. Essentially I want:
If the value in column EpisodeID is found anywhere in LostEpisodes[LostID], then return the value "1", otherwise "0".
LostEpisodes is the linked table and LostID is the column that contains the subset of IDs I want to be able to sort on.
I have tried using =IF(VALUES(LostEpisodes[LostID])=[EpisodeID],1,0) but got an error. Is my syntax wrong or should I be using a different approach? Seems simple enough, but I am new to PowerPivot and DAX.
Thanks
OK - So I have found an answer that works and wanted to share. Others may have more elegant solutions, but this worked. This is where I miss MATCH.
I have a linked table called LostEpisodes which contains 2 columns, EpisodeID and Lost (all contain the value of 1 as they are all lost episodes). For my purposes I am manually entering the episode IDs as there are only a few. EpisodeID is also in the main table and is the column I am matching on.
I started with one new column labeled LostLookup with the following formula:
=LOOKUPVALUE(LostEpisodes[Lost],LostEpisodes[EpisodeID],[EpisodeID])
I then created a new column with the following formula:
=if(ISBLANK([LostLookup]),"NotLost","Lost")
This creates the indicator variable I can now use in pivot tables and charts. I have tested it and it works great.
Hope this makes sense!

Retrieve results from a batch of SQL queries in Pentaho or Postgres?

I'm still relatively new to SQL and Pentaho.
I've pulled a table with two different IDs and need to run a query for each specific instance.
For example,
SELECT *
FROM Table
WHERE RecordA = 'value in column A'
AND RecordB = 'value in column B'
I need the results back, either appended to new columns in the original table or part of their own text file output.
I was initially looking at using a formula for this inside of Pentaho, but couldn't quite figure it out. Since I have the query written I threw it into Excel and got the concatenated results (so a string of 350 or so queries that I need to run). I'm just not sure how to accomplish this - I tried the Execute SQL Script inside of Pentaho but it doesn't seem to do output?
Any direction would be useful. I've searched a little but have come up short so far, possibly because I am still pretty new to this platform.
You can accomplish this behavior in a lot of ways, with a "Database Lookup" step for example, but I usually do that in a quite easy way and here is a example for your tests, I hope it helps.
The idea here is to have two Table input steps, the first one will fetch the IDs we want to look at. For example you may use a SQL query similar to note on the left. The result will be a 1 column stream of rows.
Next we have a Table Input that reads the rows received and executes it's query for each row. I'll add a screenshot with the options that I selected.
What it does is replace a placeholder '?' with the data that is received. If you need two columns use two '?' but remember that it will replace the first one with the first column and the second one with the second column
And you are good to go. Test it a couple of times and good luck.
And the config for the second table input.

Reporting Services - Two filters on the same chart Category Group?

I have sales data that I'd like to plot on my chart. However, at a specific point in time, we had a change taking place I'd like to ensure is clearly visible in the chart, preferably by dividing the sales data (which is stored in a single SQL Server column) into two different chunks, which would allow me to then treat them as different data series.
I used to solve this in Excel by storing the post-event data in a different column (by simply dragging them to a different column), and thus I was able to treat them as a different series (the blue and green line in the chart below. The red and orange line are pre-event and post-event averages):
I'd like to reproduce this effect in SSRS, but am not sure how to tackle it. I've tried using an approach where I added two category groups, both pointing to the date-time column, and applying filters to them (one <= the cutoff date, the other >=).
I then added my sales data twice, with the idea I could somehow connect them to the individual category groups, but that does not seem possible.
Has anyone tried anything like this before, or would have a different approach to achieve what I'm trying to get?
Thanks!
I managed to get this to work, and figured I'd share how to do it.
My dataset contains a field called DATEKEY, which stores the date in the format YYYYMMDD. It's possible to use this in an expression and evaluate the date for a specific row. In case the expression evaluates to true, we display the value. If not, we display a blank string.
In case we want to show the values prior to the date, the expression would be:
=IIF(Fields!DATEKEY.Value <= 20130601, Avg(Fields!My_NUMBER.Value), "")
The second series can then be made by reversing the symbol:
=IIF(Fields!DATEKEY.Value >= 20130601, Avg(Fields!My_NUMBER.Value), "")
The graph then looks like this: