How can I place the name of the family I have in a second dataframe to several rows inside a dataframe, so I don't write one by one? - repeat

I want to include the family name to the scientific name inside a data frame, in which the scientific name is repeated several times, but I have the family name in a separate file. I want to join the two in such a way that it allows me to put the family name to each scientific name automatically.
://i.stack.imgur.com/Chw5U.png
://i.stack.imgur.com/Fb2Dc.png

Related

How to make a dynamic tableau x-axis that gets changed when a different dimension is selected as a filter?

I am trying to create a report for my stakeholders where we want to check the distribution by values of column1, column2 and likewise. While I could make different tabs for each column, I apparently have 55 columns (all numeric and I will be making bins for them) that the user want to toggle between for comparison on distributions - and coming up with an insight like column 43 is more skewed than column19.
Pet example:
1. Suppose I have data in below format:
2. And I want to create a tableau filter enabling the user to toggle between continents and countries, and get their individual distributions in the same view. I want to keep the measure consistent (population in this example). Something like below:
3. and generating distribution by continent when continent is selected and by country when country is selected
I know if I had wanted to toggle between different measures for the same dimension, then I needed to create parameters, but this is opposite to it (same measure but different dimensions) and I am not able to figure out.
Any comment/help will be much appreciated.
I'm afraid you'd still need to use parameters. There is no equivalent of Measure Names for dimensions so you'll have to create a parameter (I called it Dimension Switch) with Continent, Country, etc. as options and a calculated field that will look something like this:
CASE Dimension Switch
WHEN "Continent" THEN [Continent]
WHEN "Country" THEN [Country]
END
This calculated field will take on the values of whichever column you have selected in the Dimension Switch parameter.

Single Column name splitting to multiple columns with data

I am analyzing the inverter data from a power plant. There are more than 10 inverters and each inverter has 3 parameters that need to be analyzed. The parameters are Energy generated per interval, AC Power P_AC and DC Power P_DC. The inverters are numbered as 17.02 or 22.03 etc. The data is taken at a time step of 5minutes. After downloading the data in a csv file, there is only 1 column in the csv file. The column name contains numbers of all the inverter and their parameter names separated by a ';'. Also, the data at each time step is in 1 single cell separated by ';'. I want to analyse all the parameters of all the inverters and i want to make sure that each parameter of every inverter comes in a separate column. Can somebody help me to segregate this? Also, I want to ensure that columns are sorted in the increasing order of inverter numbering. I am attaching the the link to actual csv file - https://drive.google.com/file/d/1Rp54DEarzFUGm2oU5Bfkl3karbUYYwcd/view?usp=sharing
https://drive.google.com/file/d/12InL3N-ZMMODGWVUYn_8nTwPgAQtSBzq/view?usp=sharing
In the data frame above, you can see that every column has a project code -'SM10046 Akadyr Ext', then the inverter number 'INV 17.02' and then the name of parameter 'Energy generated per interval [kWh]' and lastly the code of parameter 'E_INT' . I want that the project code should be removed and only inverter number and parameter code should be present as a column name. Also, all the inverter should come in a serial order.
Essentially you have a multitude of columns, and from your description, you need to sort /analyze data from each plant ?
If you need permanent storage of data, I would use SQLite or similar, and convert each plant into a row with a key holding plant ID.
Like this:
2020-07-28 13:33:09;A1;A2;A3;B1;B2;B3
turned into something like this (now in a database, 5 fiels per record)
2020-07-28 13:33:09;A;A1;A2;A3
2020-07-28 13:33:09;B;A1;A2;A3
my goto-too for this would be a scripting language like AutoIT3, Perl or Python, which makes splitting lines and connecting to SQLite trivial.
If you just need real-time sorting/reporting etc, AWK is a perfect tool for this, since you can create sorted arrays very easily. (Perl/Python again of course alternatives as well).
It could be useful if you provide actual (trivial) example of what you expect output to be ?

Is there any way to exclude columns from a source file/table in Pentaho using "like" or any other function?

I have a CSV file having more than 700 columns. I just want 175 columns from them to be inserted into a RDBMS table or a flat file usingPentaho (PDI). Now, the source CSV file has variable columns i.e. the columns can keep adding or deleting but have some specific keywords that remain constant throughout. I have the list of keywords which are present in column names that have to excluded, e.g. starts_with("avgbal_"), starts_with("emi_"), starts_with("delinq_prin_"), starts_with("total_utilization_"), starts_with("min_overdue_"), starts_with("payment_received_")
Any column which have the above keywords have to be excluded and should not pass onto my RDBMS table or a flat file. Is there any way to remove the above columns by writing some SQL query in PDI? Selecting specific 175 columns is not possible as they are variable in nature.
I think your example is fit to use meta data injection you can refer to example shared below
https://help.pentaho.com/Documentation/7.1/0L0/0Y0/0K0/ETL_Metadata_Injection
two things you need to be careful
maintain list of columns you need to push in.
since you have changing column names so you may face issue with valid columns as well which you want to import or work with. in order to do so make sure you generate the meta data file every time so you are sure about the column names you want to push out from the flat file.

Sheet.js how does encode_col and decode_col work?

How does decode_col, and encode_col, know which sheet to target if you never pass one in?
encode_col / decode_col converts between 0-indexed columns and column names.
If I give it a column name, like "foobar" and that column exists in different sheets, or wholly different files I'm processing how will these two functions know where that column is?

how to display parent record and children from multiple tables

I have three tables, 1 parent with 2 child table 1-to-many relationships.
table 1: People
table 2: Phone numbers (any number of records)
table 3: email adresses (any number of records)
I would like a report looking like this:
Person1:
First_Name-------Last_Name
phone1:------12345----Home
phone2:------54321----work
mail1:-------first#mail.com----work
mail2:-------first#mail.com----work
mail3:-------first#mail.com----work
Person2:
-------First_Name-------Last_Name
phone1:------12345----Home
phone2:------54321----work
mail1:-------first#mail.com----work
mail2:-------first#mail.com----work
mail3:-------first#mail.com----work
I would very much like to do this using reporting services.
Edit: I know how to link all tables with left joins, I don't have a concept as how to get this done.
OK, here's a sort of summary of how you need to approach it. I can't give you a step-by-step because it would be impossibly long, but this should get you started. (Also note that I'm doing this from memory):
You will need three datasets, one each for people, phone numbers, and email addresses. I am going to assume that the phone numbers and email addresses have some kind of ID field in common, some way to tie people to phones and to email addresses. You will need that field in your datasets.
You don't need to do any joins in your SQL: the linking between the three datasets will be done within the report.
Put a Matrix on your report. Have it draw from the People dataset. Put the first and last name in the detail row of the Matrix.
Add a new detail row to your Matrix, under your first and last name.
Merge the cells in your new detail row so that you have one very wide cell.
Embed (drag and drop) a new Matrix in the single wide cell in your new detail row.
Link your matrix to the Phones dataset. Set up a record filter in the matrix to limit the records to those that match the ID field in the parent matrix.
Set up the Phones as you want in the embedded child Matrix.
Repeat steps 3-7 with the Email Addresses dataset.
You don't actually need to do any grouping (unless you want to include subtotals of some kind), because you're not actually grouping anything: you're just displaying a list with embedded sub-lists.