I have 2 sheets, A and B.
A includes rows with a given slug, and a few 'metrics' columns.
Slug | Views | Shares
aaa | 10 | 5
bbb | 25 | 2
ccc | 5 | 0
Sheet B has a huge list of slugs (some which might not be contained in sheet A).
I basically want to "search" sheet A by slug value, and "fill in" those corresponding values.
Here's a sample output of sheet B:
Slug | Views | Shares
xxx | - | -
bbb | 25 | 2
aaa | 10 | 5
ddd | - | -
eee | - | -
I will eventually perform some math on them when moving them over, but this is the start I need before I can do that.
Use VLOOKUP.
For example to get the 25 in bbb in sheet B, the formula would be:
=VLOOKUP(A2,A!$A$2:$C$4,2,FALSE)
The first argument A2 is bbb in sheet B, the second argument is the table in sheet A, the third argument says to return column 2 (the Views column), and the final argument indicates exact match.
Related
Given the following dataframe, I would like to add a fifth column that contains a list of column headers when a certain condition is met on a row, but only for a range of dynamically selected columns (ie subset of the dataframe)
| North | South | East | West |
|-------|-------|------|------|
| 8 | 1 | 8 | 6 |
| 4 | 4 | 8 | 4 |
| 1 | 1 | 1 | 2 |
| 7 | 3 | 7 | 8 |
For instance, given that the inner two columns ('South', 'East') are selected and that column headers are to be returned when the row contains the value of one (1), the expected output would look like this:
Headers
|---------------|
| [South] |
| |
| [South, East] |
| |
The following one liner manages to return column headers for the entire dataframe.
df['Headers'] = df.apply(lambda x: df.columns[x==1].tolist(),axis=1)
I tried adding the dynamic column range condition by using iloc but to no avail. What am I missing?
For reference, these are my two failed attempts (N1 and N2 being column range variables here)
df['Headers'] = df.iloc[N1:N2].apply(lambda x: df.columns[x==1].tolist(),axis=1)
df['Headers'] = df.apply(lambda x: df.iloc[N1:N2].columns[x==1].tolist(),axis=1)
This works:
df=pd.DataFrame({'North':[8,4,1,7],'South':[1,4,1,3],'East':[8,8,1,7],\
'West':[6,4,2,8]})
df1=df.melt(ignore_index=False)
condition1=df1['variable']=='South'
condition2=df1['variable']=='East'
condition3=df1['value']==1
df1=df1.loc[(condition1|condition2)&condition3]
df1=df1.groupby(df1.index)['variable'].apply(list)
df=df.join(df1)
What is the most efficient way to store a variable amount of columns in MS-SQL?
I have a requirement to store a large number (several million) records into a Microsoft SQL server (via c#). Most columns are standard, but certain groups of users will need to add their own custom columns, and record data in them.
The data in each custom column field will not be large, but the number of records with a certain set of custom columns will be in the millions.
I do not know ahead of time what these columns might be (in terms of name or datatype), but I'll need to pull reports based on these columns as effeciently as possible..
What is the most efficient way of storing the new varying columns and data?
Entity-Attribute-Value model?
Con's: Efficiency if there's a large number of custom columns (= large number of rows)?
A extra table "CustomColumns"?
Storing columnName, Data, Datatype each time an entry has a custom column, for each column.
Con's: A table with a large number of records, perhaps not the most efficient storage.
Serialise the extra columns for each record into a single field
Con's: Lookup efficiency and stored procedure complicated when running reports based on a custom field.
Any other?
Edit: Think I may be confusing option (1) and (2): I actually meant, is the following the best approach :
Entity (User Groups)
id | name | description
-- | ---- | ------------
1 | user group 1 | user group 1
2 | user group 2 | user group 2
Attribute
id | name | type | entityids (best way to do this for 2 user
-- | ---- | ---- | groups using same attribute?
1 | att1 | string | 1,2
2 | att2 | int | 2
3 | att3 | string | 1
4 | att4 | numeric | 2
5 | att5 | string | 1
Value
id | entityId| attributeId | value
-- | --------| ----------- | -----
1 | 1 | 1 | a
2 | 1 | 2 | 1
3 | 1 | 3 | b
4 | 1 | 3 | c
5 | 1 | 3 | d
6 | 1 | 3 | 75
7 | 1 | 5 | Inches
I am working on a excel sheet where I need to show the formula that is used in another cell. I have 2 tables. Table one contains:
+-----------+-------+-------+------+
| Parameter | Short | Value | Unit |
| Name | | | |
+-----------+-------+-------+------+
| Diameter | D | 50 | mm |
+-----------+-------+-------+------+
| Wanddikte | T | 5 | mm |
+-----------+-------+-------+------+
| Lengte | L | 200 | mm |
+-----------+-------+-------+------+
And the second table:
+----------------------+-------+-------------+------+-----------------+
| Name | Short | output | Unit | Formula |
+----------------------+-------+-------------+------+-----------------+
| Doorsnede oppervlakt | A1 | 1963,495408 | mm | =0,25*PI()*C3^2 |
+----------------------+-------+-------------+------+-----------------+
| Binnendiameter | ID | 40 | mm | =C3-2*C4 |
+----------------------+-------+-------------+------+-----------------+
| Verfoppervlakt | Averf | 31415,92654 | mm2 | =PI()*C3*C5 |
+----------------------+-------+-------------+------+-----------------+
Now I want to change the last column of the second table. There you see the cell references: C3, C4 and C5.
Those refer to cells in the first table (Value column). But instead of showing C3 (value= 50 in table1) I want to show D (Short in table 1).
The last column in table 2 contains the excel formula: =FORMULATEXT(...) which refers to the output calculation in table 2.
How do I replace cell references with values from the Short column in the last column of the second table ?
1) You could use named ranges.
For example: C3 would be a named range called D. Then in your formula you would write =25*PI()*D^2 and you would have the FORMULATEXT as requested.
C4 would be a named range called T and C5 a named range called L.
To create the named range click on the cell you want to enter the name for e.g. C3 then go to the Name Box top left and enter then name e.g. D.
See here: Named ranges
2) Consider having a helper column where you put the following:
'=0,25*PI()*D^2 . Hide the column where you have the FORMULATEXT result and leave the helper column visible. The ' at the start means Excel will not try to evaluate the cell contents.
I think this might appear confusing if you use a simple letter such as D. This is not descriptive of what D actually is and can be confused as a partial cell reference.
I have a table with an unknown number of columns, and I need to combine all columns after a certain point. Consider the following:
| A | B | C | D | E |
|----|----|---|---|---|
| 24 | 25 | 7 | | |
| 12 | 3 | 4 | | |
| 5 | 5 | 5 | 5 | |
Columns A-C are known, and the information in them correct. But column D to N (an unknown number of columns starting with D) needs to be combined as they are all parts of the same string. How can I combine an unknown number of columns in OpenRefine?
As some columns may have empty cells (the string may be of various lengths) I also need to disregard empty cells.
There is a two step approach to this that should work for you.
From the first column you want to merge (Col D in this case) choose Transpose->Transpose cells across columns into rows
You will be asked to set some options. You'll want to choose 'From Column' D and 'To Column' N. Then choose to transpose into One Column, assign a name to that column, make sure the option to 'Ignore Blank Cells' is checked (should be checked by default. Then click Transpose.
You'll get the values that were previously in cols D-N appearing in rows. e.g.
| A | B | C | D | E | F |
|----|----|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 |
Transposes to:
| A | B | C | new |
|----|----|---|-----|
| 1 | 2 | 3 | 4 |
| | | | 5 |
| | | | 6 |
You can then use the dropdown menu from the head of the 'new' column to choose
Edit cells->Join multi-value cells
You'll be asked what character you want to use to separate the characters in the joined cell. Probably in your use case you can delete the joining character and combine the cells without any joining characters. This will give you:
| A | B | C | new |
|----|----|---|-----|
| 1 | 2 | 3 | 456 |
I'm looking for the most efficient way to compare two sets of two columns, thus:
Set 1:
A | B | C |
11_22 | 10 | |
33_44 | 20 | |
55_66 | 30 | |
77_88 | 40 | |
99_00 | 50 | |
Set 2:
J | K |
33_44 | 19 |
99_00 | 47 |
77_88 | 40 |
For each match between column A and J, column C should display the difference between the adjacent cells
(in this case 33_44, 99_00, and 77_88) in B and K, respectively, with the full
amount in column B if no match exists in J
A | B | C
11_22 | 10 | 10
33_44 | 20 | 1
55_66 | 30 | 30
77_88 | 40 | 0
99_00 | 50 | 3
I'm thinking of creating two multi-dimensional arrays containing values
in the ranges (A, B) and (J, K), with a nested loop, but am not sure how to
get the result back into column C when a match occurs. Creating a third "result array" and outputting that on a fresh sheet would work too.
It is possible to do a lot with ADO, for example: Excel VBA to match and line up rows