Merge tables with same headers but in different order in Excel - vba

I have two tables in Excel with the same table headers but in different orders.
How can I merge the rows into 1 table but in such way that all the values are still in the correct column based on the column headers?
I am afraid I will have to use VBA, but I just wanted to make sure that there are not other clever options.
Maybe Excel is not even the best software to use for this task?

Be careful about the use of $, you have to lock every thing with it, except :
The row of the ID you test in the destination table
The column of the Header you test in the destination table
That way, you'll be able to "extend" the formula on the whole row and until the last ID for that table.
Then do the same for the second table! Et voilĂ ! ;)
For the 1st table :
=INDEX($B$2:$C$3;MATCH($I2;$A$2:$A$3;0);MATCH(J$1;$B$1:$C$1;0))
For the 2nd table :
=INDEX($B$2:$C$3;MATCH($I2;$A$2:$A$3;0);MATCH(J$1;$B$1:$C$1;0))
Screenshots (I'm on French version, so EQUIV=MATCH in English) :
1st Table:
2nd Table:

Related

One column per report page

I have a 4x4 table in SQL with 20 rows. I want to split this data into four pages. Page 1 has the first 10 rows of the first column, Page 2 has the first 10 rows of the second column, etc.
After every four pages, this patern repeats showing the next 10 rows from the first column, etc. How can I arrange this?
I could arrange the data of the 4x4 table into another temporary table with just one column in its schema. Then I could read a single column of this table into my report. But can I instead do this directly without an intermediary table?
The intermediary table sounds like the best solution to me. I'd just write a custom SQL command in Crystal's Database Expert to arrange the data as you see fit.
You could in theory pull this off with repeating subreports in some manner of repeating header, but it would be much less work to have SQL properly format the incoming data for you.

Searching for value in a linked table in power pivot

I have a PowerPivot table that has a column of IDs and a linked table that contains a set of specific IDs that I want to use to create an indicator variable which I can use to sort on in existing tables and charts. Essentially I want:
If the value in column EpisodeID is found anywhere in LostEpisodes[LostID], then return the value "1", otherwise "0".
LostEpisodes is the linked table and LostID is the column that contains the subset of IDs I want to be able to sort on.
I have tried using =IF(VALUES(LostEpisodes[LostID])=[EpisodeID],1,0) but got an error. Is my syntax wrong or should I be using a different approach? Seems simple enough, but I am new to PowerPivot and DAX.
Thanks
OK - So I have found an answer that works and wanted to share. Others may have more elegant solutions, but this worked. This is where I miss MATCH.
I have a linked table called LostEpisodes which contains 2 columns, EpisodeID and Lost (all contain the value of 1 as they are all lost episodes). For my purposes I am manually entering the episode IDs as there are only a few. EpisodeID is also in the main table and is the column I am matching on.
I started with one new column labeled LostLookup with the following formula:
=LOOKUPVALUE(LostEpisodes[Lost],LostEpisodes[EpisodeID],[EpisodeID])
I then created a new column with the following formula:
=if(ISBLANK([LostLookup]),"NotLost","Lost")
This creates the indicator variable I can now use in pivot tables and charts. I have tested it and it works great.
Hope this makes sense!

BigQuery - remove unused column from schema

I accidentally added a wrong column to my BigQuery table schema.
Instead of reloading the complete table (million of rows), I would like to know if the following is possible:
remove bad rows (rows with values contains the wrong column) by running a "select *" query on the table with some kind of filter, and saving result to same table.
removing the (now) unused column.
Is this functionality (or similar) supported?
Possibly the "save result to table" functionality can have a "compact schema" option.
The smallest time-saving way to remove a column from Big Query according to the documentation.
ALTER TABLE [table_name] DROP COLUMN IF EXISTS [column_name]
If your table does not consist of record/repeated type fields - your simple option is:
Select valid columns while filtering out bad records into new temp table
SELECT < list of original columns >
FROM YourTable
WHERE < filter to remove bad entries here >
Write above to temp table - YourTable_Temp
Make a backup copy of "broken" table - YourTable_Backup
Delete YourTable
Copy YourTable_Temp to YourTable
Check if all looks as expected and if so - get rid of temp and backup tables
Please note: the cost of above #1 is exactly the same as action in first bullet in your question. The rest of actions (copy) are free
In case if you have repeated/record fields - you still can execute above plan, but in #1 you will need to use some BigQuery User-Defined Functions to have proper schema in output
You can see below for examples - of course this will require some extra dev - but if you are in critical situation - this should work for you
Create a table with Record type column
create a table with a column type RECORD
I hope, at some point Google BigQuery Team will add better support for cases like yours when you need to manipulate and output repeated/record data, but for now this is a best workaround I found - at least for myself
Below is the code to do it. Lets say c is the column that you wants to delete.
CREATE OR REPLACE TABLE transactions.test_table AS
SELECT * EXCEPT (c) FROM transactions.test_table;
Or second method and my favorite is by following below steps.
Write Select query with the columns you want to exclude.
Go to Query Settings
Query Settings
In Destination setting Set destination table for query results, enter project name, Dataset name and table name exactly same as you entered in Step 1.
In Destination table write preference select Overwrite table.
Destination table settings
Save the Query Setting and run the query.
Save results to table is your way to go. Try on the big table with the selected columns you are interested, and you can apply a limit to make it small.

SSIS Check Excel source rows redirect rows to another table on 'x' number of field matches

I work in a sales based environment and our data consists of 'leads'.
Let's say we record CompanyName, PhoneNumber, Address1 & PostCode(ZIP). These rows a seeded with a unique ID in the schema.
The leads come in from various sources and are compiled onto a spread sheet and then imported into SQL 2012 using SSIS.
After a validation check to see if a file exists we then use a simple data flow which consists of an Excel source, Derived Column, Data Conversion and finally an OLE DB Destination.
My requirement I'm sure has a relatively simple solution. I understand what I need to achieve is the first step. I need to take a sample of data from the last rolling two months, if 2 or more fields in the source excel file match the corresponding field in the destination sql table then I want to redirect to another table.
I am unsure of which combination of components I could use to achieve this. I believe that Fuzzy lookup may not be what I am looking for as I am looking to find exact field matches, I have looked at the lookup component but I am unsure if this is the way to go.
Could anyone please provide some advice on how I can best achieve this as simply as possible.
You can use the Lookup to check for matches in your existing table. However, it will be fairly complicated to implement the requirement of checking for any two or more fields matching. Your expression would be long and complex basically consisting of:
(using pseudo code for readability)
IIF((a=a AND b=b) OR (a=a AND c=c) OR (b=b AND c=c) OR ...and so on
for every combination of two columns you want to test
I would do this by importing the entire spreadsheet to a staging table, and doing the existing rows check in a SQL stored proc that moves the data to the desired destination table.

how to remove duplicate row from output table using Pentaho DI?

I am creating a transformation that take input from CSV file and output to a table. That is running correctly but the problem is if I run that transformation more then one time. Then the output table contain the duplicate rows again and again.
Now I want to remove all duplicate row from the output table.
And if I run the transformation repeatedly it should not affect the output table until it don't have a new row.
How I can solve this?
Two solutions come to my mind:
Use Insert / Update step instead of Table input step to store data into output table. It will try to search row in output table that matches incoming record stream row according to key fields (all fields / columns in you case) you define. It works like this:
If the row can't be found, it inserts the row. If it can be found and the fields to update are the same, nothing is done. If they are not all the same, the row in the table is updated.
Use following parameters:
The keys to look up the values: tableField1 = streamField1; tableField2 = streamField2; tableField3 = streamField3; and so on..
Update fields: tableField1, streamField1, N; tableField2, streamField2, N; tableField3, streamField3, N; and so on..
After storing duplicite values to the output table, you can remove duplicites using this concept:
Use Execute SQL step where you define SQL which removes duplicite entries and keeps only unique rows. You can inspire here to create such a SQL: How can I remove duplicate rows?
Another way is to use the Merge rows (diff) step, followed by a Synchronize after merge step.
As long as the number of rows in your CSV that are different from your target table are below 20 - 25% of the total, this is usually the most performance friendly option.
Merge rows (diff) takes two input streams that must be sorted on its key fields (by a compatible collation), and generates the union of the two inputs with each row marked as "new", "changed", "deleted", or "identical". This means you'll have to put Sort rows steps on the CSV input and possibly the input from the target table if you can't use an ORDER BY clause. Mark the CSV input as the "Compare" row origin and the target table as the "Reference".
The Synchronize after merge step then applies the changes marked in the rows to the target table. Note that Synchronize after merge is the only step in PDI (I believe) that requires input be entered in the Advanced tab. There you set the flag field and the values that identify the row operation. After applying the changes the target table will contain the exact same data as the input CSV.
Note also that you can use a Switch/Case or Filter Rows step to do things like remove deletes or updates if you want. I often flow off the "identical" rows and write the rest to a text file so I can examine only the changes.
I looked for visual answers, but the answers were text, so adding this visual-answer for any kettle-newbie like me
Case
user-updateslog.csv (has dup values) ---> users_table , store only latest user detail.
Solution
Step 1: Connect csv to insert/update as in the below Transformation.
Step 2: In Insert/Update, add condition to compare keys to find the candidate row, and choose "Y" fields to update.