In Excel, how can I get the difference between 2 tables?
I have 2 tables: A and B.
B is a subset of A. In other words, all rows/records of B are included in A, but not vice versa.
I would like to get
A - B
i.e. I want an output which gives me only the records which are in A but not in B.
Also, more generally, if B were not a subset, how would I get
A∪B - A∩B?
I usually use VLOOKUP to achieve this if this is going to be made only for once for a dataset. Just write a VLOOKUP fomula on the big table, the rows that are not in the small table will return #N/A error. When we filter out that error rows, we will have the minus'd rows left behind.
On the other hand this is also achievable using Power Query which is a cleaner way IMHO. For Excel 2010 you should download and install it. For the newer versions Power Query is included in Excel natively.
I am able to explain the process for Office 365 since I have that version; for previous versions slight changes may apply.
First get your tables into Power Query using Data / From Table/Range menu.
When you have your both tables to Power Query, right click on a blank space at the Queries pane at the left and go to New Query / Combine / Merge Queries as New menu:
In this screen, select your tables (select the larger table in the first place), CTRL select the table fields to be minus'd and select Left Anti in the bottom combo. When you OK this you will have a minus'd new table.
* Select Close & Load in the Home menu and your new table will be available in a new sheet in Excel.
* When there is a change made in the original sheets, just press Data / Refresh in Excel and your generated table will be refreshed accordingly.
Related
Im trying to use MS Excel Power Query to get value from SQL DB based on item in each row.
My Excel Table has the following in A1, B1:
Date: =TODAY()
A2, B2 has the following headers and A3 has the fruits list. C2 - F2 contains other information. Hence, the val needs to be populated in col. B
Fruit, Val
Apple
Orange
Banana
The SQL query looks like below:
select val
from MY_TABLE
WHERE fruit = ?
AND date = ?
The ? is the parameter and it links to cell $B$1 (date), $A3 (the first item in the fruit list)
I am using the ODBC data connection where I input my query and insert the final parameter as ?
Then from editing the connections > properties, I change the parameters under the 'Definition' tab, selecting the appropriate cells.
But when I drag this to the next cell, it doesn't update. I tried changing $A3 to $A4, but once again the value is returned in cell B3 only.
Any idea how I can update this for each row?
I know I could use the MS SQL data connection where I can use a query like
SELECT val
FROM MY_TABLE
WHERE fruit IN (
'Apple',
'Orange',
'Banana'
)
But the excel sheet is used by many people and hence, the fruits list is updated at regular intervals. So using a static query is not ideal.
What im trying to achieve is that whenever the fruits list gets updated, the user can choose to flash fill to the next cell, which will update the Col B, by referencing the equivalent cell A.
I was not able to reproduce the exact problem you are facing with creating dynamic SQL query parameters in this way (for some reason the Parameters button under the Definition tab is greyed out, I am using Excel for Microsoft 365 on Windows 10). Anyway, if you were to succeed in doing this, wouldn't you end up with a unique query for each cell? I would imagine that would hurt performance when clicking on Data > Refresh All.
In any case, I believe one of the reasons for using Power Query is to have it write SQL queries for you: Power Query Editor > Query Settings pane on the right > Applied Steps, right-click on the last one and click on View Native Query to see the SQL query being sent to the server. As you further process the data, this underlying SQL query will be automatically edited depending on the statements supported by query folding. Of course, the connector needs to support this, so I suggest using the MS SQL Server connector. Note that sometimes the View Native Query option is greyed out but query folding is still taking place, the only way to know for sure is by using a profiling tool on the database.
Here is a way to use Power Query so that the whole Val column gets updated in a single data refresh.
Click on cell B1 and name it cellDate by using in the name box left of the formula bar, then right-click on cell B1 > Get Data from Table/Range... to open the Power Query Editor.
Replace the content of the Power Query Editor formula bar with this:
= Date.From(Excel.CurrentWorkbook(){[Name="cellDate"]}[Content][Column1]{0})
You now have a query that returns the date from cell B1. Now click on the query that contains the table you are importing from the database (named Fruits in this example). Filter the Date column using the drop-down list and select any random date.
In the formula bar, replace #date(2021, 9, 10) with cellDate. Now every time you change the date in cell B1 and refresh the data, this filter will be updated. If you are ignoring Privacy Level settings or using a Public Privacy Level for your workbook, this filter step should be folded to the data source.
Close and load these queries as connections only.
Select the range of cells containing the fruit names, create a Table, name it listFruits and right-click > Get Data from Table/Range... to open the Power Query Editor.
In the Query List on the left, right-click on listFruits > Duplicate. Rename it as listFruitsValues. On the Home tab > Merge Queries. Select Fruits as the second table and click on the Fruit column in each table. Select as Join Kind: Left Outer (all from first, matching from second), then click on OK. Note that from this step onwards, the query is not folded back to the data source.
Click on the expand button of the Fruits column, select only the Val column, uncheck Use original column name as prefix, then OK. Remove the Fruit column.
This is what the Power Query Editor window should look like at this stage.
Now you can load the listFruitsValues query in the worksheet next to the Fruit table. Here is what is that looks like with the default table formatting.
Now if any edit is made to the date and/or the list of fruits, clicking on Data > Refresh All will update the Val column accordingly.
On a final note, I would suggest considering a different approach if the source table filtered for the date (i.e. Fruits in this example) is not too large. The issue with the approach presented above is that the users need to click on the Refresh All button after every edit of the fruit list. This can be avoided by simply loading the Fruits query in a separate worksheet and using the following formula to populate the Val column:
=XLOOKUP(A4,Fruits[Fruit],Fruits[Val])
By creating a single Table with the Fruit and Val columns, the values are instantaneously updated when changes are made to the list of fruits and the Fruits query only needs to be refreshed when the date is changed.
I am trying to import data from a SQL server into power BI. There is a section on the advanced options called SQL statement.
I know that the SQL statement for what I require is:
Select TOP 1000 * from [Table]
How do I write this in the Power Bi at the time of data source / import. So that it runs this statement for each of the tables I plan to import?
You can try this at the time of importing SQL Server data.
After loading data you can keep and remove rows using keep rows as shown below
If all the tables you want are on the same database, then you can navigate to that database as the first step in your query.
From there, filter down to select just the tables you want.
(You can see the preview of the cell selected in the bottom pane.)
Now that you've got the tables you want, you can apply a TopN function to the entire column (I chose top 3).
Table.TransformColumns(#"Filtered Rows",{{"Data", each Table.FirstN(_,3), type table}})
A quick way to add this step is to do a transformation on a text column and then just replace the column and the function applied. For example, if you format the Schema column to UPPERCASE using the GUI, it will add the step
Table.TransformColumns(#"Filtered Rows",{{"Schema", Text.Upper, type text}})
from which you can swap out the column, function, and type for what you actually want (see previous).
At this point, your tables are all trimmed to the top N rows and you can load each one to its own query by right-clicking on the table cell and choosing "Add as New Query". Alternatively, you can right-click on the Database query in the left pane (see the first image) and choose "Reference". This creates a new query from which you can simply click on the Table you want and it will return just that one.
Note: The former method will automatically name the new query after the table you expanded but the latter would work better if you wanted to change your N value since it doesn't recreate the whole query.
Either way, if you right-click on the last applied step in each of these new tables, you can choose "View Native Query" and you can see that the statement passed back to the server is a simple select top 3.
select top 3
[$Table].[DealSpecificKey] as [DealSpecificKey],
[$Table].[DateInvestment] as [DateInvestment],
[$Table].[DateInvestmentKey] as [DateInvestmentKey],
[$Table].[DateRedemption] as [DateRedemption],
[$Table].[DateRedemptionKey] as [DateRedemptionKey]
from [dbo].[AuxDaysInvested] as [$Table]
I've been searching the internet for hours trying to figure out if the following is even possible:
To choose the AS400 query records directly from Excel.
I haven't found any solution or description of how this could be achieved, which makes me guess that it's simply not possible. However, I haven't seen anyone confirm that it is impossible.
So my question is: Is this possible? And if it is, could you point me in the right direction in order for me to start learning how to do it?
I know its possible to run a query from Excel, and then adding parameters via SQL statements, but in my case, this presents several problems that could be avoided by choosing the records before the query is executed.
Example:
I have a query with a column (lets call it ColVal) that can hold the values 1 and/or 2. In the AS400 program under the menu "Work with queries" and then "Choose records" I can specify which records the query should contain when it has run based on the value in ColVal. This means i can get three different situations (A, B and C) when i run the query:
A) The query only contains records where the value in ColVal is 1
B) The query only contains records where the value in ColVal is 2
C) The query contains records where the value in ColVal is either 1 or 2
The goal is to be able to choose which situation I want from Excel in order to circumvent opening and using the AS400 program.
However, using situation C and then editing the query in Excel with an SQL statement to mimic situation A or B is not an option, as this means the query still contains undesired records.
This whole thing boils down to the following: Is it even possible to run the query from Excel essentially changing the data it contains and not just outputting it to excel? If this is possible, is it then possible to pass a parameter to the AS400 system and use it to create situation A, B or C?
I hope this example makes sense.
Edit - New example
Say i have different customers A and B. I can open the AS400 program and run a query in which i have specified that I only want data on customer A. I can then open Excel and use filters (as Hambone described) on the query to determine which records I want to output. However, if I want to work with data from customer B, I have to open the AS400 again and run the query with different parameters. I would like to be able to "change" my dataset from customer A to B from Excel, without having to include both in my recordset and then filter out one of them.
I imagined this is doable if you could pass a parameter to the AS400. The AS400 then runs the query using this parameter as the criteria for which records should be stored in the query. This means that if the parameter is Customer B, then there is no way to acces data from customer A, without running the query through AS400 again.
Any ideas are greatly appreciated :)
Follow up to my comment, here is a quick primer on how to run an ODBC query directly in MS Excel using Microsoft Query. This is very different than Power Query, which you referenced, in that MS Query is standard with Excel -- it's not a plug-in. This is relevant because it means everyone has it. If you are deploying a solution to others, that's an important consideration.
To start an MS Query in Excel, go to the data tab, select "From Other Sources" -> "Microsoft Query."
A list of your ODBC connections will come up. Pick the one that you want and select "OK."
It may or may not ask you for a login (depending on which ODBC connection you use and how its configured).
The next part is important. MS Query is going to try to have you use its builder to create the query. If you have the SQL, skip this part. It's horrible. Click "Cancel" on the query wizard, and then click the "SQL" button to enter your own SQL. If you can, make sure the result set is small (like use where 1 = 2 in the query).
When MS Query returns results, click the button next to the SQL Button to have it return the results to the spreadsheet. It looks like a little door.
From here, any time you want to refresh the query, you can simply right-click the data table in Excel and select "refresh." Alternatively you can go to the data tab on the ribbon and select "Refresh."
By the way if you have linked pivot tables and charts, the "Refresh All" option will refresh those as well, in the correct order.
To edit your query at any time, right-click on the table in Excel, go to Table-External Data Properties:
Then Click on the Connection Properties icon (highlighted below)
Click on the second tab (Definition) and edit the SQL Directly.
Parameters can be declared simply by inserting a bare "?" in place of your literal.
In other words, if your query looks like this:
select *
from users
where user_id = 'hambone'
Just change it to:
select *
from users
where user_id = ?
Excel will prompt you for a user id before it runs the query. From here, you also have the option of putting the parameter value in a cell within the spreadsheet and having the query read it from there. You'll see these when you right-click the table and go to the "Parameters" menu option.
Let me know if this helps or is unclear.
-- EDIT 7/23/2018 --
To follow up on your latest edit, it is possible to handle the scenario you describe, where you want to be able to filter on a value, or if none is given, then not have a filter. You see this a lot when you present multiple filter options to the user and you want a blank to mean "no filter," which is obviously counter to the way SQL works.
However, you can hack SQL to still make it work:
select * from activities
where
(activity = ? or ? is null) and
(energy = ? or ? is null)
In this example you have to declare four parameters instead of two, two for each.
You might also have to play with datatypes, depending on the RDBMS (for example for numerics you might have to say ? = 0 instead of ? is null or even ? = '' for text).
Here is a working example where a single filter was applied on the query above and you can clearly see the second one did not have an impact.
Yes it's possible. You need to use an ODBC driver to connect to the AS400 and retrieve the data. The driver and documentation are Here
I have an Excel sheet with around 50k rows of data and 10 columns. The sheet is about wholesalers and their products. In the current version there are around 30 unique wholesalers, each of them with around between 1000 and 3000 different products (I have queried this information from the database). What I want to do is to extract the distinct wholesalers, put them in another sheet and then for each wholesaler to find the total count of products that they offer. I was able to get a distinct list of the wholesalers (via a macro), but now I am confused how to use it in order to get the total count of their product: something like for each wholesaler do:
Select Count(*)
From worksheet s
Where s.wholesaler == "one of the value from the list"
And in general my question is what is the best way to query worksheet with loads of data? (like to use macros, pivot tables or some other excel magic)
If you have a SQL query then use it :). Excel allows you to run SQL queries. See Data ribbon, External Data-From other sources -> Microsoft Query. Or checkout my SQL extension for Excel: http://blog.tkacprow.pl/?page_id=130
I have the following query (sample)
SELECT *
FROM Table_1 1 INNER JOIN
Table_2 2 ON 1.C1 = 2.C1 INNER JOIN
Table_3 3 ON 2.C2 = 3.C2 INNER JOIN
Table_4 4 ON 3.C3 = 4.C3
The output is of 10+ columns.
When I hover over * (after "SELECT") I get a tooltip with all different column names from those 4 tables.
Is there a way to easily switch from * to those column names instead of typing each one of them after SELECT?
Thank you
I'm assuming you're working in Management Studio. If so, go to the Object Explorer and open up your Table. Left click on the Columns folder and drag it to your query window. All the columns for that table will be listed.
From the Object Explorer of SQL Server Management Studio, you can expand a table (so you see the Columns, Keys, Indexes, etc folder breakdown). Dragging the Columns folder to your query window will give you a comma separated list of the column names.
Please note: If there are duplicate column names among your four tables, you will have to properly quantify these columns properly.
I see your query references 4 tables.
To avoid having to locate and expand the 4 different objects in object explorer you can also select the query text in management studio, right click and choose "Design Query in Editor" then copy the column names out of the expanded list
Copying the column names rather than simply pressing OK avoids the designer messing up your formatting and possibly your query as well.
Expanding wildcards is part of the functionality of Redgate SQL Prompt if you have this need frequently.
If you use DataGrip for writing SQL, you can press Alt+Enter -> Expand column list