How to run a SQL query on an Excel table? - sql

I'm trying to create a sub-table from another table of all the last name fields sorted A-Z which have a phone number field that isn't null. I could do this pretty easy with SQL, but I have no clue how to go about running a SQL query within Excel. I'm tempted to import the data into postgresql and just query it there, but that seems a little excessive.
For what I'm trying to do, the SQL query SELECT lastname, firstname, phonenumber WHERE phonenumber IS NOT NULL ORDER BY lastname would do the trick. It seems too simple for it to be something that Excel can't do natively. How can I run a SQL query like this from within Excel?

There are many fine ways to get this done, which others have already suggestioned. Following along the "get Excel data via SQL track", here are some pointers.
Excel has the "Data Connection Wizard" which allows you to import or link from another data source or even within the very same Excel file.
As part of Microsoft Office (and OS's) are two providers of interest: the old "Microsoft.Jet.OLEDB", and the latest "Microsoft.ACE.OLEDB". Look for them when setting up a connection (such as with the Data Connection Wizard).
Once connected to an Excel workbook, a worksheet or range is the equivalent of a table or view. The table name of a worksheet is the name of the worksheet with a dollar sign ("$") appended to it, and surrounded with square brackets ("[" and "]"); of a range, it is simply the name of the range. To specify an unnamed range of cells as your recordsource, append standard Excel row/column notation to the end of the sheet name in the square brackets.
The native SQL will (more or less be) the SQL of Microsoft Access. (In the past, it was called JET SQL; however Access SQL has evolved, and I believe JET is deprecated old tech.)
Example, reading a worksheet: SELECT * FROM [Sheet1$]
Example, reading a range: SELECT * FROM MyRange
Example, reading an unnamed range of cells: SELECT * FROM [Sheet1$A1:B10]
There are many many many books and web sites available to help you work through the particulars.
Further notes
By default, it is assumed that the first row of your Excel data source contains column headings that can be used as field names. If this is not the case, you must turn this setting off, or your first row of data "disappears" to be used as field names. This is done by adding the optional HDR= setting to the Extended Properties of the connection string. The default, which does not need to be specified, is HDR=Yes. If you do not have column headings, you need to specify HDR=No; the provider names your fields F1, F2, etc.
A caution about specifying worksheets: The provider assumes that your table of data begins with the upper-most, left-most, non-blank cell on the specified worksheet. In other words, your table of data can begin in Row 3, Column C without a problem. However, you cannot, for example, type a worksheet title above and to the left of the data in cell A1.
A caution about specifying ranges: When you specify a worksheet as your recordsource, the provider adds new records below existing records in the worksheet as space allows. When you specify a range (named or unnamed), Jet also adds new records below the existing records in the range as space allows. However, if you requery on the original range, the resulting recordset does not include the newly added records outside the range.
Data types (worth trying) for CREATE TABLE: Short, Long, Single, Double, Currency, DateTime, Bit, Byte, GUID, BigBinary, LongBinary, VarBinary, LongText, VarChar, Decimal.
Connecting to "old tech" Excel (files with the xls extention): Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\MyFolder\MyWorkbook.xls;Extended Properties=Excel 8.0;. Use the Excel 5.0 source database type for Microsoft Excel 5.0 and 7.0 (95) workbooks and use the Excel 8.0 source database type for Microsoft Excel 8.0 (97), 9.0 (2000) and 10.0 (2002) workbooks.
Connecting to "latest" Excel (files with the xlsx file extension): Provider=Microsoft.ACE.OLEDB.12.0;Data Source=Excel2007file.xlsx;Extended Properties="Excel 12.0 Xml;HDR=YES;"
Treating data as text: IMEX setting treats all data as text. Provider=Microsoft.ACE.OLEDB.12.0;Data Source=Excel2007file.xlsx;Extended Properties="Excel 12.0 Xml;HDR=YES;IMEX=1";
(More details at http://www.connectionstrings.com/excel)
More information at http://msdn.microsoft.com/en-US/library/ms141683(v=sql.90).aspx, and at http://support.microsoft.com/kb/316934
Connecting to Excel via ADODB via VBA detailed at http://support.microsoft.com/kb/257819
Microsoft JET 4 details at http://support.microsoft.com/kb/275561

tl;dr; Excel does all of this natively - use filters and or tables
(http://office.microsoft.com/en-gb/excel-help/filter-data-in-an-excel-table-HA102840028.aspx)
You can open excel programatically through an oledb connection and execute SQL on the tables within the worksheet.
But you can do everything you are asking to do with no formulas just filters.
click anywhere within the data you are looking at
go to data on the ribbon bar
select "Filter" its about the middle and looks like a funnel
you will have arrows on the tight hand side of each cell in the the first row of your table now
click the arrow on phone number and de-select blanks (last option)
click the arrow on last name and select a-z ordering (top option)
have a play around.. some things to note:
you can select the filtered rows and pasty them somewhere else
in the status bar on the left you will see how many rows meet you filter criteria out of the total number of rows. (e.g. 308 of 313 records found)
you can filter by color in excel 2010 on wards
Sometimes i create calculated columns that give statuses or cleaned versions of data you can then filter or sort by theses too. (e.g. like the formulae in the other answers)
DO it with filters unless you are going to do it a lot or you want to automate importing data somewhere or something.. but for completeness:
A c# option:
OleDbConnection ExcelFile = new OleDbConnection( String.Format( "Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"Excel 12.0;HDR=YES\"", filename));
ExcelFile.Open();
a handy place to start is to take a look at the schema as there may be more there than you think:
List<String> excelSheets = new List<string>();
// Add the sheet name to the string array.
foreach (DataRow row in dt.Rows) {
string temp = row["TABLE_NAME"].ToString();
if (temp[temp.Length - 1] == '$') {
excelSheets.Add(row["TABLE_NAME"].ToString());
}
}
then when you want to query a sheet:
OleDbDataAdapter da = new OleDbDataAdapter("select * from [" + sheet + "]", ExcelFile);
dt = new DataTable();
da.Fill(dt);
NOTE - Use Tables in excel!:
Excel has "tables" functionality that make data behave more like a table.. this gives you some great benefits but is not going to let you do every type of query.
http://office.microsoft.com/en-gb/excel-help/overview-of-excel-tables-HA010048546.aspx
For tabular data in excel this is my default.. first thing i do is click into the data then select "format as table" from the home section on the ribbon. this gives you filtering, and sorting by default and allows you to access the table and fields by name (e.g. table[fieldname] ) this also allows aggregate functions on columns e.g. max and average

Might I suggest giving QueryStorm a try - it's a plugin for Excel that makes it quite convenient to use SQL in Excel.
In the SQL scripts Excel tables are visible as if they were regular database tables.
All four SQL data operations are supported: select/update/insert/delete.
The engine that executes the queries is SQLite so you can use joins, common table expressions, window functions, etc... And you get the fancy stuff like code completion, auto-formatting, symbol tooltips etc...
It has a completely free community edition for use by individuals and small companies. If you're in a company that has more than 5 employees or more than $1M in yearly revenue, you'll need a paid license but you can use a free trial key for evaluation purposes.
This blog post describes the SQL functionality of the plugin in much more detail.
Disclaimer: I'm the author.

You can do this natively as follows:
Select the table and use Excel to sort it on Last Name
Create a 2-row by 1-column advanced filter criteria, say in
E1 and E2, where E1 is empty and E2 contains the formula =C6=""
where C6 is the first data cell of the phone number column.
Select the table and use advanced filter, copy to a range, using
the criteria range in E1:E2 and specify where you want to copy the
output to
If you want to do this programmatically I suggest you use the Macro Recorder to record the above steps and look at the code.

The accepted answers here are old technology and shouldn't be attempted.
Back when this question was written, Power Query wasn't a well known option and wasn't available unless you were on the latest version of Office and installed it as a separate Add-in.
Now, Power Query is included in Excel and used by default to get data. It is the right way to do this. It is simple, fast and effective.
Here is the answer to the question in Power Query. Search on "getting started with Power Query" if you need help replicating this. Once you get started with Power Query, you'll see this is very basic and easy to do with the Advanced Editor:
let
Source = Excel.CurrentWorkbook(){[Name="Names"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"lastname", type text}, {"firstname", type text}, {"phonenumber", type text}}),
#"Filtered Rows" = Table.SelectRows(#"Changed Type", each ([phonenumber] <> null)),
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"lastname", "firstname", "phonenumber"}),
#"Sorted Rows" = Table.Sort(#"Removed Other Columns",{{"lastname", Order.Ascending}})
in
#"Sorted Rows"

You can use SQL in Excel. It is only well hidden.
See this tutorial:
http://smallbusiness.chron.com/use-sql-statements-ms-excel-41193.html

If you need to do this once just follow Charles' descriptions, but it is also possible to do this with Excel formulas and helper columns in case you want to make the filter dynamic.
Lets assume you data is on the sheet DataSheet and starts in row 2 of the following columns:
A: lastname
B: firstname
C: phonenumber
You need two helper columns on this sheet.
D2: =if(A2 = "", 1, 0), this is the filter column, corresponding to your where condition
E2: =if(D2 <> 1, "", sumifs(D$2:D$1048576, A$2:A$1048576, "<"&A2) + sumifs(D$2:D2, A$2:A2, A2)), this corresponds to the order by
Copy down these formulas as far as your data goes.
On the sheet which should display your result create the following columns.
A: A sequence of numbers starting with 1 in row 2, this limits the total number of rows you can get (kind like a limit in sequel)
B2: =match(A2, DataSheet!$E$2:$E$1048576, 0), this is the row of the corresponding data
C2: =iferror(index(DataSheet!A$2:A$1048576, $B2), ""), this is the actual data or empty if no data exists
Copy down the formulas in B2 and C2 and copy-past column C to D and E.

If you have GDAL/OGR compiled with the against the Expat library, you can use the XLSX driver to read .xlsx files, and run SQL expressions from a command prompt. For example, from a osgeo4w shell in the same directory as the spreadsheet, use the ogrinfo utility:
ogrinfo -dialect sqlite -sql "SELECT name, count(*) FROM sheet1 GROUP BY name" Book1.xlsx
will run a SQLite query on sheet1, and output the query result in an unusual form:
INFO: Open of `Book1.xlsx'
using driver `XLSX' successful.
Layer name: SELECT
Geometry: None
Feature Count: 36
Layer SRS WKT:
(unknown)
name: String (0.0)
count(*): Integer (0.0)
OGRFeature(SELECT):0
name (String) = Red
count(*) (Integer) = 849
OGRFeature(SELECT):1
name (String) = Green
count(*) (Integer) = 265
...
Or run the same query using ogr2ogr to make a simple CSV file:
$ ogr2ogr -f CSV out.csv -dialect sqlite \
-sql "SELECT name, count(*) FROM sheet1 GROUP BY name" Book1.xlsx
$ cat out.csv
name,count(*)
Red,849
Green,265
...
To do similar with older .xls files, you would need the XLS driver, built against the FreeXL library, which is not really common (e.g. not from OSGeo4w).

You can experiment with the native DB driver for Excel in language/platform of your choice. In Java world, you can try with http://code.google.com/p/sqlsheet/ which provides a JDBC driver for working with Excel sheets directly. Similarly, you can get drivers for the DB technology for other platforms.
However, I can guarantee that you will soon hit a wall with the number of features these wrapper libraries provide. Better way will be to use Apache HSSF/POI or similar level of library but it will need more coding effort.

Microsoft Access and LibreOffice Base can open a spreadsheet as a source and run sql queries on it. That would be the easiest way to run all kinds of queries, and avoid the mess of running macros or writing code.
Excel also has autofilters and data sorting that will accomplish a lot of simple queries like your example. If you need help with those features, Google would be a better source for tutorials than me.

I might be misunderstanding me, but isn't this exactly what a pivot table does? Do you have the data in a table or just a filtered list? If its not a table make it one (ctrl+l) if it is, then simply activate any cell in the table and insert a pivot table on another sheet. Then Add the columns lastname, firstname, phonenumber to the rows section. Then Add Phone number to the filter section and filter out the null values. Now Sort like normal.

Related

Excel cell Value as SQL query where statement

I am very new to SQL, I want to import data from SQL Server to Excel using this query
SELECT
Model, Factory, TargetTime, TotalEvalMins
FROM
AMSView
WHERE
WeekNumber = 45 AND WeekYear = 2021
I want to change the week number & year dynamically by taking user input from a cell.
Can anyone please suggest how to change the query?
Let's say the user values week & year in worksheet sample in A1,A2 , how can I write that query?
Since the amount of data is huge I must apply where while querying the data instead of applying filters in Excel.
Sorry for my bad English
Name each of your cells that you will use as parameters. This page describes the process.
Name a cell
1. Select a cell.
2. In the Name Box, type a name.
3. Press Enter.
For each cell containing a parameter for your query:
Select the cell
Use Data>Get & Transform Data>From Table/Range. This will open the PowerQuery Editor. You will see something like this:
Right-click the cell in row 1 in the grid in the Power Query Editor and select 'Drill Down'. This converts the query on your parameter cell to a named value which can be used in other queries. It looks like this:
Now in Excel, use Data>Get Data and create your query from the database. I created a sample table in a local SQL Server database called AMSView, then connected to it with the query text in your post. When finishing the query connection, select 'Transform' so the query opens in the PowerQuery Editor.
Now, use Home>Advanced Editor and edit as follows by replacing the fixed values in the WHERE clause with concatenated names of your parameter cells, converted to text. For brevity, I have only used one parameter. If you've used capital letters in your cell names, remember, the M language is case-sensitive, so the concatenated parameter name must have identical casing to the named value.
let
Source = Sql.Database("localhost", "StackOverflowTest", [Query="SELECT #(lf) Model, Factory, TargetTime, TotalEvalMins #(lf)FROM #(lf) AMSView #(lf)WHERE #(lf) WeekNumber = " & Number.ToText(week_number)])
in
Source
Once your query is finished, use Home>Close & Load to load the results to the workbook. Now, when your parameter cells change, you need only refresh the query (right-click, refresh) and the data will be filtered as required.

Handlings Blanks in Yes/No columns

I have an excel file with one column that has either Yes/No or Blanks. When I import this into Access, the column data type is automatically set to Yes/No. When I open the table it mirrors what was in the excel file, which is fine, example below:
Tasked
Yes
No
No
When I use this table as part of make-table build – that includes several outer joins – the Tasked column reads -1/0 instead. The problem with this is that the blanks are set to 0, so I can't tell what job has yet to be Tasked.
I understand you can't have blanks in Yes/No column in Access, but in reality this doesn't help.
I have read several forums with advice on this but nothing is working?

Exporting fields of a highlighted row in Access Databse to a Word document

I have an access database and I want to export fields from ONE highlighted row to a word document and email it to a recepient.
From the Access database I want to export the following fields:
Initials (character string), HospNo (Character and Number string), date, comment (character string)
and I want to export these fields from the row of my choice to a word document, c:\test.docx, with 4 MERGEFIELD's bookmarked as Inits, HosNumber, ScanDate, Diagnosis, respectively.
I think MailMerge is the solution and that's why I used Mergefileds in Word. But I know very little VBA and don't know where to start from.
I have Office 2010 on my PC.
Is that information sufficient to explain the problem?
From a very high level, you're probably going to need to create a recordset in VBA that contains just the one record you want to export. You can then use that recordset as your source for your mailmerge.
I've never done mailmerges, but this should get you started:
Dim db as Database
Dim rec as Recordset
Set db = CurrentDB
Set rec = db.OpenRecordset("SELECT Initials, HospNo, [date], comment FROM MyTableName WHERE SomeFilterCriteria")
'Mailmerge based on "rec"
Obviously you need to change MyTableName and MyFilterCriteria based on your specific info, you didn't give us the table name or how you want to determine which record of data to mailmerge.
Either that or you can build a query, set up the mailmerge from the query, and then put filters in the query that point to your form. In the Criteria line in a query (if you open it in Design View), you would put something like
[Forms]![MyFormName].[MyFieldName]
Also, if you have the ability to do so, change your date field name. The word "date" is a reserved word, which means you have to enclose it in brackets so Access doesn't think it's a built-in command. Change the field name to scandate or something to avoid future problems.
If you are already working with MailMerge, you can just select the line in Access and use the Word Merge feature under the external data link.
If you are looking for a more automated procedure, I believe you will need to use code.

SSRS report builder 2.0 store STATIC DATA to use to query results

Does any one know if there is a way to import a spreadsheet into report builder 2.0 and then use my data set to make calculations against.
This might seem like a novice question as my limited experience of report builder does not help.
The reason i want to do this is so that i don't have to have my main data-set run the query on working out averages of hundreds of thousands of records as it take ages to run. by having the benchmark average data static i would want to run my query and do the calculations in report builder which will make it a 100 times faster.
Thank you for your time in advance
You may be able to overcome this by adding Calculated Fields to your dataset (DS). I am assuming that your static data can be related to your dataset by using at least one existing field. Using the Switch function, you can populate your calculated fields. Switch “evaluates a list of expressions and returns an Object value corresponding to the first expression in the list that is True.”
You can use the function like this:
=Switch(Fields!DsField1.Value = 2, “Your Value1”, Fields!DsField1.Value = 5, “Your Value2”, Fields!DsField1.Value = 10, “Your Value3”, ….)
If you have any condition that needs to be checked, you can add it before the Switch statement like this:
=IIF(Fields!DsField20.Value <>1000, Switch(Fields!DsField1.Value = 2, “Your Value1”, Fields!DsField1.Value = 5, “Your Value2”, Fields!DsField1.Value = 10, “Your Value3”, ….), Nothing)
You can have your values in an Excel sheet to make the creation of the formula easier. Simply create your formula in the first row, copy the row down to extend your formula, and cover all your values. Then from Excel simply copy and paste the column of data into your calculated field(s).
Here’s an example of My Excel formula. This is the best I could do as I could not paste the sample here. You can copy and paste these and replace them your own values.
In Cell-A2 2
In Cell-B2 YourValue1
In Cell-C2 YourOtherValue1
In Cell-D2 YourOtherOtherValue1
In Cell-E2 YourOtherOtherOtherValue1
In Cell-F2 ="Fields!DsField1.Value ="&A2&","&""""&B2&""""&","
In Cell-G2 ="Fields!DsField1.Value ="&$A2&","&""""&C2&""""&","
In Cell-H2 ="Fields!DsField1.Value ="&$A2&","&""""&D2&""""&","
In Cell-I2 ="Fields!DsField1.Value ="&$A2&","&""""&E2&""""&","**
Sorry if there is anything I have missed; I did this in a rush.
Report builder doesn't have anywhere you can 'import' the spreadsheet to, except one of the databases you are querying from. And Excel isn't a supported data source for SSRS, however it might be possible to add a report data source that uses an ODBC DSN to the appropriate Excel file. (I haven't tried it).
But, I can foresee some problems with this approach - it may get upset under multiple users and I expect you may find the file gets locked so you can't update it very easily.
An approach that might work could be to upload the static data into an Access database (as that is supported via the OLE DB Jet provider) and reference that as a data source; but the best approach is always going to be importing the static data into a table in your main database and using that.

Transform and load a large CSV to multiple worksheets in one Excel file

Back Story:
NEW PROJECT FROM MANAGEMENT: I have been given a soft project from my boss to evaluate one of our current ETL plans to look for room for improvement in the process, and I am looking for guidance.
MOTIVE: Excel is currently being used and crashes quite often during the process due to file size.
TASK: Every month an analyst receives a large csv file from a survey vendor containing up to 750 columns (not all unique names) with over 15,000 rows to simply transform a large csv file into an excel file with seven worksheets broken up based on the column headings in the csv. Details of how it is broken up is below.
My question is one large csv being transformed into an edited excel file with multiple worksheets any easier or quicker using VB.NET and VS2010 or VBA for that matter, or would using Excel be the simplilest way to continue this process? I am an Expert Excel user but I am still very much a beginner to intermediate at coding in VBA, VB.NET or any other language.
Detailed Question:
I am open to using free or open source software, but I am most familiar with VB.NET and Excel and Excel-VBA. I have played around a bit coding a simple windows form application to load the csv into a datatable using similar TextFieldParser code found here. I have thought of loading it into an array or even a 2d array to more easily edit the column headings and find the duplicate column headings. The datatable option still leaves me with more questions than answers because I need unique column headings and not sure if I should bother with a datatable if I'm going to just write an excel file right away. I tried CSVreader from CodeProject won't work on files with duplicate header names. I feel as though I am having writers block as I am not sure which direction I should take handle such a process. Any input you can provide will be much appreciated, and I apologize if this question does not have a single and clear best answer, Thanks.
Current Analyst tasks using excel
The current analytical plan has said analyst to open the csv in excel, insert a row above row 1 and use a vlookup to replace the 'New' column names with the 'Old' column names based on a simple two column lookup table on a separate worksheet. For example
New becomes Old
"org-name" becomes "org_name" or
"item_1_Vendor" becomes "item_1" or
"date-created_Survey" becomes "date_created"
etc...checking all sent "New" columns against the list of all possible 750 columns.
Then they paste values of the first row and then delete the 2nd row which contained the New headings we want to change.
Then the analyst has to fix the primary key on the file which is called "sid".
The Survey ID field (sid) should have a number for each row of the data file. Sometimes the sid shows up under the sid_HCAHPS or the sid_CGCAHPS fields instead.
The analyst would insert a column next to the "sid" field and put a formula in it like this, for example:
=IF(BE2<>"",BE2,IF(RD2<>"",RD2,IF(UH2<>"",UH2,"")))
Actual cell references would change but in the example excel formula,
"sid"=Range("BE2")
"sid_HCAHPS"=Range("RD2")
"sid_CGCAHPS"=Range("UH2")
Once the newly created primary key column is made and filled without blanks, we can delete the original "sid" column.
The next step is to check the columns because there may be a redundant HCAHPS section of columns (due to a second survey being sent and then returned- coded as Wave 2), delete second set of columns "sid_HCAHPS" through "language"
Next is the largest alteration because we have setup a system where we send this information to our database admins in the form of a seven worksheet excel file to be loaded by an MS Access Query that creates a table from each sheet that gets loaded into our proprietary business intelligence software. All Done!!
Is your question, "can VB.net automate our current analyst tasks?" -If so, then yes.
You could use the streamreader class to get data from your csv
(http://msdn.microsoft.com/en-us/library/system.io.streamreader.aspx)
Then store it either in an array as you mentioned or use the *list class
(http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx)
Once you've got all your data stored you'll need to automate excel, this is quite straight forward but here's a link to get you started with that as well: http://support.microsoft.com/kb/301982/en-gb
With the list class you can create a list of custom objects using either classes or structures. eg.
We define a structure:
Structure rowOfData
Public intPrimaryKey as Integer
Public strIceCreamName as String
Public decPrice as Decimal
End Structure
We can then create a rowOfData and add properties to it:
Dim iceCream1 as rowOfData
iceCream1.intPrimaryKey = 1
iceCream1.strIceCreamName = "Mr Whippy"
iceCream1.decPrice = 0.99
We create a list with:
Dim listOfIceCreams as New List(of rowOfData)
And add to it like this:
listOfIceCreams.Add(iceCream1)
listOfIceCreams.Add(iceCream2)
etc.
And access the members of the list like this:
listOfIceCreams(0).decPrice 'gives us the price of the ice Cream that was added to the list first.
There are also a lot of other useful methods that lists have which arrays don't. You could have a look through that msdn list class link to see if anything jumps out at you that you might need