Tableau - carry out join on IP Numbers SSMS

Tableau - carry out join on IP Numbers SSMS - sql

I am trying to put two tables into Tableau. One has IP addresses in dot format ie 192.168.32.1, and the other has IP numbers corresponding to cities and postcodes etc that I want to make available to visualisation.
The idea is to carry out the steps here (http://kb.tableau.com/articles/howto/mapping-ip-address-geocode-data) to do a join on the two tables, where the join converts the IP address in one table into a number that can then be compared to the number in the other table.
However when i followed the steps in the guide here it ran for 40 minutes and then crashed.
Can anyone shed any light on this?
My tables are in Microsoft SQL Server Management Studio - I have also looked into using Computed columns to do the same thing but with no luck so far ( I am very new to SQL and cannot work out how to save then apply a function, as suggested here https://www.stev.org/post/mssqlconvertiptobigint).

Preface: I'd suggest trying to run the below query to see if it converts correctly and quickly (try to stick to under 30 seconds as a good rule of thumb) and go from there. That can tell you whether you're better off investing more time in SQL or in Tableau.
There are many approaches one could take, this is just my suggestion. What you could consider is writing a query that creates another table with the data formatted. A stored procedure that is set to run in a job (or just a job) and add to the table every few minutes (or nightly, whatever you think is appropriate) would give you the base data in SQL. Then, you could use Tableau to do the joins .
select [IP Address],
--add as many columns as you want from the base table to take the place of one of the tables you join to
[CodeForIPAddressIntegerFromYourHelpSite] as 'IPINT'
--converts IP Address to Integer - use the code from your help site
Into [IPIntegerConversion]
--this will create a permanent table automatically
from YourTableWithIPAddress
This method would get you a table that has both the IP Address and the IP Integer that would allow you to link between the two (you should be able to paste the code from their site over [CodeForIPAddressIntegerFromYourHelpSite]. Then, you could set this up to run automatically in SQL Agent (which is very easy, actually). If the query itself isnt expensive, you can paste it into the job. If you pass the data already computed, it may be more efficient.

I think this should get you close:
ParseName is a function in SQL Server that parses IP addresses. I am not an expert in IP stuff and got the basics from about 5 minutes of google searching. You may have to reverse the order, but this is the basic query structure and it should be pretty fast.
select ip.ip
,parsename(ip.ip,4)*16777216 -- 2^24
+parsename(ip.ip,3)*65536 -- 2^16
+parsename(ip.ip,2)*256 -- 2^8
+parsename(ip.ip,1) ip4
,ipv4.*
from tableWYourIPs ip
left join ipv4 on parsename(#ip,4)*16777216
+parsename(#ip,3)*65536
+parsename(#ip,2)*256
+parsename(#ip,1) between ipv4.start and ipv4.end
Make sure you apply the indexes the site recommends:
CREATE INDEX [ip_from] ON [ip2location].[dbo].[ip2location_db9]([ip_from]) ON [PRIMARY]
GO
CREATE INDEX [ip_to] ON [ip2location].[dbo].[ip2location_db9]([ip_to]) ON [PRIMARY]
GO
The conversion follows this logic:
http://blogs.lessthandot.com/index.php/datamgmt/datadesign/how-to-convert-ip-addresses-between-bigi/

Related

Access Unmatched or similar query where a column does not contain or is not like another column

I want to design a query that basically does a mass amount of "Not Like "*x*", except all of the things I would not like the query to contain are in another column.
I know I can do this one at a time by just using the criteria and specifying "Not like "*x*", but I have no idea how to do a not like for a whole column of data.
So, the long version is that I have a bunch of cameras hosted on several different severs on a corporate network. Each of these cameras are on the same subnet and everything but the last octet of the IP address matches the server. Now, I have already created a field in a query that trims off the last octet of my IP, so I now basically have a pre-made IP range of where the cameras could possibly be. However, I do not have an exact inventory of each of the cameras - and there's not really a fast way to do this.
I have a list of issues that I'm working on and I've noticed some of the cameras coming up in the list of issues (basically a table that includes a bunch of IP addresses). I'd like to remove all possible instances of the cameras from appearing in the report.
I've seen designs where people have been able to compare like columns, but I want to do the opposite. I want to generate a query where it does not contain anything like what's in the camera column.
For the sake of this, I'll call the query where I have the camera ranges Camera Ranges and the field Camera Range.
Is there a way I can accomplish this?
I'm open to designing a query or even changing up my table to make it easier to do the query.

Similar to the answer I provided here, rather than using a negative selection in which you are testing whether the value held by a record is not like any record in another dataset, the easier approach is to match those which are like the dataset and return those records with no match.
To accomplish this, you can use a left join coupled with an is null condition in the where clause, for example:
select
MainData.*
from
MainData left join ExclusionData on
MainData.TargetField like ExclusionData.Pattern
where
ExclusionData.Pattern is null
Or, if the pattern field does not already contain wildcard operators:
select
MainData.*
from
MainData left join ExclusionData on
MainData.TargetField like '*' & ExclusionData.Pattern & '*'
where
ExclusionData.Pattern is null
Note that MS Access will not be able to represent such calculated joins in the Query Designer, but the JET database engine used by MS Access will still be able to interpret and execute the valid SQL.

ms access query (ms access freezes)

I have this report and need to add totals for each person (the red circle)
existing report
new report
I cannot change the existing report so I export data from MS SQL to MS Access and create a new report there. I got it working for one employee but have trouble with a query which would for multiple employees.
This query extract data use as input:
SELECT [TIME].[RCD_NUM], [TIME].[EMP_ID], [TIME].[PPERIOD], [TIME].[PRUN], [TIME].[TDATE], [TIME].[PC], [TIME].[RATE], [TIME].[HOURS], [TIME].[AMOUNT], [TIME].[JOB_ID], [TIME].[UPDATED], [TIME].[UPDATED_BY], [TIME].[LOG_DATE], [TIME].[ORIGINAL_REC_NUM]
FROM [TIME]
WHERE ((([TIME].[EMP_ID])=376) And (([TIME].[TDATE])<=#12/31/2006# And ([TIME].[TDATE])>=#1/1/2006#) And (([TIME].[PC])<599));
this query populates the report:
SELECT *
FROM TIME1
WHERE RCD_NUM = (SELECT Max(RCD_NUM) FROM [TIME1] UQ WHERE UQ.PPERIOD = [TIME1].PPERIOD AND UQ.PC = [TIME1].PC);
the problem is if I remove EMP_ID from the first query like this
SELECT [TIME].[RCD_NUM], [TIME].[EMP_ID], [TIME].[PPERIOD], [TIME].[PRUN], [TIME].[TDATE], [TIME].[PC], [TIME].[RATE], [TIME].[HOURS], [TIME].[AMOUNT], [TIME].[JOB_ID], [TIME].[UPDATED], [TIME].[UPDATED_BY], [TIME].[LOG_DATE], [TIME].[ORIGINAL_REC_NUM]
FROM [TIME]
WHERE ((([TIME].[TDATE])<=#12/31/2006# And ([TIME].[TDATE])>=#1/1/2006#) And (([TIME].[PC])<599));
then the second query doesn't work and ms access freezes when running this query.
any help/idea please?

Caveat: I won't pretend to know the precise cause of the problem, but I have had to repeatedly refactor queries in Access to get them working even though the original SQL statements are completely valid in regards to syntax and logic. Sometimes I've had to convolute a sequence of queries just to avoid bugs in Access. Access is often rather dumb and will simply (re)execute queries and subqueries exactly as given without optimization. At other times Access will attempt to combine queries by performing some internal optimizations, but sometimes those introduce frustrating bugs. Something as simple as a name change or column reordering can be the difference between a functioning query and one that crashes or freezes Access.
First consider:
Can you leave the data on SQL Server and link to the results in Access (rather than export/importing it into Access)? Even if you need or prefer to use Access for creating the actual report, you could use all the power of SQL Server for querying the data--it is likely less buggy and more efficient.
Common best practice is to create SQL Server stored procedures that return just what data you need in Access. A pass-through query is created in Access to retrieve the data, but all data operations are performed on the server.
Perhaps this is just a performance issue where limiting the set by [EMP_ID] selects a small subset, but the full table is large enough to "freeze" Access.
How long have you let Access remain frozen before killing the process? Be patient... like many, many minutes (or hours). Start it in the morning and check after lunch. :) It might eventually return a result set. This does not imply it is tolerable or that there is no other solution, but it can be useful to know if it eventually returns data or not.
How many possible records are there?
Are the imported data properly indexed? Add indexes to all key fields and those which are used in WHERE clauses.
Is the database located on a network share or is it local? Try copying the database to a local drive.
Other hints:
Try the BETWEEN operator for dates in the WHERE clause.
Try refactoring the "second" query by performing a join in the FROM clause rather than the WHERE clause. In doing this, you may also want to save the subquery as a named query (just as [TIME1] is saved). Whether or not a query is saved or embedded in another statement CAN change the behavior of Access (see caveat) even though the results should be identical.
Here's a version with the embedded aggregate query. Notice how all column references are qualified with the source. Some of the original query's columns do not have a source alias prefixing the column name. Remember the caveat... such picky details can affect Access behavior.:
SELECT TIME1.*
FROM TIME1 INNER JOIN
(SELECT UQ.PPERIOD, UQ.PC, Max(UQ.RCD_NUM) As Max_RCD_NUM
FROM [TIME1] UQ
GROUP BY UQ.PPERIOD, UQ.PC) As TIMEAGG
ON (TIME1.PPERIOD = TIMEAGG.PPERIOD) And (TIME1.PC = TIMEAGG.PC)
AND (TIME1.RCD_NUM = TIMEAGG.Max_RCD_NUM)

Create a Global Persistent List of Strings as Variable

I'm using SQL Server 2008 R2, in MS SQL Server Management Studio. I've only ever done Selecting and all the standard stuff but I find myself frequently using the same lists of strings for different queries and I'd like to be able to build variable that holds them. I don't have the access rights to create a new table, otherwise I would just do that. Is this even possible?
Let's say I have a bunch of client numbers that I want to use to include only their client account data with a query, example:
SELECT * FROM SALES
WHERE CLIENTNUMBER IN ('123','456','789')
Is there a way to create a variable that will hold those 3 values, so that I can instead just say
SELECT * FROM SALES
WHERE CLIENTNUMBER IN #CLOTHING_CLIENTS
The list is longer than 3 client numbers of course. And there are different categories etc. I think it would be MUCH simpler to do as a separate table but of course I don't have the ability to create new tables. I could do JOINs and the like too but that's getting even more work than just putting in the client numbers each time.
I'm trying to simplify things and make it more readable for other people, not make it more efficient for the database or more "correct".

There's a couple of ways you can do this involving temp tables or table variables. Try something like this:
declare #CLOTHING_CLIENTS table (ClientNumber varchar(20) not null);
-- Your list of values goes here
insert into #CLOTHING_CLIENTS (ClientNumber)
values ('123')
,('456')
,('789');
select * from Sales
where ClientNumber in (select ClientNumber from #CLOTHING_CLIENTS);
The #CLOTHING_CLIENTS variable can be used again anywhere in the same batch that it was created in. This post does a good job explaining the scope of table variables.

Good news there are Global things in T-SQL!!!
An extension of Jeff Lewis's answer that makes things a bit more 'Global' is to use the ## Type of table.
Assuming you can make them then a ##Table is a temporary table that can be access by other connections and even other databases. Just make sure your on the same server.
So you can do this:
CREATE TABLE ##MyValues(A INT)
INSERT INTO ##MyValues(A) VALUES (1)
Once done you can go anywhere and do
SELECT * FROM ##MyTables
Now all you need to do is update your snippets to do things like
SELECT * FROM SALES AS S
INNER JOIN ##MyClientIDTable AS MCIT OM MCIT.CLIENTNUMBER = S.CLIENTNUMBER
Just make sure your ##MyClientIDTable Has a CLIENTNUMBER column in it and the correct data.
Hope this helps a bit.

Get last few query results in SQL

I frequently do a static analysis of SQL databases, during which I have the luxury of nobody being able to change the data except me.
However, I have not found a way to 'tell' this to SQL in order to prevent running the same query multiple times.
Here is what I would like to do, first I start with a complicated query that has a very small output.
SELECT * FROM MYTABLE WHERE MYPROPERTY = 1234
Then I run a simple query from the same window (Mostly using SQL server studio if that is relevant)
SELECT 1
Now I suddenly realize that I forgot to save the results from my first complicated (slow) query.
As I know the underlying data did not change (or even if it did) I would like to look one step back and simply get the result. However at the moment I don't know any trick to do this and I have to run the entire query again.
So the question summary is: How can I (automatically store/)get the results from recently executed queries.
I am particulary interested in simple select queries, and would be happy to allocate say 100MB memory for automated result storage. Would prefer a solution that works in SQL server studio with T-SQL, but other SQL solutions are also welcome.
EDIT: I am not looking for a way to manually prevent this from happening. In the cases where I can anticipate the problem it will not happen.

This can't be done in Microsoft SQL Server. SQL Server does not cache results, instead it caches data pages that were accessed by your query. This should make your query go a lot faster the second time around so it won't be as painful to re-run it.
In other databases, such as Oracle and MySQL, they do have a query caching mechanism that will allow you to retrieve the results directly the second time around.

I run into this frequently, I often just throw the results of longer-running queries into a temp table:
SELECT *
INTO #results1
FROM MYTABLE WHERE MYPROPERTY = 1234
SELECT *
FROM #results1
If the query is very long-running I might use a 'real' table. It's a good way to save on re-run time.
Downside is that it adds to your query.
You can also send query results to a file in SSMS, info on formatting the output is here: SSMS Results to File

The easiest way to do this is to run each query in its own SSMS window, the results will stay there until you close it, or run out of memory - besides that, I am not sure there is a way to accomplish what you want.
Once you close the SSMS window, I don't believe there is a way to get back 'cached' results.

This isn't a technical answer to your question. Having written queries and looking at results for many years, I am in the habit of saving the results in Excel, regardless of the database/query tool I'm using.
The format in Excel is rather methodical:
Each worksheet has the date. (Called something like "1 Jul".)
Each spreadsheet contains one month. (Typically with the month name like "work-201307".)
In the "B" column I copy and paste the query.
Underneath, in the "C" column, I copy and paste the results.
The next query starts a few lines after, one after the other.
I put the queries in the "B" column, so I can go to the "A" column and use to get to the first row. I put the results in the "C" column, so I can go to the "B" column and use to move between queries.
I mostly do this so I can go back and see the work I did many months ago. For instance, someone sends an email from February and says "do this again". I can go back to the February spreadsheet, go to the day it was created, and see what I was doing at that time.
In your question, though, I realize that I now instinctively solve this problem, because the "right click on the grid, copy with column headers, alt-tab to excel, alt-V" is a behavior that I comes quite naturally.

I was going to suggest you to run each query into a script with a counter (stored in a table) increased each time the query is executed (i.e. i++) and storing each query in a Temp Table called "tmpTable" + i, but it sounds very complicated to manage. Am I right?
Then I googled and I've found this Tool Pack: I didn't try it but you could take a look:
http://www.ssmstoolspack.com/Features
Hope it helps.
EDIT: added the folliwing link. There's the option to output as XML file and they mention SQL Server Integration Services as a possible solution too.
http://michaeljswart.com/2012/03/sending-query-results-to-others/#method5
SECOND EDIT: There's this DBMS-Independent tool too, it sounds interesting:
http://www.sql-workbench.net/

i am not sure this is what you want. Anyway check my answer
In sql server management studio you can open multiple tabs for executing queries. Open new tab for each query, then the result of executed queries will be available under that tab.
After executing one query in a tab dont use that tab for new query, open new tab for that job.

Have you considered using some kind of offline SQL client such as Excel? Specifically, Excel will retrieve the results into the spread sheet (using the Data ribbon/menus) where they are stored pretty much permanently as results. It will prompt you to refresh when necessary or you can do it on demand.
Your question as to whether it can be done in T/SQL or other databases depends on the database and results cache and even then they are options that the query processor can use not guarantees to the individual query.

Oracle9i: Filter Expression Fails to Exclude Data at Runtime

I have a relatively simple select statement in a VB6 program that I have to maintain. (Suppress your natural tendency to shudder; I inherited the thing, I didn't write it.)
The statement is straightforward (reformatted for clarity):
select distinct
b.ip_address
from
code_table a,
location b
where
a.code_item = b.which_id and
a.location_type_code = '15' and
a.code_status = 'R'
The table in question returns a list of IP addresses from the database. The key column in question is code_status. Some time ago, we realized that one of the IP addresses was no longer valid, so we changed its status to I (invalid) to exclude it from appearing in the query's results.
When you execute the query above in SQL Plus, or in SQL Developer, everything is fine. But when you execute it from VB6, the check against code_status is ignored, and the invalid IP address appears in the result set.
My first guess was that the results were cached somewhere. But, not being an Oracle expert, I have no idea where to look.
This is ancient VB6 code. The SQL is embedded in the application. At the moment, I don't have time to rewrite it as a stored procedure. (I will some day, given the chance.) But, I need to know what would cause this disparity in behavior and how to eliminate it. If it's happening here, it's likely happening somewhere else.
If anyone can suggest a good place to look, I'd be very appreciative.

Some random ideas:
Are you sure you committed the changes that invalidate the ip-address? Can someone else (using another db connection / user) see the changed code_status?
Are you sure that the results are not modified after they are returned from the database?
Are you sure that you are using the "same" database connection in SQLPlus as in the code (database, user etc.)?
Are you sure that that is indeed the SQL sent to the database? (You may check by tracing on the Oracle server or by debugging the VB code). Reformatting may have changed "something".
Off the top of my head I can't think of any "caching" that might "re-insert" the unwanted ip. Hope something from the above gives you some ideas on where to look at.

In addition to the suggestions that IronGoofy has made, have you tried swapping round the last two clauses?
where
a.code_item = b.wich_id and
a.code_status = 'R' and
a.location_type_code = '15'
If you get a different set of results then this might point to some sort of wrangling going on that results in dodgy SQL actually be sent to the database.

There are Oracle bugs that result in incorrect answers. This surely isn't one of those times. Usually they involve some bizarre combination of views and functions and dblinks and lunar phases...
It's not cached anywhere. Oracle doesn't cache results until 11 and even then it knows to change the cache when the answer may change.
I would guess this is a data issue. You have a DISTINCT on the IP address in the query, why? If there's no unique constraint, there may be more than one copy of your IP address and you only fixed one of them.
And your Code_status is in a completely different table from your IP addresses. You set the status to "I" in the code table and you get the list of IPs from the Location table.
Stop thinking zebras and start thinking horses. This is almost certainly just data you do not fully understand.
Run this
select
a.location_type_code,
a.code_status
from
code_table a,
location b
where
a.code_item = b.which_id and
b.ip_address = <the one you think you fixed>
I bet you get one row with an 'I' and another row with an 'R'

I'd suggest you have a look at the V$SQL system view to confirm that the query you believe the VB6 code is running is actually the query it is running.
Something along the lines of
select sql_text, fetches
where sql_text like '%ip_address%'
Verify that the SQL_TEXT is the one you expect and that the FETCHES count goes up as you execute the code.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas