I am building a TSQL query to parse through a FTP log from FileZilla. I am trying to figure out if there is a way to get information from a line preceding the current one?
For example,
I have parsed out the Following procedure: "STOR file.exe"
With the FileZilla is doesn't say if the STOR wass successful until the next line. So I want to check the next line and see if the STOR was successful or was unsuccessful?
Also people could try to STOR a files multiple times so I want to get the last version of its status.
Example Info from Log file:
(000005) 4/10/2010 14:55:30 PM - ftp_login_name (IP Address)> STOR file.exe
(000005) 4/10/2010 14:55:30 PM - ftp_login_name (IP Address)> 150 Opening data for transfer.
(000005) 4/10/2010 14:55:30 PM - ftp_login_name (IP Address)> 226 Transfer OK
I want to add a column in my query that says that the STOR was successful or unsuccessful.
Thanks!
Assuming you have parsed these lines into actual columns, and you have SQL server 2005 or greater. You can use CROSS APPLY example query below (untested). I hope this helps.
select o.*, prev.*
from FTPLog o
cross apply
(
select top 1 *
from FTPLog P where P.LogDate < O.LogDate
order by LogDate DESC
) prev
James has the right idea, though there may be some issues if you ever have log dates that are exactly the same (and from your sample it looks like you might). You may be able to add an identity column to force an order at the time the data is inserted, then you can use James' concept on the identity column.
More than that though, TSQL may not be the best choice for this project, at least not by itself. While there are techniques you can use to make it iterate sequentially, it is not as good for that as certain other languages are. You may want to consider parsing your files in a tool, such as Python or Perl or even C#, that is better at text processing and better at processing data sequentially.
Related
I am trying to put two tables into Tableau. One has IP addresses in dot format ie 192.168.32.1, and the other has IP numbers corresponding to cities and postcodes etc that I want to make available to visualisation.
The idea is to carry out the steps here (http://kb.tableau.com/articles/howto/mapping-ip-address-geocode-data) to do a join on the two tables, where the join converts the IP address in one table into a number that can then be compared to the number in the other table.
However when i followed the steps in the guide here it ran for 40 minutes and then crashed.
Can anyone shed any light on this?
My tables are in Microsoft SQL Server Management Studio - I have also looked into using Computed columns to do the same thing but with no luck so far ( I am very new to SQL and cannot work out how to save then apply a function, as suggested here https://www.stev.org/post/mssqlconvertiptobigint).
Preface: I'd suggest trying to run the below query to see if it converts correctly and quickly (try to stick to under 30 seconds as a good rule of thumb) and go from there. That can tell you whether you're better off investing more time in SQL or in Tableau.
There are many approaches one could take, this is just my suggestion. What you could consider is writing a query that creates another table with the data formatted. A stored procedure that is set to run in a job (or just a job) and add to the table every few minutes (or nightly, whatever you think is appropriate) would give you the base data in SQL. Then, you could use Tableau to do the joins .
select [IP Address],
--add as many columns as you want from the base table to take the place of one of the tables you join to
[CodeForIPAddressIntegerFromYourHelpSite] as 'IPINT'
--converts IP Address to Integer - use the code from your help site
Into [IPIntegerConversion]
--this will create a permanent table automatically
from YourTableWithIPAddress
This method would get you a table that has both the IP Address and the IP Integer that would allow you to link between the two (you should be able to paste the code from their site over [CodeForIPAddressIntegerFromYourHelpSite]. Then, you could set this up to run automatically in SQL Agent (which is very easy, actually). If the query itself isnt expensive, you can paste it into the job. If you pass the data already computed, it may be more efficient.
I think this should get you close:
ParseName is a function in SQL Server that parses IP addresses. I am not an expert in IP stuff and got the basics from about 5 minutes of google searching. You may have to reverse the order, but this is the basic query structure and it should be pretty fast.
select ip.ip
,parsename(ip.ip,4)*16777216 -- 2^24
+parsename(ip.ip,3)*65536 -- 2^16
+parsename(ip.ip,2)*256 -- 2^8
+parsename(ip.ip,1) ip4
,ipv4.*
from tableWYourIPs ip
left join ipv4 on parsename(#ip,4)*16777216
+parsename(#ip,3)*65536
+parsename(#ip,2)*256
+parsename(#ip,1) between ipv4.start and ipv4.end
Make sure you apply the indexes the site recommends:
CREATE INDEX [ip_from] ON [ip2location].[dbo].[ip2location_db9]([ip_from]) ON [PRIMARY]
GO
CREATE INDEX [ip_to] ON [ip2location].[dbo].[ip2location_db9]([ip_to]) ON [PRIMARY]
GO
The conversion follows this logic:
http://blogs.lessthandot.com/index.php/datamgmt/datadesign/how-to-convert-ip-addresses-between-bigi/
As of now, Hive Terminal is showing only column headers after a create table code is run. What settings should I change to make Hive Terminal show few rows also, say first 100 rows?
Code I am using to create table t2 from table t1 which resides in the database (I don't know how t1 is created):
create table t2 as
select *
from t1
limit 100;
Now while development, I am writing select * from t2 limit 100; after each create table section to get the rows with headers.
You cannot
The Hive Create Table documentation does not mention anything about showing records. This, combined with my experience in Hive makes me quite confident that you cannot achieve this by mere regular config changes.
Of course you could tap into the code of hive itself, but that is not something to be attempted lightly.
And you should not want to
Changing the create command could lead to all kinds of problems. Especially because unlike the select command, it is in fact an operation on metadata, followed by an insert. Both of these normally would not show you anything.
If you would create a huge table, it would be problematic to show everything. If you choose always to just show the first 100 rows, that would be inconsistent.
There are ways
Now, there are some things you could do:
Change hive itself (not easy, probably not desirable)
Do it in 2 steps (what you currently do)
Write a wrapper:
If you want to automate things and don't like code duplication, you can look into writing a small wrapper function to call the create and select based on just the input of source (and limit) and destination.
This kind of wrapper could be written in bash, python, or whatever you choose.
However, note that if you like executing the commands ad-hoc/manually this may not be suitable, as you will need to start a hive JVM each time you run such a program and thus response time is expected to be slow.
All in all you are probably best off just doing the create first and select second.
The below command mentioned seems to be correct to show the first 100 rows:
select * from <created_table> limit 100;
Paste the code you have written to create the table will help to diagnose the issue in hand!!
Nevertheless , check if you have correctly mentioned the delimiters for the elements, key-value pairs, collection items etc while creating the table.
If you have not defined them correctly you might end up with having only the first row(header) being shown.
I frequently do a static analysis of SQL databases, during which I have the luxury of nobody being able to change the data except me.
However, I have not found a way to 'tell' this to SQL in order to prevent running the same query multiple times.
Here is what I would like to do, first I start with a complicated query that has a very small output.
SELECT * FROM MYTABLE WHERE MYPROPERTY = 1234
Then I run a simple query from the same window (Mostly using SQL server studio if that is relevant)
SELECT 1
Now I suddenly realize that I forgot to save the results from my first complicated (slow) query.
As I know the underlying data did not change (or even if it did) I would like to look one step back and simply get the result. However at the moment I don't know any trick to do this and I have to run the entire query again.
So the question summary is: How can I (automatically store/)get the results from recently executed queries.
I am particulary interested in simple select queries, and would be happy to allocate say 100MB memory for automated result storage. Would prefer a solution that works in SQL server studio with T-SQL, but other SQL solutions are also welcome.
EDIT: I am not looking for a way to manually prevent this from happening. In the cases where I can anticipate the problem it will not happen.
This can't be done in Microsoft SQL Server. SQL Server does not cache results, instead it caches data pages that were accessed by your query. This should make your query go a lot faster the second time around so it won't be as painful to re-run it.
In other databases, such as Oracle and MySQL, they do have a query caching mechanism that will allow you to retrieve the results directly the second time around.
I run into this frequently, I often just throw the results of longer-running queries into a temp table:
SELECT *
INTO #results1
FROM MYTABLE WHERE MYPROPERTY = 1234
SELECT *
FROM #results1
If the query is very long-running I might use a 'real' table. It's a good way to save on re-run time.
Downside is that it adds to your query.
You can also send query results to a file in SSMS, info on formatting the output is here: SSMS Results to File
The easiest way to do this is to run each query in its own SSMS window, the results will stay there until you close it, or run out of memory - besides that, I am not sure there is a way to accomplish what you want.
Once you close the SSMS window, I don't believe there is a way to get back 'cached' results.
This isn't a technical answer to your question. Having written queries and looking at results for many years, I am in the habit of saving the results in Excel, regardless of the database/query tool I'm using.
The format in Excel is rather methodical:
Each worksheet has the date. (Called something like "1 Jul".)
Each spreadsheet contains one month. (Typically with the month name like "work-201307".)
In the "B" column I copy and paste the query.
Underneath, in the "C" column, I copy and paste the results.
The next query starts a few lines after, one after the other.
I put the queries in the "B" column, so I can go to the "A" column and use to get to the first row. I put the results in the "C" column, so I can go to the "B" column and use to move between queries.
I mostly do this so I can go back and see the work I did many months ago. For instance, someone sends an email from February and says "do this again". I can go back to the February spreadsheet, go to the day it was created, and see what I was doing at that time.
In your question, though, I realize that I now instinctively solve this problem, because the "right click on the grid, copy with column headers, alt-tab to excel, alt-V" is a behavior that I comes quite naturally.
I was going to suggest you to run each query into a script with a counter (stored in a table) increased each time the query is executed (i.e. i++) and storing each query in a Temp Table called "tmpTable" + i, but it sounds very complicated to manage. Am I right?
Then I googled and I've found this Tool Pack: I didn't try it but you could take a look:
http://www.ssmstoolspack.com/Features
Hope it helps.
EDIT: added the folliwing link. There's the option to output as XML file and they mention SQL Server Integration Services as a possible solution too.
http://michaeljswart.com/2012/03/sending-query-results-to-others/#method5
SECOND EDIT: There's this DBMS-Independent tool too, it sounds interesting:
http://www.sql-workbench.net/
i am not sure this is what you want. Anyway check my answer
In sql server management studio you can open multiple tabs for executing queries. Open new tab for each query, then the result of executed queries will be available under that tab.
After executing one query in a tab dont use that tab for new query, open new tab for that job.
Have you considered using some kind of offline SQL client such as Excel? Specifically, Excel will retrieve the results into the spread sheet (using the Data ribbon/menus) where they are stored pretty much permanently as results. It will prompt you to refresh when necessary or you can do it on demand.
Your question as to whether it can be done in T/SQL or other databases depends on the database and results cache and even then they are options that the query processor can use not guarantees to the individual query.
Edited outputs: no file names or trailing slashes are included
I have a database with potentially thousands thousands of records (we're talking a 2MB result string if it was just SELECT * FROM xxx in a standard use case.
Now for reasons of security this result cannot be held anywhere for much more processing.
There is a path field where I want to extract all records with each level of folder structure.
So run the query one way I get every record in the root:
C:\
Query again another way I get every record in the first folder level:
C:\a\
C:\b\
etc
Then of course I will GROUP somehow in order to return
C:\a\
C:\b\
and not
C:\a\
C:\a\
C:\b\
C:\b\
hopefully you get the idea?
Any answers that at least move me in the right direction I will be grateful for. I really am stumped where to start with this as downloading every record and processing is far from the ideal solution in my context. (Which is what we do now).
SAMPLE DATA
C:\a\b\c\d
C:\a\b\c
C:\
C:\a\b
C:\g
D:\x
D:\x\y
Sample output 1
C:\
D:\
Sample output 2
C:\a
C:\g
D:\x
sample output 3
C:\a\b
D:\x\y
sample output 4
C:\a\b\c
sample output 5
C:\a\b\c\d
You could do if you have only folders: SELECT DISTINCT path FROM table WHERE LENGTH(path) - LENGTH(replace(path,'\','')) = N
If you have only file names then it depends on whether you have an INSTR function (or some regexp substitution function) provided by the RDBMS. In all cases, depends on the string functions that are available.
I have questions regarding MySQL date and querying with it.
First:
SELECT * FROM post WHERE DATE(Post_Date)="2009-03-25"
returns 0 results
SELECT * FROM post WHERE Post_Date="2009-03-25"
returns 71 results
SELECT * FROM post WHERE Post_Date>="2009-03-25"
returns 379 results
I understand that the second query returning 71 results match only posts with 2009-03-25 00:00:00 as the Post_Date and the third query shows everything. BUT why does the first query SHOW 0 RESULTS?? Please help! I checked the MySQL cnf and the date_format is set to %Y-%m-%d
Second:
SELECT * FROM post WHERE DATE(Post_Date)="2009-03-25"
RETURNS results on WINDOWS!
SELECT * FROM post WHERE DATE(Post_Date)="2009-03-25"
NO RESULTS in Linux!
Any pointers will be helpful!
Is there a configuration file that I need to change to make this work in Linux?
Diagnostic step: run the query SELECT DATE('2009-03-25 08:30:00') on each system. The result will probably tell you what's going on. (Likely a version issue.)
Not sure what to day about your first part, but as for the second: Have you check to make sure that both your servers on windows and Linux have the same data in their respective databases? If you are sure that they are, you may want to check if the Linux database give any results for that year or year-month rather than only the specific year-month-date.