Sorry for the generic title of the question, but I didn't know how else to put it.. So here goes:
I have a single table that holds the following information:
computerName | userName | date | logOn | startUp
| | | |
ID_000000001 | NULL | 2012-08-14 08:00:00.000 | NULL | 1
ID_000000001 | NULL | 2012-08-15 09:00:00.000 | NULL | 0
ID_000000003 | user02 | 2012-08-15 19:00:00.000 | 1 | NULL
ID_000000004 | user02 | 2012-08-16 20:00:00.000 | 0 | NULL
computername and username are self-explanatory I suppose
logOn is 1 when the user logged on at the machine and 0 when he logged off.
startUp is 1 when the machine was turned on and 0 when it got turned off.
the other entry is alway NULL respectively since we can't login and startup at the exact same time.
Now my task is: Find out which computers have been turned on the least amount of time over the last month (or any given amount of time, but for now let's say one month) Is this even possible with SQL? <-- Careful: I don't need to know how many times a PC was turned on, but how many hours/minutes each computer was turned on over the given timespace
There's two little problems as well:
We cannot say that the first entry of each computer is a 1 in the startUp column since the script that logs those events was installed recently and thus maybe a computer was already running when it started logging.
We cannot assume that if we order by date and only show the startUpcolumn that the entries will all be alternating 1's and 0's because if the computer is forced shut down by pulling the plug for example there won't be a log for the shutdown and there could be two 1's in a row.
EDIT: userName is of course NULL when startUp has a value, since turning on/shutting down doesn't show which user did that.
In a stored procedure, with cursors and fetch loops.
And you use a temptable to store by computername the uptime.
I give you the main plan, I'll let you see for the details in the TSQL guide.
Another link: a good example with TSQL Cursor.
DECLARE #total_hour_by_computername
declare #computer_name varchar(255)
declare #RowNum int
--Now in you ComputerNameList cursor, you have all different computernames:
declare ComputerNameList cursor for
select DISTINCT computername from myTable
-- We open the cursor
OPEN ComputerNameList
--You begin your foreach computername loop :
FETCH NEXT FROM ComputerNameList
INTO #computer_name
set #RowNum = 0
WHILE ##FETCH_STATUS = 0
BEGIN
SET #total_hour_by_computername=0;
--This query selects all startup 1 dates by computername order by date.
select #current_date=date from myTable where startup = 1 and computername = #computername order by date
--You use a 2nd loop on the dates that were sent you back:
--This query gives you the previous date
select TOP(1) #previousDate=date from myTable
where computername = #computername and date < #current_date and startup is not null
order by date DESC
--If it comes null, you can decide not to take it into account.
--Else
SET #total_hour_by_computername=#total_hour_by_computername+datediff(hour, #previousDate, #current_date);
--Once all dates have been parsed, you insert into your temptable the results for this computername
INSERT INTO TEMPTABLE(Computername,uptime) VALUES (#computername,#total_hour_by_computername)
--End of the #computer_name loop
FETCH NEXT FROM ComputerNameList
INTO #computer_name
END
CLOSE ComputerNameList
DEALLOCATE ComputerNameList
You only need a select into your temptable to determine which one of the computers has been up the most time.
You could group by computer, and use where to filter for startups in a particular month:
select computerName
, count(*)
from YourTable
where '2012-08-01' <= [date] and [date] < '2012-09-01'
and startup = 1
group by
computerName
order by
count(*) desc
As RoadWarrior pointed out, an accurate reports is not possible when shutdown messages are dropped. But here is an attempt to generate something useful. I'm going to assume the table name is computers:
SELECT c1.computerName,
timediff(MIN(c2.date), c1.date) as upTime
FROM computers as c1, computers as c2
WHERE c1.computerName=c2.computerName
AND c1.startUp=1 AND c2.startUp=0
AND c2.date >= c1.date
GROUP BY c1.date
ORDER BY c1.date;
This will generate a list of all the periods a computer was on. To generate your requested report you can use the above query as a subquery:
SELECT
c3.computerName,
SEC_TO_TIME(SUM(TIME_TO_SEC(c3.upTime))) AS totalUpTime
FROM
(SELECT c1.computerName,
timediff(MIN(c2.date), c1.date) AS upTime
FROM computers AS c1, computers AS c2
WHERE c1.computerName=c2.computerName
AND c1.startUp=1 AND c2.startUp=0
AND c2.date >= c1.date
GROUP BY c1.date
ORDER BY c1.date
) AS c3
GROUP BY c3.computerName
ORDER BY c3.totalUpTime;
Try this query (replace table_name with the name of your table):
SELECT SUM(startUp) AS startupTimes
FROM table_name
GROUP BY computerName
ORDER BY startupTimes
This will output the number of times each computer has been started. To get just the first row (the computer that has the least amount of startups) you can append LIMIT 1 to the query.
If (per your last paragraph) you aren't recording all shutdown events. then you don't have the information available to generate a report showing the amount of time each computer has been switched on. Because you aren't recording all instances of computer shutdown, it doesn't matter what SQL query you use.
FWIW, this schema isn't 3NF. A more common approach would be to have a single column recording each event, for example:
ComputerId:UserId:EventId:EventDate
The first three columns are each a foreign key into another table where the details are stored. Although even with this schema, the UserID would be null for startup/shutdown events.
Related
i have a table named as events and looks like this:
timestamp | intvalue | hostname | attributes
2019-03-13 14:43:05.437| 257 | room04 | Success 000
2019-03-13 14:43:05.317| 257 | room03 | Success 000
2019-03-13 14:43:03.450| 2049 | room05 | Error 108
2019-03-13 14:43:03.393| 0 | room05 | TicketNumber=3
2019-03-13 14:43:02.347| 0 | room04 | TicketNumber=2
2019-03-13 14:43:02.257| 0 | room03 | TicketNumber=1
The above is a sample of a table containing thousands of rows like this.
I'll explain in a few words what you see in this table. The timestamp column gives the date and time of when each event happened. In the intvalue column, 257 means successful entry, 2049 means error and 0 means a ticket made a request. The hostname gives the name of the card/ticket reader that reads each ticket and the attributes column gives some details like the number of the ticket (1, 2, 3 etc) or the type of error (i.e 108 or 109) and if the event is successful.
In this situation there is a pattern that says, if a ticket requests to enter and it is valid and happened at a time like 14:43:02.257, then the message of the successful entry will be written in the database (as a new event) in 6 seconds at most (that means at 14:49:02.257 maximum) after the ticket was read by the ticket reader.
If the ticket fails to enter, then after a time margin of 100 ms the error message will be written in the database.
So in this example what i want to do is create a table like below
timestamp | intvalue | hostname | result | ticketnumber
2019-03-13 14:43:05.437| 257 | room04 | Success 000 | TicketNumber=2
2019-03-13 14:43:05.317| 257 | room03 | Success 000 | TicketNumber=1
2019-03-13 14:43:03.450| 2049 | room05 | Error 108 | TicketNumber=3
As you can see the ticket with TicketNumber=3 is matched with the result Error 108 because if you look at the initial table, they have a time margin of less than 100ms, the other two tickets are matched 1-to-1 with their respective results, because the time margin is less than 6 seconds (and over than 100ms). You can also notice, that the hostnames can help the matching, the row with the attribute of the TicketNumber=3 has a hostname of room05, just like the next row that has the attribute of Error 108.
I've been trying to self join this table or join it with a CTE. I've used cross apply and i also have tried methods using datediff but i've failed miserably and i'm stuck.
Is there anyone that can help me and show me a correct way of achieving the desired outcome?
Thank you very much for your time.
The time lags don't really seem to make a difference, unless somehow a single room could be interleaved with both success and failure messages. Assuming that two requests do not happen in a row with no intervening event, then you can use lag():
select e.*
from (select timestamp, intvalue, hostname, attributes,
lag(attributes) over (partition by hostname order by timestamp) as ticketnumber
from event
) e
where intvalue > 0
order by timestamp
OK...here is the result you asked for based on the data you provided. This is just an example of how to write a self join to get the results in your example. I hope this pushes you in the right direction.
IF OBJECT_ID('tempdb..#t') IS NOT NULL
BEGIN
DROP TABLE #t
END
CREATE TABLE #t
(
[timestamp] DATETIME,
intValue INT,
hostName VARCHAR(50),
attributes VARCHAR(50)
)
INSERT INTO #t([timestamp], intValue, hostName, attributes)
VALUES ('2019-03-13 14:43:05.437', 257, 'room04', 'Success 000'),
('2019-03-13 14:43:05.317',257, 'room03','Success 000'),
('2019-03-13 14:43:03.450',2049, 'room05','Error 108'),
('2019-03-13 14:43:03.393',0, 'room05','TicketNumber=3'),
('2019-03-13 14:43:02.347',0, 'room04','TicketNumber=2'),
('2019-03-13 14:43:02.257',0, 'room03','TicketNumber=1')
SELECT x.[timestamp], x.intValue, x.hostName, x.attributes result, y.attributes
ticketnumber
FROM (SELECT * FROM #t WHERE intValue > 0) AS x
INNER JOIN #t y
ON x.hostName = y.hostName AND y.intValue = 0
GROUP BY x.[timestamp], x.intValue, x.hostName, x.attributes, y.attributes
ORDER BY x.[timestamp] DESC
I would not try to copy this into your project and use it, this is just an example of how to use the join. I would need way more information about what you want to accomplish before posting a full blown solution as there much much better ways to produce reports for large data sets.
- Bill
Since you're using SQL 2017, you can make use of lead/lag.
with evt(timestamp,intvalue,hostname,attributes) as
(
select cast('2019-03-13 14:43:05.437' as datetime), 257 , 'room04','Success 000' union all
select cast('2019-03-13 14:43:05.317' as datetime), 257 , 'room03','Success 000' union all
select cast('2019-03-13 14:43:03.450' as datetime), 2049 , 'room05','Error 108' union all
select cast('2019-03-13 14:43:03.393' as datetime), 0 , 'room05','TicketNumber=3' union all
select cast('2019-03-13 14:43:02.347' as datetime), 0 , 'room04','TicketNumber=2' union all
select cast('2019-03-13 14:43:02.257' as datetime), 0 , 'room03','TicketNumber=1'
)
select [timestamp], intvalue, hostname, attributes, lag(attributes) over (partition by hostname order by timestamp) ticketnumber, datediff(ss,lag([timestamp]) over (partition by hostname order by timestamp), [timestamp]) lapse
from evt
order by timestamp
I've got a postgres database that contains a table with IP, User, and time fields. I need a query to give me the complete set of all IPs that have only a single user active on them over a defined time period (i.e. I need to filter out IPs with multiple or no users, and should only have one row per IP). The user field contains some null values, that I can filter out. I'm using Pandas' read_sql() method to get a dataframe directly.
I can get the full dataframe of data from the defined time period easily with:
SELECT ip, user FROM table WHERE user IS NOT NULL AND time >= start AND time <= end
I can then take this data and wrangle the information I need out of it easily using pandas with groupby and filter operations. However, I would like to be able to get what I need using a single SQL query. Unfortunately, my SQL chops ain't too hot. My first attempt below isn't great; the dataframe I end up with isn't the same as when I create the dataframe manually using the original query above and some pandas wrangling.
SELECT DISTINCT ip, user FROM table WHERE user IS NOT NULL AND ip IN (SELECT ip FROM table WHERE user IS NOT NULL AND time >= start AND time <= end GROUP BY ip HAVING COUNT(DISTINCT user) = 1)
Can anyone point me in the right direction here? Thanks.
edit: I neglected to mention that there are multiple entries for each user/ip combination. The source is network authentication traffic, and users authenticate on IPs very frequently.
Sample table head:
---------------------------------
ip | user | time
---------------------------------
172.18.0.0 | jbloggs | 1531987000
172.18.0.0 | jbloggs | 1531987100
172.18.0.1 | jsmith | 1531987200
172.18.0.1 | jbloggs | 1531987300
172.18.0.2 | odin | 1531987400
If I were to query this example table for the time range 1531987000 to 1531987400 I would like the following output:
---------------------
ip | user
--------------------
172.18.0.0 | jbloggs
172.18.0.2 | odin
This should work
SELECT ip
FROM table
WHERE user IS NOT NULL AND time >= start AND time <= end
GROUP BY ip
HAVING COUNT(ip) = 1
Explanation:
SELECT ip FROM table WHERE user IS NOT NULL AND time >= start AND time <= end - filtering out the nulls and time periods
...GROUP BY ip HAVING COUNT(ip) = 1 - If an ip has multiple users, the count(no. of rows with that ip) would be greater > 1.
If by "single user" you mean that there could be multiple rows with only one user, then:
SELECT ip
FROM table
WHERE user IS NOT NULL AND time >= start AND time <= end
GROUP BY ip
HAVING MIN(user) = MAX(user) AND COUNT(user) = COUNT(*);
I have figured out a query that gets me what I want:
SELECT DISTINCT ip, user
FROM table
WHERE user IS NOT NULL AND time >= start AND time <= end AND ip IN
(SELECT ip FROM table
WHERE user IS NOT NULL AND time >= start AND time <= end
GROUP BY ip HAVING COUNT(DISTINCT user) = 1)
Explanation:
The inner select gets me all IPs that have only one user across the specified time range. I then need to select the distinct ip/user pairs from the main table where the IPs are in the nested select.
It seems messy that I have to do the same filtering (of time range and non-null user fields) twice though, is there a better way to do this?
I have The following tables:
DOCUMENT(iddoc,doctype,title,publishingdate,validTillDate)
USERS(iduser,fname,lname)
TRANSACTION(idtrans,iduser,iddoc,transdate,schedulereturndate)
I'm asked to Indicate for a given document whether it is available or not and if it is borrowed by whom, and when it should be returned. So how can i have these conditions in my query.
where my code will be like this:
if(d.validTillDate < SYSDATE){
SELECT u.iduser t.schedulereturndate
FROM USERS u, TRANSACTION t
WHERE u.iduser=t.iduser
}
SO I WANNA KNOW HOW CAN I CODE THIS IF
The query for a borrowed document would be something like this:
SELECT d.iddoc,u.iduser t.schedulereturndate,'Borrowed'
from document d,
,USERS u
,TRANSACTION t
WHERE u.iduser=t.iduser
and t.iddoc=d.iddoc
and d.validitydate<sysdate
union
SELECT d.iddoc,null,null,'Not borrowed'
from document d,
WHERE d.validitydate is null
or d.validitydate>=sysdate
Edit ) added a union for the not borrowed documents.
It's hard to understand your question for me.
If I guessed right, then:
SELECT d.iddoc, u.iduser, t.schedulereturndate
FROM
document d
LEFT JOIN transaction t ON
(d.iddoc=t.iddoc)
-- join by iddoc field
AND (SYSDATE BETWEEN t.transdate AND t.schedulereturndate)
-- I assume we need only current transactions, not past (or future?)
WHERE
(SYSDATE<=d.validTillDate)
-- I assume we need only documents whose validity date not passed yet
Assuming there are no active transactions for iddoc=1, one active transaction for iddoc=2 and two active transactions for iddoc=3, the result will look like:
iddoc | iduser | schedulereturndate
------+--------+-------------------
1 NULL NULL
2 534 2017-09-08
3 54334 2016-03-02
3 2433 2016-07-01
I stumbled upon a very strange behaviour while working on some T-SQL Code.
I am working on a SQL Server 2008 R2 SP2 (build nr.: 10.50.4000).
My question to you guys is if anybody has seen such a behaviour before or if anybody might be able to explain it to me.
So,
What's the situation?
We have a table, which looks like that:
product_number | id_object | position_in_product
---------------------------------------------------
1 | 101 | 1
1 | 102 | 1
1 | 103 | 1
2 | 201 | 1
2 | 202 | 1
2 | 203 | 1
Multiple object ids are allocated to one product number. The order should be defined by the position_in_product column. The funny part lies exactly in establishing that order.
Of course, after doing that the table should look like this:
product_number | id_object | position_in_product
---------------------------------------------------
1 | 101 | 1
1 | 102 | 2
1 | 103 | 3
2 | 201 | 1
2 | 202 | 2
2 | 203 | 3
What's going on?
To update the order column we create a cursor with the following statement:
DECLARE
table_runner CURSOR LOCAL FORWARD_ONLY FOR
SELECT id_object, product_number
FROM table
WHERE ident = #ident
ORDER BY product_number
By using this cursor and counting the rows with the same product_number we should be able to update the position_in_product column. (This has worked in every installation until now)
To move the cursor to the next row we use this:
FETCH next from table_runner
INTO #table_runner$id_object, #table_runner$product_number
The whole function looks like this:
OPEN table_runner
FETCH next from table_runner
INTO #table_runner$id_object, #table_runner$product_number
while ##FETCH_STATUS = 0
BEGIN
/* update_logic */
FETCH next from table_runner
INTO #table_runner$id_object, #table_runner$product_number
END
CLOSE table_runner
And that is the part, that does not work as expected.
The fetch will not give me the next row. I am getting always the same result row.
The while loop does never end, the fetch_status is always 0, but the result stays the same.
The Workaround
After searching the web for quite a while without any results i decided to try a more pragmatical way and put another FETCH statement in.
I know that the id_object variable is unique and has to change in every loop cycle,
so i remembered the last fetched id and put this under the loop fetch statement:
if #id_object_memory = #table_runner$id_object
begin
FETCH next from table_runner
INTO #table_runner$id_object, #table_runner$product_number
set #id_object_memory = #table_runner$id_object
end
else
set #id_object_memory = #table_runner$id_object
With that the loop works as expected, the column in question is updated as it should and the cursor will reach the end of the result set.
The big ?
Has anyone any explanation for that?
There are more cursor defined in the same procedure and they all work as expected.
I have absolute no clue how to explain this.
So, thanks for reading ;)
I can't help with the cursor issue, I've never seen this before, but should point out you don't need a cursor at all to do this update. You can simply use:
WITH CTE AS
( SELECT Product_Number,
ID_Object,
Position_in_Product,
RowNumber = ROW_NUMBER() OVER(PARTITION BY Product_Number
ORDER BY id_object)
FROM T
WHERE ident = #ident
)
UPDATE CTE
SET Position_in_Product = RowNumber;
Example on SQL Fiddle
You possibly don't even need to store this column, and can just use ROW_NUMBER in a query where the position_in_product is required.
Cursors are so 2000 ;-)
Seriously though; avoid cursors at all costs. Set-based operations > looping.
Just create a view with the following:
CREATE VIEW your_view
AS
SELECT product_number
, id_object
, Row_Number() OVER (PARTITION BY product_number ORDER BY id_object) As position_in_product
FROM your_table
;
No need to ever perform the update; the row numbers will "automatically" recalculate.
Yeah, so I'm filling out a requirements document for a new client project and they're asking for growth trends and performance expectations calculated from existing data within our database.
The best source of data for something like this would be our logs table as we pretty much log every single transaction that occurs within our application.
Now, here's the issue, I don't have a whole lot of experience with MySql when it comes to collating cumulative sum and running averages. I've thrown together the following query which kind of makes sense to me, but it just keeps locking up the command console. The thing takes forever to execute and there are only 80k records within the test sample.
So, given the following basic table structure:
id | action | date_created
1 | 'merp' | 2007-06-20 17:17:00
2 | 'foo' | 2007-06-21 09:54:48
3 | 'bar' | 2007-06-21 12:47:30
... thousands of records ...
3545 | 'stab' | 2007-07-05 11:28:36
How would I go about calculating the average number of records created for each given day of the week?
day_of_week | average_records_created
1 | 234
2 | 23
3 | 5
4 | 67
5 | 234
6 | 12
7 | 36
I have the following query which makes me want to murderdeathkill myself by casting my body down an elevator shaft... and onto some bullets:
SELECT
DISTINCT(DAYOFWEEK(DATE(t1.datetime_entry))) AS t1.day_of_week,
AVG((SELECT COUNT(*) FROM VMS_LOGS t2 WHERE DAYOFWEEK(DATE(t2.date_time_entry)) = t1.day_of_week)) AS average_records_created
FROM VMS_LOGS t1
GROUP BY t1.day_of_week;
Halps? Please, don't make me cut myself again. :'(
How far back do you need to go when sampling this information? This solution works as long as it's less than a year.
Because day of week and week number are constant for a record, create a companion table that has the ID, WeekNumber, and DayOfWeek. Whenever you want to run this statistic, just generate the "missing" records from your master table.
Then, your report can be something along the lines of:
select
DayOfWeek
, count(*)/count(distinct(WeekNumber)) as Average
from
MyCompanionTable
group by
DayOfWeek
Of course if the table is too large, then you can instead pre-summarize the data on a daily basis and just use that, and add in "today's" data from your master table when running the report.
I rewrote your query as:
SELECT x.day_of_week,
AVG(x.count) 'average_records_created'
FROM (SELECT DAYOFWEEK(t.datetime_entry) 'day_of_week',
COUNT(*) 'count'
FROM VMS_LOGS t
GROUP BY DAYOFWEEK(t.datetime_entry)) x
GROUP BY x.day_of_week
The reason why your query takes so long is because of your inner select, you are essentialy running 6,400,000,000 queries. With a query like this your best solution may be to develop a timed reporting system, where the user receives an email when the query is done and the report is constructed or the user logs in and checks the report after.
Even with the optimization written by OMG Ponies (bellow) you are still looking at around the same number of queries.
SELECT x.day_of_week,
AVG(x.count) 'average_records_created'
FROM (SELECT DAYOFWEEK(t.datetime_entry) 'day_of_week',
COUNT(*) 'count'
FROM VMS_LOGS t
GROUP BY DAYOFWEEK(t.datetime_entry)) x
GROUP BY x.day_of_week