I have 2 tables in SQL : Event and Swimstyle
The Event table has a value SwimstyleId which refers to Swimstyle.id
The Swimstyle table has 3 values : distance, relaycount and strokeid
Normally there would be somewhere between 30 and 50 rows in the table Swimstyle, which would hold all possible values (these are swimming distances like 50 (distance), 1 (relaycount), FREE (strokeid)).
However, due to a programming mistake the lookup for existing values didn't work and the importer of new results created a new swimstyle entry for each event added...
My Swimstyle table now consists of almost 200k rows, which ofcourse is performance wise not the best idea ;)
To fix this i want to go through all Events, get the swimstyle values that are attached, lookup the first existing row in Swimstyle that has the same distance, relaycount and strokeid values and update the Event.SwimstyleId with that value.
When this is all done i can delete all orphaned Swimstyle rows, leaving a table with only 30-50 rows.
I have been trying to make a query that does this, but not getting anywhere. Anyone to point me in the right direction ?
These 2 statements should fix the problem, if I've read it right. N.B. I haven't been able to try this out anywhere, and I've made a few assumptions about the table structure.
UPDATE event e
set swimstyle_id = (SELECT MIN(s_min.id)
FROM swimstyle s_min,swimstyle s_cur
WHERE s_min.distance = s_cur.distance
AND s_min.relaycount = s_cur.relaycount
AND s_min.strokeid = s_cur.strokeid
AND s_cur.id = e.swimstyle_id);
DELETE FROM swimstyle s
WHERE NOT EXISTS (SELECT 1
FROM event e
WHERE e.swimstyle_id = s.id);
Related
I am new to SQL and this is probably fairly easy but I cannot figure out how to make it work.
I am trying to fill a column with data pulled from another column in the same table.. in this case its a basketball database with box scores and I am trying to fill the column of opponent points (opp_pts) to match what their opponent for that game scored. each game is matched by season_id and game_id.
the whole table is about 700 rows with a few hundred games and about 40 teams but a sample is below... this is an example of one game where the score was 84-81 but I want to fill opp_team_stats with the appropriate score
season_id game_id team_id team_pts opp_team_pts
U2018 140 U2018_19 84.0
U2018 140 U2018_23 81.0
I have tried but have only been able to fill the whole column of opp_team_pts with with 84 which is obviously incorrect
UPDATE box_scores
SET opp_team_pts = (SELECT box_scores.team_pts
FROM box_scores
WHERE box_scores.season_id=box_scores.season_id AND box_scores.game_id=box_scores.game_id);
I'm sure the code is probably redundant but that is as far as I got, I understand why it filled the way it did but can't seem to figure out how to fix it... I may be on the wrong track but hopefully can get a bit of help
Assuming that each game has exactly two teams, you can use a correlated subquery:
UPDATE box_scores
SET opp_team_pts = (SELECT bs2.team_pts
FROM box_scores bs2
WHERE bs2.season_id = box_scores.season_id AND
bs2.game_id = box_scores.game_id AND
bs2.team_id <> box_scores.team_id
);
SQLite does not support FROM in the UPDATE statement.
My data is Microsoft Office 365 Mailbox audit logs.
I am working with 14 columns, incorporating names, timestamps, IP addresses, etc.
I have two tables, lets call them EXISTING and NEW. The column definition, order and count are identical in the two tables.
The data in Existing is (very close to!) Distinct.
The data in New is drawn from multiple overlapping searches and is not Distinct.
There are about millions of rows in Existing and hundreds of thousands in New.
Data is being written to New all the time, 24x7, with about 1 million rows a day being added.
~95% of the Rows in New are already present in Existing and are therefore unwanted duplicates. However the data in New contains has many gaps, there are many recent rows in Existing that are NOT present in New.
Want to select all rows from New that are not present in Existing, using Invoke-SqlCmd in Powershell.
Then want to delete all the processed rows from New so it doesn't grow uncontrollably.
My approach so far has been:
Add a [Processed] column to New.
Set [Processed] to 0 for all existing data for selection purposes. New rows that continue to be added will have [Processed] = NULL, and will be left alone.
SELECT DISTINCT all data with [Processed] = 0 from New and copy to a table temporary table called Staging. Find the oldest timestamp ([LastAccessed]) in this data. Then delete all rows from New with [Processed] = 0.
Copy all data from Existing with [LastAccessed] equal to or later to above time stamp across to STAGING, adding the column [Processed] = 1.
Now I want all data in Staging where [Processed] = 0 and there is No duplicate.
Nearest concept I can come up with is:
SELECT MailboxOwnerUPN
,MailboxResolvedOwnerName
,LastAccessed
,ClientIPAddress
,ClientInfoString
,MailboxGuid
,Operation
,OperationResult
,LogonType
,ExternalAccess
,InternalLogonType
,LogonUserDisplayName
,OriginatingServer
FROM dbo.Office365Staging
GROUP BY MailboxOwnerUPN
,MailboxResolvedOwnerName
,LastAccessed
,ClientIPAddress
,ClientInfoString
,MailboxGuid
,Operation
,OperationResult
,LogonType
,ExternalAccess
,InternalLogonType
,LogonUserDisplayName
,OriginatingServer
HAVING Count(1) = 1 and Processed = 0;
Which of course I can't do because [Processed] isn't part of the Select or Group. If I add the Column [Processed] then all lines are unique and there are no duplicates. Have tried a variety of joins and other techniques, without success thus far.
Initially without [Processed] = 0, the query worked, but returned unwanted unique lines from Existing. I only want unique lines from New.
Clearly due to the size of these structures efficiency is a consideration. This process will be happening regularly, every 15 minutes ideally.
Identifying these new lines then starts another process of Geo-IP, reputation, alerting, etc in PowerShell....
Thought the performance of the following would be horrid, but it is OK at ~27 seconds....
SELECT [MailboxOwnerUPN]
,[MailboxResolvedOwnerName]
,[LastAccessed]
,[ClientIPAddress]
,[ClientInfoString]
,[MailboxGuid]
,[Operation]
,[OperationResult]
,[LogonType]
,[ExternalAccess]
,[InternalLogonType]
,[LogonUserDisplayName]
,[OriginatingServer]
FROM dbo.New
WHERE [Processed] = 1 and
NOT EXISTS (Select * From dbo.Existing
Where New.LastAccessed = Existing.LastAccessed and
New.ClientIPAddress = Existing.ClientIPAddress and
New.ClientInfoString = Existing.ClientInfoString and
New.MailboxGuid = Existing.MailboxGuid)
GO
I have two tables that I use to merge. Something is wrong with my query and I cannot find any information as to why. What happens is; I will run the query and it works fine as my target table populates with the information I want. I then run the query again (immediately after) and it changes (38 rows affected) I run it again and it adds the rows back in - and again... rows are deleted. There are over 100 rows but only the same rows seem to be affected. Nothing changes in the source table.
I suspect that I am performing a DELETE on WHEN MATCHED - I can only find information for DELETE WHEN NOT MATCHED - but I don't know as this only seems to affect the same 38 rows every time.
The tables and (hopefully) information you need are here in my code:
MERGE QA.dbo.RMA AS target USING Touchstn02.dbo.RMA_Detail AS source
ON (target.RMANUM_52 = source.RMANUM_52)
WHEN MATCHED AND (source.STATUS_52 >3)
THEN
DELETE
WHEN NOT MATCHED AND (source.STATUS_52 < 4) AND (source.RECQTY_52 > 0)
THEN
INSERT (RMANUM_52, RMADATE_52, CUSTID_52, RETNUM_52, RETQTY_52,
SERIAL_52, REPAIR_52,
RPLACE_52, CREDIT_52, WRKNUM_52, KEYWRD_52, RECQTY_52,
RECDTE_52, STATUS_52,
REM1_52, REM2_52, REM3_52, Comment, CMPDTE)
VALUES (source.RMANUM_52, source.RMADTE_52, source.CUSTID_52,
source.RETNUM_52,
source.RETQTY_52, source.SERIAL_52, source.REPAIR_52,
source.RPLACE_52, source.CREDIT_52,
source.WRKNUM_52, source.KEYWRD_52, source.RECQTY_52,
source.RECDTE_52,
source.STATUS_52, source.REM1_52, source.REM2_52,
source.REM3_52, source.REM4_52,
source.CMPDTE_52);
As always, I appreciate any help/input
I think I have got it!
The 38 rows being deleted have at least two STATUS_52 values in the source table for these RMANUM_52 key values.
For a given RMANUM_52 key value:
One of the STATUS_52 values will be < 4 and have a Qty > 0.
The other row Status value will be > 3.
So, on the first run through, a row is inserted with a STATUS_52 value < 4.
Then on next run through the DELETE is triggered because the matching is by RMANUM_52 and the STATUS_52 in the SOURCE table. The tricky thing here is that we are looking back for status values into all rows in the source table (including rows that were not part of the first insert). And there is a different row in the source table that matches RMANUM_52 with a STATUS_52 > 3. So DELETE logic matches and delete happens.
I can't tell from your example exactly what your requirements are so I'm not going to hazard guessing a fix.
By user sorting I mean that as a user on the site you see a bunch of items, and you are supposed to be able to reorder them (I'm using jQuery UI).
The user only sees 20 items on each page, but the total number of items can be thousands.
I assume I need to add another column in the table for custom ordering.
If the user sees items from 41-60, and and he sorts them like:
41 = 2nd
42 = 1st
43 = fifth
etc.
I can't just set the ordering column to 2,1,5.
I would need to go through the entire table and change each record.
Is there any way to avoid this and somehow sort only the current selection?
Add another column to store the custom order, just as you suggested yourself. You can avoid the problem of having to reassign all rows' values by using a REAL-typed column: For new rows, you still use an increasing integer sequence for the column's value. But if a user reorders a row, the decimal data type will allow you to use the formula ½ (previous row's value + next row's value) to update the column of the single row that was moved. You
have got two special cases to take care of, namely if a user moves a row to the very beginning or end of the list. In that case, just use min - 1 rsp. max + 1.
This approach is the simplest I can think of, but it also has some downsides. First, it has a theoretical limitation due to the datatype having only double-precision. After a finite number of reorderings, the values are too close together for their average to be a different number. But that's really only a theoretical limit you should never reach in practical applications. Also, the column will use 8 bytes of memory per row, which probably is much more than you actually need.
If your application might scale to the point where those 8 bytes matter or where you might have users that overeagerly reorder rows, you should instead stick to the INTEGER column and use multiples of a constant number as the default values (e.g. 100, 200, 300, ..). You still use the update formula from above, but whenever two values become too close together, you reassign all values. By tweaking the constant multiplier to the average table size / user behaviour, you can control how often this expensive operation has to be done.
There are a couple ways I can think of to do this. One would be to use a SELECT FROM SELECT style statement. As in something like this.
SELECT *
FROM (
SELECT col1, col2, col3...
FROM ...
WHERE ...
LIMIT n,m
) as Table_A
ORDER BY ...
The second option would be to use temp tables such as:
INSERT INTO temp_table_A SELECT ... FROM ... WHERE ... LIMIT n,m;
SELECT * FROM temp_table_A ORDER BY ...
Another option to look at would be jQuery plugin like DataTables
one way i can think of is:
Add a new column (if feasible) or create a new table for holding the order of the items.
On any page you will show around 20 items based on the initial ordering.
Using the jquery's Draggable you can send updates to this table
I think you can do this with an extra column.
First, you could prepopulate this new column with a default sort order and then allow the user to interactively modify it with the drag and drop of jquery-ui.
Lets say this user has 100 items in the table. You set the values in the order column to [1,2,3,...,99,100]. I suggest that you run a script on the original table to set all items to a default sort order.
Now going back to your example where the user is presented with items 41-60: the initial presentation in their browser would rank those at orders [41,42,43,...,59,60]. You might also need to save the lowest order that appears in this subset, in this case 41. Or better yet, save the entire array of rankings and restore the exact same numbers in the new order. This covers the case where they select a set of records that are not already consecutively ordered, perhaps because they belong to someone else.
To demonstrate what I mean: when they reorder them in the page, your javascript reassigns those same numbers back to the subset in the new order. Like this:
item A : 41
item B : 45
item C : 46
item D : 47
item E : 51
item F : 54
item G : 57
then the user changes them to this order, but you reassign the numbers like this:
item D : 41
item F : 45
item E : 46
item A : 47
item C : 51
item B : 54
item G : 57
This should also work if the subset is consecutive.
I'd like to consult one thing. I have table in DB. It has 2 columns and looks like this:
Name...bilance
Jane...+3
Jane...-5
Jane...0
Jane...-8
Jane...-2
Paul...-1
Paul...2
Paul....9
Paul...1
...
I have to walk through this table and if I find record with different "name" (than was on previous row) I process all rows with the previous "name". (If I step on the first Paul row I process all Jane rows)
The processing goes like this:
Now I work only with Jane records and walk through them one by one. On each record I stop and compare it with all previous Jane rows one by one.
The task is to sumarize "bilance" column (in the scope of actual person) if they have different signs
Summary:
I loop through this table in 3 levels paralelly (nested loops)
1st level = search for changes of "name" column
2nd level = if change was found, get all rows with previous "name" and walk through them
3rd level = on each row stop and walk through all previous rows with current "name"
Can this be solved only using CURSOR and FETCHING, or is there some smoother solution?
My real table has 30 000 rows and 1500 people and If I do the logic in PHP, it takes long minutes and than timeouts. So I would like to rewrite it to MS SQL 2000 (no other DB is allowed). Are cursors fast solution or is it better to use something else?
Thank you for your opinions.
UPDATE:
There are lots of questions about my "summarization". Problem is a little bit more difficult than I explained. I simplified it just to describe my algorithm.
Each row of my table contains much more columns. The most important is month. That's why there are more rows for each person. Each is for different month.
"Bilances" are "working overtimes" and "arrear hours" of workers. And I need to sumarize + and - bilances to neutralize them using values from previous months. I want to have as many zeroes as possible. All the table must stay as it is, just bilances must be changed to zeroes.
Example:
Row (Jane -5) will be summarized with row (Jane +3). Instead of 3 I will get 0 and instead of -5 I will get -2. Because I used this -5 to reduce +3.
Next row (Jane 0) won't be affected
Next row (Jane -8) can not be used, because all previous bilances are negative
etc.
You can sum all the values per name using a single SQL statement:
select
name,
sum(bilance) as bilance_sum
from
my_table
group by
name
order by
name
On the face of it, it sounds like this should do what you want:
select Name, sum(bilance)
from table
group by Name
order by Name
If not, you might need to elaborate on how the Names are sorted and what you mean by "summarize".
I'm not sure what you mean by this line... "The task is to sumarize "bilance" column (in the scope of actual person) if they have different signs".
But, it may be possible to use a group by query to get a lot of what you need.
select name, case when bilance < 0 then 'negative' when bilance >= 0 then 'positive', count(*)
from table
group by name, bilance
That might not be perfect syntax for the case statement, but it should get you really close.