Updating one column with the value from another, based on another common column - sql

I have a large (3 million rows) table of transactional data, which can be simplified thus:
ID File DOB
--------------------------
1 File1 01/01/1900
2 File1 03/10/1978
3 File1 03/10/1978
4 File2 15/07/1997
5 File2 01/01/1900
6 File2 15/07/1997
In some cases there is no date. I would like to update the date field so it is the same as the other records for a file which has a date. So record 1's DOB would become 03/10/1978, because records 2 and 3 for that file have that date. Likewise record 5 would become 15/07/1997.
What is the most efficient way to achieve this?
Thanks.

Supposing your table is called "Files", then this will work:
UPDATE f1 SET f1.DOB=f2.MaxDOB
FROM files f1
JOIN (SELECT File, MAX(DOB) AS MaxDOB FROM files GROUP BY File) f2 ON
f2.File=f1.File;
As far as performance is concerned, it probably won't get much more efficient than this, but you do need to insure there is an index on the (File, DOB) column set. 3 million records is a lot and this query will also update records that do not need it, but filtering those out would require a much more complex join. Anyway... you better check the query plan.

I dont know about most efficient way, but i can think of one solution...create a temp table with following query. Though i am not sure about exact keywords of sqlserver 2008, but this might work or you may need to change key word like to_date and its format.
create table new_table as (
select file,min(DOB) as default_date, max(DOB) as fixed_date from three_million_table group by file having min(dob)= to_Date('01/01/1900','dd/mm/yyyy') )
so your new table will have
column headers: file, default_date,fixed_date
values: File1, 01/01/1900, 03/10/1978
Now it may not be wise to run update on three_million_table, but if you think it is ok then:
update T1
SET T1.DOB = T2.fixed_date
FROM three_million_table T1
INNER JOIN new_table T2
ON T1.file = T2.file

Hope this help... having 3 million records will surely take it toll to update the table by scanning each record
;WITH testCTE ([name],dobir,number)
AS (SELECT [File],DOB, ROW_NUMBER() OVER (PARTITION BY [FILE],DOB
ORDER BY ( SELECT 0)) RowNumber
FROM test)
UPDATE TEST
SET DOB = tcte.dobir
FROM testCTE as tcte
LEFT JOIN TEST t on tcte.name = t.[FILE]
WHERE tcte.number > 1 and [FILE] = tcte.[name]
sql fiddle

Related

multiple sets of conditions

I'd like to run a query on a table that matches any of the following sets of conditions:
SELECT
id,
time
FROM
TABLE
WHERE
<condition1 is True> OR,
<condition2 is True> OR,
<condition3 is True> OR,
...
Each condition might look like:
id = 'id1' AND t > 20 AND t < 40
The values from each WHERE condition (id, 20, 40 above) are rows in a pandas dataframe - that is 20k rows long. I see two options that would technically work:
make 20k queries to the database, one for each condition, and concat the result
generate a (very long) query as above and submit
My question: what would be an idiomatic/performant way to accomplish this?
I suspect neither of the above are appropriate approaches and this problem is somewhat difficult to google.
I think it would be better to create a temporary table with columns id, t1, and t2 and put your 20k rows in there. Then just join to this temporary table:
SELECT DISTINCT TABLE.id, time
FROM TABLE
JOIN TEMP_TABLE T2 ON
TABLE.ID = T2.ID AND TABLE.T > T1 AND TABLE.T < T2;

SQL Combining two different tables

(P.S. I am still learning SQL and you can consider me a newbie)
I have 2 sample tables as follows:
Table 1
|Profile_ID| |Img_Path|
Table 2
|Profile_ID| |UName| |Default_Title|
My scenario is, from the 2nd table, i need to fetch all the records that contain a certain word, for which i have the following query :
Select Profile_Id,UName from
Table2 Where
Contains(Default_Title, 'Test')
ORDER BY Profile_Id
OFFSET 5 ROWS
FETCH NEXT 20 ROWS ONLY
(Note that i am setting the OFFSET due to requirements.)
Now, the scenario is, as soon as i retrieve 1 record from the 2nd table, i need to fetch the record from the 1st table based on the Profile_Id.
So, i need to return the following 2 results in one single statement :
|Profile_Id| |Img_Path|
|Profile_Id| |UName|
And i need to return the results in side-by-side columns, like :
|Profile_Id| |Img_Path| |UName|
(Note i had to merge 2 Profile_Id columns into one as they both contain same data)
I am still learning SQL and i am learning about Union, Join etc. but i am a bit confused as to which way to go.
You can use join:
select t1.*, t2.UName
from table1 t1 join
(select Profile_Id, UName
from Table2
where Contains(Default_Title, 'Test')
order by Profile_Id
offset 5 rows fetch next 20 rows only
) t2
on t2.profile_id = t1.profile_id
SELECT a.Profile_Id, a.Img_Path, b.UName
FROM table1 a INNER JOIN table2 b ON a.Profile_Id=b.Profile_Id
WHERE b.Default_Title = 'Test'

SQL: Can I use CHARINDEX to return the best match not just the first match?

http://sqlfiddle.com/#!6/5ac78/1
Not sure if that fiddle will work. I want to return code 2 from the join on CHARINDEX.
As another example, I have a Description table (dt) that looks like this:
ID Description Code
158 INTEREST 199
159 INTEREST PAID 383
160 INTEREST PAYABLE ON ACCOUNT 384
And a master table (mt) with entries like this:
ID Narrative Code
1 INTEREST PAID NULL
I need to set the Code on the master table to 383. When I do an INSERT based on a JOIN using CHARINDEX(dt.Description, mt.Description) > 0, it sets the mt.Code to 199 every time.
How I can update the master table to pull the Code from Description table with the best match, not just the first matching instance?
Thanks!
You could just use a simple JOIN to find a match with a LEFT JOIN to eliminate all but the longest match;
UPDATE t1
SET t1.codeA = t2_1.codeB
FROM table1 t1
JOIN table2 t2_1
ON CHARINDEX(t2_1.colB, t1.colA) > 0
LEFT JOIN table2 t2_2
ON CHARINDEX(t2_2.colB, t1.colA) > 0
AND t2_1.codeB <> t2_2.codeB
AND LEN(t2_2.colB) > LEN(t2_1.colB)
WHERE t2_2.colB IS NULL;
An SQLfiddle to test with.
Note that it's (probably) not possible to make a CHARINDEX query like this one (or your original query) use indexes, so the query may be very slow for large amounts of data.
Also, always test first before running SQL updates from random people on the Internet on your production data :)
This is awkward, but it seems to work:
update table1
set codeA = (
select max(codeB)
from table2
where charindex(colB, colA) > 0
)
where exists (
select 1
from table2
where charindex(colB, colA) > 0
);
Revised fiddle is here: http://sqlfiddle.com/#!6/5ac78/12
The problem is knowing which is the "best" value to return. I have assumed the row with the maximum ID is the one you want.

comparing 2 consecutive rows in a recordset

Currently,I have this objective to meet. I need to query the database for certain results. After done so, I will need to compare the records:
For example: the query return me with 10 rows of records, I then need to compare: row 1 with 2, row 2 with 3, row 3 with 4 ... row 9 with 10.
The final result that I wish to have is 10 or less than 10 rows of records.
I have one approach currently. I do this within a function, hand have the variables call "previous" and "current". In a loop I will always compare previous and current which I populate through the record set using a cursor.
After I got each row of filtered result, I will then input it into a physical temporary table.
After all the results are in this temporary table. I'll do a query on this table and insert the result into a cursor and then returning the cursor.
The problem is: how can I not use a temporary table. I've search through online about using nested tables, but somehow I just could not get it working.
How to replace the temp table with something else? Or is there other approach that I can use to compare the row columns with other rows.
EDIT
So sorry, maybe I am not clear with my question. Here is a sample of the result that I am trying to achieve.
TABLE X
Column A B C D
100 300 99 T1
100 300 98 T2
100 300 97 T3
100 100 97 T4
100 300 97 T5
101 11 11 T6
ColumnA is the primary key of the table. ColumnA has duplicates because table X is an audit table that keep tracks of all changes.column D acts as the timestamp for that record.
For my query, I am only interested in changes in column A,B and D. After the query I would like to get the result as below:
Column A B D
100 300 T1
100 100 T4
100 300 T5
101 11 T6
I think Analytics might do what you want :
select col1, col2, last(col1) over (order by col1, col2) LASTROWVALUE
from table1
this way, LASTROWVALUE will contain de value of col1 for the last row, which you can directly compare to the col1 of the current row.
Look this URL for more info : http://www.orafaq.com/node/55
SELECT ROW_NUMBER() OVER(ORDER BY <Some column name>) rn,
Column1, <Some column name>, CompareColumn,
LAG(CompareColumn) OVER(ORDER BY <Some column name>) PreviousValue,
LEAD(CompareColumn) OVER(ORDER BY <Some column name>) NextValue,
case
when CompareColumn != LEAD(CompareColumn) OVER(ORDER BY <Some column name>) then CompareColumn||'-->'||LEAD(CompareColumn) OVER(ORDER BY <Some column name>)
when CompareColumn = LAG(CompareColumn) OVER(ORDER BY <Some column name>) then 'NO CHANGE'
else 'false'
end
FROM <table name>
You can use this logic in a loop to change behaviour.
Hi It's not very clear what exactly yuo want to accomplish. But maybe you can fetch the results of the original query in a PLSQL collection and use that to do your comparison.
What exactly are you doing the row comparison for? Are you looking to eliminate duplicates, or are you transforming the data into another form and then returning that?
To eliminate duplicates, look to use GROUP BY or DISTINCT functionality in your SELECT.
If you are iterating over the initial data and transforming it in some way then it is hard to do it without using a temporary table - but what exactly is your problem with the temp table? If you are concerned about the performance of a cursor then maybe you could do one outer SELECT that compares the results of two inner SELECTs - but the trick is that the second SELECT is offset by one row, so you achieve the requirement of comparing row 1 against row2, etc.
I think you are complicating things with the temp table.
It can be made using a cursor and 2 temporary variables.
Here is the pseudo code:
declare
v_temp_a%xyz;
v_temp_b%xyz;
i number;
cursor my_cursor is select xyz from xyz;
begin
i := 1;
for my_row in my_cursor loop
if (i = 1)
v_temp_a := my_row;
else
v_temp_b := v_temp_a;
v_temp_a := my_row;
/* at this point v_temp_b has the previous row and v_temp_a has the currunt row
compare them and put whatever logic you want */
end if
i := i + 1;
end loop
end

How to compare string data to table data in SQL Server - I need to know if a value in a string doesn't exist in a column

I have two tables, one an import table, the other a FK constraint on the table the import table will eventually be put into. In the import table a user can provide a list of semicolon separated values that correspond to values in the 2nd table.
So we're looking at something like this:
TABLE 1
ID | Column1
1 | A; B; C; D
TABLE 2
ID | Column2
1 | A
2 | B
3 | D
4 | E
The requirement is:
Rows in TABLE 1 with a value not in TABLE 2 (C in our example) should be marked as invalid for manual cleanup by the user. Rows where all values are valid are handled by another script that already works.
In production we'll be dealing with 6 columns that need to be checked and imports of AT LEAST 100k rows at a time. As a result I'd like to do all the work in the DB, not in another app.
BTW, it's SQL2008.
I'm stuck, anyone have any ideas. Thanks!
Seems to me you could pass ID & Column1 values from Table1 to a Table-Valued function (or a temp table in-line) which would parse the ;-delimited list, returning individual values per record.
Here are a couple options:
T-SQL: Parse a delimited string
Quick T-Sql to parse a delimited string
The result (ID, value) from the function could be used to compare (unmatched query) against values in Table 2.
SELECT tmp.ID
FROM tmp
LEFT JOIN Table2 ON Table2.id = tmp.ID
WHERE Table2.id is null
The ID results of the comparison would then be used to flag records in Table 1.
Perhaps inserting those composite values into 'TABLE 1' may have seemed like the most convenient solution at one time. However, unless your users are using SQL Server Management Studio or something similar to enter the values directly into the table then I assume there must be a software layer between the UI and the database. If so, you're going to save yourself a lot headaches both now and in the long run by investing a little time in altering your code to split the semi-colon delimited inputs into discrete values before inserting them into the database. This will result in 'TABLE 1' looking something like this
TABLE 1
ID | Column1
1 | A
1 | B
1 | C
1 | D
It's then trivial to write the SQL to find those IDs which are invalid.
If it is possible, try putting the values in separate rows when importing (instead of storing it as ; separated).
This might help.
Here is an easy and straightforward solution for the IDs of the invalid rows, despite its lack of performance because of string manipulations.
select T1.ID
from [TABLE 1] T1
left join [TABLE 2] T2
on ('; ' + T1.COLUMN1 + '; ') like ('%; ' + T2.COLUMN2 + '; %')
where T1.COLUMN1 is not null
group by T1.ID
having count(*) < len(T1.COLUMN1) - len(replace(T1.COLUMN1, ';', '')) + 1
There are two assumptions:
The semicolon-separated list does not contain duplicates
TABLE 2 does not contain duplicates in COLUMN2.
The second assumption can easily be fixed by using (select distinct COLUMN2 from [TABLE 2]) rather than [TABLE 2].