I have the following code:
UPDATE tableOne
SET columnOne = CASE
WHEN tableOne.columnTwo LIKE '%-02-%' OR tableOne.columnTwo LIKE '%-03-%' OR
tableOne.columnTwo LIKE '%-04-%' OR
tableOne.columnTwo LIKE '%-05-%' OR
tableOne.columnTwo LIKE '%-06-%' OR
tableOne.columnTwo LIKE '%-07-%' OR tableOne.columnTwo LIKE '%-08-%' OR
tableOne.columnTwo LIKE '%-09-%'
THEN tableTwo.columnOne :: text
ELSE tableOne.columnOne
END
FROM tableTwo
WHERE tableTwo.tableId = tableOne.tableId
I have two tables. tableOne consists of 100 millions of rows (and 40 columns) and tableTwo consists of 90 millions of rows. Above query is already in progress for more than 2 days. I am not sure it will ever finish. Is there a way to optimize the query?
If helpful LIKE does the following:
Checks if the string (e.g. 2018-06-30 08:20:17) has listed month. If yes, pick value from tableTwo (and CAST it to type text), else keep self value (already type text).
Move the case condition to the where clause:
UPDATE tableOne
SET columnOne = tableTwo.columnOne::text
FROM tableTwo
WHERE tableTwo.tableId = tableOne.tableId AND
tableOne.columnTwo ~ '-0[2-9]-' and
tableOne.columnOne is distinct from tableTwo.columnOne::text;
Regular expressions are not really that much faster than a bunch of likes. The win here is in not updating rows that don't need to be updated. If the format of tableOne.columnTwo is a known format, you could use substring operations instead.
What about only updating if month is between 02 and 09
UPDATE tableOne
SET columnOne = tableTwo.columnOne :: text
FROM tableTwo
WHERE tableTwo.tableId = tableOne.tableId
AND SUBSTRING(tableOne.columnTwo FROM 6 FOR 2) BETWEEN '02' AND '09'
Related
The query structure: Helper-select in "with" clause - selects most recent entry using 'top 1 transaction_date'. Then does many joins. It takes too much time to run - what am I doing wrong?
CREATE VIEW [IRWSMCMaterialization].[FactInventoryItemOnHandDailyView] AS
WITH TempTBLFactIvnItmDaily AS (
SELECT TOP 20
ITEM_NUMBER AS [InventoryItemNumber]
,CAST(FORMAT(TRANSACTION_DATE, 'yyyyMMdd') AS INT) AS [DateKey]
,BRANCH_PLANT_FHK AS [BranchPlantKey]
,BRANCH_PLANT_CODE AS [BranchPlantCode]
,CAST(QUANTITY_ON_HAND AS BIGINT) AS [QuantityOnHand]
,TRANSACTION_DATE AS [Date]
,WAREHOUSE_LOCATION_FHK AS [WarehouseLocationKey]
,WAREHOUSE_LOCATION_CODE AS [WarehouseLocationCode]
,WAREHOUSE_LOT_NUMBER_CODE AS [WarehouseLotNumber]
,WAREHOUSE_LOT_NUMBER_FHK AS [WarehouseLotNumberKey]
,UNIT_OF_MEASURE AS [UnitOfMeasureName]
,UNIT_OF_MEASURE_PHK AS [UnitOfMeasureKey]
FROM dbo.RS_INV_ITEM_ON_HAND
-- below is where clause, choose only most recent entry
WHERE TRANSACTION_DATE = (SELECT TOP 1 TRANSACTION_DATE FROM dbo.RS_INV_ITEM_ON_HAND ORDER BY TRANSACTION_DATE DESC)
)
SELECT [InventoryItemNumber],
[DateKey],
[Date],
[BranchPlantCode] AS [BP],
[WarehouseLocationCode] AS [Location],
[QuantityOnHand],
[UnitOfMeasureName] AS [UoM],
CASE [WarehouseLotNumber]
WHEN 'Not Assigned' THEN NULL
ELSE [WarehouseLotNumber]
END
AS [Lot]
FROM TempTBLFactIvnItmDaily iioh
JOIN DWH.DimBranchPlant bp ON iioh.BranchPlantKey = bp.BRANCH_PLANT_PHK
JOIN DWH.DimWarehouseLocation wloc ON iioh.WarehouseLocationKey = wloc.WAREHOUSE_LOCATION_PHK
JOIN DWH.DimWarehouseLotNumber wlot ON iioh.WarehouseLotNumberKey = wlot.WarehouseLotNumber_PHK
JOIN DWH.DimUnitOfMeasure uom ON CAST(iioh.UnitOfMeasureKey AS VARCHAR(100)) = uom.UNIT_OF_MEASURE_PHK
where bp.BRANCH_PLANT_CODE = '96100'
AND iioh.QuantityOnHand > 0
AND (wloc.WAREHOUSE_LOCATION_CODE like '6000W01%' OR wloc.WAREHOUSE_LOCATION_CODE like 'BL%')
GO
There are a lot of things that does not seems good. First of all, your base query must be a lot simpler. Something like this:
SELECT iioh.ITEM_NUMBER AS [InventoryItemNumber],
CAST(FORMAT(iioh.TRANSACTION_DATE, 'yyyyMMdd') AS INT) AS [DateKey],
iioh.TRANSACTION_DATE AS [Date],
iioh.BRANCH_PLANT_CODE AS [BP],
iioh.WAREHOUSE_LOCATION_CODE AS [Location],
CAST(iioh.QUANTITY_ON_HAND AS BIGINT) AS [QuantityOnHand],
iioh.UNIT_OF_MEASURE AS [UoM],
NULLIF(iioh.WAREHOUSE_LOT_NUMBER_CODE, 'Not Assigned') AS [Lot]
FROM dbo.RS_INV_ITEM_ON_HAND iioh
JOIN DWH.DimBranchPlant bp
ON iioh.BranchPlantKey = bp.BRANCH_PLANT_PHK
JOIN DWH.DimWarehouseLocation wloc
ON iioh.WarehouseLocationKey = wloc.WAREHOUSE_LOCATION_PHK
JOIN DWH.DimUnitOfMeasure uom
ON CAST(iioh.UnitOfMeasureKey AS VARCHAR(100)) = uom.UNIT_OF_MEASURE_PHK
where bp.BRANCH_PLANT_CODE = '96100'
AND iioh.QuantityOnHand > 0
AND (wloc.WAREHOUSE_LOCATION_CODE like '6000W01%' OR wloc.WAREHOUSE_LOCATION_CODE like 'BL%')
AND iioh.TRANSACTION_DATE = #TRANSACTION_DATE
For example, you are joining the DWH.DimWarehouseLotNumber but you are not extracting columns - do you really need it? Also, there are other columns which are not returned by the view - why to query them?
From, there you are first filtering by date and then y other fields, so your first TOP 20 records may be filtered by the next conditions - is this a behavior you want?
Also, do you really want this cast?
ON CAST(iioh.UnitOfMeasureKey AS VARCHAR(100)) = uom.UNIT_OF_MEASURE_PHK
It's better to use CONVERT, not FORMAT in performance aspect. Also, why not saving/materializing the TRANSACTION_DATE as INT (for example using a persisted computed column or just on CRUD) instead of calculating this value on each read?
Filtering by location code using LIKE clause can heart the performance, too. Why not adding a new column WareHouseLocationCodeType and set a same value for all locations satisfying this condition:
(wloc.WAREHOUSE_LOCATION_CODE like '6000W01%' OR wloc.WAREHOUSE_LOCATION_CODE like 'BL%')
Then you can filter by this column in the view since this is very important for you. Also, you can create filter index on this column to increase the performance, more.
Also, you may want to create a inline-function instead a view and pass the date as parameter:
CREATE OR ALTER FUNCTION [IRWSMCMaterialization].[FactInventoryItemOnHandDailyView]
(
#TRANSACTION_DATE datetime
)
RETURNS TABLE
AS
RETURN
(
SELECT iioh.ITEM_NUMBER AS [InventoryItemNumber],
CAST(FORMAT(iioh.TRANSACTION_DATE, 'yyyyMMdd') AS INT) AS [DateKey],
iioh.TRANSACTION_DATE AS [Date],
iioh.BRANCH_PLANT_CODE AS [BP],
iioh.WAREHOUSE_LOCATION_CODE AS [Location],
CAST(iioh.QUANTITY_ON_HAND AS BIGINT) AS [QuantityOnHand],
iioh.UNIT_OF_MEASURE AS [UoM],
NULLIF(iioh.WAREHOUSE_LOT_NUMBER_CODE, 'Not Assigned') AS [Lot]
,iioh.TRANSACTION_DATE
FROM dbo.RS_INV_ITEM_ON_HAND iioh
JOIN DWH.DimBranchPlant bp
ON iioh.BranchPlantKey = bp.BRANCH_PLANT_PHK
JOIN DWH.DimWarehouseLocation wloc
ON iioh.WarehouseLocationKey = wloc.WAREHOUSE_LOCATION_PHK
JOIN DWH.DimUnitOfMeasure uom
ON CAST(iioh.UnitOfMeasureKey AS VARCHAR(100)) = uom.UNIT_OF_MEASURE_PHK
where bp.BRANCH_PLANT_CODE = '96100'
AND iioh.QuantityOnHand > 0
AND (wloc.WAREHOUSE_LOCATION_CODE like '6000W01%' OR wloc.WAREHOUSE_LOCATION_CODE like 'BL%')
AND iioh.TRANSACTION_DATE = #TRANSACTION_DATE
)
Then call it like this:
SELECT TOP 20 *
FROM [IRWSMCMaterialization].[FactInventoryItemOnHandDailyView] ('2020-12-04')
ORDER BY #TRANSACTION_DATE DESC
The query optimization is science today. If you want to find bottlenecks in your query you can follow some of these steps:
As the first step, enable statistics with these commands:
SET STATISTICS TIME ON;
SET STATISTICS IO ON;
Once you execute these commands in some query windows in the same window execute your query. When your query is executed switch to the Messages tab and you will see a lot of useful information like TIME execution, parse and compile-time and maybe the most interesting I/O reads.
As the second step, try to understand which table has a lot of reads, for example if you are expecting 10 rows from the query, but in some tables you have 10k or 100k logical reads something is wrong. That means for the 10 rows query execution from one table reads 10k pages. Obviously you are missing some index on this table, try to find which index you need.
If you are having some static values in where clause like the following one, then think about Filtered Index:
bp.BRANCH_PLANT_CODE = '96100' AND iioh.QuantityOnHand > 0
Not always, but in some cases conversion can break your indexes if you are casting them or using some other function in where clause like the following one, even you have an index on this column query optimizer will not use it in query execution:
CAST(iioh.UnitOfMeasureKey AS VARCHAR(100))
The last one, if you have OR logical operator in your query try to execute one by one part of your OR logical operator separately see to performance. This logical operator can really kill your query, and this is one example:
AND (wloc.WAREHOUSE_LOCATION_CODE like '6000W01%' OR wloc.WAREHOUSE_LOCATION_CODE like 'BL%')
Once, you determine here that you don't have any issues you can go more further.
I have a column where I have 0 or 1. I like to do the following set up:
If 0 than put / use the Region_table (here I have regions like EMEA, AP,LA with finished goods only) and when it 1 then put / use the Plant_table (here I have plants with non-finished goods) data's.
I tried to write it in 2 different statements but it is not good:
,Case
when [FG_NFG_Selektion] = '0' Then 'AC_region'
End as 'AC_region'
,Case
when [FG_NFG_Selektion] = '1' Then 'AC_plant'
End as 'AC_plant'
I'm not 100% clear on what you're looking for, but if you want to get data from different tables based on the value in the [FG_NFG_Selektion] field, you can do something like this:
SELECT
CASE
WHEN [FG_NFG_Selektion] = '0' THEN r.some_col -- If 0, use value from "region" table
WHEN [FG_NFG_Selektion] = '1' THEN p.some_col -- If 1, use value from "plant" table
END AS new_field
FROM MyTable t
LEFT JOIN AC_region r ON t.pk_col = r.pk_col -- get data from "AC_region" table
LEFT JOIN AC_plant p ON t.pk_col = p.pk_col -- get data from "AC_plant" table
;
If [FG_NFG_Selektion] is a numeric field, then you should remove the single quotes: [FG_NFG_Selektion] = 0.
I would strongly recommend putting the conditions in the ON clauses:
SELECT COALESCE(r.some_col, p.some_col) as som_col
FROM t LEFT JOIN
AC_region r
ON t.pk_col = r.pk_col AND
t.FG_NFG_Selektion = '0' LEFT JOIN
AC_plant p
ON t.pk_col = p.pk_col AND
t.FG_NFG_Selektion = '1';
Why do I recommend this? First, this works correctly if there are multiple matches in either table. That is probably not an issue in this case, but it could be in others. You don't want to figure out where extra rows come from.
Second, putting the conditions in the ON clause allows the optimizer/execution engine to take advantage of them. For instance, it is more likely to use FG_NFG_Selektion in an index.
I am trying to make a filter to find all the stuffs made of various substances.
In the database, there is:
a stuffs table
a substances table
a stuffs_substances join table.
Now, I want to find only all the stuffs that are made of gold AND silver (not all the stuffs that contain gold and all stuffs that contain silver).
One last thing: the end user can type only a part of the substance name in the filter form field. For example he will type silv and it will show up all the stuffs made of silver.
So I made this query (not working):
select "stuffs".*
from "stuffs"
inner join "stuffs_substances" as "substances_join"
on "substances_join"."stuff_id" = "stuffs"."id"
inner join "substances"
on "substances_join"."substance_id" = "substances"."id"
where ("substances"."name" like '%silv%')
and ("substances"."name" like '%gold%')
It returns an empty array. What am I doing wrong here?
Basically, you just want aggregation:
select st.*
from "stuffs" st join
"stuffs_substances" ss join
on ss."stuff_id" = st."id" join
"substances" s
on ss."substance_id" = s."id"
where s."name" like '%silv%' or
s."name" like '%gold%'
group by st.id
having count(*) filter (where s."name" like '%silv%') > 0 and
count(*) filter (where s."name" like '%gold%') > 0;
Note that this works, assuming that stuff.id is the primary key in stuffs.
I don't understand your fascination with double quotes and long table aliases. To me, those things just make the query harder to write and to read.
if you want to do search by part of word then do action to re-run query each time user write a letter of word , and the filter part in query in case of oracle sql way
in case to search with start part only
where name like :what_user_write || '%'
or in case any part of word
where name like '%' || :what_user_write || '%'
you can also use CAB, to be sure user can search by capital or small whatever
ok, you ask about join, I test this in mysql , it work find to get stuff made from gold and silver or silver and so on, hope this help
select sf.id, ss.code, sf.stuff_name from stuffs sf, stuffs_substances ss , substances s
where sf.id = ss.id
and s.code = ss.code
and s.sub_name like 'gol%'
I have the following sql data:
ID Company Name Customer Address 1 City State Zip Date
0108500 AAA Test Mish~Sara Newa Claims Chtiana CO 123 06FE0046
0108500 AAA.Test Mish~Sara Newa Claims Chtiana CO 123 06FE0046
1802600 AAA Test Company Ban, Adj.~Gorge PO Box 83 MouLaurel CA 153 09JS0025
1210600 AAA Test Company Biwel~Brce 97kehst ve Jacn CA 153 04JS0190
AAA Test, AAA.Test and AAA Test Company are considered as one company.
Since their data is messy I'm thinking either to do this:
Is there a way to search all the records in the DB wherein it will search the company name with almost the same name then re-name it to the longest name?
In this case, the AAA Test and AAA.Test will be AAA Test Company.
OR Is there a way to filter only record with company name that are almost the same then they can have option to change it?
If there's no way to do it via sql query, what are your suggestions so that we can clean-up the records? There are almost 1 million records in the database and it's hard to clean it up manually.
Thank you in advance.
You could use String matching algorithm like Jaro-Winkler. I've written an SQL version that is used daily to deduplicate People's names that have been typed in differently. It can take awhile but it does work well for the fuzzy match you're looking for.
Something like a self join? || is ANSI SQL concat, some products have a concat function instead.
select *
from tablename t1
join tablename t2 on t1.companyname like '%' || t2.companyname || '%'
Depending on datatype you may have to remove blanks from the t2.companyname, use TRIM(t2.companyname) in that case.
And, as Miguel suggests, use REPLACE to remove commas and dots etc.
Use case-insensitive collation. SOUNDEX can be used etc etc.
I think most Database Servers support Full-Text search ability, and if so there are some functions related to Full-Text search that support Proximity.
for example there is a Near function in SqlServer and here is its documentation https://msdn.microsoft.com/en-us/library/ms142568.aspx
You can do the clean-up in several stages.
Create new columns
Convert everything to upper case, remove punctuation & whitespace, then match on the first 6 to 10 characters (using self join). Assuming your table is called "vendor": add two columns, "status", "dupstr", then update as follows
/** Populate dupstr column for fuzzy match **/
update vendor v
set v.dupstr = left(upper(regex_replace(regex_replace(v.companyname,'.',''),' ','')),6)
;
Identify duplicate records
Add an index on the dupstr column, then do an update like this to identify "good" records:
/** Mark the good duplicates **/
update vendor v
set v.status = 'keep' --indicate keeper record
where
--dupes to clean up
exists ( select 1 from vendor v1 where v.dupstr = v1.dupstr
and v.id != v1.id )
and
( --keeper has longest name
length(v.companyname) =
( select max(length(v2.companyname)) from vendor v2
where v.dupstr = v2.dupstr
)
or
--keeper has latest record (assuming ID is sequential)
v.id =
( select max(v3.id) from vendor v3
where v.dupstr = v3.dupstr
)
)
group by v.dupstr
;
The above SQL can be refined to add "dupe" status to other records , or you can do a separate update.
Clean Up Stragglers
Report any remaining partial matches to be reviewed by a human (i.e. dupe records without a keeper record)
You can use SQL query with SOUDEX of DIFFRENCE
For example:
SELECT DIFFERENCE ('AAA Test','AAA Test Company')
DIFFERENCE returns 0 - 4 ( 4 = almost the same, 0 - totally diffrent)
See also: https://learn.microsoft.com/en-us/sql/t-sql/functions/difference-transact-sql?view=sql-server-2017
Below is an existing ms sql server 2008 report query.
SELECT
number, batchtype, customer, systemmonth, systemyear, entered, comment, totalpaid
FROM
payhistory LEFT OUTER JOIN agency ON
payhistory.SendingID = agency.agencyid
WHERE
payhistory.batchtype LIKE 'p%' AND
payhistory.entered >= '2011-08-01 00:00:00.00' AND
payhistory.entered < '2011-08-15 00:00:00.00' AND
payhistory.systemmonth = 8 AND
payhistory.systemyear = 2011 AND
payhistory.comment NOT LIKE 'Elit%'
Results will look like this:
number batchtype customer systemmonth systemyear entered comment totalpaid
6255756 PC EMC1106 8 2011 12:00:00 AM DP From - NO CASH 33
5575317 PA ERS002 8 2011 12:00:00 AM MO-0051381526 7/31 20
6227031 PA FTS1104 8 2011 12:00:00 AM MO-10422682168 7/30 25
6232589 PC FTS1104 8 2011 12:00:00 AM DP From - NO CASH 103
2548281 PC WAP1001 8 2011 12:00:00 AM NCO DP $1,445.41 89.41
4544785 PCR WAP1001 8 2011 12:00:00 AM NCO DP $1,445.41 39
What I am trying to do is modify the query that will exclude records where the customer is like 'FTS%' and 'EMC%' and batchtype = 'PC'. As you can see in the result set there are records where customer is like FTS% and batchtype = 'PA'. I would like to keep these records in the results. I would appreciate any ideas offered.
Your query contains a mix of upper and lower string comparison targets. As far as I'm aware, SQL Server is not by default case-sensitive; is it possible this is what is tripping your query up? Check collation per this answer.
EDIT: Based on your updated question, can you not just use an AND clause that uses a NOT on the front?
In other words, add a 'AND not (x)' clause, where 'x' is the conditions that define the records you want to exclude? You'd need to nest the customer test, because it's an OR.
e.g.:
... payhistory.comment NOT LIKE 'Elit%'
AND not ((customer like 'FTS%' or customer like 'EMC%') AND batchtype = 'PC')
As a side note, I believe that a LIKE clause may imply an inefficient table scan in some (but not all) cases, so if this query will be used in a performance-sensitive role you may want to check the query plan, and optimise the table to suit.
$sql="select * from builder_property where builder_pro_name LIKE '%%' OR builder_pro_name LIKE '%za%' AND status='Active'";
This will return all the builder property name in table that will ends name like plaza or complex.
It can be because your sever might be case sensitive. In that case, below query would work.
SELECT
table1.number, table1.btype, table1.cust, table1.comment, table2.ACode
FROM
table1 LEFT OUTER JOIN table2 ON table1.1ID = table2.2ID
WHERE
lower(table1.btype) LIKE 'p%' AND
lower(table1.comment) NOT LIKE 'yyy%' AND
lower(table1.cust) NOT LIKE 'abc%' AND
lower(table1.cust) NOT LIKE 'xyz%' AND
lower(table1.btype) <> 'pc'
Add this condition to the WHERE clause:
NOT((customer LIKE 'FTS%' OR customer LIKE 'EMC%') AND batchtype='PC')
Assuming your other results are OK and you just want to filter those out, the whole query would be
SELECT
number, batchtype, customer, systemmonth, systemyear, entered, comment, totalpaid
FROM
payhistory
LEFT OUTER JOIN agency ON
payhistory.SendingID = agency.agencyid
WHERE
payhistory.batchtype LIKE 'p%' AND
payhistory.entered >= '2011-08-01 00:00:00.00' AND
payhistory.entered < '2011-08-15 00:00:00.00' AND
payhistory.systemmonth = 8 AND
payhistory.systemyear = 2011 AND
payhistory.comment NOT LIKE 'Elit%' AND
NOT((payhistory.customer LIKE 'FTS%' OR payhistory.customer LIKE 'EMC%') AND payhistory.batchtype='PC')
Hope that works for you.
When building complex where clauses it is a good idea to use parenthesis to keep everything straight. Also when using multiple NOT LIKE statements you have to combine all of the NOT LIKE conditions together using ORs and wrap them inside of a separate AND condition like this...
WHERE
(payhistory.batchtype LIKE 'p%')
AND (payhistory.entered >= '2011-08-01 00:00:00.00')
AND (payhistory.entered < '2011-08-15 00:00:00.00')
AND (payhistory.systemmonth = 8 )
AND (payhistory.systemyear = 2011)
AND ( // BEGIN NOT LIKE CODE
(payhistory.comment NOT LIKE 'Elit%')
OR (
(payhistory.customer NOT LIKE 'EMC%') AND
(payhistory.batchtype = 'PC')
)
OR (
(payhistory.customer NOT LIKE 'FTS%') AND
(payhistory.batchtype = 'PC')
)
) //END NOT LIKE CODE