I have an update statement which is designed to manage "jobs". I process one job first whilst the others wait in line until the first one is complete. After which they will be sent to process (far faster as I can leverage calcs from the first run).
To quote my own comment:
-- Set job status to 2 for where there is saved results file
-- Where there is no saved results file:
-- --Set job status to 2 for one instance of each unique param combination
-- --Set job status to 8 for all others
PseudoCode:
UPDATE #Jobs
SET [JobStatusId] = CASE
WHEN ISNULL([PreCalculated].[FileCount], 0) > 0 THEN 2
WHEN ISNULL([PreCalculated].[FileCount], 0) = 0 AND [GroupedOrder] = 1 THEN 2
ELSE 8 END
FROM #NewJobsGrouped [NewJobsGrouped]
LEFT JOIN (
SELECT COUNT([Id]) [FileCount],
[ResultsFile].[ParamsId]
FROM #ResultsFile [ResultsFile]
WHERE [ResultsFile].[IsActive] = 1
GROUP BY [ResultsFile].[ParamsId]
) [PreCalculated]
ON [PreCalculated].[ParamsId] = [NewJobsGrouped].[ParamsId]
where #NewJobsGrouped looks like:
Job ID || GroupedOrder || ParamsId
1460 1 807
1461 2 807
1462 3 807
This does not work. Every job is being set to status 2. However:
SELECT CASE
WHEN ISNULL([PreCalculated].[FileCount], 0) > 0 THEN 2
WHEN ISNULL([PreCalculated].[FileCount], 0) = 0 AND [GroupedOrder] = 1 THEN 2
ELSE 8 END [JobStatusId]
etc
Works exactly as I am expecting.
Why would these two case statements give different results? Is there something obvious I am missing? I honestly can't explain what I'm seeing and whilst I can probably use another temp table to hold the output from the select and have a simpler update - but I'd like to understand what's going on?
Related
To generate 1mln rows of report with the below mentioned script is taking almost 2 days so, really appreciate if somebody could help me with different script which the report can be generated within 10-15mins please.
The requirement of the report is as following;
Table “cover” contains 5mln rows & 6 columns of data and likewise table “data” contains 500,000 rows and 6 columns.
So, each numbers of the rows in table cover has to go through table date and provide the maximum matches.
For instance, as mentioned on the below tables, there could be 3 matches in row #1, 2 matches in row #2 and 5 matches in row #3 so the script has to select the max selection which is 5 in row #3.
Sample table
UPDATE public.cover_sheet AS fc
SET maxmatch = (SELECT MAX(tmp.mtch)
FROM (
SELECT (SELECT CASE WHEN fc.a=drwo.a THEN 1 ELSE 0 END) +
(SELECT CASE WHEN fc.b=drwo.b THEN 1 ELSE 0 END) +
(SELECT CASE WHEN fc.c=drwo.c THEN 1 ELSE 0 END) +
(SELECT CASE WHEN fc.d=drwo.d THEN 1 ELSE 0 END) +
(SELECT CASE WHEN fc.e=drwo.e THEN 1 ELSE 0 END) +
(SELECT CASE WHEN fc.f=drwo.f THEN 1 ELSE 0 END) AS mtch
FROM public.data AS drwo
) AS tmp)
WHERE fc.code>0;
SELECT *
FROM public.cover_sheet AS fc
WHERE fc.maxmatch>0;
As #a_horse_with_no_name mentioned in the comment to the question, your question is not clear...
Seems, you want to get the number of records which 6 fields from both tables are equal.
I'd suggest to:
reduce the number of select statements, then the speed of query execution will increase,
split your query into few smaller ones (good practice), to check your logic,
use join to get equal data, see: Visual Representation of SQL Joins
use subquery or cte to get result on which you'll be able to update table.
I think you want to get result as follow:
SELECT COUNT(*) mtch
FROM public.cover_sheet AS fc INNER JOIN public.data AS drwo ON
fc.a=drwo.a AND fc.b=drwo.b AND fc.c=drwo.c AND fc.d=drwo.d AND fc.e=drwo.e AND fc.f=drwo.f
If i'm not wrong and above query is correct, the time of execution of above query will reduce to about 1-2 minutes.
Finally, update query may look like:
WITH qry AS
(
-- proper select statement here
)
UPDATE public.cover_sheet AS fc
SET maxmatch = qry.<fieldname>
FROM qry
WHERE fc.code>0 AND fc.<key> = qry.<key>;
Note:
I do not see your data and i know nothing about its structure, relationships, etc. So, you have to change above query to your needs.
I´m a bit newbie to SQL, so I want to ask for possible solution how to create a query which will show the desired results.
There´s a table where are the data coming continuously from one main PLC so everything is gathered in 1 table. There are a bunch of data per 1 shift (about half million). In column "Op" are the machines represented by their IDs (Op = ID of machine). Machines are about from 1 to 218. Every machine has it´s their cycle times divided into starting process (T1), duration (T2), end process(T3) and Result. The "Result" can be interpreted as 0 - as OK, 1 - not OK, 2 - empty pallet, 3 - free flow of pallet. Those are the Results of what the PLCs are reporting directly into database´s table.
I have tried the basic statements to count these results for exact record states (0,1,2,3) and for exact machine. That´s OK but not the desired goal.
SELECT Count(*) as Result0
FROM PalletOperations
where Op = 1 and Result = 0
The expected result is to show a full list of every machines from 1 to 218 how many results were counted as 0, 1, 2 and 3. The other columns are not relevant for this time. The main goal is to show a result as every machine has its own row with the expected data of counted states result. If theres 218 machines, than I need to generate results of 218 machines separately from 1 to 218 in rows. Each row should contain the Op(name 1,2,3,4....218) with the columns of counted result for states 0,1,2,3 as mentioned above.
Any advice is welcome
I think you want conditional aggregation:
SELECT op,
SUM(CASE WHEN Result = 0 THEN 1 ELSE 0 END) as result_0,
SUM(CASE WHEN Result = 1 THEN 1 ELSE 0 END) as result_1,
SUM(CASE WHEN Result = 2 THEN 1 ELSE 0 END) as result_2,
SUM(CASE WHEN Result = 3 THEN 1 ELSE 0 END) as result_3
FROM PalletOperations
GROUP BY op;
you can use the below sql statement to get count of results per machine per result
SELECT op,Result, count(*)
FROM PalletOperations
GROUP BY op,Result;
I'm not good at all with IF statements. I currently have a schedule that looks like this:
Lot(Int) PartNum(Varchar50) Amount(Int) IsPainted(Bool) IsInspected(Bool) Finished(Bool)
1 xxx-0191 500 1 1 0
2 xxx-0191 700 1 0 0
What I'm trying to accomplish, and I'm under the thought it'll have to be handled by an IF statement but I'm certainly open for using whatever works best here, is to have a query that will give me the following
Lot PartNum Amount Status
1 xxx-0191 500 Inspected
2 xxx-0191 700 Painted
What I need it to do is just pull the last available column of "True" or "1" in the boolean columns and just display that information in the "Status" column in the query.
use this code
select
Lot,
PartNum,
Aamount,
Case when IsInspected=1 then 'Inspected' else 'Painted' end Status
from
table
Use case. Something like this:
select lot, partnum, amount,
(case when Finished = 1 then 'Finished'
when IsInspected = 1 then 'Inspected'
when IsPainted = 1 then 'Painted'
) as status
This chooses the last boolean as the one chosen for the status.
I have a table which is having 3 columns-PID,LOCID,ISMGR. Now in existing scenario, for some person, based on the location ID, he is set as ISMGR=true.
But as per the new requirement, we have to make all the ISMGR=true for any person who is having at least one ISMGR=true(means if he is mangager for any one location, he should be manager for all the locations).
Table Data before running the script:
PID|LOCID|ISMGR
1 1 1
1 2 0
1 3 0
2 1 0
2 2 1
Table Data after running the script:
PID|LOCID|ISMGR
1 1 1
1 2 1
1 3 1
2 1 1
2 2 1
Any help will be highly appreciated..
Thanks in advance.
I would be inclined to write this using exists:
update t
set ismgr = 1
where ismgr = 0 and
exists (select 1 from t t2 where t2.pid = t.pid and t2.ismgr = 1);
exists should be more efficient than doing a subquery with an aggregation.
This will work best with indexes on t(pid, ismgr) and t(ismgr).
This is not an answer but a test of the two solutions offered so far - I will call them the "EXISTS" and the "AGGREGATE" solutions or approaches.
Details of the tests are below, but here are two overall conclusions:
Both approaches have comparable execution times; on average the AGGREGATE approach worked a little faster than the EXISTS approach, but by a very small margin (smaller than the differences between running times from one trial to the next). Without indexes on any columns, the run times were: (first number is for the EXISTS approach and the second for AGGREGATE). Trial 1: 8.19s 8.08s Trial 2: 8.98s 8.22s Trial 3: 9.46s 9.55s Note - Estimated optimizer costs should be used only to compare different execution plans for the same statement, not for different solutions using different approaches. Even so, someone will inevitably ask; so - for the EXISTS approach the lowest cost the Optimizer found was 4766; for AGGREGATE, 2665. Again, though, this is completely meaningless.
If a lot of rows need to be updated, indexes will hurt performance much more than they help it. Indeed, when rows are updated, the indexes must be updated as well. If only a small number of rows must be updated, then the indexes will help, because most of the time is spent finding the rows that must be updated and only little time is spent in the updates themselves. In my example almost 25% of rows had to be updated... so the AGGREGATE solution took 51.2 seconds and the EXISTS solution took 59.3 seconds! RECOMMENDATION: If you expect that a large number of rows may need to be updated, and you already have indexes on the table, you may be better off DROPPING them and re-creating them after the updates! Or, perhaps there are other solutions to this problem; I am not an expert (keep that in mind!)
To test properly, after I created the test table and committed, I ran each solution by itself, then I rolled back and, logged in as SYS (in a different session), I ran alter system flush buffer_cache to make sure performance is not randomly helped by cache hits or hurt by misses. In all cases everything is done from disk storage.
I created a table with id's from 1 to 1.2 million and a random integer between 1 and 3, with probabilities 40%, 40% and 20% respectively (see the use of dbms_random below). Then from this prep data I created the test table: each pid was included one, two or three times based on this random integer; and a random 0 or 1 was added as ismgr (with 50-50 probability) in each row. I also added a random integer between 1 and 4 as locid just to simulate the actual data; I didn't worry about duplicate locid since that column plays no role in the problem.
Of the 1.2 million pids, approximately 480,000 (40%) appear just once in the test table, another ~480,000 appear twice and ~240,000 three times. Total rows should be about 2,160,000. That's the cardinality of the base table (in reality it ended up being 2,160,546). Then: none of the ~480,000 rows with unique pid need to be changed; half of the 480,000 pids with a count of 2 will have the same ismgr (so no change) and the other half will be split, so we will need to change 240,000 rows from these; and a simple combinatorial argument shows that 3/8, or 270,000 rows, of the 720,000 rows for pids that appear three times in the table must be changed. So we should expect that 510,000 rows should be changed. In fact the update statements resulted in 510,132 rows updated (same for both solutions). These sanity checks show that the test was probably set up correctly. Below I show also a small sample from the base table, also as a sanity check.
CREATE TABLE statement:
create table tbl as
with prep ( pid, dup ) as (
select level,
round( dbms_random.value(0.5, 3) ) as dup
from dual
connect by level <= 1200000
)
select pid,
round( dbms_random.value(0.5, 4.5) ) as locid,
round( dbms_random.value(0, 1) ) as ismgr
from prep
connect by level <= dup
and prior pid = pid
and prior sys_guid() is not null
;
commit;
Sanity checks:
select count(*) from tbl;
COUNT(*)
----------
2160546
select * from tbl where pid between 324720 and 324730;
PID LOCID ISMGR
---------- ---------- ----------
324720 4 1
324721 1 0
324721 4 1
324722 3 0
324723 1 0
324723 3 0
324723 3 1
324724 3 1
324724 2 0
324725 4 1
324725 2 0
324726 2 0
324726 1 0
324727 3 0
324728 4 1
324729 1 0
324730 3 1
324730 3 1
324730 2 0
19 rows selected
UPDATE statements:
update tbl t
set ismgr = 1
where ismgr = 0 and
exists (select 1 from tbl t2 where t2.pid = t.pid and t2.ismgr = 1);
rollback;
update tbl
set ismgr = 1
where ismgr = 0
and pid in ( select pid
from tbl
group by pid
having max(ismgr) = 1);
rollback;
-- statements to create indexes, used in separate testing:
create index pid_ismgr_idx on tbl(pid, ismgr);
create index ismgr_ids on tbl(ismgr);
Why PL/SQL? All you need is a plain SQL statement. For example:
update your_table t -- enter your actual table name here
set ismgr = 1
where ismgr = 0
and pid in ( select pid
from your_table
group by pid
having max(ismgr) = 1)
;
The existing solutions are perfectly fine, but I prefer to use merge any time I'm updating rows from a correlated sub-query. I find it to be more readable and the performance is typically commensurate with the exists method.
MERGE INTO t
USING (SELECT DISTINCT pid
FROM t
WHERE ismgr = 1) src
ON (t.pid = src.pid)
WHEN MATCHED THEN
UPDATE SET ismgr = 1
WHERE ismgr = 0;
As #mathguy pointed out, in this case using group by and having is more efficient than distinct. To use that with merge is just a matter of changing the sub-query:
MERGE INTO t
USING (SELECT pid
FROM t
GROUP BY pid
HAVING MAX(ismgr) = 1) src
ON (t.pid = src.pid)
WHEN MATCHED THEN
UPDATE SET ismgr = 1
WHERE ismgr = 0;
I have 2 queries. I basically need to work out from the 2 queries when the all threads have completed processing and alert in a form of Print Complete. To do this - If the current Processing query matches total put through query output number, All threads are completed in that batch. I need to create a stored proc that would PRINT Complete. What would be the best way of doing this?
--How Many To Process - Which is put through first (but not processed)
SELECT COUNT(IsProcessed) as 'Current Put through'
FROM [dbo].[threads]
-- Current Processed -- This incrementally goes up by the Isprocessed flag changing 1 by 1 to the value of 1
SELECT COUNT(IsProcessed) 'Current_Processing'
FROM [dbo].[threads] as count
where IsProcessed=1
Can you do something like the following?
IF NOT EXISTS (SELECT 1 FROM dbo.Threads WHERE IsProcessed != 1)
BEGIN
PRINT 'Complete'
END
ELSE
BEGIN
PRINT 'Ongoing'
END
select *,
case when cntAll = cntProcessed then 'Complete' else 'Ongoing' end as Status
from (
select count(case when IsProcessed = 1 then 1 else NULL end) Current_Processing,
count(1) 'Current Put through'
from [dbo].[threads])A