Update Column if Null from previous values - sql

I am using DB2 SQL. I have a table name ITEMATRIX which looks like -
ITNBR LBRCST MFGOH STDUC YRMNT
RM-013 0 0 499.6 2010-02
H-178 0 0 164.5 2010-02
FP9-003 0 0 6 2010-02
FP9-059 0 0 2 2010-02
A94-103B-M 0 0 0 2010-02
140-07-1012C 0 0 10 2010-05
140-07-1012C 0 0 0 2010-06
then
ITNBR LBRCST MFGOH STDUC YRMNT
140-07-1012C 0 0 10 2010-05
140-07-1012C 0 0 **10** 2010-06
etc etc......
I want to update the STDUC field if the value is 0 or Null to value present to the nearest month. Lets say for ItNBR 140-07-1012Cthe STDUC is 0 for 2010-05, then first I have to find if that Item Number has a standard cost in the year 2010 for any month, if yes then copy the value of the last month which is 2010-04 to it. There are many records with the same Item number which I am transposing later. Can anyone give me some ideas of how to go about this?
Thanks
Varun

You ca use LAG() function if using DB2 9.7: https://www.ibm.com/developerworks/mydeveloperworks/blogs/dbtrue/entry/lag_lead_first_value_last_value_totally_new_in_db21?lang=en

I think he following is the logic that you want:
update itematrix
set stduc = (select stduc
from itematrix im2
where im2.stduc <> 0 and im2.stduc is not null and im2.itnbr = itematrix.itnbr
order by yrmnt desc
fetch first 1 row only
)
where stduc is null or stduc = 0
I haven't tested this in DB2. There may be a problem using fetch first on the same table as the one used in the update. I find the documentation ambiguous.
If you don't need an actual update, but just want to see the value, then try:
select itemmatrix.*,
(select stduc
from itematrix im2
where im2.stduc <> 0 and im2.stduc is not null and im2.itnbr = itematrix.itnbr
order by yrmnt desc
fetch first 1 row only
)
from itemmatrix

I shifted the process to EXCEL and could do it easily using IF and VLOOKUP on single cost values starting from the earliest month and then simply comparing the values of previous month and changing accordingly and then create UPDATE statements using &.I shifted the process to EXCEL and could do it easily using IF and VLOOKUP and then create UPDATE statements using &.

Related

Misleading count of 1 on JOIN in Postgres 11.7

I've run into a subtlety around count(*) and join, and a hoping to get some confirmation that I've figured out what's going on correctly. For background, we commonly convert continuous timeline data into discrete bins, such as hours. And since we don't want gaps for bins with no content, we'll use generate_series to synthesize the buckets we want values for. If there's no entry for, say 10AM, fine, we stil get a result. However, I noticed that I'm sometimes getting 1 instead of 0. Here's what I'm trying to confirm:
The count is 1 if you count the "grid" series, and 0 if you count the data table.
This only has to do with count, and no other aggregate.
The code below sets up some sample data to show what I'm talking about:
DROP TABLE IF EXISTS analytics.measurement_table CASCADE;
CREATE TABLE IF NOT EXISTS analytics.measurement_table (
hour smallint NOT NULL DEFAULT NULL,
measurement smallint NOT NULL DEFAULT NULL
);
INSERT INTO measurement_table (hour, measurement)
VALUES ( 0, 1),
( 1, 1), ( 1, 1),
(10, 2), (10, 3), (10, 5);
Here are the goal results for the query. I'm using 12 hours to keep the example results shorter.
Hour Count sum
0 1 1
1 2 2
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 3 10
11 0 0
12 0 0
This works correctly:
WITH hour_series AS (
select * from generate_series (0,12) AS hour
)
SELECT hour_series.hour,
count(measurement_table.hour) AS frequency,
COALESCE(sum(measurement_table.measurement), 0) AS total
FROM hour_series
LEFT JOIN measurement_table ON (measurement_table.hour = hour_series.hour)
GROUP BY 1
ORDER BY 1
This returns misleading 1's on the match:
WITH hour_series AS (
select * from generate_series (0,12) AS hour
)
SELECT hour_series.hour,
count(*) AS frequency,
COALESCE(sum(measurement_table.measurement), 0) AS total
FROM hour_series
LEFT JOIN measurement_table ON (hour_series.hour = measurement_table.hour)
GROUP BY 1
ORDER BY 1
0 1 1
1 2 2
2 1 0
3 1 0
4 1 0
5 1 0
6 1 0
7 1 0
8 1 0
9 1 0
10 3 10
11 1 0
12 1 0
The only difference between these two examples is the count term:
count(*) -- A result of 1 on no match, and a correct count otherwise.
count(joined to table field) -- 0 on no match, correct count otherwise.
That seems to be it, you've got to make it explicit that you're counting the data table. Otherwise, you get a count of 1 since the series data is matching once. Is this a nuance of joinining, or a nuance of count in Postgres?
Does this impact any other aggrgate? It seems like it sholdn't.
P.S. generate_series is just about the best thing ever.
You figured out the problem correctly: count() behaves differently depending on the argument is is given.
count(*) counts how many rows belong to the group. This just cannot be 0 since there is always at least one row in a group (otherwise, there would be no group).
On the other hand, when given a column name or expression as argument, count() takes in account any non-null value, and ignores null values. For your query, this lets you distinguish groups that have no match in the left joined table from groups where there are matches.
Note that this behavior is not Postgres specific, but belongs to the standard
ANSI SQL specification (all databases that I know conform to it).
Bottom line:
in general cases, uses count(*); this is more efficient, since the database does not need to check for nulls (and makes it clear to the reader of the query that you just want to know how many rows belong to the group)
in specific cases such as yours, put the relevant expression in the count()

Give first duplicate a 1 and the rest 0

I have data which contains 1000+ lines and in this it contains errors people make. I have added a extra column and would like to find all duplicate Rev Names and give the first one a 1 and all remaining duplicates a 0. When there is no duplicate, it should be a 1. The outcome should look like this:
RevName ErrorCount Duplicate
Rev5588 23 1
Rev5588 67 0
Rev5588 7 0
Rev5588 45 0
Rev7895 6 1
Rev9065 4 1
Rev5588 1 1
I have tried CASE WHEN but its not giving the first one a 1, its giving them all zero's.
Thanks guys, I am pulling out my hair here trying to get this done.
You could use a case expression over the row_number window function:
SELECT RevName,
Duplicate,
CASE ROW_NUMER() OVER (PARTITION BY RevName
ORDER BY (SELECT 1)) WHEN 1 THEN 1 ELSE 0 END AS Duplicate
FROM mytable
SQL tables represent unordered sets. There is no "first" of anything, unless a column specifies the ordering.
Your logic suggests lag():
select t.*,
(case when lag(revname) over (order by ??) = revname then 0
else 1
end) as is_duplicate
from t;
The ?? is for the column that specifies the ordering.

Converting Column Headers to Row elements

I have 2 tables I am combining and that works but I think I designed the second table wrong as I have a column for each item of what really is a multiple choice question. The query is this:
select Count(n.ID) as MemCount, u.Pay1Click, u.PayMailCC, u.PayMailCheck, u.PayPhoneACH, u.PayPhoneCC, u.PayWuFoo
from name as n inner join
UD_Demo_ORG as u on n.ID = u.ID
where n.MEMBER_TYPE like 'ORG_%' and n.CATEGORY not like '%_2' and
(u.Pay1Click = '1' or u.PayMailCC = '1' or u.PayMailCheck = '1' or u.PayPhoneACH = '1' or u.PayPhoneCC = '1' or u.PayWuFoo = '1')
group by u.Pay1Click, u.PayMailCC, u.PayMailCheck, u.PayPhoneACH, u.PayPhoneCC, u.PayWuFoo
The results come up like this:
Count Pay1Click PayMailCC PayMailCheck PayPhoneACH PayPhoneCC PayWuFoo
8 0 0 0 0 0 1
25 0 0 0 0 1 0
8 0 0 0 1 0 0
99 0 0 1 0 0 0
11 0 1 0 0 0 0
So the question is, how can I get this to 2 columns, Count and then the headers of the next 6 headers so the results look like this:
Count PaymentType
8 PayWuFoo
25 PayPhoneCC
8 PayPhoneACH
99 PayMailCheck
11 PayMailCC
Thanks.
Try this one
Select Count,
CASE WHEN Pay1Click=1 THEN 'Pay1Click'
PayMailCC=1 THEN ' PayMailCC'
PayMailCheck=1 THEN 'PayMailCheck'
PayPhoneACH=1 THEN 'PayPhoneACH'
PayPhoneCC=1 THEN 'PayPhoneCC'
PayWuFoo=1 THEN 'PayWuFoo'
END as PaymentType
FROM ......
I think indeed you made a mistake in the structure of the second table. Instead of creating a row for each multiple choice question, i would suggest transforming all those columns to a 'answer' column, so you would have the actual name of the alternative as the record in that column.
But for this, you have to change the structure of your tables, and change the way the table is populated. you should get the name of the alternative checked and put it into your table.
More on this, you could care for repetitive data in your table, so writing over and over again the same string could make your table grow larger.
if there are other things implied to the answer, other informations in the UD_Demo_ORG table, then you can normalize the table, creating a payment_dimension table or something like this, give your alternatives an ID such as
ID PaymentType OtherInfo(description, etc)...
1 PayWuFoo ...
2 PayPhoneCC ...
3 PayPhoneACH ...
4 PayMailCheck ...
5 PayMailCC ...
This is called a dimension table, and then in your records, you would have the ID of the payment type, and not the information you don't need.
So instead of a big result set, maybe you could simplify by much your query and have just
Count PaymentId
8 1
25 2
8 3
99 4
11 5
as a result set. it would make the query faster too, and if you need other information, you can then join the table and get it.
BUT if the only field you would have is the name, perhaps you could use the paymentType as the "id" in this case... just consider it. It is scalable if you separate to a dimension table.
Some references for further reading:
http://beginnersbook.com/2015/05/normalization-in-dbms/ "Normalization in DBMS"
http://searchdatamanagement.techtarget.com/answer/What-are-the-differences-between-fact-tables-and-dimension-tables-in-star-schemas "Differences between fact tables and dimensions tables"

Separation of data rows based on a flag and timestamp

I want to delete/ignore/separate logs that are useful from logs that are not useful. Logs that are useful occur before or at a time that is known by a flag. Logs that are not useful occur after the first flagged log.
Data looks like this. Each UID seen at time t:
UID t flag PCP
'0000' 1 0 0
'0000' 2 1 0
'0000' 3 1 0
'0000' 4 0 0
'1111' 11 1 0
'1111' 12 0 0
'1111' 13 0 0
'2222' 1 0 0
'2222' 2 0 0
'2222' 3 0 0
Is there a query to input a 0/1 value in PCP so I can get
UID t flag PCP
'0000' 1 0 1
'0000' 2 1 1
'0000' 3 1 0
'0000' 4 0 0
'1111' 11 1 1
'1111' 12 0 0
'1111' 13 0 0
'2222' 1 0 0
'2222' 2 0 0
'2222' 3 0 0
Note: in actuality flag is \in {0,1,2} and I want PCP to reflect flag = 2. So an incremental sum() won't work.
Edit: this question is similar (different end, and I'm not good at SQl enough to know how to get the output I want from this question). Flag dates occurring after an event within individuals
Another edit: in sqlite you can compare strings and ints in >/= operations, and I think in SQL you cannot. My table is all in text, but comparing with integers has been going well enough, and the question above is breaking because of typing in SQL. see http://sqlfiddle.com/#!3/00448/3
I'm basing this answer off of your SQL Fiddle you posted. If UserID and PCP are actual TEXT datatypes, then this should work. If they are actually varchar, then you can replace the LIKE with an = sign.
You simply need to use an exists clause to look for any record with the same userID that has a conversiontagid = 2 and check the time....
Update logs
Set PCP = '1'
Where exists (
select 1
from logs sub
where logs.userid LIKE sub.userid
and sub.conversiontagid = 2
and sub.t >= logs.t
)
I made some assumptions using your SQL fiddle because it's not exactly clear based on your question above. But userID 4 has three records that all occurred at the same time, so I assumed that they should all three have PCP set to 1.
Here is the SQL Fiddle showing the same query used in a select statement instead of an update statement.
SQL Fiddle Example

How to set variable to 0 instead of 1 for every nth row

New to stackoverflow and SQL, so please be gentle. I'm attempting to create a stored proc that will change variable value from 1 to 0 for every nth row of identity column in table
SET #randomBit = IF(SELECT ID FROM [dbo].[js_xxxx]
WHERE ID%5 AND ID > 0) THEN SET #randomBit = 0 ELSE 1 END
The purpose #randomBit value is to be used to set a bit value, which will then be combined with other fields, then use a while to loop 50 times and then insert into a table.
I
Below would be the ouptut:
Uni
1
1
1
1
0
1
1
SELECT id, CASE WHEN ID%5=0 THEN 1 else 0 end [RandomBit] FROM [dbo].[js_xxxx]
This will give you every 5th ID as 1