Separation of data rows based on a flag and timestamp

Separation of data rows based on a flag and timestamp - sql

I want to delete/ignore/separate logs that are useful from logs that are not useful. Logs that are useful occur before or at a time that is known by a flag. Logs that are not useful occur after the first flagged log.
Data looks like this. Each UID seen at time t:
UID t flag PCP
'0000' 1 0 0
'0000' 2 1 0
'0000' 3 1 0
'0000' 4 0 0
'1111' 11 1 0
'1111' 12 0 0
'1111' 13 0 0
'2222' 1 0 0
'2222' 2 0 0
'2222' 3 0 0
Is there a query to input a 0/1 value in PCP so I can get
UID t flag PCP
'0000' 1 0 1
'0000' 2 1 1
'0000' 3 1 0
'0000' 4 0 0
'1111' 11 1 1
'1111' 12 0 0
'1111' 13 0 0
'2222' 1 0 0
'2222' 2 0 0
'2222' 3 0 0
Note: in actuality flag is \in {0,1,2} and I want PCP to reflect flag = 2. So an incremental sum() won't work.
Edit: this question is similar (different end, and I'm not good at SQl enough to know how to get the output I want from this question). Flag dates occurring after an event within individuals
Another edit: in sqlite you can compare strings and ints in >/= operations, and I think in SQL you cannot. My table is all in text, but comparing with integers has been going well enough, and the question above is breaking because of typing in SQL. see http://sqlfiddle.com/#!3/00448/3

I'm basing this answer off of your SQL Fiddle you posted. If UserID and PCP are actual TEXT datatypes, then this should work. If they are actually varchar, then you can replace the LIKE with an = sign.
You simply need to use an exists clause to look for any record with the same userID that has a conversiontagid = 2 and check the time....
Update logs
Set PCP = '1'
Where exists (
select 1
from logs sub
where logs.userid LIKE sub.userid
and sub.conversiontagid = 2
and sub.t >= logs.t
)
I made some assumptions using your SQL fiddle because it's not exactly clear based on your question above. But userID 4 has three records that all occurred at the same time, so I assumed that they should all three have PCP set to 1.
Here is the SQL Fiddle showing the same query used in a select statement instead of an update statement.
SQL Fiddle Example

Related

Misleading count of 1 on JOIN in Postgres 11.7

I've run into a subtlety around count(*) and join, and a hoping to get some confirmation that I've figured out what's going on correctly. For background, we commonly convert continuous timeline data into discrete bins, such as hours. And since we don't want gaps for bins with no content, we'll use generate_series to synthesize the buckets we want values for. If there's no entry for, say 10AM, fine, we stil get a result. However, I noticed that I'm sometimes getting 1 instead of 0. Here's what I'm trying to confirm:
The count is 1 if you count the "grid" series, and 0 if you count the data table.
This only has to do with count, and no other aggregate.
The code below sets up some sample data to show what I'm talking about:
DROP TABLE IF EXISTS analytics.measurement_table CASCADE;
CREATE TABLE IF NOT EXISTS analytics.measurement_table (
hour smallint NOT NULL DEFAULT NULL,
measurement smallint NOT NULL DEFAULT NULL
);
INSERT INTO measurement_table (hour, measurement)
VALUES ( 0, 1),
( 1, 1), ( 1, 1),
(10, 2), (10, 3), (10, 5);
Here are the goal results for the query. I'm using 12 hours to keep the example results shorter.
Hour Count sum
0 1 1
1 2 2
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 3 10
11 0 0
12 0 0
This works correctly:
WITH hour_series AS (
select * from generate_series (0,12) AS hour
)
SELECT hour_series.hour,
count(measurement_table.hour) AS frequency,
COALESCE(sum(measurement_table.measurement), 0) AS total
FROM hour_series
LEFT JOIN measurement_table ON (measurement_table.hour = hour_series.hour)
GROUP BY 1
ORDER BY 1
This returns misleading 1's on the match:
WITH hour_series AS (
select * from generate_series (0,12) AS hour
)
SELECT hour_series.hour,
count(*) AS frequency,
COALESCE(sum(measurement_table.measurement), 0) AS total
FROM hour_series
LEFT JOIN measurement_table ON (hour_series.hour = measurement_table.hour)
GROUP BY 1
ORDER BY 1
0 1 1
1 2 2
2 1 0
3 1 0
4 1 0
5 1 0
6 1 0
7 1 0
8 1 0
9 1 0
10 3 10
11 1 0
12 1 0
The only difference between these two examples is the count term:
count(*) -- A result of 1 on no match, and a correct count otherwise.
count(joined to table field) -- 0 on no match, correct count otherwise.
That seems to be it, you've got to make it explicit that you're counting the data table. Otherwise, you get a count of 1 since the series data is matching once. Is this a nuance of joinining, or a nuance of count in Postgres?
Does this impact any other aggrgate? It seems like it sholdn't.
P.S. generate_series is just about the best thing ever.

You figured out the problem correctly: count() behaves differently depending on the argument is is given.
count(*) counts how many rows belong to the group. This just cannot be 0 since there is always at least one row in a group (otherwise, there would be no group).
On the other hand, when given a column name or expression as argument, count() takes in account any non-null value, and ignores null values. For your query, this lets you distinguish groups that have no match in the left joined table from groups where there are matches.
Note that this behavior is not Postgres specific, but belongs to the standard
ANSI SQL specification (all databases that I know conform to it).
Bottom line:
in general cases, uses count(*); this is more efficient, since the database does not need to check for nulls (and makes it clear to the reader of the query that you just want to know how many rows belong to the group)
in specific cases such as yours, put the relevant expression in the count()

Converting Column Headers to Row elements

I have 2 tables I am combining and that works but I think I designed the second table wrong as I have a column for each item of what really is a multiple choice question. The query is this:
select Count(n.ID) as MemCount, u.Pay1Click, u.PayMailCC, u.PayMailCheck, u.PayPhoneACH, u.PayPhoneCC, u.PayWuFoo
from name as n inner join
UD_Demo_ORG as u on n.ID = u.ID
where n.MEMBER_TYPE like 'ORG_%' and n.CATEGORY not like '%_2' and
(u.Pay1Click = '1' or u.PayMailCC = '1' or u.PayMailCheck = '1' or u.PayPhoneACH = '1' or u.PayPhoneCC = '1' or u.PayWuFoo = '1')
group by u.Pay1Click, u.PayMailCC, u.PayMailCheck, u.PayPhoneACH, u.PayPhoneCC, u.PayWuFoo
The results come up like this:
Count Pay1Click PayMailCC PayMailCheck PayPhoneACH PayPhoneCC PayWuFoo
8 0 0 0 0 0 1
25 0 0 0 0 1 0
8 0 0 0 1 0 0
99 0 0 1 0 0 0
11 0 1 0 0 0 0
So the question is, how can I get this to 2 columns, Count and then the headers of the next 6 headers so the results look like this:
Count PaymentType
8 PayWuFoo
25 PayPhoneCC
8 PayPhoneACH
99 PayMailCheck
11 PayMailCC
Thanks.

Try this one
Select Count,
CASE WHEN Pay1Click=1 THEN 'Pay1Click'
PayMailCC=1 THEN ' PayMailCC'
PayMailCheck=1 THEN 'PayMailCheck'
PayPhoneACH=1 THEN 'PayPhoneACH'
PayPhoneCC=1 THEN 'PayPhoneCC'
PayWuFoo=1 THEN 'PayWuFoo'
END as PaymentType
FROM ......

I think indeed you made a mistake in the structure of the second table. Instead of creating a row for each multiple choice question, i would suggest transforming all those columns to a 'answer' column, so you would have the actual name of the alternative as the record in that column.
But for this, you have to change the structure of your tables, and change the way the table is populated. you should get the name of the alternative checked and put it into your table.
More on this, you could care for repetitive data in your table, so writing over and over again the same string could make your table grow larger.
if there are other things implied to the answer, other informations in the UD_Demo_ORG table, then you can normalize the table, creating a payment_dimension table or something like this, give your alternatives an ID such as
ID PaymentType OtherInfo(description, etc)...
1 PayWuFoo ...
2 PayPhoneCC ...
3 PayPhoneACH ...
4 PayMailCheck ...
5 PayMailCC ...
This is called a dimension table, and then in your records, you would have the ID of the payment type, and not the information you don't need.
So instead of a big result set, maybe you could simplify by much your query and have just
Count PaymentId
8 1
25 2
8 3
99 4
11 5
as a result set. it would make the query faster too, and if you need other information, you can then join the table and get it.
BUT if the only field you would have is the name, perhaps you could use the paymentType as the "id" in this case... just consider it. It is scalable if you separate to a dimension table.
Some references for further reading:
http://beginnersbook.com/2015/05/normalization-in-dbms/ "Normalization in DBMS"
http://searchdatamanagement.techtarget.com/answer/What-are-the-differences-between-fact-tables-and-dimension-tables-in-star-schemas "Differences between fact tables and dimensions tables"

Need help grouping a column that may or may not have the same value but have the same accounts

How do I output data stored in Table 1 so that each like account number has that also has the same CPT group's together but the ones that do not match fall to the bottom of the list?
I have one table: select * from CPTCounts and this is what is displays
Format (relevant fields only):
account OriginalCPT Count ModifiedCPT Count
11 0 71010 1
11 71010 1 0
2 0 71010 1
2 0 71020 9
2 0 73130 1
2 0 77800 1
2 71010 1 0
2 71020 8 0
2 73130 1 0
2 73610 1 0
2 G0202 4 0
31 99010 1 0
31 0 99010 4
31 0 99700 2
What I want the results to be grouped like is below... and display like this or similar.
Account OriginalCPT Count ModifiedCPT Count
11 71010 1 71010 1
2 71010 1 71010 1
2 71020 8 0
2 73130 1 0
2 73610 1 0
2 G0202 4 0
31 99010 1 99010 4
31 0 99700 2
I have one table with the values above;
Select * from #CPTCounts
The grouping I am looking for is the Original = Modified CPT and sometimes I will not have a value in one side or the other but most of the times I will have a match. I would like to place all of the unmatched ones at the bottom of the account.
any suggestions?
I was thinking of creating a second table and joining the two with the account but how do I return each value?
select cpt1.account, cpt1.originalCPT, cpt1.count, cpt2.modifiedcpt, cpt2.count
from #cptcounts cpt1
join #cptcounts cpt2 on cpt1.accont = cpt2.account
but am having trouble with that solution.

I'm not sure I have an exact solution, but perhaps some food for thought at least. The fact that you need either the "original" or the "modified" set of columns makes me think that you need a full outer join rather than a left join. You don't mention which database you are using. In MySql, for example, full joins can be emulated by means of a union of a left and a right join, as in the following:
select cpt1.account, cpt1.originalCPT, cpt1.countO, cpt2.modifiedcpt, cpt2.countM
from CPTCounts cpt1
left outer join CPTCounts cpt2
on cpt1.account = cpt2.account
and cpt1.originalCPT=cpt2.modifiedCPT
where cpt1.account is not null
and (cpt1.originalCPT is not null or cpt2.modifiedCPT is not null)
union
select cpt1.account, cpt1.originalCPT, cpt1.countO, cpt2.modifiedcpt, cpt2.countM
from CPTCounts cpt1
right outer join CPTCounts cpt2
on cpt1.account = cpt2.account
and cpt1.originalCPT=cpt2.modifiedCPT
where cpt2.account is not null
and (cpt1.originalCPT is not null or cpt2.modifiedCPT is not null)
order by originalCPT, modifiedCPT, account
The ordering brings the non-matching rows to the top, but that seemed a lesser problem than getting the matching to work.
(Your output data is a bit confusing, because the CPT 71020, for example, occurs in both original and modified columns, but you haven't shown it as one of the matching ones in your result set. I'm presuming this is because it is just an example... but if I'm wrong, then I am missing some part of your intention.)
You can play around in this SQL Fiddle.

MS SQL UPDATE-SET 'Case' different results to SELECT 'Case'

I have an update statement which is designed to manage "jobs". I process one job first whilst the others wait in line until the first one is complete. After which they will be sent to process (far faster as I can leverage calcs from the first run).
To quote my own comment:
-- Set job status to 2 for where there is saved results file
-- Where there is no saved results file:
-- --Set job status to 2 for one instance of each unique param combination
-- --Set job status to 8 for all others
PseudoCode:
UPDATE #Jobs
SET [JobStatusId] = CASE
WHEN ISNULL([PreCalculated].[FileCount], 0) > 0 THEN 2
WHEN ISNULL([PreCalculated].[FileCount], 0) = 0 AND [GroupedOrder] = 1 THEN 2
ELSE 8 END
FROM #NewJobsGrouped [NewJobsGrouped]
LEFT JOIN (
SELECT COUNT([Id]) [FileCount],
[ResultsFile].[ParamsId]
FROM #ResultsFile [ResultsFile]
WHERE [ResultsFile].[IsActive] = 1
GROUP BY [ResultsFile].[ParamsId]
) [PreCalculated]
ON [PreCalculated].[ParamsId] = [NewJobsGrouped].[ParamsId]
where #NewJobsGrouped looks like:
Job ID || GroupedOrder || ParamsId
1460 1 807
1461 2 807
1462 3 807
This does not work. Every job is being set to status 2. However:
SELECT CASE
WHEN ISNULL([PreCalculated].[FileCount], 0) > 0 THEN 2
WHEN ISNULL([PreCalculated].[FileCount], 0) = 0 AND [GroupedOrder] = 1 THEN 2
ELSE 8 END [JobStatusId]
etc
Works exactly as I am expecting.
Why would these two case statements give different results? Is there something obvious I am missing? I honestly can't explain what I'm seeing and whilst I can probably use another temp table to hold the output from the select and have a simpler update - but I'd like to understand what's going on?

Update Column if Null from previous values

I am using DB2 SQL. I have a table name ITEMATRIX which looks like -
ITNBR LBRCST MFGOH STDUC YRMNT
RM-013 0 0 499.6 2010-02
H-178 0 0 164.5 2010-02
FP9-003 0 0 6 2010-02
FP9-059 0 0 2 2010-02
A94-103B-M 0 0 0 2010-02
140-07-1012C 0 0 10 2010-05
140-07-1012C 0 0 0 2010-06
then
ITNBR LBRCST MFGOH STDUC YRMNT
140-07-1012C 0 0 10 2010-05
140-07-1012C 0 0 **10** 2010-06
etc etc......
I want to update the STDUC field if the value is 0 or Null to value present to the nearest month. Lets say for ItNBR 140-07-1012Cthe STDUC is 0 for 2010-05, then first I have to find if that Item Number has a standard cost in the year 2010 for any month, if yes then copy the value of the last month which is 2010-04 to it. There are many records with the same Item number which I am transposing later. Can anyone give me some ideas of how to go about this?
Thanks
Varun

You ca use LAG() function if using DB2 9.7: https://www.ibm.com/developerworks/mydeveloperworks/blogs/dbtrue/entry/lag_lead_first_value_last_value_totally_new_in_db21?lang=en

I think he following is the logic that you want:
update itematrix
set stduc = (select stduc
from itematrix im2
where im2.stduc <> 0 and im2.stduc is not null and im2.itnbr = itematrix.itnbr
order by yrmnt desc
fetch first 1 row only
)
where stduc is null or stduc = 0
I haven't tested this in DB2. There may be a problem using fetch first on the same table as the one used in the update. I find the documentation ambiguous.
If you don't need an actual update, but just want to see the value, then try:
select itemmatrix.*,
(select stduc
from itematrix im2
where im2.stduc <> 0 and im2.stduc is not null and im2.itnbr = itematrix.itnbr
order by yrmnt desc
fetch first 1 row only
)
from itemmatrix

I shifted the process to EXCEL and could do it easily using IF and VLOOKUP on single cost values starting from the earliest month and then simply comparing the values of previous month and changing accordingly and then create UPDATE statements using &.I shifted the process to EXCEL and could do it easily using IF and VLOOKUP and then create UPDATE statements using &.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas