Split row in two based on date range of following row? - sql

So I have a view that updates based on changes in the underlying data. There are about 20 columns but most are static, there are however changes in boolean columns. Changes can be reverted in the underlying system so boolean columns can go TRUE->FALSE->TRUE while the other columns remain the same. We are capturing the status every day (hashing and comparing) and if nothing has changed we increment the datetime field (ExportDate) to the current timestamp. If the data has changed the ExportDate stays the same and a new row is inserted with LoadDate having new current datetime. So if same boolean column value is changed again (reverted) (TRUE->FALSE->TRUE) the 3rd hash unfortunately is the same as the 1st hash and ExportDate of the 1st row is updated with current datetime - no new row is created even though it should. Is there any way for me to use the current (faulty) view and modify it to correctly show the distinct changes? The below example is for one item (chainsaw).
Serial_No
Reserved
In_Stock
Blocked
Disposed
LoadDate
ExportDate
245586
TRUE
TRUE
FALSE
FALSE
2022-06-01 04:28:51.587
2022-06-02 02:57:00.000
245586
FALSE
TRUE
FALSE
FALSE
2022-06-03 04:33:05.452
2023-01-16 03:54:00.000
245586
TRUE
TRUE
FALSE
FALSE
2022-07-05 04:33:32.551
2022-12-22 03:53:00.000
So in essence, can I create a 4th row with LoadDate = 2022-12-23 hh:mm:ss.sss and ExportDate = 2023-01-16 03:54:00.000 and modify 2nd row so that LoadDate = 2022-06-03 04:33:05.452 and ExportDate = 2022-07-04 hh:mm:ss.sss? So it looks like below:
Serial_No
Reserved
In_Stock
Blocked
Disposed
LoadDate
ExportDate
245586
TRUE
TRUE
FALSE
FALSE
2022-06-01 04:28:51.587
2022-06-02 02:57:00.000
245586
FALSE
TRUE
FALSE
FALSE
2022-06-03 04:33:05.452
2022-07-04 hh:mm:ss.sss
245586
TRUE
TRUE
FALSE
FALSE
2022-07-05 04:33:32.551
2022-12-22 03:53:00.000
245586
FALSE
TRUE
FALSE
FALSE
2022-12-23 hh:mm:ss.sss
2023-01-16 03:54:00.000
Is there any possibility to actually compare and alter data in the way that I'm looking for? I'm using Snowflake.

It's a bit messy but using the LEAD() and LAG() functions you can play with the logic to pick the right dates. Then LEAST() and COALESCE() to ensure we don't skip anything -> this part reduces readability which you can squarely blame on your source system.
For the generating a new record, I'd do this in 2 steps. 1) Fix records you have. Then 2) Write separate query to generate new row (using similar techniques) and union them together.
Nasty, nasty ... good luck :-)
WITH CTE AS (
SELECT 245586 SERIAL_NO, TRUE RESERVED, TRUE INSTOCK, FALSE BLOCKED, FALSE DISPOSED, '2022-06-01 04:28:51.587'::TIMESTAMP LOAD_DATE, '2022-06-02 02:57:00.000'::TIMESTAMP EXPORT_DATE
UNION ALL SELECT 245586 SERIAL_NO, FALSE RESERVED, TRUE INSTOCK, FALSE BLOCKED, FALSE DISPOSED, '2022-06-03 04:33:05.452'::TIMESTAMP LOAD_DATE, '2023-01-16 03:54:00.000'::TIMESTAMP EXPORT_DATE
UNION ALL SELECT 245586 SERIAL_NO, TRUE RESERVED, TRUE INSTOCK, FALSE BLOCKED, FALSE DISPOSED, '2022-07-05 04:33:32.551'::TIMESTAMP LOAD_DATE, '2022-12-22 03:53:00.000'::TIMESTAMP EXPORT_DATE)
SELECT *
,LEAST(COALESCE(LEAD(LOAD_DATE)OVER(PARTITION BY SERIAL_NO ORDER BY LOAD_DATE),EXPORT_DATE) ,EXPORT_DATE) FIXED_EXPORT
,LAG(EXPORT_DATE)OVER(PARTITION BY SERIAL_NO ORDER BY EXPORT_DATE)
FROM CTE

Related

Branches coverage in Mockk Kotlin

I am using mockk library in Kotlin. I am covering branch coverage. I am new to testing. Can someone tell me how to cover all branches? In the given below example, it has two objects one is id as string and the name which is hidden is list. Thanks
Can someone explain what are the 12 branches are for this?
I can only count 9, but I might be missing something:
!id.isNullOrEmpty()
Reason
!xxx.isNullOrEmpty()
Reason
TRUE
id is not null and not empty
TRUE
xxx is not null and not empty
TRUE
id is not null and not empty
FALSE
xxx is empty
TRUE
id is not null and not empty
FALSE
xxx is null
FALSE
id is null
TRUE
xxx is not null and not empty
FALSE
id is empty
TRUE
xxx is not null and not empty
FALSE
id is null
FALSE
xxx is null
FALSE
id is null
FALSE
xxx is empty
FALSE
id is empty
FALSE
xxx is null
FALSE
id is empty
FALSE
xxx is empty

In SQL, how do I count the number of rows showing every boolean setup possibility?

I currently have this dataset, intended to show every line as a potential boolean combination :
Column_A
Column_B
True
True
True
False
False
True
False
False
I want this :
Column_A
Column_B
Column_C
True
True
100
True
False
50
False
True
40
False
False
10
Where Column_C is a COUNT(*) FROM the initial table, with conditions.
The thing is, if I use a subquery like
(SELECT COUNT(*) FROM table WHERE condition) as Column_C
I just get the total count regardless of the boolean combinations.
I get this :
Column_A
Column_B
Column_C
True
True
200
True
False
200
False
True
200
False
False
200
Any idea on how to solve this ?
Thanks in advance !
I think you want aggregation:
select a, b, count(*)
from t
group by a, b;

Calculated Column to Determine if Value Appears in Column of Any Row Filtered on Non-Unique ID

I have a set of data in PowerPivot that has non-unique IDs on which I group items for reporting purposes. Each row is a unique item called a task and multiple tasks may be associated with one item called a review. Each task may require action. As such, the table looks something like this (without the ReviewAction column):
TaskID Action ReviewID ReviewAction
------------------------------------------------
1 True 1 True
2 False 1 True
3 False 2 False
4 True 3 True
5 False 4 False
6 False 4 False
7 False 5 True
8 True 5 True
9 False 5 True
Is there a way to produce ReviewAction as a calculated column (Display True if any tasks associated with a review require action)? For example, Review 1 contains Tasks 1 and 2. Task 1 requires action so ReviewAction is set to True for any row associated with Review 1. Likewise, Review 5 contains Tasks 7, 8, and 9. Only Task 8 requires action, but I want ReviewAction to display True for all rows associated with Review 5.
I have used the following function to count if Review IDs are duplicates, and if so, how many duplicates there are:
=CALCULATE(COUNTROWS('TableName'), FILTER('TableName', [ReviewID]=EARLIER([ReviewID])))
I haven't been able to figure out a way to use this same filtering technique to produce the ReviewAction column, however.
The reason I'm trying to produce this column is so that I can create a chart that counts the Review items (just with a distinct count) and includes a slicer to filter by reviews that require action or not. In order to create a slicer, I need that "ReviewAction" value to exist as a column.
H Jimmy,
Very interesting case, I was dealing with something similar and found an elegant (not sure how performance-smart) way to do this using X-functions. In this case, MAXX will help:
=
MAXX (
FILTER ( 'Table', [Review ID] = EARLIER ( [Review ID] ) ),
INT ( [Action] )
)
It takes sub-tables grouped by Review ID and then calculates maximum for this group, using INTEGER value of True/False statements. Then, just wrap it inside an IF clause:
=
IF (
MAXX (
FILTER ( 'Table', [Review ID] = EARLIER ( [Review ID] ) ),
INT ( [Action] )
)
= 1,
TRUE (),
FALSE ()
)
And you will get what you need. The source Excel 2013 file can be downloaded here.
Hope this helps.

Get the "OR" result of all rows of a bit column

I have a table like this:
**ID** | **read** | **edit** | **delete**|
1 true false false
2 false false true
3 true false true
4 true false false
I want to "OR" the rows and at last create a row that contain the OR result of them. is there any way to do it without for loop? what is the best way ? (row may be so many and i think for loop may reduce speed)
You could just cast the bits to integers and use MAX to get the biggest value;
SELECT MAX(CAST([read] AS INT)) [read],
MAX(CAST([edit] AS INT)) [edit],
MAX(CAST([delete] AS INT)) [delete]
FROM mytable;
An SQLfiddle to test with.
Try this:
select
cast(max(cast([read] as int)) as bit) as [overall_read],
cast(max(cast([edit] as int)) as bit) as [overall_edit],
cast(max(cast([delete] as int)) as bit) as [overall_delete]
from tbl
a or b is True when at least 1 of a or b is True, and False otherwise. So you can directly reduce this to getting the maximum value for each column, as #Joachim has also pointed out.

Return true if all column values are true

Is there a faster way in PostgreSQL to essentially do an if on several rows?
Say I have a table
ticket | row | archived
1 | 1 | true
1 | 2 | true
1 | 3 | true
2 | 1 | false
2 | 2 | true
Is there any way I could do an if statement across down the column where ticket = ?
So that where ticket = 1 would be true because
true && true && true = true
and where ticket = 2 would be false because
false && true = false
Or should I just stick with
SELECT ( (SELECT COUNT(*) FROM table WHERE ticket = 1)
= (SELECT COUNT(*) FROM table WHERE ticket = 1 AND archived = true) )
Aggregate function bool_and()
Simple, short, clear:
SELECT bool_and(archived)
FROM tbl
WHERE ticket = 1;
The manual:
true if all input values are true, otherwise false
Subquery expression EXISTS
Assuming archived is defined NOT NULL. Faster, but you have to additionally check whether any rows with ticket = 1 exist at all, or you'll get incorrect results for non-existing tickets:
SELECT EXISTS (SELECT FROM tbl WHERE ticket=1)
AND NOT
EXISTS (SELECT FROM tbl WHERE ticket=1 AND NOT archived);
Indices
Both forms can use an index like:
CREATE INDEX tbl_ticket_idx ON tbl (ticket);
.. which makes both fast, but the EXISTS query faster, because this form can stop to scan as soon as the first matching row is found. Hardly matters for only few rows per ticket, but matters for many.
To make use of index-only scans you need a multi-column index of the form:
CREATE INDEX tbl_ticket_archived_idx ON tbl (ticket, archived);
This one is better in most cases and any version of PostgreSQL. Due to data alignment, adding a boolean to the integer in the index will not make the index grow at all. Added benefit for hardly any cost.
Update: this changes in Postgres 13 with index deduplication. See:
Is a composite index also good for queries on the first field?
However, indexed columns prevent HOT (Heap Only Tuple) updates. Say, an UPDATE changes only the column archived. If the column isn't used by any index (in any way), the row can be HOT updated. Else, this shortcut cannot be taken. More on HOT updates:
Redundant data in update statements
It all depends on your actual workload.
How about something like:
select not exists (select 1 from table where ticket=1 and not archived)
I think this might be advantageous over comparing the counts, as a count may or may not use an index and really all you need to know is if any FALSE rows exist for that ticket. I think just creating a partial index on ticket could be incredibly fast.
SQL Fiddle
select not false = any (
select archived
from foo
where ticket = 1
)
SQL Fiddle