Postgresql query to check string break

Postgresql query to check string break - sql

I have a problem to perform a search for numerical sequence break in postgresql, I need that, for example, if my seq column is out of sequence I will be notified by email, for example my sequence is: 340,341,342 if it is: 340 342,343 this with a sequence failure.

You can use next query to find anomalies:
select count(*) from (
select id - (lag(id) over (order by id)) diff from tbl
) t where diff is not null and diff <> 1;
When result of the query not 0 anomaly found
SQL online editor

Related

Sequence generation based on a conditon using Informatica

I need to achieve following data transformation using Informatica,
The first picture is the sample input data
The data after transformations should look like as below,
Here for the different type I have the sequence staring from 200 in this case bfd should be considered as one group and (klm,kln) together as other group. The new id is id+the sequence number
Should I do this using UDF or by creating a procedure.. or using some sets of transformations..
I am new to informatica and am confused about what approach should I follow..
Thanks for the help in advance!

you can do this using two separate sequence transformation.
first one will start from 0 and increment by 1.
second one will start from 200 and increment by 1.
Then use an IIF condition to generate new_id column. if you have few different type of sequence, you can do it with IIF. But if you have hundreds, probably you need to use some UDF etc.
seq_1 = attach NEXTVAL from sequence generator 1
seq_2 = attach NEXTVAL from sequence generator 2
new_id = TO_INTEGER ( IIF( group ='bfd', id || seq_1,
IIF( group = 'klm' or group ='kln', id || seq_2)
)
)

You can write a query to do this:
select t.*,
(case when group = 'bfd'
then id * 10 + row_number() over (partition by group order by id)
when group in ('klm', 'kln')
then id * 100 + row_number() over (partition by case when group in ('klm', 'kln') then 1 else 2 end order by id)
end) as new_id
from t;

Sql -after group by I need to take rows with newest date

I need to write a query in sql and I can't do it correctly. I have a table with 7 columns 1st_num, 2nd_num, 3rd_num, opening_Date, Amount, code, cancel_Flag.
For every 1st_num, 2nd_num, 3rd_num I want to take only the record with the min (cancel_flag), and if there's more then 1 row so take the the newest opening Date.
But when I do group by and choose min and max for the relevant fields, I get a mix of the rows, for example:
1. 12,130,45678,2015-01-01,2005,333,0
2. 12,130,45678,2015-01-09,105,313,0
The result will be
:12,130,45678,2015-01-09,2005,333,0
and that mixes the rows into one
Microsoft sql server 2008 . using ssis by visual studio 2008
my code is :
SELECT
1st_num,
2nd_num,
3rd_num,
MAX(opening_date),
MAX (Amount),
code,
MIN(cancel_flag)
FROM do. tablename
GROUP BY
1st_num,
2nd_num,
3rd_num,
code
HAVING COUNT(*) > 1
How do I take the row with the max date or.min cancel flag as it is without mixing values?
I can't really post my code because of security reasons but I'm sure you can help.
thank you,
Oren

It is very difficult like this to answer, because every DBMS has different syntax.
Anyways, for most dbms this should work. Using row_number() function to rank the rows, and take only the first one by our definition (all your conditions):
SELECT * FROM (
SELECT t.*,
ROW_NUMBER() OVER ( PARTITION BY t.1st_num,t.2nd_num,t.3rd_num order by t.cancel_flag asc,t.opening_date desc) as row_num
FROM YourTable t ) as tableTempName
WHERE row_num = 1

Use NOT EXISTS to return a row as long as no other row with same 1st_num, 2nd_num, 3rd_num has a lower cancel_flag value, or same cancel_flag but a higher opening_Date.
select *
from tablename t1
where not exists (select 1 from tablename t2
where t2.1st_num = t1.1st_num
and t2.2nd_num = t1.2nd_num
and t2.3rd_num = t1.3rd_num
and (t2.cancel_flag < t1.cancel_flag
or (t2.cancel_flag = t1.cancel_flag and
t2.opening_Date > t1.opening_Date)))
Core ANSI SQL-99, expected to work with (almost) any dbms.

Select finishes where athlete didn't finish first for the past 3 events

Suppose I have a database of athletic meeting results with a schema as follows
DATE,NAME,FINISH_POS
I wish to do a query to select all rows where an athlete has competed in at least three events without winning. For example with the following sample data
2013-06-22,Johnson,2
2013-06-21,Johnson,1
2013-06-20,Johnson,4
2013-06-19,Johnson,2
2013-06-18,Johnson,3
2013-06-17,Johnson,4
2013-06-16,Johnson,3
2013-06-15,Johnson,1
The following rows:
2013-06-20,Johnson,4
2013-06-19,Johnson,2
Would be matched. I have only managed to get started at the following stub:
select date,name FROM table WHERE ...;
I've been trying to wrap my head around the where clause but I can't even get a start

I think this can be even simpler / faster:
SELECT day, place, athlete
FROM (
SELECT *, min(place) OVER (PARTITION BY athlete
ORDER BY day
ROWS 3 PRECEDING) AS best
FROM t
) sub
WHERE best > 1
->SQLfiddle
Uses the aggregate function min() as window function to get the minimum place of the last three rows plus the current one.
The then trivial check for "no win" (best > 1) has to be done on the next query level since window functions are applied after the WHERE clause. So you need at least one CTE of sub-select for a condition on the result of a window function.
Details about window function calls in the manual here. In particular:
If frame_end is omitted it defaults to CURRENT ROW.
If place (finishing_pos) can be NULL, use this instead:
WHERE best IS DISTINCT FROM 1
min() ignores NULL values, but if all rows in the frame are NULL, the result is NULL.
Don't use type names and reserved words as identifiers, I substituted day for your date.
This assumes at most 1 competition per day, else you have to define how to deal with peers in the time line or use timestamp instead of date.
#Craig already mentioned the index to make this fast.

Here's an alternative formulation that does the work in two scans without subqueries:
SELECT
"date", athlete, place
FROM (
SELECT
"date",
place,
athlete,
1 <> ALL (array_agg(place) OVER w) AS include_row
FROM Table1
WINDOW w AS (PARTITION BY athlete ORDER BY "date" ASC ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
) AS history
WHERE include_row;
See: http://sqlfiddle.com/#!1/fa3a4/34
The logic here is pretty much a literal translation of the question. Get the last four placements - current and the previous 3 - and return any rows in which the athlete didn't finish first in any of them.
Because the window frame is the only place where the number of rows of history to consider is defined, you can parameterise this variant unlike my previous effort (obsolete, http://sqlfiddle.com/#!1/fa3a4/31), so it works for the last n for any n. It's also a lot more efficient than the last try.
I'd be really interested in the relative efficiency of this vs #Andomar's query when executed on a dataset of non-trivial size. They're pretty much exactly the same on this tiny dataset. An index on Table1(athlete, "date") would be required for this to perform optimally on a large data set.

; with CTE as
(
select row_number() over (partition by athlete order by date) rn
, *
from Table1
)
select *
from CTE cur
where not exists
(
select *
from CTE prev
where prev.place = 1
and prev.athlete = cur.athlete
and prev.rn between cur.rn - 3 and cur.rn
)
Live example at SQL Fiddle.

SQL Server : how to select a fixed amount of rows (select every x-th value)

A short description: I have a table with data that is updated over a certain time period. Now the problem is, that - depending on the nature of the sensor which sends the data - in this time period there could be either 50 data sets or 50.000. As I want to visualize this data (using ASP.NET / c#), for a first preview I would like to SELECT just 1000 values from the table.
I already have an approach doing this: I count the rows in the time period of interest, with a simple "where" clause to specify the sensor-id, save it as a variable in SQL, and then divide the count() by 1000. I've tried it in MS Access, where it works just fine:
set #divider = select count(*) from table where [...]
SELECT (Int([RowNumber]/#divider)), First(Value)
FROM myTable
GROUP BY (Int([RowNumber]/#divider));
The trick in Access was, that I simply have a data field ("RowNumber"), which is my PK/ID, and goes from 0 up. I tried to accomplish that in SQL Server using the ROW_NUMBER() method, which works more or less. I've got the right syntax for the method, but I can not use the GROUP BY statement
Windowed functions can only appear in the SELECT or ORDER BY
clauses.
meaning ROW_NUMBER() can't be in the GROUP BY statement.
Now I'm kinda stuck. I've tried to save the ROW_NUMBER value into a char or a separate column, and GROUP BY it later on, but I couldn't get it done. And somehow I start to think, that my strategy might have its weaknesses ...? :/
To clarify once more: I don't need to SELECT TOP 1000 from my table, because this would just mean that I select the first 1000 values (depending on the sorting). I need to SELECT every x-th value, while I can compute the x (and I could even round it to an INT, if that would help to get it done). I hope I was able to describe the problem understandable ...
This is my first post here on StackOverflow, I hope I didn't forget anything essential or important, if you need any further information (table structure, my queries so far, ...) please don't hesitate to ask. Any help or hint is highly appreciated - thanks in advance! :)
Update: SOLUTION! Big thanks to https://stackoverflow.com/users/52598/lieven!!!
Here is how I did it in the end:
I declare 2 variables - I count my rows and SET it into the first var. Then I use ROUND() on the just assigned variable, and divide it by 1000 (because in the end I want ABOUT 1000 values!). I split this operation into 2 variables, because if I used the value from the COUNT function as basis for my ROUND operation, there were some mistakes.
declare #myvar decimal(10,2)
declare #myvar2 decimal(10,2)
set #myvar = (select COUNT(*)
from value_table
where channelid=135 and myDate >= '2011-01-14 22:00:00.000' and myDate <= '2011-02-14 22:00:00.000'
)
set #myvar2 = ROUND(#myvar/1000, 0)
Now I have the rounded value, which I want to be my step-size (take every x-th value -> this is our "x" ;)) stored in #myvar2. Next I will subselect the data of the desired timespan and channel, and add the ROW_NUMBER() as column "rn", and finally add a WHERE-clause to the outer SELECT, where I divide the ROW_NUMBER through #myvar2 - when the modulus is 0, the row will be SELECTed.
select * from
(
select (ROW_NUMBER() over (order by id desc)) as rn, myValue, myDate
from value_table
where channel_id=135 and myDate >= '2011-01-14 22:00:00.000' and myDate<= '2011-02-14 22:00:00.000'
) d
WHERE rn % #myvar2 = 0
Works like a charm - once again all my thanks to https://stackoverflow.com/users/52598/lieven, see the comment below for the original posting!

In essence, all you need to do to select the x-th value is retain all rows where the modulus of the rownumber divided by x is 0.
WHERE rn % #x_thValues = 0
Now to be able to use your ROW_NUMBER's result, you'll need to wrap the entire statement into in a subselect
SELECT *
FROM (
SELECT *
, rn = ROW_NUMBER() OVER (ORDER BY Value)
FROM DummyData
) d
WHERE rn % #x_thValues = 0
Combined with a variable to what x-th values you need, you might use something like this testscript
DECLARE #x_thValues INTEGER = 2
;WITH DummyData AS (SELECT * FROM (VALUES (1), (2), (3), (4)) v (Value))
SELECT *
FROM (
SELECT *
, rn = ROW_NUMBER() OVER (ORDER BY Value)
FROM DummyData
) d
WHERE rn % #x_thValues = 0

One more option to consider:
Select Top 1000 *
From dbo.SomeTable
Where ....
Order By NewID()
but to be honest- like the previous answer more than this one.
The question could be about performance..

how to select lines in Mysql while a condition lasts

I have something like this:
Name.....Value
A...........10
B............9
C............8
Meaning, the values are in descending order. I need to create a new table that will contain the values that make up 60% of the total values. So, this could be a pseudocode:
set Total = sum(value)
set counter = 0
foreach line from table OriginalTable do:
counter = counter + value
if counter > 0.6*Total then break
else insert line into FinalTable
end
As you can see, I'm parsing the sql lines here. I know this can be done using handlers, but I can't get it to work. So, any solution using handlers or something else creative will be great.
It should also be in a reasonable time complexity - the solution how to select values that sum up to 60% of the total
works, but it's slow as hell :(
Thanks!!!!

You'll likely need to use the lead() or lag() window function, possibly with a recursive query to merge the rows together. See this related question:
merge DATE-rows if episodes are in direct succession or overlapping
And in case you're using MySQL, you can work around the lack of window functions by using something like this:
Mysql query problem

I don't know which analytical functions SQL Server (which I assume you are using) supports; for Oracle, you could use something like:
select v.*,
cumulative/overall percent_current,
previous_cumulative/overall percent_previous from (
select
id,
name,
value,
cumulative,
lag(cumulative) over (order by id) as previous_cumulative,
overall
from (
select
id,
name,
value,
sum(value) over (order by id) as cumulative,
(select sum(value) from mytab) overall
from mytab
order by id)
) v
Explanation:
- sum(value) over ... computes a running total for the sum
- lag() gives you the value for the previous row
- you can then combine these to find the first row where percent_current > 0.6 and percent_previous < 0.6

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Postgresql query to check string break - sql

I have a problem to perform a search for numerical sequence break in postgresql, I need that, for example, if my seq column is out of sequence I will be notified by email, for example my sequence is: 340,341,342 if it is: 340 342,343 this with a sequence failure.

You can use next query to find anomalies: select count(*) from ( select id - (lag(id) over (order by id)) diff from tbl ) t where diff is not null and diff <> 1; When result of the query not 0 anomaly found SQL online editor

Related

Sequence generation based on a conditon using Informatica

Sql -after group by I need to take rows with newest date

Select finishes where athlete didn't finish first for the past 3 events

SQL Server : how to select a fixed amount of rows (select every x-th value)

how to select lines in Mysql while a condition lasts

Categories

Resources