How to compare ordered datasets with the dataset before? - sql

I have the following query:
select * from events order by Source, DateReceived
This gives me something like this:
I would like to get the results which i marked blue -> When there are two or more equal ErrorNr-Entries behind each other FROM THE SAME SOURCE.
So I have to compare every row with the row before. How can I achieve that?
This is what I want to get:

Apply the row number over partition by option on your table:
SELECT
ROW_NUMBER() OVER(PARTITION BY Source ORDER BY datereceived)
AS Row,
* FROM events
Either you can run a (max) having > 1 option on the result set's row number. Or if you need the details, apply the same query deducting the row nuber with 1.
Then you can make a join on the source and the row numbers and if the error nr is the same then you have a hit.

You can use the partition by as below.
select * from(select
*,row_number()over(partition by source,errornr order by Source, DateReceived) r
from
[yourtable])t
where r>1
You can specify your column names in the outer select.

Related

How to get first row of 3 specific values of a column using Oracle SQL?

I have a table which has ID, FAMILY, ENV_XML_PATH and CREATED_DATE columns.
ID
FAMILY
ENV_XML_PATH
CREATED_DATE
15826841
CRM
path1.xml
03-09-22 6:50:34AM
15826856
SCM
path3.xml
03-10-22 7:12:20AM
15826786
IC
path4.xml
02-10-22 12:50:52AM
15825965
CRM
path5.xml
02-10-22 1:50:52AM
15653951
null
path6.xml
04-10-22 12:50:52AM
15826840
FIN
path7.xml
03-10-22 2:34:09AM
15826841
SCM
path8.xml
02-10-22 8:40:52AM
15223450
IC
path9.xml
03-09-22 5:34:09AM
15026853
SCM
path10.xml
05-10-22 4:40:59AM
Now there are 18 DISTINCT values in FAMILY column and each value has multiple rows associated (as you can see from the above image).
What I want is to get the first row of 3 specific values (CRM, SCM and IC) in FAMILY column.
Something like this:
ID
FAMILY
ENV_XML_PATH
CREATED_DATE
15826841
CRM
path1.xml
date1
15826856
SCM
path3.xml
date2
15826786
IC
path4.xml
date3
I am new to this, though I understand the logic but I am not sure how to implement it. Kindly help. Thanks.
You can use RANK for that. Something like this:
WITH groupedData AS
(SELECT id, family, env_xml_path, created_date,
RANK () OVER (PARTITION BY family ORDER BY id) AS r_num
FROM yourtable
GROUP BY id, family, env_xml_path, created_date)
SELECT id, family, env_xml_path, created_date
FROM groupedData
WHERE r_num = 1
ORDER BY id;
Thus, within the first query, your data will be grouped by family and sorted by the column you want (in my example, it will be sorted by id).
After that, you will use the second query to only take the first row of each family.
Add a WHERE clause to the first query if you need to apply further restrictions on the result set.
See here a working example: db<>fiddle
You could use a window function to get to know the row number of each partition in family ordered by the created_date, and then filter by the the three families you are interested in:
with row_window as (
select
id,
family,
env_xml_path,
created_date,
row_number() over (partition by family order by created_date asc) as rn
from <your_table>
where family in ('CRM', 'SCM', 'IC')
)
select
id,
family,
env_xml_path,
created_date
from row_window
where rn = 1
Output:
ID
FAMILY
ENV_XML_PATH
CREATED_DATE
15826841
CRM
path1.xml
03-09-22 6:50:34
15826856
SCM
path3.xml
03-10-22 7:12:20
15826786
IC
path4.xml
02-10-22 12:50:52
The question doesn't really specify what 'first' means, but I assume it means the first to be added in the table, aka the person whose date is the oldest. Try this code:
SELECT DISTINCT * FROM (yourTable) WHERE Family = 'CRM' OR
Family = 'SCM' OR Family = 'IC' ORDER BY Created_Date ASC FETCH FIRST (number) ROWS ONLY;
What it does:
Distinct - It selects different rows, which means you won't get same type of rows at the top.
Where - checks if certain condition is true
OR - it means that the select should choose rows that match those requirements. In the current situation the distinct clause means that same rows won't repeat, so you won't be getting 2 different 'CRM' family names, so it will find the first 'CRM' then the first 'SCM' and so on.
ORDER BY - orders the column in specified order. In the current one, if first rows mean the oldest, then by ordering them by date and using ASC the oldest(aka smallest date) will be at the top.
FETCH FIRST (number) ROWS ONLY - It selects only the very first couple of rows you want. For example if you need 3 different 'first' rows you need to get FETCH FIRST 3 ROWS ONLY. Combined with the distinct word it will only show 3 different rows.

How to determine the order of the result from my postgres query?

I have the following query:
SELECT
time as "time",
case
when tag = 'KEB1.DB_BP.01.STATUS.SOC' THEN 'SOC'
when tag = 'KEB1.DB_BP.01.STATUS.SOH' THEN 'SOH'
end as "tag",
value as "value"
FROM metrics
WHERE
("time" BETWEEN '2021-07-02T10:39:47.266Z' AND '2021-07-09T10:39:47.266Z') AND
(container = '1234') AND
(tag = 'KEB1.DB_BP.01.STATUS.SOC' OR tag = 'KEB1.DB_BP.01.STATUS.SOH')
GROUP BY 1, 2, 3
ORDER BY time desc
LIMIT 2
This is giving me the result:
Sometimes the order changes of the result changes from SOH -> SOC or from SOC -> SOH. I'm trying to modify my query so I always get SOH first and than SOC.. How can I achieve this?
You have two times that are identical. The order by is only using time as a key. When the key values are identical, the resulting order for those keys is arbitrary and indeterminate. In can change from one execution to the next.
To prevent this, add an additional column to the order by so each row is unique. In this case that would seem to be tag:
order by "time", tag
You want to show the two most recent rows. In your example these have the same date/time but they can probably also differ. In order to find the two most recent rows you had to apply an ORDER BY clause.
You want to show the two rows in another order, however, so you must place an additional ORDER BY in your query. This is done by selecting from your query result (i.e. putting your query into a subquery):
select *
from ( <your query here> ) myquery
order by tag desc;
Try this:
order by 1 desc, 2
(order by first column descending and by the second column)

How to implement lag function in teradata.

Input :
Output :
I want the output as shown in the image below.
In the output image, 4 in 'behind' is evaluated as tot_cnt-tot and the subsequent numbers in 'behind', for eg: 2 is evaluated as lag(behind)-tot & as long as the 'rank' remains same, even 'behind' should remain same.
Can anyone please help me implement this in teradata?
You appears to want :
select *, (select count(*)
from table t1
where t1.rank > t.rank
) as behind
from table t;
I would summarize the data and do:
select id, max(tot_cnt), max(tot),
(max(tot_cnt) -
sum(max(tot)) over (order by id rows between unbounded preceding and current row)
) as diff
from t
group by id;
This provides one row per id, which makes a lot more sense to me. If you want the original data rows (which are all duplicates anyway), you can join this back to your table.

How to Order only first 20 records in a resultset using SQL?

My requirement is to get the List of Diagnosis based on the most used Diagnosis. So, to achieve that I have added one Column named DiagnosisCounter in the tblDiagnosisMst Table of the database which increases by 1 for each Diagnosis the each time user selects it. So, my query is like below:
select DiagnosisID,DiagnosisCode,Name from tblDiagnosisMst
where GroupName = 'Common' and RecStatus = 'A' order by DiagnosisCounter desc,
Name asc
So, this query is helping me to get the list of Diagnosis but in descending order for Diagnosis and then alphabetically for Diagnosis Name. But now my client wants to show only 20 most used Diagnosis name at the top and then all the names should appear in alphabetical order. But unfortunately I am stuck in this point. It would be so appreciative if I get your helpful advice for this problem.
This should do the trick:
;With Ordered as (
select DiagnosisID,DiagnosisCode,Name,
ROW_NUMBER() OVER (ORDER BY DiagnosisCounter desc) as rn
from tblDiagnosisMst
where GroupName = 'Common' and RecStatus = 'A'
)
select * from Ordered
order by CASE WHEN rn <= 20 THEN rn ELSE 21 END,
Name asc
We use ROW_NUMBER to assign the numbers 1-x to each of the rows, based on the diagnosiscounter. We then use that value for the first ORDER BY condition if it's in 1-20, and all other rows sort equally in position 21. The second condition is then used as a tie-breaker to sort those remaining row by name.
Try this
SELECT TOP 20
* FROM tblDiagnosisMst ORDER BY DiagnosisCounter;

Select finishes where athlete didn't finish first for the past 3 events

Suppose I have a database of athletic meeting results with a schema as follows
DATE,NAME,FINISH_POS
I wish to do a query to select all rows where an athlete has competed in at least three events without winning. For example with the following sample data
2013-06-22,Johnson,2
2013-06-21,Johnson,1
2013-06-20,Johnson,4
2013-06-19,Johnson,2
2013-06-18,Johnson,3
2013-06-17,Johnson,4
2013-06-16,Johnson,3
2013-06-15,Johnson,1
The following rows:
2013-06-20,Johnson,4
2013-06-19,Johnson,2
Would be matched. I have only managed to get started at the following stub:
select date,name FROM table WHERE ...;
I've been trying to wrap my head around the where clause but I can't even get a start
I think this can be even simpler / faster:
SELECT day, place, athlete
FROM (
SELECT *, min(place) OVER (PARTITION BY athlete
ORDER BY day
ROWS 3 PRECEDING) AS best
FROM t
) sub
WHERE best > 1
->SQLfiddle
Uses the aggregate function min() as window function to get the minimum place of the last three rows plus the current one.
The then trivial check for "no win" (best > 1) has to be done on the next query level since window functions are applied after the WHERE clause. So you need at least one CTE of sub-select for a condition on the result of a window function.
Details about window function calls in the manual here. In particular:
If frame_end is omitted it defaults to CURRENT ROW.
If place (finishing_pos) can be NULL, use this instead:
WHERE best IS DISTINCT FROM 1
min() ignores NULL values, but if all rows in the frame are NULL, the result is NULL.
Don't use type names and reserved words as identifiers, I substituted day for your date.
This assumes at most 1 competition per day, else you have to define how to deal with peers in the time line or use timestamp instead of date.
#Craig already mentioned the index to make this fast.
Here's an alternative formulation that does the work in two scans without subqueries:
SELECT
"date", athlete, place
FROM (
SELECT
"date",
place,
athlete,
1 <> ALL (array_agg(place) OVER w) AS include_row
FROM Table1
WINDOW w AS (PARTITION BY athlete ORDER BY "date" ASC ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
) AS history
WHERE include_row;
See: http://sqlfiddle.com/#!1/fa3a4/34
The logic here is pretty much a literal translation of the question. Get the last four placements - current and the previous 3 - and return any rows in which the athlete didn't finish first in any of them.
Because the window frame is the only place where the number of rows of history to consider is defined, you can parameterise this variant unlike my previous effort (obsolete, http://sqlfiddle.com/#!1/fa3a4/31), so it works for the last n for any n. It's also a lot more efficient than the last try.
I'd be really interested in the relative efficiency of this vs #Andomar's query when executed on a dataset of non-trivial size. They're pretty much exactly the same on this tiny dataset. An index on Table1(athlete, "date") would be required for this to perform optimally on a large data set.
; with CTE as
(
select row_number() over (partition by athlete order by date) rn
, *
from Table1
)
select *
from CTE cur
where not exists
(
select *
from CTE prev
where prev.place = 1
and prev.athlete = cur.athlete
and prev.rn between cur.rn - 3 and cur.rn
)
Live example at SQL Fiddle.