I am trying to write a query in which I update a counter based on other conditions. For example:
with table 1 as (select *, count from table1)
select box_type,
case when box_type = lag(box_type) over (order by time)
then
count, update table1 set count = count + 1
else
count
end as identifier
Here's the basic jist of what I'm trying to do. I want a table that looks like this:
box_type
identifier
small
1
small
1
small
1
medium
2
medium
2
large
3
large
3
small
4
I just want to increment that identifier value every time the box_type changes
Thank you!
Your question only makes sense if you have a column that sepcifies the ordering. Let me assume such a column exists -- based on your code, I'll call it time.
Then, you can use lag() and a cumulative sum:
select t1.*,
count(*) filter (where box_type is distinct from prev_box_type) over (order by time) as count
from (select t1.*,
lag(box_type) over (order by time) as prev_box_type
from table1 t1
) t1
Related
New to SQL here - I am trying to get 1 row from a table matching to a particular criteria
Typically this would look like
SELECT TOP 1 *
FROM myTable
WHERE id = 'abc'
The output may look like
value id
--------------
1 abc
The table has many entries for an 'id', and I am trying to get one entry per 'id'. Now I have list of 'id's. How would I execute something like
SELECT TOP 1 *
FROM myTable
FOR EACH id
WHERE id IN ('abc', 'edf', 'fgh')
Expecting result like
value id
--------------
1 abc
10 edf
12 fgh
I do not know if it is some sort union or concat operation, but would like to learn. I am working on Azure SQL Server
The table has many entries for an 'id', and I am trying to get one entry per 'id'. Now I have list of 'id's.
A typical method is row_number():
select t.*
from (select t.*,
row_number() over (partition by id order by id) as seqnum
from mytable t
) t
where seqnum = 1;
Note: you can filter on particular ids, if you want. It is unclear if that is really required for your question.
If you happen to be using SQL Server (as select top suggests), you can use the more concise, but somewhat less performant:
select top (1) with ties t.*
from mytable t
order by row_number() over (order by id order by (select null));
I'm looking for an efficient approach where I can assign numbers in sequence to each group.
Record Group GroupSequence
-------|---------|--------------
1 Car 1
2 Car 2
3 Bike 1
4 Bus 1
5 Bus 2
6 Bus 3
I came through this question: How to add sequence number for groups in a SQL query without temp tables. But my use case is slightly different from it. Any ideas on how to accomplish this with a single query?
You are looking for row_number():
select t.*, row_number() over (partition by group order by record) as group_sequence
from t;
You can calculate this when you need it, so I see no reason to store it. However, you can update the values if you like:
update t
set group_sequence = tt.new_group_sequence
from (select t.*,
row_number() over (partition by group order by record) as new_group_sequence
from t
) tt
where tt.record = t.record;
I have a large data set with about 100 million rows that I want to 'compress' the data set and get a 1% sample of the entire dataset while ensuring relativity.
How can such query be implemented?
Step 1: create the helper table
You can use aggregation to group records by visit_id, and CROSS JOIN with a query that computes the total number of records in the table to compute the distribution percent:
CREATE TABLE my_helper AS
SELECT
t.visit_number,
COUNT(*) visit_count,
SUM(t.purchase_id) sum_purchase,
COUNT(*)/total.cnt distribution
FROM
mytable t
CROSS JOIN (SELECT COUNT(*) cnt FROM mytable) total
GROUP BY t.visit_number
Step 2: sample the main table using the helper table
Within a subquery, you can use ROW_NUMBER() OVER(PARTITION BY visit_number ORDER BY RANDOM()) to assign a random rank to each record within groups of records sharing the same visit_id. Then, in the outer query, you can join on the helper table to select the corect amount of records for each visit_id:
SELECT x.*
FROM (
SELECT
t.*,
ROW_NUMBER() OVER(PARTITION BY visit_number ORDER BY RANDOM()) rn
FROM mytable t
) x
INNER JOIN my_helper h ON h.visit_number = x.visit_number
WHERE x.rn <= 1000000 * h.distribution
Side notes:
this only works if there are indeed more than 1 million record in the source table
the exact number of records in the output might be slightly below or above 1 million (depending on the distribution in the original table)
it should be possible to combine both queries into a single one, which would avoid the need to use a helper table
This is doable. A quick way is to take every nth record only.
1) order by a random column (probably ID)
2) apply a nownum() attribute
3) apply a mod(rownum) = 0 on whatever percent makes sense (e.g. 1% would be rownum mod 100)
You may need steps 1/2 in a sub query and step 3 on the outside.
Enjoy and good luck!
Hi guys i have a postgres table with a column for event and a column for sequence. Every event may have multiple sequences. For ex:
Event | Sequence
a | 1
a | 4
a | 5
b | 1
b | 2
Now i know that select min(sequence) group by event gives me the minimum sequence. How do i get the very next value after the min value. i hope that makes sense. Thanks in advance.
I'm Using Postgres 9.3.
You can use ROW_NUMBER() partitioning by Event and ordering by Sequence to get the second lowest sequence number per Event;
SELECT Event, Sequence
FROM (
SELECT Event, Sequence,
ROW_NUMBER() OVER (PARTITION BY Event ORDER BY Sequence) rn
FROM Table1
) z
WHERE rn = 2;
An SQLfiddle to test with.
EDIT A bit more complicated, but if you need a query that doesn't rely on ROW_NUMBER(), use a subquery with a self-join to exclude rows with minimum sequence for each event:
SELECT outer_query.Event, MIN(outer_query.Sequence) AS SecondMinSeq
FROM Table1 as outer_query
INNER JOIN (
SELECT Table1.Event, MIN(Sequence) AS MinSeq
FROM Table1
GROUP BY Table1.Event
) AS min_sequences
ON outer_query.Event = min_sequences.Event AND outer_query.Sequence <> min_sequences.MinSeq
GROUP BY outer_query.Event
SQL Fiddle: http://sqlfiddle.com/#!15/4438b/7
Suppose I have a table filled with the data below, what SQL function or query I should use in db2 to retrieve all rows having the FIRST field FLD_A with value A, the FIRST field FLD_A with value B..and so on?
ID FLD_A FLD_B
1 A 10
2 A 20
3 A 30
4 B 10
5 A 20
6 C 30
I am expecting a table like below; I am aware of grouping done by function GROUP BY but how can I limit the query to return the very first of each group?
Essentially I would like to have the information about the very first row where a new value for FLD_A is appearing for the first time?
ID FLD_A FLD_B
1 A 10
4 B 10
6 C 30
Try this it works in sql
SELECT * FROM Table1
WHERE ID IN (SELECT MIN(ID) FROM Table1 GROUP BY FLD_A)
A good way to approach this problem is with window functions and row_number() in particular:
select t.*
from (select t.*,
row_number() over (partition by fld_a order by id) as seqnum
from table1
) t
where seqnum = 1;
(This is assuming that "first" means "minimum id".)
If you use t.*, this will add one extra column to the output. You can just list the columns you want to avoid this.