Select records for batch processing in loop - sql

I need to select the records in batch wise, like in below example we have 20 records. if I give batch of size of 10 there would be two loops. the problem here is if I do top 10 then 555 value will be split as its position is 10 and 11. hence 555 should also include in that top first batch. how I can achieve this? this is just example, I have 900 million records to process and my batch will be 2 million in real scenario.
ID
-------
111
111
111
222
222
333
333
444
444
555
555
666
666
777
777
888
888

You can use top with ties - this might return more records then stated but will not break similar ids to different batches:
Create and populate sample table (Please save us this step in your future questions):
DECLARE #T AS TABLE
(ID int)
INSERT INTO #T VALUES
(111),(111),(111),
(222),(222),
(333),(333),
(444),(444),
(555),(555),
(666),(666),
(777),(777),
(888),(888)
The select statement:
SELECT TOP 10 WITH TIES ID
FROM #T
ORDER BY ID
Results:
row ID
1 111
2 111
3 111
4 222
5 222
6 333
7 333
8 444
9 444
10 555
11 555

While selecting the records, you can group them by id prior to limiting their number.

Related

How find duplicates of the same id but different date with SQL (Oracle)

I've got a datatable like that :
id
line
datedt
123
1
01/01/2021
123
2
01/01/2021
123
3
01/01/2021
777
1
13/04/2020
777
2
13/04/2020
123
1
12/04/2021
123
2
12/04/2021
888
1
01/07/2020
888
2
01/07/2020
452
1
05/01/2020
888
1
02/05/2021
888
2
02/05/2021
I'd like to obtain a result like that, ie : the nb of same id with differents dates.
Example we can find 123 with 2 diffents dates, 888 too, but only date for 777 and 452
id
nb
123
2
777
1
888
2
452
1
How could I get that ?
Hope it's clear :)
Thanks you
select id , count(distinct datedt) as nb
from table
group by id

Break values in one column into multiple rows

I am trying to explode values belonging to one id into multiple rows.
category_id subcategory_ids
123 111
123
333
465 444
555
The result I am trying to achieve should look like below-
category_id subcategory_ids
123 111
123 123
123 333
465 444
465 555
Below is for BigQuery Standard SQL
#standardSQL
SELECT category_id, subcategory_ids
FROM `project.dataset.table`,
UNNEST(subcategory_ids) subcategory_ids

DB2:how to get top

I have a table having data like
pin id name
3 33 jjj
2 22 bbb
1 111 aaaa
1 112 aa
1 113 aaa
4 44 kkk
I want to print rows of the table where if count(*) group by pin =1 (i.e single entry in table ) print the row
if count(*) group by pin >2 then print first two rows
so my out put should be
pin id name
3 33 jjj
2 22 bbb
1 111 aaaa
1 112 aa
4 44 kkk
Use row_number() OVER(partion by pin order by id) as rownum function . Where rownum <3
. As #Clockwork-Muse said, you need to define an order becase you need to say what do you want to see if there are more than 2 rows for a particular pin.
This will generate you desired output.

Using sequences to create group ID

I'm attempting to create group_ids based on a set of item_ids. The only indication that the item_ids are part of a single group is the fact that item_ids are sequential. For example, based on the first two columns below, the output I want is the third:
item item_id group_id
ABC 282 2
ABC 283 2
ABC 284 2
ABC 285 2
ABC 051 3
ABC 052 3
ABC 189 4
ABC 231 5
ABC 232 5
ABC 233 5
ABC 234 5
ABC 247 6
ABC 248 6
ABC 249 6
ABC 250 6
ABC 091 7
ABC 092 7
The group_id doesn't necessarily have to be sequential itself, it only has to be unique. I attempted this with the following code:
create sequence seq
start with 1
minvalue 1
increment by 1
cache 20;
select seq.nextval from dual; --to initialize the sequence
select
item,
item_id,
case when diff = 1 then seq.currval else seq.nextval end group_id
from
(
select
item,
item_id,
(id - lag(id, 1, 0) over (order by 1) diff
from
(
select
item,
item_id
from
table
)
);
But get the following output:
item item_id group_id
ABC 282 2
ABC 283 3
ABC 284 4
ABC 285 5
ABC 051 6
ABC 052 7
ABC 189 8
ABC 231 9
ABC 232 10
ABC 233 11
ABC 234 12
ABC 247 13
ABC 248 14
ABC 249 15
ABC 250 16
ABC 091 17
ABC 092 18
When looking for the cause of the problem, I found an excellent explanation by user ShannonSeverance that details why my solution won't work. However, it didn't provide any suggestions on how to move forward.
Does anyone have any ideas?
You have a problem, because SQL tables are inherently unordered. The following "should" logically work, although it won't in practice:
select ii.*, (item_id - rownum) as grp_id
from item_ids ii;
A sequence of item_ids in order minus the row number is constant. You can use that for a group, at least for a given item. To handle multiple items, concatenate the values together:
select ii.*, item||'-'||(item_id - rownum) as grp_id
from item_ids ii;
To really make this work, you need to add an order by -- this guarantees the ordering of the results from the select. This might work, assuming that there are "holes" between the groups:
select ii.*, item||'-'||(item_id - rownum) as grp_id
from item_ids ii
order by item, item_id;
Otherwise, you need some other column to determine the proper ordering for the items.

Combine rows adding specific columns

I have a table similar to the following:
employee_id | totalWorkHours | projectID
1 20 123
1 20 321
2 15 222
2 25 333
3 10 434
3 12 343
Is it possible to combine rows based on employee_id, but add totalWorkHours into an actual total for an employee and present in a result set without modifying the table?
So the results would be something like:
employee_id | actualTotalWorkHours
1 40
2 40
3 22
Or is this something better done with the raw result set?
Any help is much appreciated.
Select employee_id, Sum(totalWorkHours) As actualWorkHours
From YourTableName
Group By employee_id
Order By employee_id