I'm attempting to provide a grouping id to a set of items using NTILE(). Basically, every 4 items should be grouped together with the same GroupID. The problem is that the total number of rows is different per id. Is this possible?
SELECT
ProductDescription AS LabelType1,
NTILE(FLOOR(COUNT(bc.Groupings) / 4)) OVER (ORDER BY s.OrderId) AS GroupNumber,
Barcode AS Barcode1
FROM
dbo.table1 s
INNER JOIN
#BoxCounts bc ON s.OrderId = bc.OrderId
This is an elaboration on Ben Thul's comment (because he has not answered the question).
NTILE() divides a set of rows into n almost equal-sized groups. The n is a constant.
You want to assign a grouping id to a fixed number of rows. That is a different problem and easily handled with row_number() or rank().
So, one method is:
SELECT ProductDescription AS LabelType1,
(ROW_NUMBER() OVER (ORDER BY s.OrderId) - 1) / 4 as GroupNumber,
Barcode AS Barcode1
FROM dbo.table1 s INNER JOIN
#BoxCounts bc
ON s.OrderId = bc.OrderId;
Note the - 1 in the calculation, so the first group has four elements. Also, SQL Server does integer division, so you don't have to worry about additional decimal places.
If you could have ties and want all rows with the same OrderId to be in the same group, then use dense_rank() (if you want all groups to have four different order ids) or rank() (if you want all groups to have approximately four order ids).
Related
I have a table which has ID, FAMILY, ENV_XML_PATH and CREATED_DATE columns.
ID
FAMILY
ENV_XML_PATH
CREATED_DATE
15826841
CRM
path1.xml
03-09-22 6:50:34AM
15826856
SCM
path3.xml
03-10-22 7:12:20AM
15826786
IC
path4.xml
02-10-22 12:50:52AM
15825965
CRM
path5.xml
02-10-22 1:50:52AM
15653951
null
path6.xml
04-10-22 12:50:52AM
15826840
FIN
path7.xml
03-10-22 2:34:09AM
15826841
SCM
path8.xml
02-10-22 8:40:52AM
15223450
IC
path9.xml
03-09-22 5:34:09AM
15026853
SCM
path10.xml
05-10-22 4:40:59AM
Now there are 18 DISTINCT values in FAMILY column and each value has multiple rows associated (as you can see from the above image).
What I want is to get the first row of 3 specific values (CRM, SCM and IC) in FAMILY column.
Something like this:
ID
FAMILY
ENV_XML_PATH
CREATED_DATE
15826841
CRM
path1.xml
date1
15826856
SCM
path3.xml
date2
15826786
IC
path4.xml
date3
I am new to this, though I understand the logic but I am not sure how to implement it. Kindly help. Thanks.
You can use RANK for that. Something like this:
WITH groupedData AS
(SELECT id, family, env_xml_path, created_date,
RANK () OVER (PARTITION BY family ORDER BY id) AS r_num
FROM yourtable
GROUP BY id, family, env_xml_path, created_date)
SELECT id, family, env_xml_path, created_date
FROM groupedData
WHERE r_num = 1
ORDER BY id;
Thus, within the first query, your data will be grouped by family and sorted by the column you want (in my example, it will be sorted by id).
After that, you will use the second query to only take the first row of each family.
Add a WHERE clause to the first query if you need to apply further restrictions on the result set.
See here a working example: db<>fiddle
You could use a window function to get to know the row number of each partition in family ordered by the created_date, and then filter by the the three families you are interested in:
with row_window as (
select
id,
family,
env_xml_path,
created_date,
row_number() over (partition by family order by created_date asc) as rn
from <your_table>
where family in ('CRM', 'SCM', 'IC')
)
select
id,
family,
env_xml_path,
created_date
from row_window
where rn = 1
Output:
ID
FAMILY
ENV_XML_PATH
CREATED_DATE
15826841
CRM
path1.xml
03-09-22 6:50:34
15826856
SCM
path3.xml
03-10-22 7:12:20
15826786
IC
path4.xml
02-10-22 12:50:52
The question doesn't really specify what 'first' means, but I assume it means the first to be added in the table, aka the person whose date is the oldest. Try this code:
SELECT DISTINCT * FROM (yourTable) WHERE Family = 'CRM' OR
Family = 'SCM' OR Family = 'IC' ORDER BY Created_Date ASC FETCH FIRST (number) ROWS ONLY;
What it does:
Distinct - It selects different rows, which means you won't get same type of rows at the top.
Where - checks if certain condition is true
OR - it means that the select should choose rows that match those requirements. In the current situation the distinct clause means that same rows won't repeat, so you won't be getting 2 different 'CRM' family names, so it will find the first 'CRM' then the first 'SCM' and so on.
ORDER BY - orders the column in specified order. In the current one, if first rows mean the oldest, then by ordering them by date and using ASC the oldest(aka smallest date) will be at the top.
FETCH FIRST (number) ROWS ONLY - It selects only the very first couple of rows you want. For example if you need 3 different 'first' rows you need to get FETCH FIRST 3 ROWS ONLY. Combined with the distinct word it will only show 3 different rows.
I have a table in Postgres with about half a million rows and an integer primary key.
I'd like to split its entire PK space into N ranges of approximately same size for independent processing. How do I best do it?
I apparently can do it by fetching all PK values to a client and remember every N-th value. This does a full scan and a fetch of all the values, while I only want no more than N+1 of them.
I can select min and max values and cut the range, but if the PKs are not distributed quite evenly, it may give me some ranges of seriously different sizes.
I want ranges for index-based access later on, so any modulo-based tricks do mot apply.
Is there any nice SQL-based solution that does not involve fetching all the keys to a client? Writing an N-specific query, e.g. with N clauses, if fine.
An example:
IDs in a range, say, from 1234 to 567890, N = 4.
I'd like to get 4 numbers, say 127123, 254789, 379860, so than there are approximately 125k records in each of the ranges of IDs [1234, 127123], [127123, 254789], [254789, 379860], [379860, 567890].
Update:
I've come up with a solution like this:
select
percentile_disc(0.25) within group (order by c.id) over() as pct_25
,percentile_disc(0.50) within group (order by c.id) over() as pct_50
,percentile_disc(0.75) within group (order by c.id) over() as pct_75
from customer c
limit 1
;
It does a decent job of giving me the exact range boundaries, and runs only a few seconds, which is fine for my purposes.
What bothers me is that I have to add the limit 1 clause to get just one row. Without it, I receive identical rows, one per record in the table. Is there a better way to get just a one row of the percentiles?
I think you can use row_number() for this purpose. Something like this:
select t.*,
floor((seqnum * N) / cnt) as range
from (select t.*,
row_number() over (order by pk) - 1 as seqnum,
count(*) over () as cnt
from t
) t;
This assumes by range that you mean ranges on pk values. You can also move the range expression to a where clause to just select one particular range.
I have multiple tables that contain the name of a company/attribute and a ranking.
I would like to write a piece of code which allows a range of Scores to be placed into specific Groups based on the percentile of the score in relationship to tables Score total. I provided a very easy use case to demonstrate what I am looking for, splitting a group of 10 companies into 5 groups, but I would like to scales this in order to apply the 5 groups to data sets with many rows WITHOUT having to specify values in a CASE statement.
You can use NTILE to divide the data into 5 buckets based on score. However, if the data can't be divided into equal number of bins or if there are ties, one of the groups will have more members.
SELECT t.*, NTILE(5) OVER(ORDER BY score) as grp
FROM tablename t
Read more about NTILE here
NTILE(5) OVER(ORDER BY score) might actually put rows with the same value into different quantiles (This is probably not what you want, at least I never liked that).
It's quite similar to
5 * (row_number() over (order by score) - 1) / count(*) over ()
but if the number of rows can't be evenly divided the remainder rows are added to the first quantiles when using NTILE and randomly for ROW_NUMBER.
To assign all the rows with the same value to the same quantile you need to do your own calculation:
5 * (rank() over (order by score) - 1) / count(*) over ()
You can try using ROW_NUMBER() and CEILING() :
SELECT t.name,t.score,
CEILING(ROW_NUMBER() OVER(ORDER BY t.score)/2) as group
FROM YourTable t
This will divide each group of two into a single group, using the ROW_NUMBER() result .
I have a table like following :
Orderserialno SKU Units
1234-6789 2x3 5
1234-6789 4x5 7
1334-8905 4x5 2
1334-8905 6x10 2
I need to get the count of distinct orderserialno where Units are not equal within a orderserialno. There could be more combinations of Sku's in an order than what I have mentioned but the eventual goal is to get those orders where units corresponding to various SKUs (in that order) are not equal.
In the above case I should get answer as 1 as orderserialno 1234-6789 has different units.
Thanks
This is a relatively simple GROUP BY query:
SELECT Orderserialno, Units
FROM MyTable
GROUP BY Orderserialno, Units
HAVING COUNT(1) > 1
This would give you all pairs (Orderserialno, Units). To project out the Units, nest this query inside a DISTINCT, like this:
SELECT DICTINCT(Orderserialno) FROM (
SELECT Orderserialno, Units
FROM MyTable
GROUP BY Orderserialno, Units
HAVING COUNT(1) > 1
)
If you need only the total count of Orderserialnos with multiple units, replace DICTINCT(Orderserialno) with COUNT(DICTINCT Orderserialno).
To get the list of such order numbers, use an aggregation query:
select OrderSerialNo
from t
group by OrderSerialNo
having min(Units) <> max(Units)
This uses a trick to see if the units value changes. You can use count(distinct), but that usually incurs a performance overhead. Instead, just compare the minimum and maximum values. If they are different, then the value is not constant.
To get the count, use this as a subquery:
select count(*)
from (select OrderSerialNo
from t
group by OrderSerialNo
having min(Units) <> max(Units)
) t
I have a database table that Stores Maximum and Minimum Price Breaks for a Product.
Does anyone know of the SQL which say if I have a break from one Max to the Min of the next item. E.g. 1-10 12-20 I would like it to return me either the numbers that are missing or at the very least a count or bool if it can detect a break from the Absolute Min and the Absolute Max by going through each range.
SQL Server (MSSQL) 2008
For a database that supports window functions, like Oracle:
SELECT t.*
, CASE LAG(maxq+1, 1, minq) OVER (PARTITION BY prod ORDER BY minq)
WHEN minq
THEN 0
ELSE 1
END AS is_gap
FROM tbl t
;
This will produce is_gap = 1 for a row that forms a gap with the previous row (ordered by minq). If your quantity ranges can overlap, the required logic would need to be provided.
http://sqlfiddle.com/#!4/f609e/4
Something like this, giving max quantities that aren't the overall max for the product and don't have a min quantity following them:
select prev.tbProduct_Id,prev.MaxQuantity
from yourtable prev
left join (select tbProduct_ID, max(MaxQuantity) MaxQuantity from yourtable group by tbProduct_id) maxes
on maxes.tbProduct_ID=prev.tbProduct_Id and maxes.MaxQuantity=prev.MaxQuantity
left join yourtable next
on next.tbProduct_Id=prev.tbProduct_Id and next.MinQuantity=prev.MaxQuantity+1
where maxes.tbProduct_Id is null and next.tbProduct_Id is null;
This would fail on your sample data, though, because it would expect a row with MinQuantity 21, not 20.