complex sorting sql - sql

I have the following table
Priority Time
100 1
86 3
85 2
I want to sort it by first by priority and then by time, however, priority differce within 20 points are treated the same. e.g. 100 and 85 are considered as the same priority level.
so the result will be:
Priority Time
100 1
85 2
86 3
Thanks,

Try this (assuming that priority is an integer)
select *
from foobar
order by ( priority / 20 ) , -- 0-19 yields 0 , 20-39 yields 1, etc.
time

Related

width_bucket not returning buckets of equal width

I'm using Postgres version 9.6.9 and attempting to use width_bucket() to generate a histogram with buckets consisting of equal widths. However, the query I'm using is not returning buckets of equal widths.
As you can see in the example below, the values in the bucket have varying widths. e.g. bucket 1 has a min of 7 and a max of 18 - a width of 11. bucket 3 has a min of 52 and a max of 55 - a width of 3.
How can I adjust my query to ensure that each bucket has the same width?
Here's what the data looks like:
value
-------
7
7
15
17
18
22
23
25
29
42
52
52
55
60
74
85
90
90
92
95
(20 rows)
Here's the query and resulting histogram:
WITH min_max AS (
SELECT
min(value) AS min_val,
max(value) AS max_val
FROM table
)
SELECT
min(value),
max(value),
count(*),
width_bucket(value, min_val, max_val, 5) AS bucket
FROM table, min_max
GROUP BY bucket
ORDER BY bucket;
min | max | count | bucket
-----+-----+-------+--------
7 | 23 | 7 | 1
25 | 42 | 3 | 2
52 | 55 | 3 | 3
60 | 74 | 2 | 4
85 | 92 | 4 | 5
95 | 95 | 1 | 6
( 6 rows )
From https://prestodb.io/docs/current/functions/window.html
Have a look at ntile():
ntile(n) → bigint
Divides the rows for each window partition into n buckets ranging from 1 to at most n. Bucket values will differ by at most 1. If the number of rows in the partition does not divide evenly into the number of buckets, then the remainder values are distributed one per bucket, starting with the first bucket.
For example, with 6 rows and 4 buckets, the bucket values would be as follows: 1 1 2 2 3 4
Or say to rank each runner's 100m race times to find their personal best out of their 10 races:
SELECT
NTILE(10) over (PARTITION BY runners ORDER BY racetimes)
FROM
table
Your buckets are the same size. You just don't have data that accurately represents the end-points.
For instance, would 24 be in the first or second bucket? This is more notable for the ranges without any data, such as 75-83.
From https://www.oreilly.com/library/view/sql-in-a/9780596155322/re91.html
WIDTH_BUCKET( expression, min, max, buckets)
The buckets argument specifies the number of buckets to create over the range defined by min through max. min is inclusive, whereas max is not.
Maximum is not included. so set
WIDTH_BUCKET( expression, min, max + 1, buckets)

How to do conditional count based on row value in SAS/SQL?

Re-uploading since there was some problems with my last post, and I did not know that we were supposed to post sample data. I'm fairly new to SAS, and I have a problem that I know how to solve in Excel but not SAS. however, the dataset is too large to reasonably use in Excel.
I have four variables: id, year_start, groupname, test_score.
Sample data:
id year_start group_name test_score
1 19931231 Red 90
1 19941230 Red 89
1 19951231 Red 91
1 19961231 Red 92
2 19930630 Red 85
2 19940629 Red 87
2 19950630 Red 95
3 19950931 Blue 90
3 19960931 Blue 90
4 19930331 Red 95
4 19940331 Red 97
4 19950330 Red 98
4 19960331 Red 95
5 19931231 Red 96
5 19941231 Red 97
My goal is to achieve a ranked list (fractional) by test_score for each year. I hoped that I would be able to achieve this using PROC RANK FRACTION. This function would calculate order by a test_score (highest is 1, 2nd highest is 2 and so on) and then divide by the total number of observations to provide a fractional rank. Unfortunately, year_start differs widely from row to row. For each id/year combo, I want to perform a one-year look-back from year-start, and rank that observation compared to all other id's that have a year_start in that one year range. I'm not interested in comparing by calendar year, and the rank of each id should be relative to its own year_start. Adding another level of complication, I would like this rank to be performed by groupname.
PROC SQL is totally fine if someone has a SQL solution.
Using the above data, the ranks would be like this:
id year_start group_name test_score rank
1 19931231 Red 90 0.75
1 19941230 Red 89 0.8
1 19951231 Red 91 1
1 19961231 Red 92 1
2 19930630 Red 85 1
2 19940629 Red 87 0.8
2 19950630 Red 95 0.75
3 19950931 Blue 90 1
3 19960931 Blue 90 1
4 19930331 Red 95 1
4 19940331 Red 97 0.2
4 19950330 Red 98 0.2
4 19960331 Red 95 0.333
5 19931231 Red 96 0.25
5 19941231 Red 97 0.667
In order to calculate the rank for row 1,
we first exclude blue observations.
Then we count the number of observations that fall within a year before that year_start, 19931231 (so we have 4 observations).
We count how many of these observations have a higher test_score, and then add 1 to find the order of the current observation (So it is the 3rd highest).
Then, we divide the order by the total number to get the rank (3/4= 0.75).
In Excel, the formula for this variable would look something like this. Assume formula is for row 1 and there are 100 rows. id=A, year_start=B, groupname=C, and test_score=D:
=(1+countifs(D1:D100,">"&D1,
B1:B100,"<="&B1,
B1:B100,">"&B1-365.25,
C1:C100, C1))/
countifs(B1:B100,"<="&B1,
B1:B100,">"&B1-365.25,
C1:C100, C1)
Thanks so much for the help!
ahammond428
Your example isn't correct if I'm reading it correctly, so it's hard to know exactly what you're trying to do. But try the following and see if it works. You may need to tweak inequalities to be open or closed depending on whether you want to include one year to the date. Note that your year_start column needs to be imported in a SAS date format for this to work. Otherwise you can change it over with input(year_start, yymmdd8.).
proc sql;
select distinct
a.id,
a.year_start,
a.group_name,
a.test_score,
1+sum(case when b.test_score > a.test_score then 1 else 0 end) as rank_num,
count(b.id) as rank_denom,
calculated rank_num / calculated rank_denom as rank
from testdata a left join testdata b
on a.group_name = b.group_name
and intnx('year',a.year_start,-1,'s') le b.year_start le a.year_start
group by a.id, a.year_start, a.group_name, a.test_score
order by id, year_start;
quit;
Note that I changed dates of 9/31 to 9/30 (since there is no 9/31), but left 3/30, 6/29, and 12/30 alone since perhaps that was intended, though the other dates seem to be quarter-end.
Consider correlated count subqueries in SQL:
DATA
data ranktable;
infile datalines missover;
input id year_start group_name $ test_score;
datalines;
1 19931231 Red 90
1 19941230 Red 89
1 19951231 Red 91
1 19961231 Red 92
2 19930630 Red 85
2 19940629 Red 87
2 19950630 Red 95
3 19950930 Blue 90
3 19960930 Blue 90
4 19930331 Red 95
4 19940331 Red 97
4 19950330 Red 98
4 19960331 Red 95
5 19931231 Red 96
5 19941231 Red 97
;
run;
data ranktable;
set ranktable;
format year_start date9.;
year_start = input(put(year_start,z8.),yymmdd8.);
run;
PROC SQL
Additional fields included for your review
proc sql;
select r.id, r.year_start, r.group_name, r.test_score,
put(intnx('year', r.year_start, -1, 's'), yymmdd10.) as year_ago,
(select count(*) from ranktable sub
where sub.test_score >= r.test_score
and sub.group_name = r.group_name
and sub.year_start <= r.year_start
and sub.year_start >= intnx('year', r.year_start, -1, 's')) as num_rank,
(select count(*) from ranktable sub
where sub.group_name = r.group_name
and sub.year_start <= r.year_start
and sub.year_start >= intnx('year', r.year_start, -1, 's')) as denom_rank,
calculated num_rank / calculated denom_rank as rank
from ranktable r;
run;
OUTPUT
You will notice a slight difference between your expected results which may be due to the quarter day (365.25) you apply for all years as SAS's intnx takes one full calendar year in days which change with each year

Retrieve value from different fields for each record of an Access table

I would be more than appreciative for some help here, as I have been having some serious problems with this.
Background:
I have a list of unique records. For each record I have a monotonically increasing pattern (either A, B or C), and a development position (1 to 5) assigned to it.
So each of the 3 patterns is set out in five fields representing the development period.
Problem:
I need to retrieve the percentages relating to the relevant development periods, from different fields for each row. It should be in a single column called "Output".
Example:
Apologies, not sure how to attach a table here, but the fields are below, the table is a transpose of these fields.
ID - (1,2,3,4,5)
Pattern - (A, B, C, A, C)
Dev - (1,5,3,4,2)
1 - (20%, 15%, 25%, 20%, 25%)
2 - (40%, 35%, 40%, 40%, 40%)
3 - (60%, 65%, 60%, 60%, 60%)
4 - (80%, 85%, 65%, 80%, 65%)
5 - (100%, 100%, 100%, 100%, 100%)
Output - (20%, 100%, 60%, 80%, 40%)
In MS Excel, I could simply use a HLOOKUP or OFFSET function to do this. But how do I do this in Access? The best I have come up with so far is Output: Eval([Category]) but this doesn't seem to achieve what I want which is to select the "Dev" field, and treat this as a field when building an expression.
In practice, I have more than 100 development periods to play with, and over 800 different patterns, so "switch" methods can't work here I think.
Thanks in advance,
alch84
Assuming that
[ID] is a unique column (primary key), and
the source column for [Output] only depends on the value of [Dev]
then this seems to work:
UPDATE tblAlvo SET Output = DLOOKUP("[" & Dev & "]", "tblAlvo", "ID=" & ID)
Before:
ID Pattern Dev 1 2 3 4 5 Output
-- ------- --- -- -- -- -- --- ------
1 A 1 20 40 60 80 100
2 B 5 15 35 65 85 100
3 C 3 25 40 60 65 100
4 A 4 20 40 60 80 100
5 C 2 25 40 60 65 100
After:
ID Pattern Dev 1 2 3 4 5 Output
-- ------- --- -- -- -- -- --- ------
1 A 1 20 40 60 80 100 20
2 B 5 15 35 65 85 100 100
3 C 3 25 40 60 65 100 60
4 A 4 20 40 60 80 100 80
5 C 2 25 40 60 65 100 40

Select every ten steps SQL

I have the following table:
----------------------------------------------
oNumber oValue1
----------------------------------------------
1 54
2 44
3 89
4 65
ff.
10 33
11 22
ff.
20 43
21 76
ff.
100 45
I want to select every 10 value in oNumber. So the result should be:
----------------------------------------------
oNumber oValue1
----------------------------------------------
10 33
20 43
ff.
100 45
Also, oNumber is not a sequence number. It's just a value. Even it isn't a sequence number, 10, 20, 30 and so on will always appear under oNumber field.
Does anyone know how is the tsql for this case?
Thank you.
select * from table where oNumber % 10 = 0
https://msdn.microsoft.com/en-us/library/ms190279.aspx
Use the "Modulo" operator - %. So in this case, the answer would be something like:
SELECT * FROM table WHERE oNumber % 10 = 0
This will only load if oNumber is a number divisible by ten (and therefore has a remainder zero).
In the case you simply want multiples of 10, then just use the modulo operator as stated by Daniel and Ian.
select *
from table
where oNumber % 10 = 0;
However, I felt that you could be alluding to the fact that you want to get every 10th item in your list. If that's the case, which it may be not, you would simply just sequence your set based on oNumber and use the modulo operator.
select *
from (
select *,
RowNum = row_number() over (order by oNumber)
from table) a
where RowNum % 10 = 0;

why Order by does not sort?

I have a query below, I want it to sort the data by id, but it doesn't sort at all.
Select distinct ec.category,ec.id
from print ec
order by ec.id asc
What could be the reason?
this is the output :
Looking at your data, the column data type is a varchar, aka 'text'.
If it is text, it sorts like text, according to the place the character occurs in the character set used.
So each column is ordered on the first character, then the second, etc. So 2 comes after 11.
Either make the column a numeric data type, like number, or use to_number in the sorting:
select distinct ec.category,ec.id
from print ec
order by to_number(ec.id)
The difference lies in the way varchar and number are sorted. in your case, since you have used varchar data type to store number, the sorting is done for the ASCII values.
NUMBERS when sorted as STRING
SQL> WITH DATA AS(
2 SELECT LEVEL rn FROM dual CONNECT BY LEVEL < = 11
3 )
4 SELECT rn, ascii(rn) FROM DATA
5 order by ascii(rn)
6 /
RN ASCII(RN)
---------- ----------
1 49
11 49
10 49
2 50
3 51
4 52
5 53
6 54
7 55
8 56
9 57
11 rows selected.
SQL>
As you can see, the sorting is based on the ASCII values.
NUMBER when sorted as a NUMBER itself
SQL> WITH DATA AS(
2 SELECT LEVEL rn FROM dual CONNECT BY LEVEL < = 11
3 )
4 SELECT rn, ascii(rn) FROM DATA
5 ORDER BY rn
6 /
RN ASCII(RN)
---------- ----------
1 49
2 50
3 51
4 52
5 53
6 54
7 55
8 56
9 57
10 49
11 49
11 rows selected.
SQL>
How to fix the issue?
Change the data type to NUMBER. As a workaround, you could use to_number.
Using to_number -
SQL> WITH DATA AS(
2 SELECT to_char(LEVEL) rn FROM dual CONNECT BY LEVEL < = 11
3 )
4 SELECT rn, ascii(rn) FROM DATA
5 ORDER BY to_number(rn)
6 /
RN ASCII(RN)
--- ----------
1 49
2 50
3 51
4 52
5 53
6 54
7 55
8 56
9 57
10 49
11 49
11 rows selected.
SQL>
Make sure the type of your "id" column is int. (integer = number)
Right now it is probably text, char or varchar(for text, strings).
You can't sort numbers alphabetically or strings/text chronologically like you are trying now.
when you sort a string datatype that has in values it produce result as
1
10
11
2
21 etc...
Hence
Change your Id datatype to int/bigint
You can only just cast/convert the datatype in a query
Select distinct ec.category,ec.id
from print ec
order by cast(ec.id as int) asc