Data sorting and grouping in Oracle - sql

Some one mistakenly inputted negative values to unique key column
long back and now i have to group the data selecting max of ID as per category to extract report. ID column now have both positive and negative values.
Max(ID) function is not working correctly with negative values.
ID Category
45678 A
234567 B
-4578 A
-45798 A
-7890 C
-8904 C
-7654 C
Expected O/P is
ID Category
45678 A
234567 B
-8904 C

"So ID with largest negative values will have latest data before 2010
and id with positive values are created after 2010"
That means in case there are positive IDs for a category you want the maximum (e.g. 45678 for category A) and otherwise the minimum (e.g. -8904 for category C). You can use Oracle's KEEP FIRST/LAST for this:
select
category,
max(id) keep (dense_rank last order by sign(id), abs(id))
from mytable
group by category
order by category;
This sorts your IDs by sign (negative before positive ones, so if there are positive ones you'd prefer these) and then by absolute amount (so you get the highest negative or positive as the last row, which is the one you pick with KEEP LAST).

Related

Calculating the mode/median/most frequent observation in categorical variables in SQL impala

I would like to calculate the mode/median or better, most frequent observation of a categorical variable within my query.
E.g, if the variable has the following string values:
dog, dog, dog, cat, cat and I want to get dog since its 3 vs 2.
Is there any function that does that? I tried APPX_MEDIAN() but it only returns the first 10 characters as median and I do not want that.
Also, I would like to get the most frequent observation with respect to date if there is a tie-break.
Thank you!
the most frequent observation is mode and you can calculate it like this.
Single value mode can be calculated like this on a value column. Get the count and pick up row with max count.
select count(*),value from mytable group by value order by 1 desc limit 1
now, in case you have multiple modes, you need to join back to the main table to find all matches.
select orig.value from
(select count(*) c, value v from mytable) orig
join (select count(*) cmode from mytable group by value order by 1 desc limit 1) cmode
ON orig.c= cmode.cmode
This will get all count of values and then match them based on count. Now, if one value of count matches to max count, you will get 1 row, if you have two value counts matches to max count, you will get 2 rows and so on.
Calculation of median is little tricky - and it will give you middle value. And its not most frequent one.

SQL Query for the closest value (HELP!!)

I have created a table on the database or I have loaded a caliber list with the corresponding prices.
I need a request which when you enter a caliber (which is not frocally in the caliber table) it displays the price corresponding to the nearest key on the table of the one which was entered.
The table of sizes by price :
in an example, based on this the table of calibers. if I put the value 1.47 as the caliber, it must bring me the price corresponding to the 1.5 caliber. or if I put the value 1.41 as the caliber, it must bring me the price corresponding to the 1.4 caliber
I would consider something like the following:
SELECT *
FROM
(
SELECT mt.*, RANK() OVER (ORDER BY ABS(caliber-1.41)) rn
FROM mytable mt
)
WHERE rn = 1
This calculates the difference between caliber and 1.41 using ABS for absolute value (to get closest without caring whether it is bigger or smaller). The WHERE rn = 1 then limits to the rows with the smallest difference.
Note that this assumes that if there are two rows that are equally far from your number, you want to return them both. If you want to arbitrarily pick one in the event of a tie I would replace RANK with ROW_NUMBER.

SQL: Apply sequence number to a column based on nth occurrence of each distinct value

I have a table with a column of values where each value occurs a variable number of times (i.e., one value may occur 1 time, and another value may occur 3 times). I need to add a column that identifies the occurrence sequence # of its corresponding value.
Input Table
SOURCE_VAL
a
a
b
c
c
c
Output table
SEQUENCE_VAL
SOURCE_VAL
1
a
2
a
1
b
1
c
2
c
3
c
What would the SQL for this be to generate the SEQUENCE_VAL column based on SOURCE_VAL?
You are looking for row_number(). Without an ordering column, you can use:
select t.*,
row_number() over (partition by source_val order by source_val) as sequence_val
from t
order by source_val, sequence_val;
Note: This assumes that you do not care about the ordering of the value. If you have another column that does specify the ordering for each source_val, then use that in the order by.

How can I reduce complexity? Data preparation, SQL + Tableau

I need to prepare some data to connect to tableau, and I'm struggling because the size of the data is too much for tableau to handle, so I'm looking for ideas to code this efficiently in SQL.
Setup:
I have 2 million users
There are 30 different categories, and each user can fall into many. For example:
User 1 - Category A, B and C
User 2 - Category F
User 3 - Category A, B
What I want:
Select three categories and assign priority 1, priority 2 and priority 3
These selection is not static, so today I may choose A, B, C but tomorrow those categories can be D, G, A
So if I have:
Priority 1: A
Priority 2: B
Priority 3: C
I want the number of users who fall into category A
I want the number of users who fall into category B AND are not in category A
I want the number of users who fall into category C AND are not in category A or B
My original idea was to create a table with one row per user and one yes/no column per category, and then aggregate, but still the size of the final table is too huge for tableau to handle.
Any ideas?
Update: My idea is to prepare a table with aggregated numbers and a few thousand rows max, so that it can be processed with tableau
You can assign each of the 30 categories a unique placeholder 1 to 30. Each user will be thereafter assigned a binary number of 30digits based on the categories he is falling in. This binary number can then be converted into decimal number the greatest of which can be 2^31-1 i.e. 10 digit number which can be stored without exp format.
Whenever you will have to see the categories user falling in that can be done by applying reverse conversion i.e. decimal to binary and thereafter to string with padding zeros on left side. From this string you can search places of 1s at desired place.
I think you can try this methodology.

Generate random records from the table tblFruit based on the field Type

I will need your help to generate random records from the table tblFruit based on the field Type (without no duplication)
As per the above table.
There are 4 type of fruit number 1,2,3,4
I want to generate x records dynamically from the table tblFruit (e.g 7 records).
Let say I need to get 7 random record of fruit .
My result should contains fruit of the different types. However, we need to ensure that the result contains only 7 records.
i.e
2 records of type 1,
2 records of type 2,
2 records of type 3,
1 records of type 4
e.g
Note: If i want to generate 10 records (without no duplication),
then i will get 2 records of each type and the two remaining records randomly of any type.
Much grateful for your help.
I might suggest:
select top (7) f.*
from tblfruit f
order by row_number() over (partition by type order by newid());
This will actually produce a result with approximately the same number of rows of each type (well, off by 1), but that meets your needs.