SELECT DISTINCT is not working - sql

Let's say I have a table name TableA with the below partial data:
LOOKUP_VALUE LOOKUPS_CODE LOOKUPS_ID
------------ ------------ ----------
5% 120 1001
5% 121 1002
5% 123 1003
2% 130 2001
2% 131 2002
I wanted to select only 1 row of 5% and 1 row of 2% as a view using DISTINCT but it fail, my query is:
SELECT DISTINCT lookup_value, lookups_code
FROM TableA;
The above query give me the result as shown below.
LOOKUP_VALUE LOOKUPS_CODE
------------ ------------
5% 120
5% 121
5% 123
2% 130
2% 131
But that is not my expected result, mt expected result is shown below:
LOOKUP_VALUE LOOKUPS_CODE
------------ ------------
5% 120
2% 130
May I know how can I achieve this without specifying any WHERE clause?
Thank you!

I think you're misunderstanding the scope of DISTINCT: it will give your distinct rows, not just distinct on the first field.
If you want one row for each distinct LOOKUP_VALUE, you either need a WHERE clause that will work out which one of them to show, or an aggregation strategy with a GROUP BY clause plus logic in the SELECT that tells the query how to aggregate the other columns (e.g. AVG, MAX, MIN)

Here's my guess at your problem - when you say
"The above query give me the result as shown in the data table above."
this is simply not true - please try it and update your question accordingly.
I am speculating here: I think you are trying to use "Distinct" but also output the other fields. If you run:
select distinct Field1, Field2, Field3 ...
Then your output will be "one row per distinct combination" of the 3 fields.
Try GROUP BY instead - this will let you select the Max, Min, Sum of other fields while still yielding "one row per unique combined values" for fields included in GROUP BY
example below uses your table to return one row per LOOKUP_VALUE and then the max and min of the remaining fields and the count of total records using your data:
select
LOOKUP_VALUE, min( LOOKUPS_CODE) LOOKUPS_CODE_min, max( LOOKUPS_CODE) LOOKUPS_CODE_max, min( LOOKUPS_ID) LOOKUPS_ID_min, max( LOOKUPS_ID) LOOKUPS_ID_max, Count(*) Record_Count
From TableA
Group by LOOKUP_VALUE

I wanted to select only 1 row of 5% and 1 row of 2%
This will get the lowest value lookups_code for each lookup_value:
SELECT lookup_value,
lookups_code
FROM (
SELECT lookup_value,
lookups_code,
ROW_NUMBER() OVER ( PARTITION BY lookup_value ORDER BY lookups_code ) AS rn
FROM TableA
)
WHERE rn = 1
You could also use GROUP BY:
SELECT lookup_value,
MIN( lookups_code ) AS lookups_code
FROM TableA
GROUP BY lookup_value

How about the MIN() function
I believe this works for your desired output, but am currently not able to test it.
SELECT Lookup_Value, MIN(LOOKUPS_CODE)
FROM TableA
GROUP BY Lookup_Value;

I'm going to take a total shot in the dark on this one, but because of the way you have named your fields it implies you are attempting to mimic the vlookup function within Microsoft Excel. If this is the case, the behavior when there are multiple matches is to pick the first match. As arbitrary as that sounds, it's the way it works.
If this is what you want, AND the first value is not necessarily the lowest (or highest, or best looking, or whatever), then the row_number aggregate function would probably suit your needs.
I give you a caveat that my ordering criteria is based on the database row number, which could conceivably be different than what you think. If, however, you insert them into a clean table (with a reset high water mark), then I think it's a pretty safe bet it will behave the way you want. If not, then you are better off including a field explicitly to tell it what order you want the choice to occur.
with cte as (
select
vlookup_value,
vlookups_code,
row_number() over (partition by vlookup_value order by rownum) as rn
from
TableA
)
select
vlookup_value, vlookups_code
from cte
where rn = 1

Related

Filtering Rows in SQL

My data looks like this: Number(String), Number2(String), Transaction Type(String), Cost(Integer)
enter image description here
For number 1, Cost 10 and -10 cancel out so the remaining cost is 100
For number 2, Cost 50 and -50 cancel out, Cost 87 and -87 cancel out
For number 3, Cost remains 274
For number 4, Cost 316 and -316 cancel out, 313 remains as the cost
The output I am looking for Looks like this:
How do I do this in SQL?
I have tried "sum(price)" and group by "number", but oracle doesn't let me get results because of other columns
https://datascience.stackexchange.com/questions/47572/filtering-unique-row-values-in-sql
When you're doing an aggregate query, you have to pick one value for each column - either by including it in the group by, or wrapping it in an aggregate function.
It's not clear what you want to display for columns 2 and 3 in your output, but from your example data it looks like you're taking the MAX, so that's what I did here.
select number, max(number2), max(transaction_type), sum(cost)
from my_data
group by number
having sum(cost) <> 0;
Oracle has very nice functionality equivalent toe first() . . . but the syntax is a little cumbersome:
select number,
max(number2) keep (dense_rank first order by cost desc) as number2,
max(transaction_type) keep (dense_rank first order by cost desc) as transaction_type,
max(cost) as cost
from t
group by number;
In my experience, keep has good performance characteristics.
You're almost there... you'll need to get the sum for each number without the other columns and then join back to your table.
select * from table t
join
(select number,sum(cost)
from table
group by number) sums on sums.number=t.number
You can use correlated subquery :
select t.*
from table t
where t.cost = (select sum(t1.cost) from table t1 where t1.number = t.number);

How to get the biggest column value between duplicated rows id?

I am working on an Oracle 11g database query that needs to retrieve a list of the highest NUM value between duplicated rows in a table.
Here is an example of my context:
ID | NUM
------------
1 | 1111
1 | 2222
2 | 3333
2 | 4444
3 | 5555
3 | 6666
And here is the result I am expecting after the query is executed:
NUM
----
2222
4444
6666
I know how to get the GREATEST value in a list of numbers, but I have absolutely no guess on how to group two lines, fetch the biggest column value between them IF they have the same ID.
Programmaticaly it is something quite easy to achieve, but using SQL it tends to be a litle bit less intuitive for me. Any suggestion or advise is welcomed as I don't even know which function could help me doing this in Oracle.
Thank you !
This is the typical use case for a GROUP BY. Assuming your Num field can be compared:
SELECT ID, MAX(NUM) as Max
FROM myTable
GROUP BY ID
If you don't want to select the ID (as in the output you provided), you can run
SELECT Max
FROM (
SELECT ID, MAX(NUM) as Max
FROM myTable
GROUP BY ID
) results
And here is the SQL fiddle
Edit : if NUM is, as you mentioned later, VARCHAR2, then you have to cast it to an Int. See this question.
The most efficient way I would suggest is
SELECT ids,
value
FROM (SELECT ids,
value,
max(value)
over (
PARTITION BY ids) max_value
FROM test)
WHERE value = max_value;
This requires that the query maintain a single value per id of the maximum value encountered so far. If a new maximum is found then the existing value is modified, otherwise the new value is discarded. The total number of elements that have to be held in memory is related to the number of ids, not the number of rows scanned.
See this SQLFIDDLE

sql statement to select previous rows to a search param

Im after an sql statement (if it exists) or how to set up a method using several sql statements to achieve the following.
I have a listbox and a search text box.
in the search box, user would enter a surname e.g. smith.
i then want to query the database for the search with something like this :
select * FROM customer where surname LIKE searchparam
This would give me all the results for customers with surname containing : SMITH . Simple, right?
What i need to do is limit the results returned. This statement could give me 1000's of rows if the search param was just S.
What i want is the result, limited to the first 20 matches AND the 10 rows prior to the 1st match.
For example, SMI search:
Sives
Skimmings
Skinner
Skipper
Slater
Sloan
Slow
Small
Smallwood
Smetain
Smith ----------- This is the first match of my query. But i want the previous 10 and following 20.
Smith
Smith
Smith
Smith
Smoday
Smyth
Snedden
Snell
Snow
Sohn
Solis
Solomon
Solway
Sommer
Sommers
Soper
Sorace
Spears
Spedding
Is there anyway to do this?
As few sql statements as possible.
Reason? I am creating an app for users with slow internet connections.
I am using POSTGRESQL v9
Thanks
Andrew
WITH ranked AS (
SELECT *, ROW_NUMBER() over (ORDER BY surname) AS rowNumber FROM customer
)
SELECT ranked.*
FROM ranked, (SELECT MIN(rowNumber) target FROM ranked WHERE surname LIKE searchparam) found
WHERE ranked.rowNumber BETWEEN found.target - 10 AND found.target + 20
ORDER BY ranked.rowNumber
SQL Fiddle here. Note that the fiddle uses the example data, and I modified the range to 3 entries before and 6 entries past.
I'm assuming that you're looking for a general algorithm ...
It sounds like you're looking for a combination of finding the matches "greater than or equal to smith", and "less than smith".
For the former you'd order by surname and limit the result to 20, and for the latter you'd order by surname descending and limit to 10.
The two result sets can then be added together as arrays and reordered.
I think you need to use ROW_NUMBER() (see this link).
WITH cust1 AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY surname) as numRow FROM customer
)
SELECT c1.surname, c1.numRow, x.flag
FROM cust1 c1, (SELECT *,
case when numRow = (SELECT MIN(numRow) FROM cust1 WHERE surname='Smith') then 1 else 0 end as flag
FROM cust1) x
WHERE x.flag = 1 and c1.numRow BETWEEN x.numRow - 1 AND x.numRow + 1
ORDER BY c1.numRow
SQLFiddle here.
This works, but the flag finally isn't necessary and it would be a query like PinnyM posts.
A variation on #PinnyM's solution:
WITH ranked AS (
SELECT
*,
ROW_NUMBER() over (ORDER BY surname) AS rowNumber
FROM customer
),
minrank AS (
SELECT
*,
MIN(CASE WHEN surname LIKE searchparam THEN rowNumber END) OVER () AS target
FROM ranked
)
SELECT
surname
FROM minrank
WHERE rowNumber BETWEEN target - 10 AND target + 20
;
Instead of two separate calls to the ranked CTE, one to get the first match's row number and the other to read the results from, another CTE is introduced to serve both purposes. Can't speak for PostgreSQL but in SQL Server this might result in a better execution plan for the query, although in either case the real efficiency would still need to be verified by proper testing.

Adding a percent column to MS Access Query

I'm trying to add a column which calculates percentages of different products in MS Access Query. Basically, this is the structure of the query that I'm trying to reach:
Product |
Total |
Percentage
Prod1 |
15 |
21.13%
Prod2 |
23 |
32.39%
Prod3 |
33 |
46.48%
Product |
71 |
100%
The formula for finding the percent I use is: ([Total Q of a Product]/[Totals of all Products])*100, but when I try to use the expression builder (since my SQL skills are basic) in MS Access to calculate it..
= [CountOfProcuts] / Sum([CountOfProducts])
..I receive an error message "Cannot have aggregate function in GROUP BY clause.. (and the expression goes here)". I also tried the option with two queries: one that calculates only the totals and another that use the first one to calculate the percentages, but the result was the same.
I'll be grateful if someone can help me with this.
You can get all but the last row of your desired output with this query.
SELECT
y.Product,
y.Total,
Format((y.Total/sub.SumOfTotal),'#.##%') AS Percentage
FROM
YourTable AS y,
(
SELECT Sum(Total) AS SumOfTotal
FROM YourTable
) AS sub;
Since that query does not include a JOIN or WHERE condition, it returns a cross join between the table and the single row of the subquery.
If you need the last row from your question example, you can UNION the query with another which returns the fabricated row you want. In this example, I used a custom Dual table which is designed to always contain one and only one row. But you could substitute another table or query which returns a single row.
SELECT
y.Product,
y.Total,
Format((y.Total/sub.SumOfTotal),'#.##%') AS Percentage
FROM
YourTable AS y,
(
SELECT Sum(Total) AS SumOfTotal
FROM YourTable
) AS sub
UNION ALL
SELECT
'Product',
DSum('Total', 'YourTable'),
'100%'
FROM Dual;

How to get a value from previous result row of a SELECT statement?

If we have a table called FollowUp and has rows [ ID(int) , Value(Money) ]
and we have some rows in it, for example
ID --Value
1------70
2------100
3------150
8------200
20-----250
45-----280
and we want to make one SQL Query that get each row ID,Value and the previous Row Value in which data appear as follow
ID --- Value ---Prev_Value
1 ----- 70 ---------- 0
2 ----- 100 -------- 70
3 ----- 150 -------- 100
8 ----- 200 -------- 150
20 ---- 250 -------- 200
45 ---- 280 -------- 250
i make the following query but i think it's so bad in performance in huge amount of data
SELECT FollowUp.ID, FollowUp.Value,
(
SELECT F1.Value
FROM FollowUp as F1 where
F1.ID =
(
SELECT Max(F2.ID)
FROM FollowUp as F2 where F2.ID < FollowUp.ID
)
) AS Prev_Value
FROM FollowUp
So can anyone help me to get the best solution for such a problem ?
This sql should perform better then the one you have above, although these type of queries tend to be a little performance intensive... so anything you can put in them to limit the size of the dataset you are looking at will help tremendously. For example if you are looking at a specific date range, put that in.
SELECT followup.value,
( SELECT TOP 1 f1.VALUE
FROM followup as f1
WHERE f1.id<followup.id
ORDER BY f1.id DESC
) AS Prev_Value
FROM followup
HTH
You can use the OVER statement to generate nicely increasing row numbers.
select
rownr = row_number() over (order by id)
, value
from your_table
With the numbers, you can easily look up the previous row:
with numbered_rows
as (
select
rownr = row_number() over (order by id)
, value
from your_table
)
select
cur.value
, IsNull(prev.value,0)
from numbered_rows cur
left join numbered_rows prev on cur.rownr = prev.rownr + 1
Hope this is useful.
This is not an answer to your actual question.
Instead, I feel that you are approaching the problem from a wrong direction:
In properly normalized relational databases the tuples ("rows") of each table should contain references to other db items instead of the actual values. Maintaining these relations between tuples belongs to the data insertion part of the codebase.
That is, if containing the value of a tuple with closest, smaller id number really belongs into your data model.
If the requirement to know the previous value comes from the view part of the application - that is, a single view into the data that needs to format it in certain way - you should pull the contents out, sorted by id, and handle the requirement in view specific code.
In your case, I would assume that knowing the previous tuples' value really would belong in the view code instead of the database.
EDIT: You did mention that you store them separately and just want to make a query for it. Even still, application code would probably be the more logical place to do this combining.
What about pulling the lines into your application and computing the previous value there?
Create a stored procedure and use a cursor to iterate and produce rows.
You could use the function 'LAG'.
SELECT ID,
Value,
LAG(value) OVER(ORDER BY ID) AS Prev_Value
FROM FOLLOWUP;