query for column that are within a variable + or 1 of another column - sql

I have a table that has 2 columns, and I am trying to determine a way to select the records where the two columns are CLOSE to one another. Maybe based on standard deviation if i can think about how to do that. But for now, this is what my table looks like:
ID| PCT | RETURN
1 | 20 | 1.20
2 | 15 | 0.90
3 | 0 | 3.00
The values in the pct field is a percent number (for example 20%). The value in the return field is a not fully calculated % number (so its supposed to be 20% above what the initial value was). The query I am working with so far is this:
select * from TABLE1 where ((pct = ((return - 1)* 100)));
What I'd like to end up with are the rows where both are within a set value of each other. For example If they are within 5 points of each other, then the row would be returned and the output would be:
ID| PCT | RETURN
1 | 20 | 1.20
2 | 15 | 0.90
In the above, ID 1 should work out to be PCT = 20 and Return = 20, and ID 2, is PCT = 15 and RETURN = 10. Because it was within 5 points of each other, it was returned.
ID 3 was not returned because 0 and 200 are way above the 5 point threshold.
Is there any way to set a variable that would return a +- 5 when comparing the two values from the above attributes? Thanks.

RexTester Example:
Use Lead() over (Order by PCT) to look ahead and LAG() to look back to the next row do the math and evaluate results...
WITH CTE (ID, PCT , RETURN) as (
SELECT 1 , 20 , 1.20 FROM DUAL UNION ALL
SELECT 2 , 15 , 0.90 FROM DUAL UNION ALL
SELECT 3 , 0 , 3.00 FROM DUAL),
CTE2 as (SELECT A.*, LEAD(PCT) Over (ORDER BY PCT) LEADPCT, LAG(PCT) Over (order by PCT) LAGPCT
FROM CTE A)
SELECT * FROM CTE2
WHERE LEADPCT-PCT <=5 OR PCT-LAGPCT <=5
Order by ID
Giving us:
+----+----+-----+--------+---------+--------+
| | ID | PCT | RETURN | LEADPCT | LAGPCT |
+----+----+-----+--------+---------+--------+
| 1 | 1 | 20 | 1,20 | NULL | 15 |
| 2 | 2 | 15 | 0,90 | 20 | 0 |
+----+----+-----+--------+---------+--------+
or use the return value instead of PCT... just depends on what you're after. But maybe I don't fully understand the question..

Related

query SQL table for the same data in column for 3 times in a row

I have a table
Id, Response
1, Yes
2, Yes
3, No
4, No
5, Yes
6, No
7, No
8, No
I would like to be able to query the table and check for the response of No and if it occurs 3 times in a row return a value.
So I am trying
select count(response) where response = no
order by id
Basically, the theory goes, if there are 3 responses of No, I want to trigger something else to happen. So I need to query the table each time an entry is made, and if the last 3 entries are no then return value.
I only want to know if the latest values are 3 no. for example if the last 4 entries were no, no, no, yes - I don't care as there is a yes value
so the last 3 values have to be no
I don't know which RDBMS you use, but you can try something like that:
select count(*)
from
(select id,
response
from your_table
order by id desc
limit 3) t
where t.response = 'No';
Here is a solution in Bigquery. You may need to tweak the syntax for you SQL base:
SELECT
* ,
SUM( CASE WHEN response ="No" THEN 1 ELSE 0 END )
OVER (ORDER BY id RANGE BETWEEN 2 PRECEDING AND CURRENT ROW)
FROM dataset
It returns output like this:
Which I think is what you want.
The key part is the window functions using RANGE BETWEEN 2 PRECEDING AND CURRENT ROW. The case statement is checking if the current row and the 2 before are "No". If they are return a 1. So when three in a row occur this will SUM to 3.
I would use two lag()s:
select t.*
from (select t.*,
lag(id, 2) over (order by id) as prev2_id,
lag(id, 2) over (order by id) as prev2_id_response
from t
) t
where response = 'no' and prev2_id = prev2_id_response;
The first lag() determines the id "2 back". The second determines the id "2 back" for the same response. If the response is the same for those three rows, then these are the same.
This returns each occurrence of "no" where this occurs. You can use exists if you just want to know if this ever occurs.
This can be done with window functions and a derived table or CTE term. The following takes you through how it can be done, step by step:
Full Example with data
WITH cte1 AS (
SELECT x.*
, CASE WHEN COALESCE(LAG(response) OVER (ORDER BY id), 'NA') <> response THEN 1 ELSE 0 END AS edge
FROM xlogs AS x
)
, cte2 AS (
SELECT x.*
, SUM(edge) OVER (ORDER BY id) AS xgroup
FROM cte1 AS x
)
, cte3 AS (
SELECT x.*
, ROW_NUMBER() OVER (PARTITION BY xgroup ORDER BY id) AS xposition
FROM cte2 AS x
)
, cte4 AS (
SELECT x.*
, CASE WHEN xposition >= 3 AND response = 'No' THEN 1 END AS xtrigger
FROM cte3 AS x
)
, cte5 AS (
SELECT x.*
FROM cte4 AS x
ORDER BY id DESC
LIMIT 1
)
SELECT *
FROM cte5
WHERE response = 'No'
;
The result of cte4 provides useful detail about the logic:
+----+----------+------+--------+-----------+----------+
| id | response | edge | xgroup | xposition | xtrigger |
+----+----------+------+--------+-----------+----------+
| 1 | Yes | 1 | 1 | 1 | NULL |
| 2 | Yes | 0 | 1 | 2 | NULL |
| 3 | No | 1 | 2 | 1 | NULL |
| 4 | No | 0 | 2 | 2 | NULL |
| 5 | Yes | 1 | 3 | 1 | NULL |
| 6 | No | 1 | 4 | 1 | NULL |
| 7 | No | 0 | 4 | 2 | NULL |
| 8 | No | 0 | 4 | 3 | 1 |
+----+----------+------+--------+-----------+----------+

Calculating percentages based on a few columns in a subgroup

I have a table (let's call it Table A) like this:
ID Device Clicks
1 A 10
1 B 10
2 A 1
2 C 19
I would like to build a table (let's call it Table B) from A above like this:
ID Device Clicks Percentage
1 A 10 50
1 B 10 50
2 A 1 5
2 C 19 95
Again from Table B, I'd like to derive Table C where the Updated Device column for each of the ID will carry the name from the Device column only if the Percentage is >=95%. If the percentage split between Devices for each ID is anything else, we'll simply set UpdatedDevice to Others. For example, using the data in Table B, we'll get a Table C like below:
ID Device Clicks Percentage UpdatedDevice
1 A 10 50 Others
1 B 10 50 Others
2 A 1 5 C
2 C 19 95 C
I am wondering if there's a way to do this with advanced SQL windowing/analytical functions in one shot instead of generating intermediate tables.
select
Id
, Device
, Clicks
, Percentage
, UpdatedDevice = isnull(max(UpdatedDevice) over (partition by Id),'Others')
from (
select *
, Percentage = convert(int,(clicks / sum(clicks+.0) over (partition by Id))*100)
, UpdatedDevice = case
when (clicks / sum(clicks+.0) over (partition by Id)) >= .95
then Device
end
from t
) as cte
test setup: http://rextester.com/XKBNO39353
returns:
+----+--------+--------+------------+---------------+
| Id | Device | Clicks | Percentage | UpdatedDevice |
+----+--------+--------+------------+---------------+
| 1 | A | 10 | 50 | Others |
| 1 | B | 10 | 50 | Others |
| 2 | A | 1 | 5 | C |
| 2 | C | 19 | 95 | C |
+----+--------+--------+------------+---------------+
Just for kicks, you can do this without a subquery:
select t.*,
clicks * 1.0 / sum(clicks) over (partition by id) as ratio, -- you an convert to a percentage
(case when max(clicks) over (partition by id) >= 0.95 * sum(clicks) over (partition by device)
then first_value(device) over (partition by id order by clicks desc)
else 'Others'
end) as UpdatedDevice
from t;
The key idea for calculating UpdatedDevice is that the maximum would be the device that satisfies the 95% rule. And first_value() of course.

subtract data from single column

I have a database table with 2 columns naming piece and diff and type.
Here's what the table looks like
id | piece | diff | type
1 | 20 | NULL | cake
2 | 15 | NULL | cake
3 | 10 | NULL | cake
I want like 20 - 15 = 5 then 15 -10 = 5 , then so on so fort with type as where.
Result will be like this
id | piece | diff | type
1 | 20 | 0 | cake
2 | 15 | 5 | cake
3 | 10 | 5 | cake
Here's the code I have so far but i dont think I'm on the right track
SELECT
tableblabla.id,
(tableblabla.cast(pieces as decimal(7, 2)) - t.cast(pieces as decimal(7, 2))) as diff
FROM
tableblabla
INNER JOIN
tableblablaas t ON tableblabla.id = t.id + 1
Thanks for the help
Use LAG/LEAD window function.
Considering that you want to find Difference per type else remove Partition by from window functions
select id, piece,
Isnull(lag(piece)over(partition by type order by id) - piece,0) as Diff,
type
From yourtable
If you are using Sql Server prior to 2012 use this.
;WITH cte
AS (SELECT Row_number()OVER(partition by type ORDER BY id) RN,*
FROM Yourtable)
SELECT a.id,
a.piece,
Isnull(b.piece - a.piece, 0) AS diff,
a.type
FROM cte a
LEFT JOIN cte b
ON a.rn = b.rn + 1

How to calculate the value of a previous row from the count of another column

I want to create an additional column which calculates the value of a row from count column with its predecessor row from the sum column. Below is the query. I tried using ROLLUP but it does not serve the purpose.
select to_char(register_date,'YYYY-MM') as "registered_in_month"
,count(*) as Total_count
from CMSS.USERS_PROFILE a
where a.pcms_db != '*'
group by (to_char(register_date,'YYYY-MM'))
order by to_char(register_date,'YYYY-MM')
This is what i get
registered_in_month TOTAL_COUNT
-------------------------------------
2005-01 1
2005-02 3
2005-04 8
2005-06 4
But what I would like to display is below, including the months which have count as 0
registered_in_month TOTAL_COUNT SUM
------------------------------------------
2005-01 1 1
2005-02 3 4
2005-03 0 4
2005-04 8 12
2005-05 0 12
2005-06 4 16
To include missing months in your result, first you need to have complete list of months. To do that you should find the earliest and latest month and then use heirarchial
query to generate the complete list.
SQL Fiddle
with x(min_date, max_date) as (
select min(trunc(register_date,'month')),
max(trunc(register_date,'month'))
from users_profile
)
select add_months(min_date,level-1)
from x
connect by add_months(min_date,level-1) <= max_date;
Once you have all the months, you can outer join it to your table. To get the cumulative sum, simply add up the count using SUM as analytical function.
with x(min_date, max_date) as (
select min(trunc(register_date,'month')),
max(trunc(register_date,'month'))
from users_profile
),
y(all_months) as (
select add_months(min_date,level-1)
from x
connect by add_months(min_date,level-1) <= max_date
)
select to_char(a.all_months,'yyyy-mm') registered_in_month,
count(b.register_date) total_count,
sum(count(b.register_date)) over (order by a.all_months) "sum"
from y a left outer join users_profile b
on a.all_months = trunc(b.register_date,'month')
group by a.all_months
order by a.all_months;
Output:
| REGISTERED_IN_MONTH | TOTAL_COUNT | SUM |
|---------------------|-------------|-----|
| 2005-01 | 1 | 1 |
| 2005-02 | 3 | 4 |
| 2005-03 | 0 | 4 |
| 2005-04 | 8 | 12 |
| 2005-05 | 0 | 12 |
| 2005-06 | 4 | 16 |

SQL - min() gets the lowest value, max() the highest, what if I want the 2nd (or 5th or nth) lowest value?

The problem I'm trying to solve is that I have a table like this:
a and b refer to point on a different table. distance is the distance between the points.
| id | a_id | b_id | distance | delete |
| 1 | 1 | 1 | 1 | 0 |
| 2 | 1 | 2 | 0.2345 | 0 |
| 3 | 1 | 3 | 100 | 0 |
| 4 | 2 | 1 | 1343.2 | 0 |
| 5 | 2 | 2 | 0.45 | 0 |
| 6 | 2 | 3 | 110 | 0 |
....
The important column I'm looking is a_id. If I wanted to keep the closet b for each a, I could do something like this:
update mytable set delete = 1 from (select a_id, min(distance) as dist from table group by a_id) as x where a_gid = a_gid and distance > dist;
delete from mytable where delete = 1;
Which would give me a result table like this:
| id | a_id | b_id | distance | delete |
| 1 | 1 | 1 | 1 | 0 |
| 5 | 2 | 2 | 0.45 | 0 |
....
i.e. I need one row for each value of a_id, and that row should have the lowest value of distance for each a_id.
However I want to keep the 10 closest points for each a_gid. I could do this with a plpgsql function but I'm curious if there is a more SQL-y way.
min() and max() return the smallest and largest, if there was an aggregate function like nth(), which'd return the nth largest/smallest value then I could do this in similar manner to the above.
I'm using PostgeSQL.
Try this:
SELECT *
FROM (
SELECT a_id, (
SELECT b_id
FROM mytable mib
WHERE mib.a_id = ma.a_id
ORDER BY
dist DESC
LIMIT 1 OFFSET s
) AS b_id
FROM (
SELECT DISTINCT a_id
FROM mytable mia
) ma, generate_series (1, 10) s
) ab
WHERE b_id IS NOT NULL
Checked on PostgreSQL 8.3
I love postgres, so it took it as a challenge the second I saw this question.
So, for the table:
Table "pg_temp_29.foo"
Column | Type | Modifiers
--------+---------+-----------
value | integer |
With the values:
SELECT value FROM foo ORDER BY value;
value
-------
0
1
2
3
4
5
6
7
8
9
14
20
32
(13 rows)
You can do a:
SELECT value FROM foo ORDER BY value DESC LIMIT 1 OFFSET X
Where X = 0 for the highest value, 1 for the second highest, 2... And so forth.
This can be further embedded in a subquery to retrieve the value needed. So, to use the dataset provided in the original question we can get the a_ids with the top ten lowest distances by doing:
SELECT a_id, distance FROM mytable
WHERE id IN
(SELECT id FROM mytable WHERE t1.a_id = t2.a_id
ORDER BY distance LIMIT 10);
ORDER BY a_id, distance;
a_id | distance
------+----------
1 | 0.2345
1 | 1
1 | 100
2 | 0.45
2 | 110
2 | 1342.2
Does PostgreSQL have the analytic function rank()? If so try:
select a_id, b_id, distance
from
( select a_id, b_id, distance, rank() over (partition by a_id order by distance) rnk
from mytable
) where rnk <= 10;
This SQL should find you the Nth lowest salary should work in SQL Server, MySQL, DB2, Oracle, Teradata, and almost any other RDBMS: (note: low performance because of subquery)
SELECT * /*This is the outer query part */
FROM mytable tbl1
WHERE (N-1) = ( /* Subquery starts here */
SELECT COUNT(DISTINCT(tbl2.distance))
FROM mytable tbl2
WHERE tbl2.distance < tbl1.distance)
The most important thing to understand in the query above is that the subquery is evaluated each and every time a row is processed by the outer query. In other words, the inner query can not be processed independently of the outer query since the inner query uses the tbl1 value as well.
In order to find the Nth lowest value, we just find the value that has exactly N-1 values lower than itself.