Debugging a SQL Query - sql

I have a table structure like below. I need to select the row where User_Id =100 and User_sub_id = 1 and time_used = minimum of all and where Timestamp the highest. The output of my query should result in :
US;1365510103204;NY;1365510103;100;1;678;
My query looks like this.
select *
from my_table
where CODE='DE'
and User_Id = 100
and User_sub_id = 1
and time_used = (select min(time_used)
from my_table
where CODE='DE'
and User_Id=100
and User_sub_id= 1);
this returns me all the 4 rows. I need only 1, the one with highest timestamp.
Many Thanks
CODE: Timestamp: Location: Time_recorded: User_Id: User_sub_Id: time_used
"US;1365510102420;NY;1365510102;100;1;1078;
"US;1365510102719;NY;1365510102;100;1;978;
"US;1365510103204;NY;1365510103;100;1;878;
"US;1365510102232;NY;1365510102;100;1;678;
"US;1365510102420;NY;1365510102;100;1;678;
"US;1365510102719;NY;1365510102;100;1;678;
"US;1365510103204;NY;1365510103;100;1;678;
"US;1365510102420;NY;1365510102;101;1;678;
"US;1365510102719;NY;1365510102;101;1;638;
"US;1365510103204;NY;1365510103;101;1;638;

Another possibly faster solution is using window functions:
select *
from (
select code,
timestamp,
min(time_used) over (partition by user_id, user_sub_id) as min_used,
row_number() over (partition by user_id, user_sub_id order by timestamp desc) as rn,
time_used,
user_id,
user_sub_id
from my_table
where CODE='US'
and User_Id = 100
and User_sub_id = 1
) t
where time_used = min_used
and rn = 1;
This only needs to scan the table once instead of twice as your solution with the sub-select is doing.
I would strongly recommend to rename the column timestamp.
First this is a reserved word and using them is not recommended.
And secondly it doesn't document anything - it's horrible name as such. time_used is much better and you should find something similar for timestamp. Is that the "recording time", the "expiration time", the "due time" or something completely different?

Then try this:
select *
from my_table
where CODE='DE'
and User_Id=100
and User_sub_id=1
and time_used=(
select min(time_used)
from my_table
where CODE='DE'
and User_Id=100 and User_sub_id=1
)
order by "timestamp" desc -- <-- this adds sorting
limit 1; -- <-- this retrieves only one row

Add to your query the following condition
ORDER BY Timestamp DESC, LIMIT 1

Related

Bigquery Query: Adding a specific value to previous rows in BigQuery

I want to create a add a specific value to rows with null value in case they have something that isn't a null value. It's something difficult to understand, but it could be easier in watching the desired output:
This is my actual table:
DATESTAMP______________pressure__________final_date
2021-02-19T21:19:35_______10.12_____________null
2021-02-19T22:19:35_______11.13_____________null
2021-02-19T23:19:35_______10.43_____________null
2021-02-20T00:19:35_______11.98_____________null
2021-02-20T01:19:35_______10.21_____________null
2021-02-20T01:40:10_______20.21_____________2021-02-20
2021-02-24T23:11:00_______10.42_____________null
2021-02-25T00:11:00_______10.51_____________null
2021-02-25T00:11:00_______20.51_____________2021-02-25
2021-02-28T11:11:12_______10.51_____________null
2021-02-28T12:11:12_______10.52_____________null
This is mi desired table after doing the query:
DATESTAMP______________pressure__________final_date
2021-02-19T21:19:35_______10.12_____________2021-02-20
2021-02-19T22:19:35_______11.13_____________2021-02-20
2021-02-19T23:19:35_______10.43_____________2021-02-20
2021-02-20T00:19:35_______11.98_____________2021-02-20
2021-02-20T01:19:35_______10.21_____________2021-02-20
2021-02-20T01:40:10_______20.21_____________2021-02-20
2021-02-24T23:11:00_______10.42_____________2021-02-25
2021-02-25T00:11:00_______10.51_____________2021-02-25
2021-02-25T00:11:00_______20.51_____________2021-02-25
2021-02-28T11:11:12_______10.51_____________null
2021-02-28T12:11:12_______10.52_____________null
It doesn't matter if I have to create a new column:
That's my query:
SELECT *, IF(final_date is null, LAG(final_date ) OVER (ORDER BY DATESTAMP DESC), final_date ) AS preceding FROM(
SELECT
* FROM my_table
ORDER BY DATESTAMP ASC)
ORDER BY DATESTAMP ASC
And that's the result I received in the before query:
DATESTAMP______________pressure_________final_date_______preceding
2021-02-19T21:19:35_______10.12_____________null_____________null
2021-02-19T22:19:35_______11.13_____________null_____________null
2021-02-19T23:19:35_______10.43_____________null _____________null
2021-02-20T00:19:35_______11.98_____________null _____________null
2021-02-20T01:19:35_______10.21_____________null_____________2021-02-20
2021-02-20T01:40:10_______20.21_____________2021-02-20 ______2021-02-20
2021-02-24T23:11:00_______10.42_____________null_____________null
2021-02-25T00:11:00_______10.51_____________null_____________2021-02-25
2021-02-25T00:11:00_______20.51_____________2021-02-25_______2021-02-25
2021-02-28T11:11:12_______10.51_____________null_____________null
2021-02-28T12:11:12_______10.52_____________null_____________null
Can someone help me?
Thanks!
This looks like a cumulative minimum:
SELECT t.*,
MIN(final_date) OVER (ORDER BY DATESTAMP DESC) as imputed_final_date
FROM my_table

SQL - When result is duplicated on 2 fields remove all

When i run this query
SELECT
DT.CONTRACT_NUMBER,
DT.ROLE,
DT.TAX_ID,
DT.EFFECTIVE_DATE
FROM DATA_TABLE DT
I get this result.
Id like to remove results where the TAX ID appears more than once for each contract.
i.e This result would be gone. If they had 3 results they would be gone.
I think window functions might be the way to go:
SELECT DT.CONTRACT_NUMBER, DT.ROLE, DT.TAX_ID, DT.EFFECTIVE_DATE
FROM (SELECT DT.CONTRACT_NUMBER, DT.ROLE, DT.TAX_ID, DT.EFFECTIVE_DATE,
COUNT(*) OVER (PARTITION BY TAX_ID) as cnt
FROM DATA_TABLE DT
WHERE DT.CONTRACT_NUMBER = '551000280'
) DT
WHERE CNT = 1;
If you actually want to keep one row per tax id, then use row_number() instead of count(*).

Finding the Difference of Two Results

I have two results with two different dates (a recent one and the previous one) the numbers below are the result 250 being the most recent and 300 being the previous result:
250
300
The code I use is here:
SELECT TOP 2
MY FIELD as bmi
FROM
MY TABLE
ORDER BY
THE DATE FIELD DESC
Within this same code I want to be able to find the difference between those two numbers and for that to appear not the two numbers?
I have tried a few things of skipping N rows etc but now I don't know what I can do?
I think you want something like this:
declare #firstBmiRes int
declare #secondBmiRes int
SET #firstBmiRes = 250 /* insert your query */
SET #secondBmiRes = 300 /* insert your query */
(SELECT SUM(#secondBmiRes - #firstBmiRes))
If you want to continue to use the calculated result. You can obviously store the value into another variable like this:
declare #bmi int
SET #bmi = (SELECT SUM(#secondBmiRes - #firstBmiRes))
SELECT #bmi
2nd Approach:
Since we don't have very much information to work with. you could try something like this... But i'm assuming a lot of your datastructure here.
declare #BmiScore int
declare #firstBmiRes int
declare #secondBmiRes int
SET #firstBmiRes = (SELECT TOP 1 MY_FIELD
FROM MY_TABLE
ORDER BY DATE_FIELD DESC)
SET #secondBmiRes = (SELECT MY_FIELD
FROM MY_TABLE
ORDER BY DATE_FIELD DESC
OFFSET 1 ROW
FETCH NEXT 1 ROW ONLY)
SET #bmiScore = (SELECT SUM(#secondBmiRes - #firstBmiRes))
SELECT #bmiScore
SELECT
MYFIELD - LAG (MYFIELD,1) OVER (ORDER BY MYDATE) AS BMI
FROM
MYTABLE;
ORDER BY MYDATE DESC
Using a LEAD function if you want your code to be a part of new code for some reason:
select TOP 1 (bmi - lead(bmi) over (order by date_field)) as result
from( SELECT TOP 2 my_field as bmi
, date_field
FROM my_table
ORDER BY date_field DESC) A
Here is a DEMO
Or by LAG :
select TOP 1 (lag(my_field) over (order by date_field) - my_field ) as result
FROM my_table
ORDER BY date_field DESC;
You can use LEAD/ LAG if your version of SQL Server supports these functions. If you are on an older version then you can use a windowed function to apply an order to the rows.
Here's your data going into a temporary table variable:
DECLARE #MY_TABLE TABLE (THE_DATE_FIELD DATE, MY_FIELD INT);
INSERT INTO #MY_TABLE SELECT '20200114', 300 UNION ALL SELECT '20200113', 250;
...and here's a query to perform the calculation you needed:
WITH x AS (
SELECT TOP 2
THE_DATE_FIELD,
MY_FIELD AS bmi,
ROW_NUMBER() OVER (ORDER BY THE_DATE_FIELD DESC) AS order_id
FROM
#MY_TABLE)
SELECT
MAX(CASE WHEN order_id = 1 THEN bmi END) - MAX(CASE WHEN order_id = 2 THEN bmi END) AS difference_bmi
FROM
x;
If I peek at the data from the CTE then I see this (and this is why I included the date field, which is redundant, and could otherwise be removed):
THE_DATE_FIELD bmi order_id
2020-01-14 300 1
2020-01-13 250 2
Now it's simply a case of picking the two values, as one has an order_id = 1 and one has an order_id = 2.

select only last try

Consider the following table structure for an imaginary table named score:
player_name |player_lastname |try |score
primary key: (player_name,player_lastname,try)
(dont discuss the table schema, its just an example)
This table holds the scores of all players - every player should be able to play either one OR two times. Now, how could I fetch data about every player's last try only (i.e. first tries should be ignored for those who played more than once)?
An example of what I'm trying to achieve:
player_name,player_lastname,try,score
=====================================
bart, simpson,1,250
lisa,simpson,1,150
lisa,simpson,2,250
homer,simpson,1,300
homer,simpson,2,350
maggi,simpson,1,50
The result should be:
player_name,player_lastname,try,score
=====================================
bart, simpson,1,250
lisa,simpson,2,250
homer,simpson,2,350
maggi,simpson,1,50
One option is to JOIN the table to itself using a subquery with MAX:
select s.*
from score s
join (
select max(try) maxtry, player_name, player_lastname
from score
group by player_name, player_lastname
) s2 on s.player_name = s2.player_name
and s.player_lastname = s2.player_lastname
and s.try = s2.maxtry
SQL Fiddle Demo
Depending on your database, you may be able to take advantage of analytic functions such as ROW_NUMBER() though which would make this easier. Here is a another fiddle to demonstrate.
Since you are using postgresql, then you should be able to use the analytic ROW_NUMBER() function. This should work as well:
select *
from (
select try, player_name, player_lastname, score,
Row_Number() Over (Partition By player_name, player_lastname order by try desc) rn
from score
) s
where rn = 1
BTW -- I'd consider adding a player_id as a primary key.
This will probably have the best performance
select distinct on (player_name, player_lastname)
player_name, player_lastname, try, score
from score
order by 1, 2, 3 desc
A Rank function can solve this:
SELECT player_name,player_lastname,TRY,score
FROM (SELECT player_name,player_lastname,TRY,score,RANK() OVER (PARTITION BY player_name, Player_Lastname ORDER BY TRY DESC)AS try_rank
FROM score
)sub
WHERE try_rank = 1
I'm assuming 'try' is the number that can be 1/2.
Edit, forgot Partition BY
SELECT player_name,player_lastname,try,score
FROM scores sc
WHERE NOT EXISTS (
SELECT *
FROM scores nx
WHERE nx.player_name = sc.player_name
AND nx.player_lastname = sc.player_lastname
AND nx.try > sc.try
);
Try this out:
Sel player_name,
player_lastname,
try,
score
from score where try = 2 or
try = 1 and
(player_name,player_lastname) not in
(sel player_name,player_lastname from score where try=2);

MySQL "ORDER BY" the amount of rows with the same value for a certain column?

I have a table called trends_points, this table has the following columns:
id (the unique id of the row)
userId (the id of the user that has entered this in the table)
term (a word)
time (a unix timestamp)
Now, I'm trying to run a query on this table which will get the rows in a specific time frame ordered by how many times the column term appears in the table during the specific timeframe...So for example if the table has the following rows:
id | userId | term | time
------------------------------------
1 28 new year 1262231638
2 37 new year 1262231658
3 1 christmas 1262231666
4 34 new year 1262231665
5 12 christmas 1262231667
6 52 twitter 1262231669
I'd like the rows to come out ordered like this:
new year
christmas
twitter
This is because "new year" exists three times in the timeframe, "christmas" exists twice and "twitter" is only in one row.
So far I've asummed it's a simple WHERE for the specific timeframe part of the query and a GROUP BY to stop the same term from coming up twice in the list.
This makes the following query:
SELECT *
FROM `trends_points`
WHERE ( time >= <time-period_start>
AND time <= <time-period_end> )
GROUP BY `term`
Does anyone know how I'd do the final part of the query? (Ordering the query's results by how many rows contain the same "term" column value..).
Use:
SELECT tp.term,
COUNT(*) 'term_count'
FROM TREND_POINTS tp
WHERE tp.time BETWEEN <time-period_start> AND <time-period_end>
GROUP BY tp.term
ORDER BY term_count DESC, tp.term
See this question about why to use BETWEEN vs using the >=/<= operators.
Keep in mind there can be ties - the order by defaults to alphabetically shorting by term value when this happens, but there could be other criteria.
Also, if you want to additionally limit the number of rows/terms coming back you can add the LIMIT clause to the end of the query. For example, this query will return the top five terms:
SELECT tp.term,
COUNT(*) 'term_count'
FROM TREND_POINTS tp
WHERE tp.time BETWEEN <time-period_start> AND <time-period_end>
GROUP BY tp.term
ORDER BY term_count DESC, tp.term
LIMIT 5
Quick answer:
SELECT
term, count(*) as thecount
FROM
mytable
WHERE
(...)
GROUP BY
term
ORDER BY
thecount DESC
SELECT t.term
FROM trend_points t
WHERE t.time >= <time-period_start> AND t.time <= <time-period_end>
ORDER BY COUNT(t.term) DESC
GROUP BY t.term
COUNT() will give you the number of rows in the group, so just order by that.
SELECT * FROM `trends_points`
WHERE ( `time` >= <time-period_start> AND `time` <= <time-period_end> )
ORDER BY COUNT(`term`) DESC
GROUP BY `term`