Pick a record based on a given value in postgres

Pick a record based on a given value in postgres - sql

I have a table in postgres like below,
alg_campaignid | alg_score | cp | sum
----------------+-----------+---------+----------
9829 | 30.44056 | 12.4000 | 12.4000
9880 | 29.59280 | 12.0600 | 24.4600
9882 | 29.59280 | 12.0600 | 36.5200
9827 | 29.27504 | 11.9300 | 48.4500
9821 | 29.14840 | 11.8800 | 60.3300
9881 | 29.14840 | 11.8800 | 72.2100
9883 | 29.14840 | 11.8800 | 84.0900
10026 | 28.79280 | 11.7300 | 95.8200
10680 | 10.31504 | 4.1800 | 100.0000
From which i have to select a record based on randomly generated number from 0 to 100.i.e first record should be returned if random number picked is between 0 and 12.4000,second if rendom is between 12.4000 and 24.4600,and likewise last if random no is between 95.8200 and 100.0000.
For Example
if the random number picked is 8 then the first record should be returned
or
if the random number picked is 48 then the fourth record should be returned
Is it possible to do this postgres if so kindly recommend a solution for this..

Yes, you can do this in Postgres. If you want to generate the number in the database:
with r as (
select random() * 100 as r
)
select t.*
from table t cross join r
where t.sum <= r.r
order by t.sum desc
limit 1;

Related

join two views and detect missing entries where the matching condition is in the next row of the other view/table (using SQLITE)

I am running a science test and logging my data inside two sqlite tables.
I have selected the data needed into two seperate and independent Views (RX and TX views).
Now I need to analyze the measurements and create a 3rd table view with the results with the following points in mind:
1- For each test at TX side (Table-1) there might be a corresponding entry at RX side (Table-2).
2- If the time stamp #RX side is less than the time stamp at the next row of the TX table view
we consider them to be associated with one record in the 3rd view/table and calculate the time difference OTHERWISE it would be a miss.
Question: How should i write the sql query in SQLITE to produce the analysis and test result given in table3?
Thanks a lot in advance.
TX View - Table (1)
id | time | measurement
------------------------
1 | 09:40:10.221 | 100
2 | 09:40:15.340 | 60
3 | 09:40:21.100 | 80
4 | 09:40:25.123 | 90
5 | 09:40:29.221 | 45
RX View -Table (2)
time | measurement
------------------------
09:40:15.7 | 65
09:40:21.560 | 80
09:40:30.414 | 50
Test Result View - Table (3)
id |TxTime |RxTime | delta_time(s)| delta_value
------------------------------------------------------------------------
1 | 09:40:10.221 | NULL |NULL | NULL (i.e. missed)
2 | 09:40:15.340 | 09:40:15.7 |0.360 | 5
3 | 09:40:21.100 | 09:40:21.560 |0.460 | 0
4 | 09:40:25.123 | NULL |NULL | NULL (i.e. missed)
5 | 09:40:29.221 | 09:40:30.414 |1.193 | 5

Use window function LEAD() to get the next time of each row in TX and join the views on your conditions:
SELECT t.id, t.time TxTime, r.time RxTime,
ROUND((julianday(r.time) - julianday(t.time)) * 24 * 60 *60, 3) [delta_time(s)],
r.measurement - t.measurement delta_value
FROM (
SELECT *, LEAD(time) OVER (ORDER BY Time) next
FROM TX
) t
LEFT JOIN RX r ON r.time >= t.time AND (r.time < t.next OR t.next IS NULL)
See the demo.
Results:
> id | TxTime | RxTime | delta_time(s) | delta_value
> -: | :----------- | :----------- | :------------ | :----------
> 1 | 09:40:10.221 | null | null | null
> 2 | 09:40:15.340 | 09:40:15.7 | 0.36 | 5
> 3 | 09:40:21.100 | 09:40:21.560 | 0.46 | 0
> 4 | 09:40:25.123 | null | null | null
> 5 | 09:40:29.221 | 09:40:30.414 | 1.193 | 5

Returning singular row/value from joined table date based on closest date

I have a Production Table and a Standing Data table. The relationship of Production to Standing Data is actually Many-To-Many which is different to how this relationship is usually represented (Many-to-One).
The standing data table holds a list of tasks and the score each task is worth. Tasks can appear multiple times with different "ValidFrom" dates for changing the score at different points in time. What I am trying to do is query the Production Table so that the TaskID is looked up in the table and uses the date it was logged to check what score it should return.
Here's an example of how I want the data to look:
Production Table:
+----------+------------+-------+-----------+--------+-------+
| RecordID | Date | EmpID | Reference | TaskID | Score |
+----------+------------+-------+-----------+--------+-------+
| 1 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 2 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 3 | 30/02/2020 | 1 | 123 | 1 | 2 |
| 4 | 31/02/2020 | 1 | 123 | 1 | 2 |
+----------+------------+-------+-----------+--------+-------+
Standing Data
+----------+--------+----------------+-------+
| RecordID | TaskID | DateActiveFrom | Score |
+----------+--------+----------------+-------+
| 1 | 1 | 01/02/2020 | 1.5 |
| 2 | 1 | 28/02/2020 | 2 |
+----------+--------+----------------+-------+
I have tried the below code but unfortunately due to multiple records meeting the criteria, the production data duplicates with two different scores per record:
SELECT p.[RecordID],
p.[Date],
p.[EmpID],
p.[Reference],
p.[TaskID],
s.[Score]
FROM ProductionTable as p
LEFT JOIN StandingDataTable as s
ON s.[TaskID] = p.[TaskID]
AND s.[DateActiveFrom] <= p.[Date];
What is the correct way to return the correct and singular/scalar Score value for this record based on the date?

You can use apply :
SELECT p.[RecordID], p.[Date], p.[EmpID], p.[Reference], p.[TaskID], s.[Score]
FROM ProductionTable as p OUTER APPLY
( SELECT TOP (1) s.[Score]
FROM StandingDataTable AS s
WHERE s.[TaskID] = p.[TaskID] AND
s.[DateActiveFrom] <= p.[Date]
ORDER BY S.DateActiveFrom DESC
) s;
You might want score basis on Record Level if so, change the where clause in apply.

ORDER BY FIELD LIST - Subquery returns more than 1 row

What i want to do is quite simple:
Write an SQL that will return a bunch of record and order the records by some list of id from the FIELD LIST section of my SQL
TABLE SAMPLE
lessons
+----+----------------------+
| id | name |
+----+----------------------+
| 9 | Greedy algorithms |
| 5 | Maya civilization |
| 3 | eFront Beginner |
| 2 | eFront Intermediate |
+----+----------------------+
mod_comp_rule
+----+---------------------+
| id | lesson_id | comp_id |
+----+---------------------+
| 1 | 3 | 1 |
| 2 | 2 | 1 |
| 3 | 9 | 2 |
+----+---------------------+
WHAT I WANT TO GET FROM MY QUERY
SELECT * FROM lessons ORDER BY FIELD(id,'3','2','9') ASC;
MY SQL
SELECT ls.id, ls.name
FROM lessons ls
ORDER BY FIELD(ls.id,
(SELECT mcr.lesson_id FROM mod_comp_rule mcr
INNER JOIN lessons ls ON ls.id = mcr.lesson_id))
My SQL Query returned the following error
MySQL said: #1242 - Subquery returns more than 1 row
So how can i make my SQL return FIELD(id,'3','2','9') without flagging the more than 1 row error ?

I don't see why FIELD() is needed for this. A correlated query will do what you want:
SELECT ls.id, ls.name
FROM lessons ls
ORDER BY (SELECT mcr.id FROM mod_comp_rule mcr WHERE ls.id = mcr.lesson_id);

SQL deleting rows with duplicate dates conditional upon values in two columns

I have data on approx 1000 individuals, where each individual can have multiple rows, with multiple dates and where the columns indicate the program admitted to and a code number.
I need each row to contain a distinct date, so I need to delete the rows of duplicate dates from my table. Where there are multiple rows with the same date, I need to keep the row that has the lowest code number. In the case of more than one row having both the same date and the same lowest code, then I need to keep the row that also has been in program (prog) B. For example;
| ID | DATE | CODE | PROG|
--------------------------------
| 1 | 1996-08-16 | 24 | A |
| 1 | 1997-06-02 | 123 | A |
| 1 | 1997-06-02 | 123 | B |
| 1 | 1997-06-02 | 211 | B |
| 1 | 1997-08-19 | 67 | A |
| 1 | 1997-08-19 | 23 | A |
So my desired output would look like this;
| ID | DATE | CODE | PROG|
--------------------------------
| 1 | 1996-08-16 | 24 | A |
| 1 | 1997-06-02 | 123 | B |
| 1 | 1997-08-19 | 23 | A |
I'm struggling to come up with a solution to this, so any help greatly appreciated!
Microsoft SQL Server 2012 (X64)

The following works with your test data
SELECT ID, date, MIN(code), MAX(prog) FROM table
GROUP BY date
You can then use the results of this query to create a new table or populate a new table. Or to delete all records not returned by this query.
SQLFiddle http://sqlfiddle.com/#!9/0ebb5/5

You can use min() function: (See the details here)
select ID, DATE, min(CODE), max(PROG)
from table
group by DATE
I assume that your table has a valid primary key. However i would recommend you to take IDas Primary key. Hope this would help you.

Eliminate full table scan due to BETWEEN (and GROUP BY)

Description
According to the explain command, there is a range that is causing a query to perform a full table scan (160k rows). How do I keep the range condition and reduce the scanning? I expect the culprit to be:
Y.YEAR BETWEEN 1900 AND 2009 AND
Code
Here is the code that has the range condition (the STATION_DISTRICT is likely superfluous).
SELECT
COUNT(1) as MEASUREMENTS,
AVG(D.AMOUNT) as AMOUNT,
Y.YEAR as YEAR,
MAKEDATE(Y.YEAR,1) as AMOUNT_DATE
FROM
CITY C,
STATION S,
STATION_DISTRICT SD,
YEAR_REF Y FORCE INDEX(YEAR_IDX),
MONTH_REF M,
DAILY D
WHERE
-- For a specific city ...
--
C.ID = 10663 AND
-- Find all the stations within a specific unit radius ...
--
6371.009 *
SQRT(
POW(RADIANS(C.LATITUDE_DECIMAL - S.LATITUDE_DECIMAL), 2) +
(COS(RADIANS(C.LATITUDE_DECIMAL + S.LATITUDE_DECIMAL) / 2) *
POW(RADIANS(C.LONGITUDE_DECIMAL - S.LONGITUDE_DECIMAL), 2)) ) <= 50 AND
-- Get the station district identification for the matching station.
--
S.STATION_DISTRICT_ID = SD.ID AND
-- Gather all known years for that station ...
--
Y.STATION_DISTRICT_ID = SD.ID AND
-- The data before 1900 is shaky; insufficient after 2009.
--
Y.YEAR BETWEEN 1900 AND 2009 AND
-- Filtered by all known months ...
--
M.YEAR_REF_ID = Y.ID AND
-- Whittled down by category ...
--
M.CATEGORY_ID = '003' AND
-- Into the valid daily climate data.
--
M.ID = D.MONTH_REF_ID AND
D.DAILY_FLAG_ID <> 'M'
GROUP BY
Y.YEAR
Update
The SQL is performing a full table scan, which results in MySQL performing a "copy to tmp table", as shown here:
+----+-------------+-------+--------+-----------------------------------+--------------+---------+-------------------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-----------------------------------+--------------+---------+-------------------------------+--------+-------------+
| 1 | SIMPLE | C | const | PRIMARY | PRIMARY | 4 | const | 1 | |
| 1 | SIMPLE | Y | range | YEAR_IDX | YEAR_IDX | 4 | NULL | 160422 | Using where |
| 1 | SIMPLE | SD | eq_ref | PRIMARY | PRIMARY | 4 | climate.Y.STATION_DISTRICT_ID | 1 | Using index |
| 1 | SIMPLE | S | eq_ref | PRIMARY | PRIMARY | 4 | climate.SD.ID | 1 | Using where |
| 1 | SIMPLE | M | ref | PRIMARY,YEAR_REF_IDX,CATEGORY_IDX | YEAR_REF_IDX | 8 | climate.Y.ID | 54 | Using where |
| 1 | SIMPLE | D | ref | INDEX | INDEX | 8 | climate.M.ID | 11 | Using where |
+----+-------------+-------+--------+-----------------------------------+--------------+---------+-------------------------------+--------+-------------+
Answer
After using the STRAIGHT_JOIN:
+----+-------------+-------+--------+-----------------------------------+---------------+---------+-------------------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-----------------------------------+---------------+---------+-------------------------------+------+---------------------------------+
| 1 | SIMPLE | C | const | PRIMARY | PRIMARY | 4 | const | 1 | Using temporary; Using filesort |
| 1 | SIMPLE | S | ALL | PRIMARY | NULL | NULL | NULL | 7795 | Using where |
| 1 | SIMPLE | SD | eq_ref | PRIMARY | PRIMARY | 4 | climate.S.STATION_DISTRICT_ID | 1 | Using index |
| 1 | SIMPLE | Y | ref | PRIMARY,STAT_YEAR_IDX | STAT_YEAR_IDX | 4 | climate.S.STATION_DISTRICT_ID | 1650 | Using where |
| 1 | SIMPLE | M | ref | PRIMARY,YEAR_REF_IDX,CATEGORY_IDX | YEAR_REF_IDX | 8 | climate.Y.ID | 54 | Using where |
| 1 | SIMPLE | D | ref | INDEX | INDEX | 8 | climate.M.ID | 11 | Using where |
+----+-------------+-------+--------+-----------------------------------+---------------+---------+-------------------------------+------+---------------------------------+
Related
http://dev.mysql.com/doc/refman/5.0/en/how-to-avoid-table-scan.html
http://dev.mysql.com/doc/refman/5.0/en/where-optimizations.html
Optimize SQL that uses between clause
Thank you!

ONE Request... It looks like you KNOW your data. Add the keyword "STRAIGHT_JOIN" and see the results...
SELECT STRAIGHT_JOIN ... the rest of your query...
Straight-join tells MySql to DO IT AS I HAVE LISTED. So, your CITY table is the first in the FROM list, thus indicating you expect that to be your primary... Additionally, your WHERE clause of the CITY is the immediate filter. With that being said, it will probably fly through the rest of the query...
Hope it helps... Its worked for me with gov't data of millions of records queried and joined to 10+ lookup tables where mySql was trying to think for me.

in order to do efficient between queries you are going to want a b tree index on your YEAR column. for example:
CREATE INDEX id_index USING BTREE ON YEAR_REF (YEAR);
BTREE indexes allow for efficient range queries, if this is in fact the root problem then having an index like this should get rid of the full table scan and have it only scan the part of the table that is in the range. read more about btrees on wikipedia
However, as with any optimisation advice, you should measure to make sure that you don't do more harm than good.

Can you change from searching within a radius to search in a bounding box?
You know the city so you can calculate a bounding box in your application.
Perhaps this
S.LATITUDE_DECIMAL >= latitude_lower and
S.LATITUDE_DECIMAL <= latitude_upper and
S.LONGITUDE_DECIMAL >= longitude_lower and
S.LONGITUDE_DECIMAL <= longitude_upper
could be a little faster?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Pick a record based on a given value in postgres - sql

Yes, you can do this in Postgres. If you want to generate the number in the database: with r as ( select random() * 100 as r ) select t.* from table t cross join r where t.sum <= r.r order by t.sum desc limit 1;

Related

join two views and detect missing entries where the matching condition is in the next row of the other view/table (using SQLITE)

Returning singular row/value from joined table date based on closest date

ORDER BY FIELD LIST - Subquery returns more than 1 row

SQL deleting rows with duplicate dates conditional upon values in two columns

Eliminate full table scan due to BETWEEN (and GROUP BY)

Categories

Resources