Find Next Closest Number in PostgreSQL

Find Next Closest Number in PostgreSQL - sql

I am running PostgreSQL 9.1.9 x64 with PostGIS 2.0.3 under Windows Server 2008 R2.
I have a table:
CREATE TABLE field_data.trench_samples (
pgid SERIAL NOT NULL,
trench_id TEXT,
sample_id TEXT,
from_m INTEGER
);
With some data in it:
INSERT INTO field_data.trench_samples (
trench_id, sample_id, from_m
)
VALUES
('TR01', '1000001', 0),
('TR01', '1000002', 5),
('TR01', '1000003', 10),
('TR01', '1000004', 15),
('TR02', '1000005', 0),
('TR02', '1000006', 3),
('TR02', '1000007', 9),
('TR02', '1000008', 14);
Now, what I am interested in is finding the difference (distance in metres in this example) between a record's "from_m" and the "next" "from_m" for that trench_id.
So, based on the data above, I'd like to end up with a query that produces the following table:
pgid, trench_id, sample_id, from_m, to_m, interval
1, 'TR01', '1000001', 0, 5, 5
2, 'TR01', '1000002', 5, 10, 5
3, 'TR01', '1000003', 10, 15, 5
4, 'TR01', '1000004', 15, 20, 5
5, 'TR02', '1000005', 0, 3, 3
6, 'TR02', '1000006', 3, 9, 6
7, 'TR02', '1000007', 9, 14, 5
8, 'TR02', '1000008', 14, 19, 5
Now, you are likely saying "wait, how do we infer an interval length for the last sample in each line, since there is no "next" from_m to compare to?"
For the "ends" of lines (sample_id 1000004 and 1000008) I would like to use the identical interval length of the previous two samples.
Of course, I have no idea how to tackle this in my current environment. Your help is very much appreciated.

Here is how you get the difference, using the one previous example at the end (as shown in the data but not explained clearly in the text).
The logic here is repeated application of lead() and lag(). First apply lead() to calculate the interval. Then apply lag() to calculate the interval at the boundary, by using the previous interval.
The rest is basically just arithmetic:
select trench_id, sample_id, from_m,
coalesce(to_m,
from_m + lag(interval) over (partition by trench_id order by sample_id)
) as to_m,
coalesce(interval, lag(interval) over (partition by trench_id order by sample_id))
from (select t.*,
lead(from_m) over (partition by trench_id order by sample_id) as to_m,
(lead(from_m) over (partition by trench_id order by sample_id) -
from_m
) as interval
from field_data.trench_samples t
) t
Here is the SQLFiddle showing it working.

Related

Two intervals coincide at some point in SQL-Server

How to know if two intervals coincide at some point?
I have two tables that store two intervals where the values mean meters.
The first interval corresponds to geological codes (VBv, P4, etc).
The second interval corresponds to samples.
They are connected through a field called Hole ID.
CREATE TABLE codes (
code VARCHAR (10),
depth_from INT,
depth_to INT,
hole_id INT
);
INSERT INTO codes VALUES ('P4', 1, 2, 100);
INSERT INTO codes VALUES ('VBv', 2, 6, 100);
INSERT INTO codes VALUES ('P4', 6, 10, 100);
CREATE TABLE samples (
sample VARCHAR (50),
depth_from INT,
depth_to INT,
hole_id INT
);
INSERT INTO samples VALUES ('OP0051780', 1, 3, 100);
INSERT INTO samples VALUES ('OP0051781', 3, 9, 100);
INSERT INTO samples VALUES ('OP0051780', 9, 10, 100);
I need all the sample ranges that match the code ranges, putting a certain code as a parameter.
What I have tried: I built a query that checks if the "from" or "to" match. I also check if any interval is contained in another.
SELECT * FROM codes INNER JOIN samples ON codes.hole_id = samples.hole_id
WHERE codes.code = 'VBv' AND
(
-- Possibility 1: From or to match
(samples.depth_from = codes.depth_from or samples.depth_to = codes.depth_to)
-- Possibility 2: Some interval contained in another.
or (samples.depth_from >= codes.depth_from and samples.depth_to <= codes.depth_to)
or (codes.depth_from >= samples.depth_from and codes.depth_to <= samples.depth_to)
)
This works for the following situations:
But when there is no match in the "from" and "to" and one interval is not contained in the other, I don't know how to solve it.

SELECT * FROM codes INNER JOIN samples ON codes.hole_id = samples.hole_id
WHERE codes.code = 'VBv' AND
(samples.depth_from <= codes.depth_to AND samples.depth_to >= codes.depth_from)

Adding summary statistics to an existing table in SQL

I am trying to add summary statistics (just total and average) to a table with 21 columns and 7 rows of data, I would like the two rows of summary statistics to start at row 8. I've been trying a query along these lines without any luck:
SELECT *
FROM
( SELECT 1,
weekday, summer_member_total, summer_member_avg_duration, summer_casual_total, summer_casual_avg_duration,
fall_member_total, fall_member_avg_duration, fall_casual_total, fall_casual_avg_duration,
winter_member_total, winter_member_avg_duration, winter_casual_total, winter_casual_avg_duration,
spring_member_total, spring_member_avg_duration, spring_casual_total, spring_casual_avg_duration,
member_total, member_avg_duration, casual_total, casual_avg_duration,
FROM `case-study-319921.2020_2021_Trip_Data.2020_2021_Summary_Stats`
UNION ALL
SELECT 8,
'TOTAL',
SUM(summer_member_total),
SUM(summer_member_avg_duration),
SUM(summer_casual_total),
SUM(summer_casual_avg_duration),
SUM(fall_member_total),
SUM(fall_member_avg_duration),
SUM(fall_casual_total),
SUM(fall_casual_avg_duration),
SUM(winter_member_total),
SUM(winter_member_avg_duration),
SUM(winter_casual_total),
SUM(winter_casual_avg_duration),
SUM(spring_member_total),
SUM(spring_member_avg_duration),
SUM(spring_casual_total),
SUM(spring_casual_avg_duration),
SUM(member_total),
SUM(member_avg_duration),
SUM(casual_total),
SUM(casual_avg_duration),
FROM `case-study-319921.2020_2021_Trip_Data.2020_2021_Summary_Stats`
UNION ALL
SELECT 9,
'AVG',
AVG(summer_member_total),
AVG(summer_member_avg_duration),
AVG(summer_casual_total),
AVG(summer_casual_avg_duration),
AVG(fall_member_total),
AVG(fall_member_avg_duration),
AVG(fall_casual_total),
AVG(fall_casual_avg_duration),
AVG(winter_member_total),
AVG(winter_member_avg_duration),
AVG(winter_casual_total),
AVG(winter_casual_avg_duration),
AVG(spring_member_total),
AVG(spring_member_avg_duration),
AVG(spring_casual_total),
AVG(spring_casual_avg_duration),
AVG(member_total),
AVG(member_avg_duration),
AVG(casual_total),
AVG(casual_avg_duration),
FROM `case-study-319921.2020_2021_Trip_Data.2020_2021_Summary_Stats` )
ORDER BY 1
Any ideas on how to approach this?

As an option to six your issue - replace
SELECT 1,
weekday, summer_
with
SELECT 1,
CAST(weekday AS STRING) weekday , summer_

SSRS (report builder) sorting a row

My requirement is I have to sort a row Month Wise, that is Jan, Feb, Mar etc right now it is not sorted.
I have tried writing query under query window order by case when...
I have tried writing query in expression of month field i.e.
IIF(Fields!Month_Y.Value = "Feb-19", 2,
IIF(Fields!Month_Y.Value = "Mar-19", 3,
IIF(Fields!Month_Y.Value = "Apr-19", 4,
IIF(Fields!Month_Y.Value = "May-19", 5,
IIF(Fields!Month_Y.Value = "Jun-19", 6,
IIF(Fields!Month_Y.Value = "Jul-19", 7,
IIF(Fields!Month_Y.Value = "Aug-19", 8,
IIF(Fields!Month_Y.Value = "Sep-19", 9,
IIF(Fields!Month_Y.Value = "Oct-19", 10,
IIF(Fields!Month_Y.Value = "Nov-19", 11, 12)))))))))))
I have wrote same IIF condition query in Tablix - Sorting expression field too.
But still it is not sorting the report month wise. if anyone can have a look and give the solution please.
Thank you in advance.

I understand from your question that you tried to use "case" on the "data set"
I have tried writing query under query window order by case when...
So the next thing that you need to do is:
Right-click on the column.
Go to "Shorting".
Press "Add".
Choose your "number" column(that you mapped - "Feb-19" etc.).
Choose your order method "Z-A" or "A-Z".
I simulated your case and it worked for me.
See the image below

Oracle SQL aggregate rows into column listagg with condition

I am having the following - simplified - layout for tables:
TABLE blocks (id)
TABLE content (id, blockId, order, data, type)
content.blockId is a foreign key to blocks.id. The idea is that in the content table you have many content entries with different types for one block.
I am now looking for a query that can provide me with an aggregation based on a blockId where all the content entries of the 3 different types are concatenated and put into respective columns.
I have already started and found the listagg function which is working well, I did the following statement and lists me all the content entries in a column:
SELECT listagg(c.data, ',') WITHIN GROUP (ORDER BY c.order) FROM content c WHERE c.blockId = 330;
Now the concatenated string however contains all the data elements of the block in one column. What I would like to achieve is that its put into separate columns based on the type. For example the following content of content would be like this:
1, 1, 0, "content1", "FRAGMENT"
2, 1, 1, "content2", "BULK"
3, 1, 3, "content4", "FRAGMENT"
4, 1, 2, "content3", "FRAGMENT"
Now I wanted to get as an output 2 columns, one is FRAGMENT and one is BULK, where FRAGMENT contains "content1;content3;content4" and BULK contains "content2"
Is there an efficient way of achieving this?

You can use case:
SELECT listagg(CASE WHEN content = 'FRAGMENT' THEN c.data END, ',') WITHIN GROUP (ORDER BY c.order) as fragments,
listagg(CASE WHEN content = 'BULK' THEN c.data END, ',') WITHIN GROUP (ORDER BY c.order) as bulks
FROM content c
WHERE c.blockId = 330;

As an alternative, if you want it more dynamic, you could pivot the outcome.
Note, that this will only work for Oracle 11.R2. Here´s an example how it could look like:
select * from
(with dataSet as (select 1 idV, 1 bulkid, 0 orderV, 'content1' dataV, 'FRAGMENT' typeV from dual union
select 2, 1, 1, 'content2', 'BULK' from dual union
select 3, 1, 3, 'content4', 'FRAGMENT' from dual union
select 4, 1, 2, 'content3', 'FRAGMENT' from dual)
select typeV, listagg(dataSet.dataV ,',') WITHIN GROUP (ORDER BY orderV) OVER (PARTITION BY typeV) dataV from dataSet)
pivot
(
max(dataV)
for typeV in ('BULK', 'FRAGMENT')
)
O/P
Bulk | FRAGMENT
-----------------
content2 | content1,content3,content4
The important things here:
OVER (PARTITION BY typeV): this acts like a group by for the listagg, concatinating everything having the same typeV.
for typeV in ('BULK', 'FRAGMENT'): this will gather the data for BULK and FRAGMENT and produce separate columns for each.
max(dataV) simply to provide a aggregate function, otherwise pivot wont work.

How to improve performance for datetime filtering in SQL Server?

I have a problem with filtering by datetime columns.
I tried these two methods:
datefield < '2013-03-15 17:17:55.179'
datefield < CAST('2013-03-15 17:17:55.179' AS datetime)
I have a large database with over 3.000.000 main objects.
So I need to improve performance for my datetime filtering. I was reading about UNIX timestamp (convert all datetime to UNIX timestamp and then filter by this UNIX field).
I think it's a better way than filtering by datetime. But if anyone knows some other way, I would appreciate it.
My query is:
SELECT TOP (100) ev.Title as Event_name, po.Name as POI_name,
po.Address, po.City, po.Region, po.Country, po.Latitude, po.Longitude, ev.Start_time,
(Select ID_Category FROM SubCategory s where ev.ID_SubCategory = s.ID_SubCategory) as ID_Category,
ev.ID_SubCategory, ev.ID_Event, ev.ID_Channel, IDChanelEvent,
ev.FavoriteCount, po.gmtOffset, v.IsFavorite, v1.IsFavorite
FROM Events ev
JOIN POI po ON ev.ID_POI = po.ID_POI
JOIN (SELECT et.id_event as joinIdEv FROM EventTagLink et, tags t
WHERE t.id_tag = et.id_tag
AND ( t.Title = N'music' )
) as joinEvents
ON joinEvents.joinIdEv = ev.ID_Event
LEFT JOIN Viewed v ON v.ID_Event = ev.ID_Event AND v.ID_User = 1 AND v.IsFavorite = 1 LEFT join Viewed v1 ON v1.ID_Event = ev.ID_Event AND v1.ID_User = 1 AND v1.IsFavorite = 0
WHERE
--ev.GmtStop_time > '2013-03-15 14:17:55.188' AND
po.Latitude > 41.31423 AND po.Latitude < 61.60511
AND po.Longitude > -6.676602 AND po.Longitude < 17.04498
AND ev.ID_SubCategory in (3, 12, 21, 4, 30, 13, 22, 6, 14, 40, 23, 7, 32, 15, 41, 8, 50, 33, 16, 42, 25, 9, 34, 17, 35, 18, 44, 27, 36, 19, 45, 28, 37, 46, 29, 38, 47, 39, 48, 49, 10, 1, 11, 2, 20)
--AND ev.GmtStart_time< '2013-03-15 17:17:55.179'
AND v1.IsFavorite is null
filtering by the time I commented.
If I turn off these filters, request duration is several seconds. If I turn them on then request duration is over 25 seconds.
Execution plan with filtering datetime
Execution plan without datetime filter
So there is a lot of discussion about execute plans, indexes and so on. But what about UNIX timestamp, which is the main reason why I've put the question there. Would it improve performance for datetime filtering?

Just a suggestion when it comes to indexes on datetime in msql is the index footprint impacts search times (Yes this seems obvious...but please read onward).
The importances to this when indexing on the datetime say for instance '2015-06-05 22:47:20.102' the index has to account for every place within the datetime. This becomes very large spatially and bulky. A successful approach that I've leveraged is create a new datetime column and populate the data by rounding the time to the hour and then building the index upon this new column. Example '2015-06-05 22:47:20.102' translates to '2015-06-05 22:00:00.000'. By taking this approach we leave the detailed data alone and can display it or use it by search on this new column which gives us approximately a 10x (at minimum) return on how fast results are returned. This is due to the fact that the index doesn't have to account for the minutes, seconds and millisecond fields.

You need to look at your execution plan first to see what SQL Server is doing. More than likely, you just need add an index. Little conversions like this are almost never the reason why your query is slow. Indices are a good first stop for fixing queries.
You don't need to make this the clustered index. Making it the clustered index means that you don't need to do a lookup, but for only 100 rows, lookup is very fast. I would put datetime and subcategory into a nonclustered index, in that order.
If you are ordering, you should also make sure that's in an index. Since it only makes sense to use one index per table, you'll need to make sure all the relevant columns are in the same index, in the right order.
But first, get your actual execution plan!

For better performance I suggest you create new indexes:
CREATE INDEX x1 ON LiveCity.dbo.Tags(Title) INCLUDE(ID_Tag)
CREATE INDEX x2 ON LiveCity.dbo.Tags(ID_Event, GmtStart_time, GmtStop_time)
INCLUDE(
FavoriteCount,
ID_Channel,
ID_POI,
ID_SubCategory,
IDChanelEvent,
Start_time,
Title
)
CREATE INDEX x ON LiveCity.dbo.POI(ID_POI, Latitude, Longitude)
INCLUDE(
Address,
City,
Country,
gmtOffset,
Name,
Region
)
This will help you avoid RID lookup operation and improve the overall performance of the query.

Try this one -
;WITH cte AS (
SELECT IsFavorite, ID_Event
FROM Viewed
WHERE ID_User = 1
)
SELECT TOP (100)
Event_name = ev.Title
, POI_name = po.Name
, po.[address]
, po.City
, po.Region
, po.Country
, po.Latitude
, po.Longitude
, ev.start_time
, s.ID_Category
, ev.ID_SubCategory
, ev.ID_Event
, ev.ID_Channel
, IDChanelEvent
, ev.FavoriteCount
, po.gmtOffset
, v.IsFavorite
, IsFavorite = NULL
FROM [events] ev
JOIN POI po ON ev.ID_POI = po.ID_POI
LEFT JOIN SubCategory s ON ev.ID_SubCategory = s.ID_SubCategory
LEFT JOIN cte v ON v.ID_Event = ev.ID_Event AND v.IsFavorite = 1
WHERE po.Latitude BETWEEN 41.31423 AND 61.60511
AND po.Longitude BETWEEN -6.676602 AND 17.04498
AND ev.ID_SubCategory IN (3, 12, 21, 4, 30, 13, 22, 6, 14, 40, 23, 7, 32, 15, 41, 8, 50, 33, 16, 42, 25, 9, 34, 17, 35, 18, 44, 27, 36, 19, 45, 28, 37, 46, 29, 38, 47, 39, 48, 49, 10, 1, 11, 2, 20)
AND v1.IsFavorite IS NULL
AND EXISTS(
SELECT 1
FROM EventTagLink et
WHERE t.Title = 'music'
AND et.joinIdEv = ev.ID_Event
)
AND NOT EXISTS (
SELECT *
FROM cte v1
WHERE v1.ID_Event = ev.ID_Event AND v1.IsFavorite = 0
)

Create cluster index on datetime field it will definitely help . we faced same problem earlier . we solved it by creating index on datetime column .

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Find Next Closest Number in PostgreSQL - sql

Related

Two intervals coincide at some point in SQL-Server

Adding summary statistics to an existing table in SQL

SSRS (report builder) sorting a row

Oracle SQL aggregate rows into column listagg with condition

How to improve performance for datetime filtering in SQL Server?

Categories

Resources