I have a table with roads that contain mileage of start/end of every road.
I need to analyze this data and get query with same data more rows that contain mileage of start/end of gaps between roads with filled column name with value 'gap'.
Initial table:
id name kmstart kmend
1 road1 0 150
2 road2 150 200
3 road3 220 257
4 road4 260 290
Result query:
id name kmstart kmend
1 road1 0 150
2 road2 150 200
null gap 200 220
3 road3 220 257
null gap 257 260
4 road4 260 290
Try this query:
SELECT NULL, 'gap', previous_kmend AS kmstart, kmstart AS kmend
FROM (
SELECT id, name, kmstart, kmend, LAG(kmend) OVER (ORDER BY kmstart, kmend) AS previous_kmend
FROM roads
)
WHERE previous_kmend < kmstart
UNION ALL
SELECT id, name, kmstart, kmend
FROM roads
ORDER BY kmstart, kmend
I just put up a quick test and it works for me.
It uses the LAG function to get the previous kmend row, and then returns the "gap" row if it is less than the current record's kmstart row. I've written an article about the LAG function recently so it was helpful to remember.
Is this what you're after?
Also, as the other commenters have mentioned, "name" isn't a good column name as it's a reserved word. I've left it here in the code so it's consistent with your question though.
Related
I want to compare two varchar2 fields and based on the percentage similarity to get this percentage as a result in my function, and the ID of this record from this table.
I have the table (SYMPTOMS), I also have the field Symptom_Descr (VARCHAR2) of this table and the variable v_symptom (varchar2) and I want to compare this variable with this field.
For example, this is my table:
The variable that I want to compare is:
'flashes the room lights 5 times'
I want as a result=
id
1
0%
2
0%
3
90%
Another example if the variable is 'washes her hands 7 times':
id
1
80%
2
0%
3
10%
The above percentages are not exact.
If the above cannot be done, then what can I do to find the similarities?
You can use the UTL_MATCH package:
SELECT id,
UTL_MATCH.EDIT_DISTANCE_SIMILARITY(
symptom_descr,
'flashes the room lights 5 times'
) AS ed_similarity,
UTL_MATCH.JARO_WINKLER_SIMILARITY(
symptom_descr,
'flashes the room lights 5 times'
) AS jw_similarity
FROM symptoms;
Which, for the sample data:
CREATE TABLE symptoms (id, symptom_descr) AS
SELECT 1, 'washes his hands constantly' FROM DUAL UNION ALL
SELECT 2, 'checks several times if he turned off the water heater' FROM DUAL UNION ALL
SELECT 3, 'flashes the room lights too many times' FROM DUAL;
Outputs:
ID
ED_SIMILARITY
JW_SIMILARITY
1
30
62
2
25
62
3
79
93
db<>fiddle here
I have a line (segmented road) dataset where each road has a unique code but that road maybe segmented. I want to add the unique code (Location) and sequential number (based on the start chainage of the segment) to a new field (segment_id).
I need it in the QGIS flavour of SQL that only allows these commands -https://sqlite.org/lang.html?
The example below has the field I want to end up with in the last (Segment_ID) column
fid RoadMntnc Location Segments Start_Chainage Segment_ID
640 Albatross_Cl 3 1 0 3.1
606 Allamanda_St 4 1 0 4.1
620 Barrbal_Dr 25 5 0 25.1
624 Barrbal_Dr 25 5 50 25.2
628 Barrbal_Dr 25 5 130 25.3
1092 Barrbal_Dr 25 5 180 25.4
1093 Barrbal_Dr 25 5 250 25.5
600 Bayil_Dr 27 2 120 27.2
601 Bayil_Dr 27 2 0 27.1
We would need to group by location and then for each group get an ascending list of the Start_Chainage and then insert 1-x to the location value to get Location.Segment#
Is this possible purely in SQL or do I need to use Python?
==== UPDATED code example based on QuestionGuyBob's suggestions
select ROW_NUMBER () OVER (
PARTITION BY Location
ORDER BY Start_Chainage
) RowNum, Location, RoadMntnc, Segments,
CAST(Location as VARCHAR(30))+ '.'+Cast(RowNum as VARCHAR(30)) AS Segment_ID
from test_simple_roads
Gives an error
Query preparation error on PRAGMA table_info(_tview): no such column: RowNum
If I change RowNum to another field it doesn't concatenate but is adding the two integers
If I change it to concat it works but I still can't use RowNum or Row_Number as i get the same error (no such column).
concat(Location, '.',RowNum) AS Segment_ID
It looks like you will want to use ROW_NUMBER() windowing function. I looked at the documentation and it does support it.
https://www.sqlite.org/windowfunctions.html#built_in_window_functions
What you will most likely want to do is to "ROW_NUMBER() OVER (PARTITION BY LOCATION ORDER BY FID) AS RN" in a sub-select. Then cast the RN to a varchar and concat it to location. Something like "CAST(Location as VARCHAR(30)) + '.' + CAST(RN AS VARCHAR(30)) AS Segment_ID
SELECT
*
,CAST(Location as VARCHAR(30))+ '.'+Cast(RowNum as VARCHAR(30)) AS Segment_ID
FROM
(
SELECT
ROW_NUMBER () OVER (
PARTITION BY Location
ORDER BY Start_Chainage
) RowNum, Location, RoadMntnc, Segments
from test_simple_roads) AS TEST
I would like select some elements from the last id
Here an example that I have :
id money
1 200
1 150
1 500
3 50
4 40
4 300
5 110
Here what I would like :
1 500
3 50
4 300
5 110
So like you can see, I took last id and the money who corresponds.
I tried to do a group by id order by id descending with limit 1. But limit 1 is not available in proc sql from sas and it doesn't work.
Thanks in advance
Unlike SAS datasets, SQL tables represent unordered sets. In your case, it looks like you want the maximum value in the second column, in which case you can use aggregation:
proc sql;
select id, max(money)
from t
group by id;
If you actually mean the last row per id based on the ordering in the SAS dataset, I would suggest using a data step instead.
Assume I have this data:
player_id stats
100 [{"position":"offense","wins":35},{"position":"defense","wins":17}]
200 [{"position":"offense","wins":85},{"position":"defense","wins":52}]
300 [{"position":"offense","wins":12},{"position":"defense","wins":98}]
And I want to display it as such:
player_id offense_wins defense_wins
100 35 17
200 85 52
300 12 98
The original data above is currently thrown into an ORC table using:
SELECT p.player_id
, s.position
, s.wins
FROM player_stats p
LATERAL VIEW EXPLODE(p.stats) sTable as s
Which gets me:
player_id position wins
100 offense 35
100 defense 17
200 offense 85
200 defense 52
300 offense 12
300 defense 98
Now from this point in MySQL I can just group_by the player_id then case the position, pulling in the associated wins value when it = 'offense' or 'defense' into their own columns, then wrap each case in a COALESCE() to prevent the nulls from coming through. Super fast.
In Hive, instead of COALESCE I have to use MIN or MAX, but the result will be the same regardless.
Here would be the primary way this data is queried:
SELECT player_id
, max(case when position = 'offense' then wins end) as offense_wins
, max(case when position = 'defense' then wins end) as defense_wins
FROM orctable
WHERE player_id = 100
GROUP BY player_id
Which would result in :
player_id offense_wins defense_wins
100 35 17
Now, in my real world situation the original dataset has six instances of that 'stats' array, each containing a map of 3-5 pairs. Because of this, the ORC table has player_id listed some 700 times from the repeated lateral views.
The entire table is 300k rows, and the player_id in the real world example is duplicated on this table just over 700 times.
Question 1 - is this the only and/or proper way to transform the data into the desired end result?
Question 2 - should this query be taking between 5 and 10 seconds to complete? The same dataset on a small MySQL server would do this in milliseconds.
I have a query that collects many different columns, and I want to include a column that sums the price of every component in an order. Right now, I already have a column that simply shows the price of every component of an order, but I am not sure how to create this new column.
I would think that the code would go something like this, but I am not really clear on what an aggregate function is or why I get an error regarding the aggregate function when I try to run this code.
SELECT ID, Location, Price, (SUM(PriceDescription) FROM table GROUP BY ID WHERE PriceDescription LIKE 'Cost.%' AS Summary)
FROM table
When I say each component, I mean that every ID I have has many different items that make up the general price. I only want to find out how much money I spend on my supplies that I need for my pressure washers which is why I said `Where PriceDescription LIKE 'Cost.%'
To further explain, I have receipts of every customer I've worked with and in these receipts I write down my cost for the soap that I use and the tools for the pressure washer that I rent. I label all of these with 'Cost.' so it looks like (Cost.Water), (Cost.Soap), (Cost.Gas), (Cost.Tools) and I would like it so for Order 1 it there's a column that sums all the Cost._ prices for the order and for Order 2 it sums all the Cost._ prices for that order. I should also mention that each Order does not have the same number of Costs (sometimes when I use my power washer I might not have to buy gas and occasionally soap).
I hope this makes sense, if not please let me know how I can explain further.
`ID Location Price PriceDescription
1 Park 10 Cost.Water
1 Park 8 Cost.Gas
1 Park 11 Cost.Soap
2 Tom 20 Cost.Water
2 Tom 6 Cost.Soap
3 Matt 15 Cost.Tools
3 Matt 15 Cost.Gas
3 Matt 21 Cost.Tools
4 College 32 Cost.Gas
4 College 22 Cost.Water
4 College 11 Cost.Tools`
I would like for my query to create a column like such
`ID Location Price Summary
1 Park 10 29
1 Park 8
1 Park 11
2 Tom 20 26
2 Tom 6
3 Matt 15 51
3 Matt 15
3 Matt 21
4 College 32 65
4 College 22
4 College 11 `
But if the 'Summary' was printed on every line instead of just at the top one, that would be okay too.
You just require sum(Price) over(Partition by Location) will give total sum as below:
SELECT ID, Location, Price, SUM(Price) over(Partition by Location) AS Summed_Price
FROM yourtable
WHERE PriceDescription LIKE 'Cost.%'
First, if your Price column really contains values that match 'Cost.%', then you can not apply SUM() over it. SUM() expects a number (e.g. INT, FLOAT, REAL or DECIMAL). If it is text then you need to explicitly convert it to a number by adding a CAST or CONVERT clause inside the SUM() call.
Second, your query syntax is wrong: you need GROUP BY, and the SELECT fields are not specified correctly. And you want to SUM() the Price field, not the PriceDescription field (which you can't even sum as I explained)
Assuming that Price is numeric (see my first remark), then this is how it can be done:
SELECT ID
, Location
, Price
, (SELECT SUM(Price)
FROM table
WHERE ID = T1.ID AND Location = T1.Location
) AS Summed_Price
FROM table AS T1
to get exact result like posted in question
Select
T.ID,
T.Location,
T.Price,
CASE WHEN (R) = 1 then RN ELSE NULL END Summary
from (
select
ID,
Location,
Price ,
SUM(Price)OVER(PARTITION BY Location)RN,
ROW_number()OVER(PARTITION BY Location ORDER BY ID )R
from Table
)T
order by T.ID