Postgres - How to convert row with an int range into intermediate rows from individual values from that range? - sql

Initial input:
CREATE TABLE TEST_TABLE(
start_i INT,
end_i INT,
v REAL
);
INSERT INTO TEST_TABLE (start_i, end_i, v)
VALUES (300,305,0.5),
(313,316,0.25)
start_i
end_i
v
300
305
0.5
313
316
0.25
Desired outcome:
Basically, I want to create intermediate rows with an additional column containing each value in the ranges shown in the initial table.
i
start_i
end_i
v
300
300
305
0.5
301
300
305
0.5
302
300
305
0.5
303
300
305
0.5
304
300
305
0.5
305
300
305
0.5
313
313
316
0.25
314
313
316
0.25
315
313
316
0.25
316
313
316
0.25
I have checked this post, but it's for SQL Server, while I am interested in Postgres. In addition, I am not using a date column type, but an integer instead.

Use generate_series():
select gs.i, t.*
from t cross join lateral
generate_series(start_i, end_i, 1) gs(i);
Strictly speak, the lateral is not needed. But it does explain what is going on. I should also note that you can also do:
select generate_series(start_i, end_i) as i, t.*
from t;
However, generate_series() affects the number of rows in the query. I am uncomfortable with having such effects in the SELECT clause.

Related

Convert This SQL Query to ANSI SQL

I would like to convert this SQL query into ANSI SQL. I am having trouble wrapping my head around the logic of this query.
I use Snowflake Data Warehouse, but it does not understand this query because of the 'delete' statement right before join, so I am trying to break it down. From my understanding the row number column is giving me the order from 1 to N based on timestamp and placing it in C. Then C is joined against itself on the rows other than the first row (based on id) and placed in C1. Then C1 is deleted from the overall data, which leaves only the first row.
I may be understanding the logic incorrectly, but I am not used to seeing the 'delete' statement right before a join. Let me know if I got the logic right, or point me in the right direction.
This query was copy/pasted from THIS stackoverflow question which has the exact situation I am trying to solve, but on a much larger scale.
with C as
(
select ID,
row_number() over(order by DT) as rn
from YourTable
)
delete C1
from C as C1
inner join C as C2
on C1.rn = C2.rn-1 and
C1.ID = C2.ID
The specific problem I am trying to solve is this. Let's assume I have this table. I need to partition the rows by primary key combinations (primKey 1 & 2) while maintaining timestamp order.
ID primKey1 primKey2 checkVar1 checkVar2 theTimestamp
100 1 2 302 423 2001-07-13
101 3 6 506 236 2005-10-25
100 1 2 302 423 2002-08-15
101 3 6 506 236 2008-12-05
101 3 6 300 100 2010-06-10
100 1 2 407 309 2005-09-05
100 1 2 302 423 2012-05-09
100 1 2 302 423 2003-07-24
Once the rows are partitioned and the timestamp is ordered within each partition, I need to delete the duplicate checkVar combination (checkVar 1 & 2) rows until the next change. Thus leaving me with the earliest unique row. The rows with asterisks are the ones which need to be removed since they are duplicates.
ID primKey1 primKey2 checkVar1 checkVar2 theTimestamp
100 1 2 302 423 2001-07-13
*100 1 2 302 423 2002-08-15
*100 1 2 302 423 2003-07-24
100 1 2 407 309 2005-09-05
100 1 2 302 423 2012-05-09
101 3 6 506 236 2005-10-25
*101 3 6 506 236 2008-12-05
101 3 6 300 100 2010-06-10
This is the final result. As you can see for ID=100, even though the 1st and 3rd record are the same, the checkVar combination changed in between, which is fine. I am only removing the duplicates until the values change.
ID primKey1 primKey2 checkVar1 checkVar2 theTimestamp
100 1 2 302 423 2001-07-13
100 1 2 407 309 2005-09-05
100 1 2 302 423 2012-05-09
101 3 6 506 236 2005-10-25
101 3 6 300 100 2010-06-10
If you want to keep the earliest row for each id, then you can use:
delete from yourtable yt
where yt.dt > (select min(yt2.dt)
from yourtable yt
where yt2.id = yd.id
);
Your query would not do this, if that is your intent.

How can I filter dataframe rows based on a quantile value of a column using groupby?

(There is probably a better way of asking the question, but hopefully this description will make it more clear)
A simplified view of my dataframe, showing 10 random rows, is:
Duration starting_station_id ending_station_id
5163 420 3077 3018
113379 240 3019 3056
9730 240 3047 3074
104058 900 3034 3042
93110 240 3055 3029
93144 240 3016 3014
48999 780 3005 3024
30905 360 3019 3025
88132 300 3022 3048
12673 240 3075 3031
What I want to do is groupby starting_station_id and ending_station_id and the filter out the rows where the value in the Duration column for a group falls above the .99 quantile.
To do the groupby and quantile computation, I do:
df.groupby( ['starting_station_id', 'ending_station_id'] )[ 'Duration' ].quantile([.99])
and some partial output is:
3005 3006 0.99 3825.6
3007 0.99 1134.0
3008 0.99 5968.8
3009 0.99 9420.0
3010 0.99 1740.0
3011 0.99 41856.0
3014 0.99 22629.6
3016 0.99 1793.4
3018 0.99 37466.4
What I believe this is telling me is that for the group ( 3005, 3006 ), the values >= 3825.6 fall into the .99 quantile. So, I want to filter out the rows where the duration value for that group is >= 3825.6. (And then do the same for all of the other groups)
What is the best way to do this?
Try this
thresholds = df.groupby(['start', 'end'])['x'].quantile(.99)
mask = (df.Duration.values > thresholds[[(x, y) for x, y in zip(df.start, df.end)]]).values
out = df[mask]

Why can't I convert this varchar to numeric?

I have a table with values pasted in, but they initially start as varchar. I need to convert them to numeric, so I did
convert(decimal(10,3), cw.col7)
But this is returning Error 8114: Error converting data type varchar to numeric. The reason I made this question is because it does not give this error for a similar data set. Are there sometimes strange anomalies when using convert() or decimal()? Or should I maybe convert to float first?
The data:
col7
490.440
2
934
28,108.000
33,226.000
17,347.000
1,561.000
57
0
421.350
64
1,100.000
0
0
3,584
202.432
0
3,280
672.109
1,150
0
104
411.032
18,016
40
510,648
443,934.000
18,705
322,254
301
9,217
18,075
16,100
395
706,269
418,313
7,170
40,450
2,423
1,300
2,311
94,000.000
17,463
0
228
884
557
153
13
0
0
212.878
45,000.000
152
24,400
3,675
11,750
987
23,725
268,071
4,520.835
286,000
112,912.480
9,000
1,316
1,020
215,244
123,967
6,911
1,088.750
138,644
16,924
7,848
33,017
464,463
618
72,391
9,367
507,635.950
588,087
92,890
17,266
0
1,414,547
89,080
664
101,635
1,552,992
175
356
7,000
0
0
445
507,381
24,016
469,983
0
0
147,737
3,521
88,210
18,433.000
21,775
3,607
34,774
7,642
42,680
1,255
10,880
350,409.800
19,394.520
2,476,257.400
778.480
1,670.440
9,710
24,931.600
3,381.800
2,900
18,000
4,121
3,750
62,200
952
29.935
17.795
11.940
902
36,303
1,240
1,020
617
817
620
92,648
70,925
82,924
19,162.200
1,213.720
2,871
3,180
91,600
645
607
155,100
6
840
1,395
112
6,721
3,850
40
4,032
5,912
1,040
872
56
1,856
179
Try_Convert(money,...) will handle the comma, while decimal(10, 3) will return null
Example
Select col7
,AsMoney = Try_Convert(money,col7)
,AsDecimal = Try_Convert(decimal(10, 3),col7)
from YourTable
Returns
Try using cast and remove comma
SELECT CAST(repalce(cw.col7,',','') AS DECIMAL(10,3))
from your_table
and as suggested by Jhon Cappelleti you need more that 3 decimal so you should use
SELECT CAST(repalce(cw.col7,',','') AS DECIMAL(12,4))
from your_table
Run this query:
select cw.col7
from cw
where try_convert(decimal(10, 3), cw.col7) is null;
This will show you the values that do not convert successfully. (If cw.col7 could be NULL then add and cw.col7 is not null to make the output more meaningful.)

select columns from a pivot query

I have the following pivot query:
select *
from
(
select order_id,unit_price,quantity,sum(unit_price*quantity)
over (partition by order_id) as Total
from DEMO_ORDER_ITEMS
) tbla
pivot
(
sum(unit_price*quantity) as unit_totals
for unit_price in(30,50,60,80,110,120,125,150)
) tblb
order by order_id;
producing following result:
ORDER_ID TOTAL 30_UNIT_TOTALS 50_UNIT_TOTALS 60_UNIT_TOTALS 80_UNIT_TOTALS 110_UNIT_TOTALS 120_UNIT_TOTALS 125_UNIT_TOTALS 150_UNIT_TOTALS
1 1890 500 640 750
2 2380 60 250 180 480 220 240 500 450
3 1640 100 240 320 480 500
4 1090 180 200 220 240 250
5 950 150 180 320 300
6 1515 330 360 375 450
7 905 90 250 120 320 125
8 1060 160 330 120 450
9 730 240 240 250
10 870 250 320 300
I would like to change order of columns ending with the TOTAL. How can i select the columns in preferred order?
This works:select tblb.* .... but select tblb.30_UNIT_TOTALS fails.
You have to quote fields if they don't start with an alphabetic character. In addition, using quotes make the identifier case sensitive. So you have to write:
tblb."30_UNIT_TOTALS"
From the documentation
Nonquoted identifiers must begin with an alphabetic character from your database character set. Quoted identifiers can begin with any character.
[...]
Nonquoted identifiers are not case sensitive. Oracle interprets them as uppercase. Quoted identifiers are case sensitive.

How to sort a sql result based on values in previous row?

I'm trying to sort a sql data selection by values in columns of the result set. The data looks like:
(This data is not sorted correctly, just an example)
ID projectID testName objectBefore objectAfter
=======================================================================================
13147 280 CDM-710 Generic TP-0000120 TOC~~#~~ -1 13148
1145 280 3.2 Quadrature/Carrier Null 25 Deg C 4940 1146
1146 280 3.2 Quadrature/Carrier Null 0 Deg C 1145 1147
1147 280 3.3 External Frequency Reference 1146 1148
1148 280 3.4 Phase Noise 50 Deg C 1147 1149
1149 280 3.4 Phase Noise 25 Deg C 1148 1150
1150 280 3.4 Phase Noise 0 Deg C 1149 1151
1151 280 3.5 Output Spurious 50 Deg C 1150 1152
1152 280 3.5 Output Spurious 25 Deg C 1151 1153
1153 280 3.5 Output Spurious 0 Deg C 1152 1154
............
18196 280 IP Regression Suite 18195 -1
The order of the data is based on the objectBefore and the objectAfter columns. The first row will always be when objectBefore = -1 and the last row will be when objectAfter = -1. In the above example, the second row would be ID 13148 as that is what row 1 objectAfter is equal to. Is there any way to write a query that would order the data in this manner?
This is actually sorting a linked list:
WITH SortedList (Id, objectBefore , projectID, testName, Level)
AS
(
SELECT Id, objectBefore , projectID, testName, 0 as Level
FROM YourTable
WHERE objectBefore = -1
UNION ALL
SELECT ll.Id, ll.objectBefore , ll.projectID, ll.testName, Level+1 as Level
FROM YourTable ll
INNER JOIN SortedList as s
ON ll.objectBefore = s.Id
)
SELECT Id, objectBefore , projectID, testName
FROM SortedList
ORDER BY Level
You can find more details in this post