I have a table where there is no unique identifier and is structured like this:
timestamp
category
value
0
a
0.12
1
a
0.231
0
b
0.23
2
c
0.01
I am trying to use a table-valued-function that only takes in a timestamp and value:
timestamp
value
0
0.12
1
0.231
0
0.23
2
0.01
The function adds some columns x, y, z... to the table above. I'd like to use the function to perform this analysis on every category and produce an output like:
timestamp
category
value
x
y
z
0
a
0.12
...
...
...
1
a
0.231
0
b
0.23
2
c
0.01
Intuitively, I'd like to take a single value of category, say category a, ignore the rest, use this table-valued function and join this result with my original table.
WITH input_table as
(
SELECT timestamp, value
WHERE value='a'
FROM table
)
SELECT x, y, z
FROM tvf(
Table input_table);
-- Join this table with original table
However, I'd like to do this for every value of category, not just 'a'. Is there a way to do this in SQL? I suspect a CROSS APPLY operation could help me, but I'm not sure how to use it.
I have a dataframe that looks like this, where the "Date" is set as the index
A B C D E
Date
1999-01-01 1 2 3 4 5
1999-01-02 1 2 3 4 5
1999-01-03 1 2 3 4 5
1999-01-04 1 2 3 4 5
I'm trying to compare the percent difference between two pairs of dates. I think I can do the first bit:
start_1 = "1999-01-02"
end_1 = "1999-01-03"
start_2 = "1999-01-03"
end_2 = "1999-01-04"
Obs_1 = df.loc[end_1] / df.loc[start_1] -1
Obs_2 = df.loc[end_2] / df.loc[start_2] -1
The output I get from - eg Obs_1 looks like this:
A 0.011197
B 0.007933
C 0.012850
D 0.016678
E 0.007330
dtype: float64
I'm looking to build some correlations between Obs_1 and Obs_2. I think I need to create a new dataframe with the labels A-E as one column (or as the index), and then the data series from Obs_1 and Obs_2 as adjacent columns.
But I'm struggling! I can't 'see' what Obs_1 and Obs_2 'are' - have I created a list? A series? How can I tell? What would be the best way of combining the two into a single dataframe...say df_1.
I'm sure the answer is staring me in the face but I'm going mental trying to figure it out...and because I'm not quite sure what Obs_1 and Obs_2 'are', it's hard to search the SO archive to help me.
Thanks in advance
I was thinking about simple reordering rows in relational database's table.
I would like to avoid method described here:
How can I reorder rows in sql database
My simple idea was to use as ListOrder column of type double-precision 64-bit IEEE 754 floating point.
At inserting a row between two existing rows we calculate listOrder value as average of these sibling elements.
Example:
1. Starting state:
value, listOrder
a 1
b 2
c 3
d 4
e 5
f 6
2. Moving "e" two rows up
One simple sql update on e-row: update mytable set listorder=2.5 where value='e'
value, listOrder
a 1
b 2
e 2.5
c 3
d 4
f 6
3. Moving "a" one position down
value, listOrder
b 2
a 2.25
e 2.5
c 3
d 4
f 6
I have a question. How many insertions can I perform (in the edge situation) to have properly ordered list.
For the 64 bit integer there is less than 64 insertions in the same place.
Is floating point types allows to more insertions?
There are other problems with described approach?
Do you see any patches/adjustments to make this idea safe and usable in applications?
This is similar to a lexical order, which can also be done with varchar columns:
A
B
C
D
E
F
becomes
A
B
BM
C
D
F
becomes
B
BF
BM
C
D
F
I prefer the two step process, where you update every row in the table after the one you move to be one larger. Sql is efficient about this, where updating the rows following a change is not as bad as it seems. You preserve something that's more human readable, the storage size for your ordinal value scales in a linear rather with your data size, and you don't risk coming to a point where you don't have enough precision to put an item in between two values
I tried to find solutions for this and it is somehow easy to solve when records are below a certain number. But...
I have an original list with 81,590 records.
Id Loc Sales LatLong
1 a 100 ...
2 b 110 ...
3 c 105 ...
4 d 125 ...
5 e 123 ...
6 f 35 ...
.
.
.
81,590 ... ... ...
I need to compare all items in the list against each other.
Id L1 L2 Dist
1 a a 0 --> Not needed. Self comparison.
2 a b 26
3 a c 150 --> Not needed. Distance >100.
4 a d 58
5 b a 26 --> Not needed. Repeated record.
6 b b 0 --> Not needed. Self comparison.
7 b c 15
8 b d 151 --> Not needed. Distance >100.
9 c a 150 --> Not needed. Repeated record.
10 c b 15 --> Not needed. Repeated record.
11 c c 0 --> Not needed. Self comparison.
12 c d 75
13 d a 58 --> Not needed. Repeated record.
14 d b 151 --> Not needed. Repeated record.
15 d c 75 --> Not needed. Repeated record.
16 d d 0 --> Not needed. Self comparison.
But as shown next to the records above, the end result needs to be a list that:
1) Compares records against each other ONLY when they are located at a certain distance, say <100 miles.
2) Does not contain duplicates in the sense that comparing Loc1 to Loc2 is the same as comparing Loc2 to Loc1.
3) And the obvious one, no need to compare Loc1 to itself.
The end result would be:
Id L1 L2 Dist
2 a b 26
4 a d 58
7 b c 15
12 c d 75
Approach:
In theory, the total number of records after comparing all items against themselves is 81,590 ^ 2 = 6,656,928,100 records.
Subtracting repeated iterations (LocA-LocB = LocB-LocA) would mean 6,656,928,100 / 2 = 3,328,464,050.
Further cleaning by getting rid of self-repeating iterations (LocA-LocA), should be 3,328,464,050 - 81,590 = 3,328,382,460.
Then I could get rid of all records with distance > 100 miles.
This is highly inefficient, I'd be building a table with 6Bn records, then deleting half, etc. etc. etc.
Is there an approach to arrive to the end product in a much more efficient (less steps, less select/delete/update) way?
What is the select statement needed to insert the final data-set into destination?
It sounds to me like there is a join of the table with itself and a filtering by iterations of the key but here is where I am stuck.
What algorithm are you using to calculate distance between two points? Simple “the world is flat” cartesian math, or that trigonometry-laden “the word is an oblate spherloid” one? This can turn into serious CPU requirements.
It’s probably best to generate a table of “locations that are within distance X of this location” once and store it permanently; barring major events like earthquakes, it’s just not going to change.
Query-wise, the base join is trivial:
SELECT
t1.Loc L1
,t2.Loc L2
from MyTable t1
inner join MyTable t2
on t2.Loc > t1.Loc
If have the distance formula in, say, a function named “distanceFunction”, it might look something like:
WITH cteCalc as (
select
t1.Loc L1
,t2.Loc L2
,dbo.distanceFunction(t1.LatLong, t2.LatLong) Dist
from MyTable t1
inner join MyTable t2
on t2.Loc > t1.Loc
where dbo.distanceFunction(t1.LatLong, t2.LatLong) < #MaxDistance)
INSERT TargetTable (L1, L2, Dist)
SELECT
L1
,L2
,Dist
where Dist <= #MaxDistance
This, of course, may break your system, if only because the transaction log will grow too big while you’re writing a few billion rows to the target table. I'd say build a loop, processing each location in turn, with the final query like:
WITH cteCalc as (
select
t1.Loc L1
,t2.Loc L2
,dbo.distanceFunction(t1.LatLong, t2.LatLong) Dist
from MyTable t1
inner join MyTable t2
on t2.Loc > t1.Loc
where dbo.distanceFunction(t1.LatLong, t2.LatLong) < #MaxDistance
and t1.Loc = #ThisIterationLoc)
INSERT TargetTable (L1, L2, Dist)
SELECT
L1
,L2
,Dist
where Dist <= #MaxDistance
First pass returns 81,589 less whichever are too far away, second pass as 81,588 to process, and so forth.
Here is an outline of how I would solve this problem:
Put indexes on latitude and longitude
Do the math for lat and long of your distance for the range (box) of your distance. Then you know that your distance (as box not a circle) is contained in this delta. You also know that it is not outside of this delta. This constrains the problem considerably.
For example, if the change in lat and long is 10 for your distance then a location at (100,100) your box would be defined by (95,95) and (105,105) values for lat and long.
Write a query that looks at each element (from lowest id) and searches for other elements (with greater id to avoid duplicates) within the the delta of lat and log and save this to a temporary table.
Iterate over that table and do a full calculation to see if it is within the circle (not the box) of your distance.
Total newbie here, regarding sqlite, so don't flame too hard :)
I have a table:
index name length L breadth B height H
1 M-1234 10 5 2
2 M-2345 20 10 3
3 ....
How do I put some tabular data (let' say ten x,y values) corresponding to index 1, then another table to index 2, and then another, etc. In short, so that I have a table of x and y values that is "connected" to first row, then another that is connected to second row.
I'm reading some tutorials on sqlite3 (which I'm using), but am having trouble finding this. If anyone knows a good newbie tutorial or a book dealing with sqlite3 (CLI) I'm all ears for that too :)
You are just looking for information on joins and the concept of foreign key, that although SQLite3 doesn't enforce, is what you need. You can go without it, anyway.
In your situation you can either add two "columns" to your table, being one x and another y, or create a new table with 3 "columns": foreign_index, x and y. Which one to use depends on what you are trying to accomplish, performance and maintainability.
If you go the linked table route, you'd end up with two tables, like this:
MyTable
index name length L breadth B height H
1 M-1234 10 5 2
2 M-2345 20 10 3
3 ....
XandY
foreign_index x y
1 12 9
2 8 7
3 ...
When you want the x and y values of your element, you just use something like SELECT x, y FROM XandY WHERE foreign_index = $idx;
To get all the related attributes, you just do a JOIN:
SELECT index, name, length, breadth, height, x, y FROM MyTable INNER JOIN XandY ON MyTable.index = XandY.foreign_index;