Using a SQL TVF on every value of a column

Using a SQL TVF on every value of a column - sql

I have a table where there is no unique identifier and is structured like this:
timestamp
category
value
0
a
0.12
1
a
0.231
0
b
0.23
2
c
0.01
I am trying to use a table-valued-function that only takes in a timestamp and value:
timestamp
value
0
0.12
1
0.231
0
0.23
2
0.01
The function adds some columns x, y, z... to the table above. I'd like to use the function to perform this analysis on every category and produce an output like:
timestamp
category
value
x
y
z
0
a
0.12
...
...
...
1
a
0.231
0
b
0.23
2
c
0.01
Intuitively, I'd like to take a single value of category, say category a, ignore the rest, use this table-valued function and join this result with my original table.
WITH input_table as
(
SELECT timestamp, value
WHERE value='a'
FROM table
)
SELECT x, y, z
FROM tvf(
Table input_table);
-- Join this table with original table
However, I'd like to do this for every value of category, not just 'a'. Is there a way to do this in SQL? I suspect a CROSS APPLY operation could help me, but I'm not sure how to use it.

Related

PostgreSQL data transformation - Turn rows into columns

I have a table whose structure looks like the following:
k | i | p | v
Notice that the key (k) is not unique, there are no keys, nothing. Each key can have multiple attributes (i = 0, 1, 2, ...) which can be of different types (p) and have different values (v). One attribute type may also appear multiple times (p(i-1) = p(i)).
What I want to do is pick certain attribute types and their corresponding values and place them in the same row. For example I want to have:
k | attr_name1 | attr_name2
I have managed to make a query that does this and works for all keys (k) for which attr_name1 and attr_name2 appear in the column p of the initial table:
SELECT DISTINCT ON (key) fn.k AS key, fn.v AS attr_name1, a.v AS attr_name2
FROM Table fn
LEFT JOIN Table a ON fn.k = a.k
AND a.p = 'attr_name2'
WHERE fn.p = 'attr_name1'
I would like, however, to take into account the case where a certain key has no attribute named attr_name1 and insert a NULL value into the corresponding column of the new table. I am not sure how to achieve that. I have no issue using multiple queries or intermediate tables etc, but there are quite a lot of rows in the table and I need something that scales to millions of rows.
Any help would be appreciated.
Example:
k i p v
1 0 a 10
1 1 b 12
1 2 c 34
1 3 d 44
1 4 e 09
2 0 a 11
2 1 b 13
2 2 d 22
2 3 f 34
Would turn into (assuming I am only interested in columns a, b, c):
k a b c
1 10 12 34
2 11 13 NULL

I would use conditional aggregation. That is, an aggregate function around a CASE expression.
SELECT
k,
MAX(CASE WHEN p='a' THEN v END) AS a,
MAX(CASE WHEN p='b' THEN v END) AS b,
MAX(CASE WHEN p='c' THEN v END) AS c
FROM
your_table
GROUP BY
k
This presumes that (k, p) is unique. If there are duplicate keys, this will clearly find the one v with the highest value (for each (k,p))
As a general rule this kind of pivoting makes the data harder to process in SQL. This is often done for display purposes because humans find this easier to read. However, from a software engineering perspective, such formatting should not be done in the data layer; be careful that by doing this you don't actually make your future life harder.

Adding a column to an SQL table and exploding the rows with a set of fixed values for that column

I would like to add a column to an SQL table with unknown columns and explode the entries in that table by a set of fixed values for that column. E.g. Turn
unknown col 1
...
unknown col x
1
...
foo
2
...
bar
into
unknown col 1
...
unknown col x
new col
1
...
foo
1
2
...
bar
1
1
...
foo
2
2
...
bar
2
The number of unknown columns is also unknown. I know the query to turn the original table into
unknown col 1
...
unknown col x
new col
1
...
foo
1
2
...
bar
1
I don't know the INSERT query that would turn it in to the desired table further above. The table is on Google BigQuery.
p.s: I can think of workarounds, e.g. multiply the number of rows in the original table by n, where n is the number of values the new column can take, then add the column and set the value based on the row number (which is not trivial to set) for each row. I am looking for a cleaner way.

add a column to an SQL table with unknown columns and explode the entries in that table by a set of fixed values for that column.
Below should do the "trick" - example
with new_col_values as (
select [1, 2, 3, 4] values
)
select t.*, val
from `project.dataset.your_table` t,
new_col_values, unnest(values) val

Conditional formula in qlik

I want to create a condiditional formula for some charts in qliksense.
I want to calculate the average for a KPI ATD , if a certain condition of another column is valid, column W = 1. So for example:
Class W ATD
A 1 1
A 1 3
A 0 1
B 1 1
Should lead to for class A: Condi.Avg= 2
In general it should be then in a new table (for W=1):
Class Condi.Avg
A 2
B 1
Right now I have:
Avg({<W= {1}> ATD)
which leads to a column in my charts with -:
How can I change this?

I think there is a typo in your expression.
Avg({<W = {'1'}>} ATD)
This should provide some result.
Edit (from the author):
Avg({< [W] = {'1']>} ATD)
is working

As promised, I tried making my own table, here are my results.
Here is my load script:
LOAD * INLINE [
Class, W, ATD
A, 1, 1
A, 1, 3
A, 0, 1
B, 1, 1
];
Then I added a table object with 1 dimension with the field Class, and 1 measure with the expression:
Avg({<W={'1'}>}ATD)
This results in the following table:
Which is exactly the same as your expected result:
Class Condi.Avg
A 2
B 1
It might be the case that one of your other dimensions are interfering with your measure.
Edit from the Author:
Avg({<[W]={'1]>}ATD) is working

Access SQL - Select same column twice for different criteria

I've been struggling with the following table for a while now. Hopefully anyone can help me out.
Item Type Value
A X 2
B X 3
C X 4
D X 5
A Y 0.1
B Y 0.3
C Y 0.4
D Y 0.6
The result I would like to see is this:
Item X Y
A 2 0.1
B 3 0.3
C 4 0.4
D 5 0.6
Is it possible to fix this in one query?
I tried Union queries and IIF statements, but none of it gives me the desired result. Another option might be to split it up in multiple queries, however I would rather have it done in once.
Looking forward to any answer.
Many thanks!
Best,
Mathijs

That's a job for a Crosstab query.
TRANSFORM Max(Table1.Valu) AS MaxOfValu
SELECT Table1.item
FROM Table1
GROUP BY Table1.item
PIVOT Table1.type;
ps: Value is a reserved word and cannot be used as a field name. And I would never used Type or Item either.

MSSQL 2008R2 Avoid Repeated Items in Matrix Like Table

I tried to find solutions for this and it is somehow easy to solve when records are below a certain number. But...
I have an original list with 81,590 records.
Id Loc Sales LatLong
1 a 100 ...
2 b 110 ...
3 c 105 ...
4 d 125 ...
5 e 123 ...
6 f 35 ...
.
.
.
81,590 ... ... ...
I need to compare all items in the list against each other.
Id L1 L2 Dist
1 a a 0 --> Not needed. Self comparison.
2 a b 26
3 a c 150 --> Not needed. Distance >100.
4 a d 58
5 b a 26 --> Not needed. Repeated record.
6 b b 0 --> Not needed. Self comparison.
7 b c 15
8 b d 151 --> Not needed. Distance >100.
9 c a 150 --> Not needed. Repeated record.
10 c b 15 --> Not needed. Repeated record.
11 c c 0 --> Not needed. Self comparison.
12 c d 75
13 d a 58 --> Not needed. Repeated record.
14 d b 151 --> Not needed. Repeated record.
15 d c 75 --> Not needed. Repeated record.
16 d d 0 --> Not needed. Self comparison.
But as shown next to the records above, the end result needs to be a list that:
1) Compares records against each other ONLY when they are located at a certain distance, say <100 miles.
2) Does not contain duplicates in the sense that comparing Loc1 to Loc2 is the same as comparing Loc2 to Loc1.
3) And the obvious one, no need to compare Loc1 to itself.
The end result would be:
Id L1 L2 Dist
2 a b 26
4 a d 58
7 b c 15
12 c d 75
Approach:
In theory, the total number of records after comparing all items against themselves is 81,590 ^ 2 = 6,656,928,100 records.
Subtracting repeated iterations (LocA-LocB = LocB-LocA) would mean 6,656,928,100 / 2 = 3,328,464,050.
Further cleaning by getting rid of self-repeating iterations (LocA-LocA), should be 3,328,464,050 - 81,590 = 3,328,382,460.
Then I could get rid of all records with distance > 100 miles.
This is highly inefficient, I'd be building a table with 6Bn records, then deleting half, etc. etc. etc.
Is there an approach to arrive to the end product in a much more efficient (less steps, less select/delete/update) way?
What is the select statement needed to insert the final data-set into destination?
It sounds to me like there is a join of the table with itself and a filtering by iterations of the key but here is where I am stuck.

What algorithm are you using to calculate distance between two points? Simple “the world is flat” cartesian math, or that trigonometry-laden “the word is an oblate spherloid” one? This can turn into serious CPU requirements.
It’s probably best to generate a table of “locations that are within distance X of this location” once and store it permanently; barring major events like earthquakes, it’s just not going to change.
Query-wise, the base join is trivial:
SELECT
t1.Loc L1
,t2.Loc L2
from MyTable t1
inner join MyTable t2
on t2.Loc > t1.Loc
If have the distance formula in, say, a function named “distanceFunction”, it might look something like:
WITH cteCalc as (
select
t1.Loc L1
,t2.Loc L2
,dbo.distanceFunction(t1.LatLong, t2.LatLong) Dist
from MyTable t1
inner join MyTable t2
on t2.Loc > t1.Loc
where dbo.distanceFunction(t1.LatLong, t2.LatLong) < #MaxDistance)
INSERT TargetTable (L1, L2, Dist)
SELECT
L1
,L2
,Dist
where Dist <= #MaxDistance
This, of course, may break your system, if only because the transaction log will grow too big while you’re writing a few billion rows to the target table. I'd say build a loop, processing each location in turn, with the final query like:
WITH cteCalc as (
select
t1.Loc L1
,t2.Loc L2
,dbo.distanceFunction(t1.LatLong, t2.LatLong) Dist
from MyTable t1
inner join MyTable t2
on t2.Loc > t1.Loc
where dbo.distanceFunction(t1.LatLong, t2.LatLong) < #MaxDistance
and t1.Loc = #ThisIterationLoc)
INSERT TargetTable (L1, L2, Dist)
SELECT
L1
,L2
,Dist
where Dist <= #MaxDistance
First pass returns 81,589 less whichever are too far away, second pass as 81,588 to process, and so forth.

Here is an outline of how I would solve this problem:
Put indexes on latitude and longitude
Do the math for lat and long of your distance for the range (box) of your distance. Then you know that your distance (as box not a circle) is contained in this delta. You also know that it is not outside of this delta. This constrains the problem considerably.
For example, if the change in lat and long is 10 for your distance then a location at (100,100) your box would be defined by (95,95) and (105,105) values for lat and long.
Write a query that looks at each element (from lowest id) and searches for other elements (with greater id to avoid duplicates) within the the delta of lat and log and save this to a temporary table.
Iterate over that table and do a full calculation to see if it is within the circle (not the box) of your distance.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Using a SQL TVF on every value of a column - sql

Related

PostgreSQL data transformation - Turn rows into columns

Adding a column to an SQL table and exploding the rows with a set of fixed values for that column

Conditional formula in qlik

Access SQL - Select same column twice for different criteria

MSSQL 2008R2 Avoid Repeated Items in Matrix Like Table

Categories

Resources