How to find the row and column number of a specific cell in sql? - sql

I have a table in SQL database and I want to find the location of a cell like a coordinate and vice versa. Here is an example:
0 1 2 3
1 a b c
2 g h i
3 n o j
When I ask for i, I want to get row=2 and column=3. When I ask for a cell of row=2 and column=3, I want to get i.

You need to store your matrix in table specifying the columns and rows like this
create table matrix (
row int,
column int,
value varchar2(20)
);
Then you insert your data like this
insert into matrix values (1, 1, 'a');
insert into matrix values (1, 2, 'b');
//and so on.
And then you can simply find what you need using two queries
select column, row from matrix where value = 'i';
select value from matrix where column = 2 and row = 3;

In Oracle, you would do:
select "3"
from t
where "0" = 2;
Naming columns as numbers is not recommended. Your whole data model is strange for SQL. A better representation would be:
row col val
1 1 a
1 2 b
1 3 c
2 1 g
. . .
Then you could do:
select val
from grid
where row = 2 and col = 3;

Create a primary key column such as 'id' and for example, the related row is 'col'
select col from db where id = 2;
this returns you a specific cell (x,2)

Related

PostgreSQL data transformation - Turn rows into columns

I have a table whose structure looks like the following:
k | i | p | v
Notice that the key (k) is not unique, there are no keys, nothing. Each key can have multiple attributes (i = 0, 1, 2, ...) which can be of different types (p) and have different values (v). One attribute type may also appear multiple times (p(i-1) = p(i)).
What I want to do is pick certain attribute types and their corresponding values and place them in the same row. For example I want to have:
k | attr_name1 | attr_name2
I have managed to make a query that does this and works for all keys (k) for which attr_name1 and attr_name2 appear in the column p of the initial table:
SELECT DISTINCT ON (key) fn.k AS key, fn.v AS attr_name1, a.v AS attr_name2
FROM Table fn
LEFT JOIN Table a ON fn.k = a.k
AND a.p = 'attr_name2'
WHERE fn.p = 'attr_name1'
I would like, however, to take into account the case where a certain key has no attribute named attr_name1 and insert a NULL value into the corresponding column of the new table. I am not sure how to achieve that. I have no issue using multiple queries or intermediate tables etc, but there are quite a lot of rows in the table and I need something that scales to millions of rows.
Any help would be appreciated.
Example:
k i p v
1 0 a 10
1 1 b 12
1 2 c 34
1 3 d 44
1 4 e 09
2 0 a 11
2 1 b 13
2 2 d 22
2 3 f 34
Would turn into (assuming I am only interested in columns a, b, c):
k a b c
1 10 12 34
2 11 13 NULL
I would use conditional aggregation. That is, an aggregate function around a CASE expression.
SELECT
k,
MAX(CASE WHEN p='a' THEN v END) AS a,
MAX(CASE WHEN p='b' THEN v END) AS b,
MAX(CASE WHEN p='c' THEN v END) AS c
FROM
your_table
GROUP BY
k
This presumes that (k, p) is unique. If there are duplicate keys, this will clearly find the one v with the highest value (for each (k,p))
As a general rule this kind of pivoting makes the data harder to process in SQL. This is often done for display purposes because humans find this easier to read. However, from a software engineering perspective, such formatting should not be done in the data layer; be careful that by doing this you don't actually make your future life harder.

Adding a column to an SQL table and exploding the rows with a set of fixed values for that column

I would like to add a column to an SQL table with unknown columns and explode the entries in that table by a set of fixed values for that column. E.g. Turn
unknown col 1
...
unknown col x
1
...
foo
2
...
bar
into
unknown col 1
...
unknown col x
new col
1
...
foo
1
2
...
bar
1
1
...
foo
2
2
...
bar
2
The number of unknown columns is also unknown. I know the query to turn the original table into
unknown col 1
...
unknown col x
new col
1
...
foo
1
2
...
bar
1
I don't know the INSERT query that would turn it in to the desired table further above. The table is on Google BigQuery.
p.s: I can think of workarounds, e.g. multiply the number of rows in the original table by n, where n is the number of values the new column can take, then add the column and set the value based on the row number (which is not trivial to set) for each row. I am looking for a cleaner way.
add a column to an SQL table with unknown columns and explode the entries in that table by a set of fixed values for that column.
Below should do the "trick" - example
with new_col_values as (
select [1, 2, 3, 4] values
)
select t.*, val
from `project.dataset.your_table` t,
new_col_values, unnest(values) val

Combine data from two or more field into one field in SAS Data Integration Studio

I would like to combine data from multiple cells into one cell using SAS Data Integration Studio.
My data is divided into three different tables, as follows:
Table 1
Col 1
Col 2
Col 3
City A
City B
City C
Table 2
Col 1
Col 2
Col 3
State A
State B
State C
Table 3
Variable 1
Variable 2
x
y
Desired final table:
Variable 1
Variable 2
States
Cities
x
y
State A, State B, State C
City A, City B, City C
That is, I want to create a final table in which I can join the data from table 1 in just one field and the data from table 2 in another field, separating the respective values ​​with a comma.

Query function to pull data based on multiple values within one cell

I would like to ask how to use Query function where one of the cell contains multiple comma separated values
For example:-
To pull col2 values when Col 1 is B, it would be straightforward.
Col 1
Col 2
A
1
B
2
C
3
D
4
=QUERY(A1:B3,"select B where A = '"&D3&"' ",0)
D3 cell value is B
Similar to how we have IN clause in SQL, I would like to pull data from col 2 when col1 values are B,C,D.
Would it be possible to concatenate the results in one row as well?
Try this one:
=join(", ", QUERY(A1:B4,"select B where "&CONCATENATE("A = '", join("' or A = '", SPLIT(D4, ",")), "'"),0))
Where D4 = B,C,D
Sample Output:
Note:
The values on column B below are the value of the formula:
CONCATENATE("A = '", join("' or A = '", SPLIT(D4, ",")), "'"),0))
Since there was no in statement in sheets query (not that I have encountered), what I did was split those said values in column D and have them format like the multiple or statements which are equal to a single in statement. This is a workaround and should do what you wish to achieve.

How to combine certain column values together in Python and make values in the other column be the means of the values combined?

I have a Panda dataframe where one of the columns is a sequence of numbers('sequence')many of them repeating and the other column values('binary variable') are either 1 or 0.
I have grouped by the values in the sequences column which are the same and made the column values in the binary variable be the % of entries which are non-zero in that group.
I now want to combine entries in the 'sequence' column with the same values together and make the column values in 'binary variable' the mean of the column values of those columns that that were combined.
So my data frame looks like this:
df = pd.DataFrame([{'sequence' : [1, 1, 4,4,4 ,6], 'binary variable' : [1,0,0,1,0,1]}).
I have then used this code to group together the same values in sequence. Using this code:
df.groupby(["sequence"]).apply(lambda 'binary variable': (binary variable!= 0).sum() / binary variable.count()*100 )
I am left with the sequence columns with non-repeating values and the binary variable column now being the percentage of non zeros
.
But now I want to group some of the column values together(so for this toy example the 1 and 4 values), and have the binary variable column have values which are the mean of the percentages of say the values for 1 and 4.
This isn't terribly well worded as finding it awkward to describe it but any help would be much appreciated, I've tried to look online and had many failed attempts with code of my own but it just is not working.
Any help would be greatly appreciated
It seems like you want to group the table twice and take the mean each time. For the second grouping, you need to create a new column to indicate the group.
Try this code:
import pandas as pd
# sequence groups for final average
grps = {(1,4):[1,4],
(5,6):[5,6]}
# initial data
df = pd.DataFrame({'sequence' : [1,1,4,4,4,5,5,6], 'binvar' : [1,0,0,1,0,1,0,1]})
gb = df.groupby(["sequence"])['binvar'].mean().reset_index() #.apply(lambda 'binary variable': (binary variable!= 0).sum() / binary variable.count()*100 )
def getgrp(x): # search groups
for k in grps:
if x in grps[k]:
return k
print(df.to_string(index=False))
gb['group'] = gb.apply(lambda r: getgrp(r[0]), axis = 1)
gb = gb.reset_index()
print(gb.to_string(index=False))
gb = gb[['group','binvar']].groupby("group")['binvar'].mean().reset_index()
print(gb.to_string(index=False))
Output
sequence binvar
1 1
1 0
4 0
4 1
4 0
5 1
5 0
6 1
index sequence binvar group
0 1 0.500000 (1, 4)
1 4 0.333333 (1, 4)
2 5 0.500000 (5, 6)
3 6 1.000000 (5, 6)
group binvar
(1, 4) 0.416667
(5, 6) 0.750000