I have a table with 3 columns that contains a number of data streams all in one table.
val_name is the name of the data stream, val_sequence is the incrementing sequence number and val contains the data. Combined name and sequence are like a composite index.
The existing val_name streams in this table are a, b. I would like users to be able to request stream c (which is not in the table), and the database to dynamically return a*b.
a*b in this case would be like multiplying two tables, one containing only val_name a, the other val_name b, and then joining on val_sequence (much like multiplying indexed python pandas series).
So the results would be:
val_sequence val_name val
0 c 80
1 c 5
The idea is that users should be able to request a or b or c and receive data, without needing to know that a and b hold data, and c only holds references. it's possible that some sequence numbers are missing both in the middle or at either end.
I haven't been able to figure out a good way how to provide this kind of flexibility. Are SQL views flexible enough for this? And if so could you give me a simple example? If not, what might be a workable alternative? Any database engine of your choice is fine.
For convenience, I am providing SQL code that creates a table and inserts above values, and creates two views. This doesn't do what I need it to do, but it's a start for those who want to give it a try.
CREATE TABLE IF NOT EXISTS valdb (
val_name VARCHAR(255),
val INT,
val_sequence int
);
INSERT into valdb (val_name, val, val_sequence) VALUES ("a",10,0);
INSERT into valdb (val_name, val, val_sequence) VALUES ("a", 1,1);
INSERT into valdb (val_name, val, val_sequence) VALUES ("b", 8,0);
INSERT into valdb (val_name, val, val_sequence) VALUES ("b", 5,1);
CREATE VIEW `a` AS SELECT val_name, val, val_sequence from valdb WHERE val_name = "a" ;
CREATE VIEW `b` AS SELECT val_name, val, val_sequence from valdb WHERE val_name = "b" ;
If, as you sample suggests, the sequence numbers are guaranteed to be the same for both streams, i.e. if and only if there exits n for a an n also exists for b, that would be an inner join on the sequence number and the * operator to get the multiplication.
CREATE VIEW c
AS
SELECT 'c' val_name,
a.val_sequence,
a.val * b.val val
FROM a
INNER JOIN b
ON a.val_sequence = b.val_sequence;
db<>fiddle (assuming MySQL from the syntax of the code you provided)
If the assumption doesn't hold true you'd need to define what should happen in such cases where it fails. E.g. whether to get the next available sequence number and whether a or b provides the "leading" sequence number or if val should be assumed to be 0 for missing sequence numbers, etc..
Related
I'm currently trying to import BQ tables full of arrays into a third party vizualisation tool that do not support them. I'm more of a node/nosql guy and that BQ step is somehow the complex exception within the project, so I believe I'm not correctly approching the problem to begin with.
A table looks like that:
Entry ID (primary)
User ID
...more metadata (finite)
Field ID 1 (dynamic)
Field ID 2 (dynamic)
...more fields (dynamic)
K1
U1
strings & numbers
[Value ID1, Value ID2]
[Value ID5, Value ID6]
...more arrays of values
K2
U1
strings & numbers
[Value ID2]
[Value ID5, Value ID6]
...more arrays of values
K3
U2
strings & numbers
[Value ID1]
[Value ID4, Value ID6]
...more arrays of values
Some more context:
our system follow a simple pattern: 1 org = 1 dataset = many users
the datasets are organized the exact same way accross orgs (when it comes to the number of tables and their IDs)
from now on I'll focus on one given table per org (let's call it "the Data table"): the one shared above
that Data table only share half of its schema accross orgs (primary key, user id, and some more columns with other metadata, it's finite and known), the second part of the schema (all the "Field .." columns) vary from an org to another (both the number of columns and the column names)
everything we're discussing will be handled by a node process that iterates over the org datasets, so it must be generic enough to handle all of them
any intermediary step, like running another pre-process to create intermediary tables or views, is acceptable
although I used JS notation for arrays, the BQ schema of the "fields" is in string/repeated, but it is possible to alter the way tables are exported to BigQuery if necessary
What I've tried:
flattening the table by parsing the arrays to string within node the moment those tables are exported to BigQuery => the third party doesn't support custom logic on cells, so in the end, the vizualisation can't correctly interpret the value
doing everything in the "What I believe I should do" beneath but through Node only: ie by reading BQ, parsing and mapping, then creating the 2 views => it screams inefficiency as I believe Node should only handle the automation part and simply send the query to BQ
doing that through SQL, but even though I can read it and run simple queries, as soon as I'm trying to mix UNNEST, JOIN and dynamic number of unknown columns, I'm kinda lost
What I believe I should do:
the third party allows to create Data Model and relations before vizualising, so I could have a view with one row per "values group", and another view that looks like the initial table, except the arrays of values are replaced by a string referencing the "primary key" of that "values group" view
The 2 outputs would look like that:
Refs
Ref ID
Value 1 (index 0)
Value 2
Value 3
...values
Ref1
Value ID1
Value ID2
Ref2
Value ID5
Value ID6
Ref3
Value ID2
Ref4
Value ID1
Ref5
Value ID4
Value ID6
Map
Entry ID (primary)
User ID
...more metadata (finite)
Field ID 1 (dynamic)
Field ID 2 (dynamic)
...more fields (dynamic)
K1
U1
strings & numbers
Ref1
Ref2
...more refs
K2
U1
strings & numbers
Ref3
Ref2
...more refs
K3
U2
strings & numbers
Ref4
Ref5
...more refs
The questions:
does it sounds logical (from a data analysis standpoint) and doable (from a BQ query standpoint) ?
I keep thinking 1 process > 1 read > 2 outputs for efficiency because of Node, but I should actually have one query from Data table to UNNEST into the Refs view, and then, another query from Data & Refs to generate the Map view, right ?
should I use GENERATE_UUID() to handle the RefID generation or there's something else more suited ?
Thanks for making it so far, I'll gladly take any input at that point.
You want to bring the nested table back to a relation data structure.
This is possible and depending on the requirements a good choice to do.
Please be aware that following query is tested only for a small dataset.
with tbl as
(Select "K1" as EntryID , "U1" as UserID, "strings & numbers" as metadata, ["Value ID1", "Value ID2"] as ID1, ["Value ID5", "Value ID6"] as ID2,
Union all Select "K2", "U1", "strings & numbers", ["Value ID2"], ["Value ID5", "Value ID6"],
Union all Select "K3", "U2", "strings & numbers", ["Value ID1"], ["Value ID4", "Value ID6"],
),
ref as (
select *, row_number() over (order by ref_name) as ref_id
from (
select distinct format("%T",ID1) as ref_name,
ID1[safe_offset(0)] as Value1,
ID1[safe_offset(1)] as Value2,
ID1[safe_offset(2)] as Value3,
ID1[safe_offset(3)] as Value4,
from (select id1 from tbl union all select id2 from tbl)
)
)
Select T.* except(ID1,Id2) #
,A.ref_id as Field_ID1
,B.ref_id as Field_ID2
from tbl T
left join ref A on format("%T",ID1)= A.ref_name
left join ref B on format("%T",ID2)= B.ref_name
First we generate your sample table tbl.
Table ref
The rows ID1 and ID2 are combined (union)
The array is converted into string format("%T",ID1) in the column ref_name
For each entry of the array, we generate a column ID1[safe_offset(0)] as Value1
The select distinct keeps only unique items
finally, we create a row_number for a unique reference id
This is put in ref table. However, you should safe this in a table create or replace table yourdatset.ref_table
Table Map
We query the tbl table without the array columns
We convert the array column ID1 to a string and join the reference id from the ref table: format("%T",ID1)= A.ref_name
The same has to be done for ID2
Just wondering is it possible to extract data from 2 different tables at one time in postgresql
I have the following:
Blocks Table - has been created as follows in order to fit a schema, so the JSON information has all been stored in an information column containing 36 polygons each
UUID (UUID)
Name (TEXT)
Type (TEXT)
Information (TEXT)
815b2ce7-ce99-4d6c-b41a-bec512173f53
C2
Block
'stored JSON info'
7a9a03fc-8be6-47ca-b743-43715ebb5610
D2
Block
'stored JSON info'
9136dcda-2a55-4084-87c1-68ccde23aed8
E3
Block
'stored JSON info'
For a later query, I need to know the geometries of each of the polygons, so I created another table using a code which parsed them out:
CREATE TABLE blockc2_ AS SELECT geom FROM (SELECT elem->>'type' AS type, elem->'properties' AS prop, elem->'geometry' AS geom FROM (SELECT json_array_elements(data) elem FROM block) f1)f2;
A final table is created to show just the geometries (which will a associated with the already created UID's like below
new_table
UUID (UUID)
Geometry (Geometry)
815b2ce7-ce99-4d6c-b41a-bec512173f53
01030000000100000005000000972E05A56D6851C084D91C434C6C32401C05D4886B6851C086D974FA4D6C324078F4DA916D6851C036BF7504766C3240F31D0CAE6F6851C035BF1D4D746C3240972E05A56D6851C084D91C434C6C3240
7a9a03fc-8be6-47ca-b743-43715ebb5610
01030000000100000005000000BB05694F726851C0CB2A87A8486C32403EDC3733706851C0CD2ADF5F4A6C32409ACB3E3C726851C07E10E069726C324017F56F58746851C07C1088B2706C3240BB05694F726851C0CB2A87A8486C3240
9136dcda-2a55-4084-87c1-68ccde23aed8
1030000000100000005000000972E05A56D6851C084D91C434C6C32401C05D4886B6851C086D974FA4D6C324078F4DA916D6851C036BF7504766C3240F31D0CAE6F6851C035BF1D4D746C3240972E05A56D6851C084D91C434C6C3240
Ideally, I need a code like below (if its possible) because if I insert them separately they don't associate with each other. Instead of 3 rows of info, it will be 6 (3 UUIDS and 3 Geometries)
INSERT INTO new_table (uuid, geometry) SELECT UUID FROM blocks WHERE Name='C2' AND SELECT geometry FROM second_table WHERE Name='C2'
Is something like this possible?
create table C (select * from table B union all select * from table A)
This sounds like a join:
INSERT INTO new_table (uuid, geometry)
SELECT b.UUID, g.geometry
FROM blocks b JOIN
geometry g
USING (name)
WHERE Name = 'C2';
I have two columns A and B, and would like to get a list of items(and their counts) in column B grouped by items in column A, and create a new table with the information. So the new table will look something like:
newCol1 | newCol2
--------+--------
a1, | b1:3,b4:1,b7:11
a2, | b2:1,b3:5,b4:3,b8:2
...and so forth. (delimiters can be anything, though. If concatenating item and count is not possible, I could also have one column with a list of items and another column with a list of counts separated by a delimiter.)
I can do this in Java by first getting all the items and storing them in a map with count updates, and then update the new table, but I was wondering if there's any way to do this in PostgreSQL (perhaps by writing a function).
I've looked at array function in PostgreSQL but didn't get far. Any pointers as well as suggestions for storing such data would be appreciated.
a and b are of type text, I assume.
SELECT a, array_agg(bs) AS b_list
FROM (
SELECT a, b || ':' || count(*) AS bs -- coerced to text automatically
FROM tbl
GROUP BY a, b
ORDER BY a, b -- to sort b_list in the result
) x
GROUP BY a;
Or use string_agg() as #a_horse demonstrates to get a string instead of an array as result.
You didn't supply any table definition nor input data (that should yield your output) so this is just a shot in the dark:
select a, string_agg(b||':'||to_char(b_count), ',)
from (
select a,
b,
count(b) over (partition by a) as b_count,
from the_unknown_table
) t
group by a
Currently struggling with finding a way to validate 2 tables (efficiently lots of rows for Table A)
I have two tables
Table A
ID
A
B
C
Table matched
ID Number
A 1
A 2
A 9
B 1
B 9
C 2
I am trying to write a SQL Server query that basically checks to make sure for every value in Table A there exists a row for a variable set of values ( 1, 2,9)
The example above is incorrect because t should have for every record in A a corresponding record in Table matched for each value (1,2,9). The end goal is:
Table matched
ID Number
A 1
A 2
A 9
B 1
B 2
B 9
C 1
C 2
C 9
I know its confusing, but in general for every X in ( some set ) there should be a corresponding record in Table matched. I have obviously simplified things.
Please let me know if you all need clarification.
Use:
SELECT a.id
FROM TABLE_A a
JOIN TABLE_B b ON b.id = a.id
WHERE b.number IN (1, 2, 9)
GROUP BY a.id
HAVING COUNT(DISTINCT b.number) = 3
The DISTINCT in the COUNT ensures that duplicates (IE: A having two records in TABLE_B with the value "2") from being falsely considered a correct record. It can be omitted if the number column either has a unique or primary key constraint on it.
The HAVING COUNT(...) must equal the number of values provided in the IN clause.
Create a temp table of values you want. You can do this dynamically if the values 1, 2 and 9 are in some table you can query from.
Then, SELECT FROM tempTable WHERE NOT IN (SELECT * FROM TableMatched)
I had this situation one time. My solution was as follows.
In addition to TableA and TableMatched, there was a table that defined the rows that should exist in TableMatched for each row in TableA. Let’s call it TableMatchedDomain.
The application then accessed TableMatched through a view that controlled the returned rows, like this:
create view TableMatchedView
select a.ID,
d.Number,
m.OtherValues
from TableA a
join TableMatchedDomain d
left join TableMatched m on m.ID = a.ID and m.Number = d.Number
This way, the rows returned were always correct. If there were missing rows from TableMatched, then the Numbers were still returned but with OtherValues as null. If there were extra values in TableMatched, then they were not returned at all, as though they didn't exist. By changing the rows in TableMatchedDomain, this behavior could be controlled very easily. If a value were removed TableMatchedDomain, then it would disappear from the view. If it were added back again in the future, then the corresponding OtherValues would appear again as they were before.
The reason I designed it this way was that I felt that establishing an invarient on the row configuration in TableMatched was too brittle and, even worse, introduced redundancy. So I removed the restriction from groups of rows (in TableMatched) and instead made the entire contents of another table (TableMatchedDomain) define the correct form of the data.
I have a table defined by:
create table apple(
A number,
B number);
Now, I need to get values in the table such as the following:
A B
------------------
1 4(max of A)
2 4(max of A)
3 4(max of A)
4 4(max of A)
How can I insert these rows, making B the maximum value of A?
Welp, first you want to insert 1-4 into your table:
insert into apple (a) values (1)
insert into apple (a) values (2)
insert into apple (a) values (3)
insert into apple (a) values (4)
Next, you're going to want to update your table to set b:
update apple set b = (select max(a) from apple)
As you can see, it's a two-part process. You can't get the max of a until you've created that column!
And of course, if you're wanting to have a select statement to grab that other field, use the OVER clause:
SELECT a, MAX(a) OVER() as b
FROM table;
Edited:
And for an existing table you can do:
UPDATE t SET b = maxcnt
FROM (
SELECT *, MAX(a) OVER() as maxcnt
FROM table
) t;
(I think this works in Oracle... definitely fine in MS-SQL)
Rob
Since 11g version you are able to use Virtual colums (their values calculated in real-time)
So that you should change your column definition as follows:
create table apple (
A number,
B number GENERATED ALWAYS AS ( max(A) ) VIRTUAL
);
I dont have Oracle 11g for testing, so cant check, but it should be working.
Also you could use user-defined function for Virtual column.
See http://www.oracle-base.com/articles/11g/VirtualColumns_11gR1.php for more examples and info!
Official docs for Create table in 11g:
http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/statements_7002.htm