Postgres - Optimized dynamic headers in separate table

Postgres - Optimized dynamic headers in separate table - sql

I have 2 tables in PostgreSQL:
CREATE TABLE contacts (
id bigint NOT NULL,
header_1 text NOT NULL,
header_2 text,
header_3 text );
CREATE TABLE headers (
id bigint NOT NULL,
name character varying,
header_type text NOT NULL,
organization_id bigint );
INSERT INTO contacts
(id, header_1, header_2, header_3)
VALUES
(1,'bob1#hotmail.com','Bob1','lol'),
(2,'bob2#hotmail.com','Bob2','lol'),
(3,'bob3#hotmail.com','Bob3','lol'),
(4,'bob4#hotmail.com','Bob4','lol'),
(5,'bob5#hotmail.com','Bob5','lol');
INSERT INTO headers
(id, name, header_type, organization_id)
VALUES
(1,'Email','email', 1),
(2,'First Name','first_name', 1),
(3,'Last Name','last_name', 1);
I wanna end up with this structure, the tricky part is that the headers are dynamic, meaning there can be n amount of headers, "contacts" columns will always start with 'header_' and "headers" will always match the contact id,
Email | First Name | Last Name
------------------|------------|-----------
bob1#hotmail.com | Bob1 | lol
bob2#hotmail.com | Bob2 | lol
bob3#hotmail.com | Bob3 | lol
bob4#hotmail.com | Bob4 | lol
bob5#hotmail.com | Bob5 | lol
Optimized querys are prefered
EDIT: Just to clarify
1.- There can be n amount of contact tables (contact1, contact2, etc)
2.- There can be n amount of rows in both header and contact tables.
3.- You can asume the data will always be integral, if table "contacts24" has a column named "header_57", you can asume theres gonna be a row in headers table with id: 57

At SQL, the table can not have a different number of columns for each row. So, it means you can not have a dynamic count of headers for each row on your contacts table.

Related

Business logic for identify SQL column update

I have a SQL table called contacts which as n number of rows where n is more than 10Lakh (1 million) rows.
Below is the table structure with dummy data
+---------------+------------+---------+-----------+---------+------------------------+
| email | department | title | state | country | cleansing_verification |
+---------------+------------+---------+-----------+---------+------------------------+
| xyz#email.com | h.r. | sr. Exe | telangana | Ind | - |
+---------------+------------+---------+-----------+---------+------------------------+
So, I have 4 schedulers to cleanse the data which is present in the above table, ie
department cleanser
title cleanser
state cleanser
country cleanser
Each cleanser will update the data of the respective columns. I have added one more column call cleansing_verification to identify which column is updated but not able to use properly.
One email can be touched by any of the cleansers. Which means all 4 can update the value or any 3 or any 2 or only 1.
So, the problem is I'm facing is How to identify which email is touched and which is not so that for remaining I can send an email notification.
If something more need let me know I will add in the question.
Thanks in advance.

So normally we don't do this in the world of database design, but you could use bitfields. So your cleansing_verification is a BIT(4) type column, and each cleanser gets a bit they can set:
department = B'1000'
title = B'0100'
state = B'0010'
country = B'0001'
When running i.e. state, you would then:
UPDATE contacts
SET cleansing_verification = cleansing_verification | B'0010'
WHERE -- whatever conditions you want to apply
If you wanted to check which rows were updated by a given cleanser, you check if the bit is set, e.g. for state:
SELECT * FROM contacts WHERE cleansing_verification & B'0010' = B'0010'
Working example on dbfiddle
Actually proper way to do it would be to introduce a new table with a foreign key back to the contacts table and a column for a cleanser, like (quick'n'dirty example):
CREATE TABLE contacts_verification
(
contact_id int references contacts(id),
cleanser int
)
Then if you want to mark a record you just insert the contact id and some sort of cleanser identification (1, 2, 3, 4), or you can use a text field and meaningful names if you really want:
INSERT INTO contacts_verification (contact_id, cleanser) VALUES (21386, 1)
Then just use JOIN to get back the records marked by a cleanser:
SELECT c.*
FORM contacts c
JOIN contacts_verification dep_verify
ON dep_verify.contact_id = c.id
AND dep_verify.cleanser = 1

INSERT SELECT with differed table/col stucture

I am trying to create a INSERT SELECT statement which inserts and converts data from Imported_table to Destination_table.
Imported_table
+------------------+-----------------------+
| Id (varchar(10)) | genre (varchar(4000)) |
+------------------+-----------------------+
| 6 | Comedy |
+------------------+-----------------------+
| 5 | Comedy |
+------------------+-----------------------+
| 1 | Action |
+------------------+-----------------------+
Destination_table (How it should be looking)
+-----------------------------+----------------------------+
| genre_name (PK,varchar(50)) | description (varchar(255)) |
+-----------------------------+----------------------------+
| Comedy | Description of Comedy |
+-----------------------------+----------------------------+
| Action | Description of Action |
+-----------------------------+----------------------------+
Imported_table.Id isn't used at all but is still in this (old) table
Destination_table.genre_name is a primairy key and should be unique (distinct)
Destination_table.description is compiled with CONCAT('Description of ',genre)
My best try
INSERT INTO testdb.dbo.Destination_table (genre_name, description)
SELECT DISTINCT Genre,
LEFT(Genre,50) AS genre_name,
CAST(CONCAT('Description of ',Genre) AS varchar(255)) AS description
FROM MYIMDB.dbo.Imported_table
Gives the error: The select list for the INSERT statement contains more items than the insert list. The number of SELECT values must match the number of INSERT columns.
Thanks in advance.

The largest error in your query is that you are trying to insert 3 columns into a destination table having only two columns. That being said, I would just use LEFT for both inserted values and take as much space as the new table can hold:
INSERT INTO testdb.dbo.Destination_table (genre_name, description)
SELECT DISTINCT
LEFT(Genre, 50),
'Description of ' + LEFT(Genre, 240) -- 240 + 15 = 255
FROM MYIMDB.dbo.Imported_table;
As a side note, the original genre field is 4000 characters wide, and your new table structure runs the risk of throwing away a lot of information. It is not clear whether you are concerned with this, but it is worth pointing out.

This means your SELECT (genre, genre_name,description) and INSERT (genre_name, description) lists don't match. You need to SELECT the same number of fields as you are specifying in your INSERT.
Try this:
INSERT INTO testdb.dbo.Destination_table (genre_name, description)
SELECT DISTINCT Genre,
CAST(CONCAT('Description of ',Genre) AS varchar(255)) AS description
FROM MYIMDB.dbo.Imported_table

You have 3 columns in your SELECT, try:
INSERT INTO testdb.dbo.Destination_table (genre_name, description)
SELECT DISTINCT LEFT(Genre,50) AS genre_name,
CAST(CONCAT('Description of ',Genre) AS varchar(255)) AS description
FROM MYIMDB.dbo.Imported_table

Create a table without knowing its columns in SQL

How can I create a table without knowing in advance how many and what columns it exactly holds?
The idea is that I have a table DATA that has 3 columns : ID, NAME, and VALUE
What I need is a way to get multiple values depending on the value of NAME - I can't do it with simple WHERE or JOIN (because I'll need other values - with other NAME values - later on in my query).
Because of the way this table is constructed I want to PIVOT it in order to transform every distinct value of NAME into a column so it will be easier to get to it in my later search.
What I want now is to somehow save this to a temp table / variable so I can use it later on to join with the result of another query...
So example:
Columns:
CREATE TABLE MainTab
(
id int,
nameMain varchar(max),
notes varchar(max)
);
CREATE TABLE SecondTab
(
id int,
id_mainTab, int,
nameSecond varchar(max),
notes varchar(max)
);
CREATE TABLE DATA
(
id int,
id_second int,
name varchar(max),
value varchar(max)
);
Now some example data from the table DATA:
| id | id_second_int | name | value |
|-------------------------------------------------------|
| 1 | 5550 | number | 111115550 |
| 2 | 6154 | address | 1, First Avenue |
| 3 | 1784 | supervisor | John Smith |
| 4 | 3467 | function | Marketing |
| 5 | 9999 | start_date | 01/01/2000 |
::::
Now imagine that 'name' has A LOT of different values, and in one query I'll need to get a lot of different values depending on the value of 'name'...
That's why I pivot it so that number, address, supervisor, function, start_date, ... become colums.
This I do dynamically because of the amount of possible columns - it would take me a while to write all of them in an 'IN' statement - and I don't want to have to remember to add it manually every time a new 'name' value gets added...
herefore I followed http://sqlhints.com/2014/03/18/dynamic-pivot-in-sql-server/
the thing is know that I want the result of my execute(#query) to get stored in a tempTab / variable. I want to use it later on to join it with mainTab...
It would be nice if I could use #cols (which holds the values of DATA.name) but I can't seem to figure out a way to do this.
ADDITIONALLY:
If I use the not dynamic way (write down all the values manually after 'IN') I still need to create a column called status. Now in this column (so far it's NULL everywhere because that value doesn't exist in my unpivoted table) i want to have 'open' or 'closed', depending on the date (let's say i have start_date and end_date,
CASE end_date
WHEN end_date < GETDATE() THEN pivotTab.status = 'closed'
ELSE pivotTab.status = 'open'
Where can I put this statement? Let's say my main query looks like this:
SELECT * FROM(
(SELECT id_second, name, value, id FROM TABLE_DATA) src
PIVOT (max(value) FOR name IN id, number, address, supervisor, function, start_date, end_date, status) AS pivotTab
JOIN SecondTab ON SecondTab.id = pivotTab.id_second
JOIN MainTab ON MainTab.id = SecondTab.id_mainTab
WHERE pivotTab.status = 'closed';

Well, as far as I can understand - you have some select statement and just need to "dump" its result to some temporary table. In this case you can use select into syntax like:
select .....
into #temp_table
from ....
This will create temporary table according to columns in select statement and populate it with data returned by select datatement.
See MDSN for reference.

Cassandra: Update multiple rows with different values

Hi I have similar table in Cassandra:
CREATE TABLE TestTable( id text,
group text,
date text,
user text,
dept text,
orderby int,
files list<text>,
users list<text>,
family_memebrs list<frozen <member>>,
PRIMARY KEY ((id)));'
CREATE INDEX on TestTable (user);
CREATE INDEX on TestTable (dept);
CREATE INDEX on TestTable (group);
CREATE INDEX on TestTable (date);
Id | OrderBy
:---- | :----
101 | 1
102 | 2
105 | 3
I want to change existing order by for following ids 105,102,103 in same order. i.e., (105, 1) (102, 2) (103, 3). I'm new to Cassandra, please help me. I think it is possible in sql byusing rownum and join.

I'm new to Cassandra
I can tell. The first clue, was the order of your results. With id as your sole PRIMARY KEY (making it your partition key) your results would never come back sorted like that. This is how they should be sorted:
aploetz#cqlsh:stackoverflow> SELECT id,orderby,token(id) FROM testtable ;
id | orderby | system.token(id)
-----+---------+---------------------
102 | 2 | -963541259029995480
105 | 3 | 2376737131193407616
101 | 1 | 4965004472028601333
(3 rows)
Unbound queries always return results sorted by the hashed token value of the partition key. I have run the token() function on your partition key (id) to show this.
I want to change existing order by for following ids 105,102,103 in same order. i.e., (105, 1) (102, 2) (103, 3).
If all you need to do is change the values in the orderby column, that's easy:
aploetz#cqlsh:stackoverflow> INSERT INTO testtable(id,orderby) VALUES ('101',3);
aploetz#cqlsh:stackoverflow> INSERT INTO testtable(id,orderby) VALUES ('102',2);
aploetz#cqlsh:stackoverflow> INSERT INTO testtable(id,orderby) VALUES ('105',1);
aploetz#cqlsh:stackoverflow> SELECT id,orderby,token(id) FROM testtable ;
id | orderby | system.token(id)
-----+---------+---------------------
102 | 2 | -963541259029995480
105 | 1 | 2376737131193407616
101 | 3 | 4965004472028601333
(3 rows)
As Cassandra PRIMARY KEYs are unique, simply INSERTing a new non-key column value for that key changes orderby.
Now if you want to actually be able to sort your results by the orderby column, that's another issue entirely, and cannot be solved with your current model.
If that's what you really want to do, then you'll need a new table with a different PRIMARY KEY definition. So I'll create the same table with two changes: I'll name it testtable_by_group, and I'll use a composite PRIMARY KEY of PRIMARY KEY (group,orderby,id)). Now I can query for a specific group "group1" and see the results sorted.
aploetz#cqlsh:stackoverflow> CREATE TABLE testtable_by_group (group text,id text,orderby int,PRIMARY KEY (group,orderby,id));
aploetz#cqlsh:stackoverflow> INSERT INTO testtable_by_group(group,id,orderby) VALUES ('group1','101',3);
aploetz#cqlsh:stackoverflow> INSERT INTO testtable_by_group(group,id,orderby) VALUES ('group1','102',2);
aploetz#cqlsh:stackoverflow> INSERT INTO testtable_by_group(group,id,orderby) VALUES ('group1','105',1);
aploetz#cqlsh:stackoverflow> SELECT group,id,orderby,token(group) FROM testtable_by_group WHERE group='group1';
group | id | orderby | system.token(group)
--------+-----+---------+----------------------
group1 | 105 | 1 | -2413872665919611707
group1 | 102 | 2 | -2413872665919611707
group1 | 101 | 3 | -2413872665919611707
(3 rows)
In this way, group is the new partition key. orderby is the first clustering key, so your rows within group are automatically sorted by it. id is on the end to ensure uniqueness, if any two rows have the same orderby.
Note that I left the token() function in the result set, but that I ran it on the new partition key (group). As you can see, the key of group1 is hashed to the same token for all 3 rows, which means that in a multi-node environment all 3 rows will be stored together. This can create a "hotspot" in your cluster, where some nodes have more data than others. That's why a good PRIMARY KEY definition ensures both query satisfaction and data distribution.
I wrote an article for DataStax on this topic a while back. Give it a read, and it should help you out: http://www.datastax.com/dev/blog/we-shall-have-order

Tricky PostgreSQL join and order query

I've got four tables in a PostgreSQL 9.3.6 database:
sections
fields (child of sections)
entries (child of sections)
data (child of entries)
CREATE TABLE section (
id serial PRIMARY KEY,
title text,
"group" integer
);
CREATE TABLE fields (
id serial PRIMARY KEY,
title text,
section integer,
type text,
"default" json
);
CREATE TABLE entries (
id serial PRIMARY KEY,
section integer
);
CREATE TABLE data (
id serial PRIMARY KEY,
data json,
field integer,
entry integer
);
I'm trying to generate a page that looks like this:
section title
field 1 title | field 2 title | field 3 title
entry 1 | data 'as' json | data 1 json | data 3 json <-- table
entry 2 | data 'df' json | data 5 json | data 6 json
entry 3 | data 'gh' json | data 8 json | data 9 json
The way I have it set up right now each piece of 'data' has an entry it's linked to, a corresponding field (that field has columns that determine how the data's json field should be interpreted), a json field to store different types of data, and an id (1-9 here in the table).
In this example there are 3 entries, and 3 fields and there is a data piece for each of the cells in between.
It's set up like this because one section can have different field types and quantity than another section and therefore different quantities and types of data.
Challenge 1:
I'm trying to join the table together in a way that it's sortable by any of the columns (contents of the data for that field's json column). For example I want to be able to sort field 3 (the third column) in reverse order, the table would look like this:
section title
field 1 title | field 2 title | field 3 title
entry 3 | data 'gh' json | data 8 json | data 9 json
entry 2 | data 'df' json | data 5 json | data 6 json
entry 1 | data 'as' json | data 1 json | data 3 json <-- table
I'm open to doing it another way too if there's a better one.
Challenge 2:
Each field has a 'default value' column - Ideally I only have to create 'data' entries when they have a value that isn't that default value. So the table might actually look like this if field 2's default value was 'asdf':
section title
field 1 title | field 2 title | field 3 title
entry 3 | data 'gh' json | data 8 json | data 9 json
entry 2 | data 'df' json | 'asdf' | data 6 json
entry 1 | data 'as' json | 'asdf' | data 3 json <-- table

The key to writing this query is understanding that you just need to fetch all the data for single section and the rest you just join. You also can't with your schema directly filter data by section so you'll need to join entry just for that:
SELECT d.* FROM data d JOIN entries e ON (d.entry = e.id)
WHERE e.section = ?
You can then join field to each row to get defaults, types and titles:
SELECT d.*, f.title, f.type, f."default"
FROM data d JOIN entries e ON (d.entry = e.id)
JOIN fields f ON (d.field = f.id)
WHERE e.section = ?
Or you can select fields in a separate query to save some network traffic.
So this was an answer, here come bonuses:
Use foreign keys instead of integers to refer to other tables, it will make database check consistency for you.
Relations (tables) should be called in singular by convention, so it's section, entry and field.
Referring fields are called <name>_id, e.g. field_id or section_id also by convention.
The whole point of JSON fields is to store a collection with not statically defined data, so it would made much more sense to not use entries and data tables, but single table with JSON containing all the fields instead.
Like this:
CREATE TABLE row ( -- less generic name would be even better
id int primary key,
section_id int references section (id),
data json
)
With data fields containing something like:
{
"title": "iPhone 6",
"price": 650,
"available": true,
...
}

#Suor has provided good advice, some of which you already accepted. I am building on the updated schema.
Schema
CREATE TABLE section (
section_id serial PRIMARY KEY,
title text,
grp integer
);
CREATE TABLE field (
field_id serial PRIMARY KEY,
section_id integer REFERENCES section,
title text,
type text,
default_val json
);
CREATE TABLE entry (
entry_id serial PRIMARY KEY,
section_id integer REFERENCES section
);
CREATE TABLE data (
data_id serial PRIMARY KEY,
field_id integer REFERENCES field,
entry_id integer REFERENCES entry,
data json
);
I changed two more details:
section_id instead of id, etc. "id" as column name is an anti-pattern that's gotten popular since a couple of ORMs use it. Don't. Descriptive names are much better. Identical names for identical content is a helpful guideline. It also allows to use the shortcut USING in join clauses:
Don't use reserved words as identifiers. Use legal, lower-case, unquoted names exclusively to make your life easier.
Are PostgreSQL column names case-sensitive?
Referential integrity?
There is another inherent weakness in your design. What stops entries in data from referencing a field and an entry that don't go together? Closely related question on dba.SE
Enforcing constraints “two tables away”
Query
Not sure if you need the complex design at all. But to answer the question, this is the base query:
SELECT entry_id, field_id, COALESCE(d.data, f.default_val) AS data
FROM entry e
JOIN field f USING (section_id)
LEFT JOIN data d USING (field_id, entry_id) -- can be missing
WHERE e.section_id = 1
ORDER BY 1, 2;
The LEFT JOIN is crucial to allow for missing data entries and use the default instead.
SQL Fiddle.
crosstab()
The final step is cross tabulation. Cannot show this in SQL Fiddle since the additional module tablefunc is not installed.
Basics for crosstab():
PostgreSQL Crosstab Query
SELECT * FROM crosstab(
$$
SELECT entry_id, field_id, COALESCE(d.data, f.default_val) AS data
FROM entry e
JOIN field f USING (section_id)
LEFT JOIN data d USING (field_id, entry_id) -- can be missing
WHERE e.section_id = 1
ORDER BY 1, 2
$$
,$$SELECT field_id FROM field WHERE section_id = 1 ORDER BY field_id$$
) AS ct (entry int, f1 json, f2 json, f3 json) -- static
ORDER BY f3->>'a'; -- static
The tricky part here is the return type of the function. I provided a static type for 3 fields, but you really want that dynamic. Also, I am referencing a field in the json type that may or may not be there ...
So build that query dynamically and execute it in a second call.
More about that:
Dynamic alternative to pivot with CASE and GROUP BY

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Postgres - Optimized dynamic headers in separate table - sql

At SQL, the table can not have a different number of columns for each row. So, it means you can not have a dynamic count of headers for each row on your contacts table.

Related

Business logic for identify SQL column update

INSERT SELECT with differed table/col stucture

Create a table without knowing its columns in SQL

Cassandra: Update multiple rows with different values

Tricky PostgreSQL join and order query

Categories

Resources