Hi I have similar table in Cassandra:
CREATE TABLE TestTable( id text,
group text,
date text,
user text,
dept text,
orderby int,
files list<text>,
users list<text>,
family_memebrs list<frozen <member>>,
PRIMARY KEY ((id)));'
CREATE INDEX on TestTable (user);
CREATE INDEX on TestTable (dept);
CREATE INDEX on TestTable (group);
CREATE INDEX on TestTable (date);
Id | OrderBy
:---- | :----
101 | 1
102 | 2
105 | 3
I want to change existing order by for following ids 105,102,103 in same order. i.e., (105, 1) (102, 2) (103, 3). I'm new to Cassandra, please help me. I think it is possible in sql byusing rownum and join.
I'm new to Cassandra
I can tell. The first clue, was the order of your results. With id as your sole PRIMARY KEY (making it your partition key) your results would never come back sorted like that. This is how they should be sorted:
aploetz#cqlsh:stackoverflow> SELECT id,orderby,token(id) FROM testtable ;
id | orderby | system.token(id)
-----+---------+---------------------
102 | 2 | -963541259029995480
105 | 3 | 2376737131193407616
101 | 1 | 4965004472028601333
(3 rows)
Unbound queries always return results sorted by the hashed token value of the partition key. I have run the token() function on your partition key (id) to show this.
I want to change existing order by for following ids 105,102,103 in same order. i.e., (105, 1) (102, 2) (103, 3).
If all you need to do is change the values in the orderby column, that's easy:
aploetz#cqlsh:stackoverflow> INSERT INTO testtable(id,orderby) VALUES ('101',3);
aploetz#cqlsh:stackoverflow> INSERT INTO testtable(id,orderby) VALUES ('102',2);
aploetz#cqlsh:stackoverflow> INSERT INTO testtable(id,orderby) VALUES ('105',1);
aploetz#cqlsh:stackoverflow> SELECT id,orderby,token(id) FROM testtable ;
id | orderby | system.token(id)
-----+---------+---------------------
102 | 2 | -963541259029995480
105 | 1 | 2376737131193407616
101 | 3 | 4965004472028601333
(3 rows)
As Cassandra PRIMARY KEYs are unique, simply INSERTing a new non-key column value for that key changes orderby.
Now if you want to actually be able to sort your results by the orderby column, that's another issue entirely, and cannot be solved with your current model.
If that's what you really want to do, then you'll need a new table with a different PRIMARY KEY definition. So I'll create the same table with two changes: I'll name it testtable_by_group, and I'll use a composite PRIMARY KEY of PRIMARY KEY (group,orderby,id)). Now I can query for a specific group "group1" and see the results sorted.
aploetz#cqlsh:stackoverflow> CREATE TABLE testtable_by_group (group text,id text,orderby int,PRIMARY KEY (group,orderby,id));
aploetz#cqlsh:stackoverflow> INSERT INTO testtable_by_group(group,id,orderby) VALUES ('group1','101',3);
aploetz#cqlsh:stackoverflow> INSERT INTO testtable_by_group(group,id,orderby) VALUES ('group1','102',2);
aploetz#cqlsh:stackoverflow> INSERT INTO testtable_by_group(group,id,orderby) VALUES ('group1','105',1);
aploetz#cqlsh:stackoverflow> SELECT group,id,orderby,token(group) FROM testtable_by_group WHERE group='group1';
group | id | orderby | system.token(group)
--------+-----+---------+----------------------
group1 | 105 | 1 | -2413872665919611707
group1 | 102 | 2 | -2413872665919611707
group1 | 101 | 3 | -2413872665919611707
(3 rows)
In this way, group is the new partition key. orderby is the first clustering key, so your rows within group are automatically sorted by it. id is on the end to ensure uniqueness, if any two rows have the same orderby.
Note that I left the token() function in the result set, but that I ran it on the new partition key (group). As you can see, the key of group1 is hashed to the same token for all 3 rows, which means that in a multi-node environment all 3 rows will be stored together. This can create a "hotspot" in your cluster, where some nodes have more data than others. That's why a good PRIMARY KEY definition ensures both query satisfaction and data distribution.
I wrote an article for DataStax on this topic a while back. Give it a read, and it should help you out: http://www.datastax.com/dev/blog/we-shall-have-order
Related
I have a table of customer contacts and their role. Simplified example below.
customer | role | userid
----------------------------
1 | Support | 123
1 | Support | 456
1 | Procurement | 567
...
desired output
customer | Support1 | Support2 | Support3 | Support4 | Procurement1 | Procurement2
-----------------------------------------------------------------------------------
1 | 123 | 456 | null | null | 567 | null
2 | 123 | 456 | 12333 | 45776 | 888 | 56723
So dynamically create number of required columns based on how many user are in that role. It's a small number of roles. Also I can assume max 5 user in that same role. Which means worst case I need to generate 5 columns for each role. The userids don't need to be in any particular order.
My current approach is getting 1 userid per role/customer. Then a second query pulls another id that wasn't part of first results set. And so on. But that way I have to statically create 5 queries. It works. But I was wondering whether there is a more efficient way? Dynamically creating needed columns.
Example of pulling one user per role:
SELECT customer,role,
(SELECT top 1 userid
FROM temp as tmp1
where tmp1.customer=tmp2.customer and tmp1.role=tmp2.role
) as userid
FROM temp as tmp2
group by customer,role
order by customer,role
SQL create with dummy data
create table temp
(
customer int,
role nvarchar(20),
userid int
)
insert into temp values (1,'Support',123)
insert into temp values (1,'Support',456)
insert into temp values (1,'Procurement',567)
insert into temp values (2,'Support',123)
insert into temp values (2,'Support',456)
insert into temp values (2,'Procurement',888)
insert into temp values (2,'Support',12333)
insert into temp values (2,'Support',45776)
insert into temp values (2,'Procurement',56723)
You may need to adapt your approach slightly if you want to avoid getting into the realm of programming user defined table functions (which is what you would need in order to generate columns dynamically). You don't mention which SQL database variant you are using (SQL Server, PostgreSQL, ?). I'm going to make the assumption that it supports some form of string aggregation feature (they pretty much all do), but the syntax for doing this will vary, so you will probably have to adjust the code to your circumstances. You mention that the number of roles is small (5-ish?). The proposed solution is to generate a comma-separated list of user ids, one for each role, using common table expressions (CTEs) and the LISTAGG (variously named STRING_AGG, GROUP_CONCAT, etc. in other databases) function.
WITH tsupport
AS (SELECT customer,
Listagg(userid, ',') AS "Support"
FROM temp
WHERE ROLE = 'Support'
GROUP BY customer),
tprocurement
AS (SELECT customer,
Listagg(userid, ',') AS "Procurement"
FROM temp
WHERE ROLE = 'Procurement'
GROUP BY customer)
--> tnextrole...
--> AS (SELECT ... for additional roles
--> Listagg...
SELECT a.customer,
"Support",
"Procurement"
--> "Next Role" etc.
FROM tsupport a
JOIN tprocurement b
ON a.customer = b.customer
--> JOIN tNextRole ...
Fiddle is here with a result that appears as below based on your dummy data:
I have 2 tables in PostgreSQL:
CREATE TABLE contacts (
id bigint NOT NULL,
header_1 text NOT NULL,
header_2 text,
header_3 text );
CREATE TABLE headers (
id bigint NOT NULL,
name character varying,
header_type text NOT NULL,
organization_id bigint );
INSERT INTO contacts
(id, header_1, header_2, header_3)
VALUES
(1,'bob1#hotmail.com','Bob1','lol'),
(2,'bob2#hotmail.com','Bob2','lol'),
(3,'bob3#hotmail.com','Bob3','lol'),
(4,'bob4#hotmail.com','Bob4','lol'),
(5,'bob5#hotmail.com','Bob5','lol');
INSERT INTO headers
(id, name, header_type, organization_id)
VALUES
(1,'Email','email', 1),
(2,'First Name','first_name', 1),
(3,'Last Name','last_name', 1);
I wanna end up with this structure, the tricky part is that the headers are dynamic, meaning there can be n amount of headers, "contacts" columns will always start with 'header_' and "headers" will always match the contact id,
Email | First Name | Last Name
------------------|------------|-----------
bob1#hotmail.com | Bob1 | lol
bob2#hotmail.com | Bob2 | lol
bob3#hotmail.com | Bob3 | lol
bob4#hotmail.com | Bob4 | lol
bob5#hotmail.com | Bob5 | lol
Optimized querys are prefered
EDIT: Just to clarify
1.- There can be n amount of contact tables (contact1, contact2, etc)
2.- There can be n amount of rows in both header and contact tables.
3.- You can asume the data will always be integral, if table "contacts24" has a column named "header_57", you can asume theres gonna be a row in headers table with id: 57
At SQL, the table can not have a different number of columns for each row. So, it means you can not have a dynamic count of headers for each row on your contacts table.
The architecture of my DB involves records in a Tags table. Each record in the Tags table has a string which is a Name and a foreign kery to the PrimaryID's of records in another Worker table.
Records in the Worker table have tags. Every time we create a Tag for a worker, we add a new row in the Tags table with the inputted Name and foreign key to the worker's PrimaryID. Therefore, we can have multiple Tags with different names per same worker.
Worker Table
ID | Worker Name | Other Information
__________________________________________________________________
1 | Worker1 | ..........................
2 | Worker2 | ..........................
3 | Worker3 | ..........................
4 | Worker4 | ..........................
Tags Table
ID |Foreign Key(WorkerID) | Name
__________________________________________________________________
1 | 1 | foo
2 | 1 | bar
3 | 2 | foo
5 | 3 | foo
6 | 3 | bar
7 | 3 | baz
8 | 1 | qux
My goal is to filter WorkerID's based on an inputted table of strings. I want to get the set of WorkerID's that have the same tags as the inputted ones. For example, if the inputted strings are foo and bar, I would like to return WorkerID's 1 and 3. Any idea how to do this? I was thinking something to do with GROUP BY or JOINING tables. I am new to SQL and can't seem to figure it out.
This is a variant of relational division. Here's one attempt:
select workerid
from tags
where name in ('foo', 'bar')
group by workerid
having count(distinct name) = 2
You can use the following:
select WorkerID
from tags where name in ('foo', 'bar')
group by WorkerID
having count(*) = 2
and this will retrieve your desired result/
Regards.
This article is an excellent resource on the subject.
While the answer from #Lennart works fine in Query Analyzer, you're not going to be able to duplicate that in a stored procedure or from a consuming application without opening yourself up to SQL injection attacks. To extend the solution, you'll want to look into passing your list of tags as a table-valued parameter since SQL doesn't support arrays.
Essentially, you create a custom type in the database that mimics a table with only one column:
CREATE TYPE list_of_tags AS TABLE (t varchar(50) NOT NULL PRIMARY KEY)
Then you populate an instance of that type in memory:
DECLARE #mylist list_of_tags
INSERT #mylist (t) VALUES('foo'),('bar')
Then you can select against that as a join using the GROUP BY/HAVING described in the previous answers:
select workerid
from tags inner join #mylist on tag = t
group by workerid
having count(distinct name) = 2
*Note: I'm not at a computer where I can test the query. If someone sees a flaw in my query, please let me know and I'll happily correct it and thank them.
I am attempting to duplicate an entry. That part isn't hard. The tricky part is: there are n entries connected with a foreign key. And for each of those entries, there are n entries connected to that. I did it manually using a lookup to duplicate and cross reference the foreign keys.
Is there some subroutine or method to duplicate an entry and search for and duplicate foreign entries? Perhaps there is a name for this type of replication I haven't stumbled on yet, is there a specific database related title for this type of operation?
PostgreSQL 8.4.13
main entry (uid is serial)
uid | title
-----+-------
1 | stuff
department (departmentid is serial, uidref is foreign key for uid above)
departmentid | uidref | title
--------------+--------+-------
100 | 1 | Foo
101 | 1 | Bar
sub_category of department (textid is serial, departmentref is foreign for departmentid above)
textid | departmentref | title
-------+---------------+----------------
1000 | 100 | Text for Foo 1
1001 | 100 | Text for Foo 2
1002 | 101 | Text for Bar 1
You can do it all in a single statement using data-modifying CTEs (requires Postgres 9.1 or later).
Your primary keys being serial columns makes it easier:
WITH m AS (
INSERT INTO main (<all columns except pk>)
SELECT <all columns except pk>
FROM main
WHERE uid = 1
RETURNING uid AS uidref -- returns new uid
)
, d AS (
INSERT INTO department (<all columns except pk>)
SELECT <all columns except pk>
FROM m
JOIN department d USING (uidref)
RETURNING departmentid AS departmentref -- returns new departmentids
)
INSERT INTO sub_category (<all columns except pk>)
SELECT <all columns except pk>
FROM d
JOIN sub_category s USING (departmentref);
Replace <all columns except pk> with your actual columns. pk is for primary key, like main.uid.
The query returns nothing. You can return pretty much anything. You just didn't specify anything.
You wouldn't call that "replication". That term usually is applied for keeping multiple database instances or objects in sync. You are just duplicating an entry - and depending objects recursively.
Aside about naming conventions:
It would get even simpler with a naming convention that labels all columns signifying "ID of table foo" with the same (descriptive) name, like foo_id. There are other naming conventions floating around, but this is the best for writing queries, IMO.
Does PostgreSQL 9.2+ provide any functionality to make it possible to generate a sequence that is namespaced to a particular value? For example:
.. | user_id | seq_id | body | ...
----------------------------------
- | 4 | 1 | "abc...."
- | 4 | 2 | "def...."
- | 5 | 1 | "ghi...."
- | 5 | 2 | "xyz...."
- | 5 | 3 | "123...."
This would be useful to generate custom urls for the user:
domain.me/username_4/posts/1
domain.me/username_4/posts/2
domain.me/username_5/posts/1
domain.me/username_5/posts/2
domain.me/username_5/posts/3
I did not find anything in the PG docs (regarding sequence and sequence functions) to do this. Are sub-queries in the INSERT statement or with custom PG functions the only other options?
You can use a subquery in the INSERT statement like #Clodoaldo demonstrates. However, this defeats the nature of a sequence as being safe to use in concurrent transactions, it will result in race conditions and eventually duplicate key violations.
You should rather rethink your approach. Just one plain sequence for your table and combine it with user_id to get the sort order you want.
You can always generate the custom URLs with the desired numbers using row_number() with a simple query like:
SELECT format('domain.me/username_%s/posts/%s'
, user_id
, row_number() OVER (PARTITION BY user_id ORDER BY seq_id)
)
FROM tbl;
db<>fiddle here
Old sqlfiddle
Maybe this answer is a little off-piste, but I would consider partitioning the data and giving each user their own partitioned table for posts.
There's a bit of overhead to the setup as you will need triggers for managing the DDL statements for the partitions, but would effectively result in each user having their own table of posts, along with their own sequence with the benefit of being able to treat all posts as one big table also.
General gist of the concept...
psql# CREATE TABLE posts (user_id integer, seq_id integer);
CREATE TABLE
psql# CREATE TABLE posts_001 (seq_id serial) INHERITS (posts);
CREATE TABLE
psql# CREATE TABLE posts_002 (seq_id serial) INHERITS (posts);
CREATE TABLE
psql# INSERT INTO posts_001 VALUES (1);
INSERT 0 1
psql# INSERT INTO posts_001 VALUES (1);
INSERT 0 1
psql# INSERT INTO posts_002 VALUES (2);
INSERT 0 1
psql# INSERT INTO posts_002 VALUES (2);
INSERT 0 1
psql# select * from posts;
user_id | seq_id
---------+--------
1 | 1
1 | 2
2 | 1
2 | 2
(4 rows)
I left out some rather important CHECK constraints in the above setup, make sure you read the docs for how these kinds of setups are used
insert into t values (user_id, seq_id) values
(4, (select coalesce(max(seq_id), 0) + 1 from t where user_id = 4))
Check for a duplicate primary key error in the front end and retry if needed.
Update
Although #Erwin advice is sensible, that is, a single sequence with the ordering in the select query, it can be expensive.
If you don't use a sequence there is no defeat of the nature of the sequence. Also it will not result in a duplicate key violation. To demonstrate it I created a table and made a python script to insert into it. I launched 3 parallel instances of the script inserting as fast as possible. And it just works.
The table must have a primary key on those columns:
create table t (
user_id int,
seq_id int,
primary key (user_id, seq_id)
);
The python script:
#!/usr/bin/env python
import psycopg2, psycopg2.extensions
query = """
begin;
insert into t (user_id, seq_id) values
(4, (select coalesce(max(seq_id), 0) + 1 from t where user_id = 4));
commit;
"""
conn = psycopg2.connect('dbname=cpn user=cpn')
conn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_SERIALIZABLE)
cursor = conn.cursor()
for i in range(0, 1000):
while True:
try:
cursor.execute(query)
break
except psycopg2.IntegrityError, e:
print e.pgerror
cursor.execute("rollback;")
cursor.close()
conn.close()
After the parallel run:
select count(*), max(seq_id) from t;
count | max
-------+------
3000 | 3000
Just as expected. I developed at least two applications using that logic and one of then is more than 13 years old and never failed. I concede that if you are Facebook or some other giant then you could have a problem.
Yes:
CREATE TABLE your_table
(
column type DEFAULT NEXTVAL(sequence_name),
...
);
More details here:
http://www.postgresql.org/docs/9.2/static/ddl-default.html