Can I count the occurences for postgres array field? - sql

I have a table postgres that uses the array type of data, it allows some magic making it possible to avoid having more tables, but the non-standard nature of this makes it more difficult to operate with for a beginner.
I would like to get some summary data out of it.
Sample content:
CREATE TABLE public.cts (
id serial NOT NULL,
day timestamp NULL,
ct varchar[] NULL,
CONSTRAINT ctrlcts_pkey PRIMARY KEY (id)
);
INSERT INTO public.cts
(id, day, ct)
VALUES(29, '2015-01-24 00:00:00.000', '{ct286,ct281}');
INSERT INTO public.cts
(id, day, ct)
VALUES(30, '2015-01-25 00:00:00.000', '{ct286,ct281}');
INSERT INTO public.cts
(id, day, ct)
VALUES(31, '2015-01-26 00:00:00.000', '{ct286,ct277,ct281}');
I would like to get the totals per array member occurence totalized, with an output like this for example:
name | value
ct286 | 3
ct281 | 3
ct277 | 1

Use Postgres function array unnest():
SELECT name, COUNT(*) cnt
FROM cts, unnest(ct) as u(name)
GROUP BY name
Demo on DB Fiddle:
| name | cnt |
| ----- | --- |
| ct277 | 1 |
| ct281 | 3 |
| ct286 | 3 |

Related

How to combine two SQL queries in MySQL with different columns without combining their resulting rows

Context
I'm trying to create a "feed" system on my website where users can go to their feed and see all their new notifications across the different things they can do on the website. For example, in the "feed" section, users are able to see if the users they follow have created articles and if the users have commented on articles. My current feed system simply uses two separate queries to obtain this information. However, I want to combine these two queries into one so that the user can view the activity of those they follow chronologically. The way my system works now, I get five articles from each person the user follows and put it in the "feed" section and then get five article comments and post it in the same area in the "feed" section. Instead of the queries being separate, I want to combine them so that, instead of seeing five article posts in a row and then five article comments in a row, they will see the feed posts that happened in chronological order, whether the other users created an article first, then commented, then created another article, or whatever the order is, instead of always seeing the same order of notifications.
Question
First, let me show you my code for table creation if you would like to recreate this. The first thing to do is to create a users table, which my articles and articlecomments tables reference:
CREATE TABLE users (
idUsers int(11) AUTO_INCREMENT PRIMARY KEY NOT NULL,
uidUsers TINYTEXT NOT NULL,
emailUsers VARCHAR(100) NOT NULL,
pwdUsers LONGTEXT NOT NULL,
created DATETIME NOT NULL,
UNIQUE (emailUsers),
FULLTEXT(uidUsers)
) ENGINE=InnoDB;
Next, let's create the articles table:
CREATE TABLE articles (
article_id INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
title TEXT NOT NULL,
article TEXT NOT NULL,
date DATETIME NOT NULL,
idUsers int(11) NOT NULL,
topic VARCHAR(50) NOT NULL,
published VARCHAR(50) NOT NULL,
PRIMARY KEY (article_id),
FULLTEXT(title, article),
FOREIGN KEY (idUsers) REFERENCES users (idUsers) ON DELETE CASCADE ON UPDATE
CASCADE
) ENGINE=InnoDB;
Finally, we need the articlecomments table:
CREATE TABLE articlecomments (
comment_id INT(11) AUTO_INCREMENT PRIMARY KEY NOT NULL,
message TEXT NOT NULL,
date DATETIME NOT NULL,
article_id INT(11) UNSIGNED NOT NULL,
idUsers INT(11) NOT NULL,
seen TINYTEXT NOT NULL,
FOREIGN KEY (article_id) REFERENCES articles (article_id) ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (idUsers) REFERENCES users (idUsers) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB;
To populate the tables sufficiently for this example, we will use these statements:
INSERT INTO users (uidUsers, emailUsers, pwdUsers, created) VALUES ('genericUser', 'genericUser#hotmail.com', 'password', NOW());
INSERT INTO articles (title, article, date, idUsers, topic, published) VALUES ('first article', 'first article contents', NOW(), '1', 'other', 'yes');
INSERT INTO articles (title, article, date, idUsers, topic, published) VALUES ('second article', 'second article contents', NOW(), '1', 'other', 'yes');
INSERT INTO articles (title, article, date, idUsers, topic, published) VALUES ('third article', 'third article contents', NOW(), '1', 'other', 'yes');
INSERT INTO articlecomments (message, date, article_id, idUsers, seen) VALUES ('first message', NOW(), '1', '1', 'false');
INSERT INTO articlecomments (message, date, article_id, idUsers, seen) VALUES ('second message', NOW(), '1', '1', 'false');
INSERT INTO articlecomments (message, date, article_id, idUsers, seen) VALUES ('third message', NOW(), '1', '1', 'false');
The two queries that I'm using to obtain data from the articles and articlecomments tables are below:
Query 1:
SELECT
articles.article_id, articles.title, articles.date,
articles.idUsers, users.uidUsers
FROM articles
JOIN users ON articles.idUsers = users.idUsers
WHERE articles.idUsers = '1' AND articles.published = 'yes'
ORDER BY articles.date DESC
LIMIT 5
Query 2:
SELECT
articlecomments.comment_id, articlecomments.message,
articlecomments.date, articlecomments.article_id, users.uidUsers
FROM articlecomments
JOIN users ON articlecomments.idUsers = users.idUsers
WHERE articlecomments.idUsers = '1'
ORDER BY articlecomments.date DESC
LIMIT 5
How would I combine these two queries that contain different information and columns so that they are ordered based on the date of creation (articles.date and articlecomments.date, respectively)? I want them to be in separate rows, not the same row. So, it should be like I queried them separately and simply combined the resulting rows together. If there are three articles and three article comments, I want there to be six total returned rows.
Here's what I want this to look like. Given there are three articles and three article comments, and the comments were created after the articles, this is what the result should look like after combining the queries above (I'm not sure if this portrayal is possible given the different column names but I'm wondering if something similar could be accomplished):
+-------------------------------+-------------------+---------------------+----------------------------------------------------------------+---------+-------------+
| id (article_id or comment_id) | title/message | date | article_id (because it is referenced in articlecomments table) | idUsers | uidUsers |
+-------------------------------+-------------------+---------------------+----------------------------------------------------------------+---------+-------------+
| 1 | first message | 2020-07-07 11:27:15 | 1 | 1 | genericUser |
| 2 | second message | 2020-07-07 11:27:15 | 1 | 1 | genericUser |
| 3 | third message | 2020-07-07 11:27:15 | 1 | 1 | genericUser |
| 2 | second article | 2020-07-07 10:47:35 | 2 | 1 | genericUser |
| 3 | third article | 2020-07-07 10:47:35 | 3 | 1 | genericUser |
| 1 | first article | 2020-07-07 10:46:51 | 1 | 1 | genericUser |
+-------------------------------+-------------------+---------------------+----------------------------------------------------------------+---------+-------------+
Things I have Tried
I have read that this might involve JOIN or UNION operators, but I'm unsure of how to implement them in this situation. I did try combining the two queries by simply using (Query 1) UNION (Query 2), which at first told me that the number of columns were different in my two queries, so I had to remove the idUsers column from my articlecomments query. This actually got me kind of close, but it wasn't formatted correctly:
+------------+-------------------+---------------------+---------+-------------+
| article_id | title | date | idUsers | uidUsers |
+------------+-------------------+---------------------+---------+-------------+
| 2 | first message | 2020-07-07 10:47:35 | 1 | genericUser |
| 3 | third article | 2020-07-07 10:47:35 | 1 | genericUser |
| 1 | first article | 2020-07-07 10:46:51 | 1 | genericUser |
| 1 | second article | 2020-07-07 11:27:15 | 1 | genericUser |
| 2 | third article | 2020-07-07 11:27:15 | 1 | genericUser |
| 3 | first article | 2020-07-07 11:27:15 | 1 | genericUser |
+------------+-------------------+---------------------+---------+-------------+
Any ideas? Let me know if there is any confusion. Thanks.
Server type: MariaDB
Server version: 10.4.8-MariaDB - mariadb.org binary distribution
This seems like MySQL. You could do something like this:
select * from (SELECT articles.article_id as id_article_comment, articles.title as title_message, articles.date as created, 'article' AS contenttype, articles.article_id as article_id, articles.idUsers, users.uidUsers FROM articles JOIN users ON articles.idUsers = users.idUsers WHERE articles.idUsers = '1' AND articles.published = 'yes' ORDER BY articles.date DESC LIMIT 5) a
union all
select * from (SELECT articlecomments.comment_id, articlecomments.message, articlecomments.date, 'article comment' AS contenttype, articlecomments.article_id, articlecomments.idUsers, users.uidUsers FROM articlecomments JOIN users ON articlecomments.idUsers = users.idUsers WHERE articlecomments.idUsers = '1' ORDER BY articlecomments.date DESC LIMIT 5) b
order by created DESC
See example here: https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=26280a9c1c5f62fc33d00d93ab84adf3
Result like this:
id_article_comment | title_message | created | article_id | uidUsers
-----------------: | :------------- | :------------------ | ---------: | :----------
1 | first article | 2020-07-09 05:59:18 | 1 | genericUser
2 | second article | 2020-07-09 05:59:18 | 1 | genericUser
3 | third article | 2020-07-09 05:59:18 | 1 | genericUser
1 | first message | 2020-07-09 05:59:18 | 1 | genericUser
2 | second message | 2020-07-09 05:59:18 | 1 | genericUser
3 | third message | 2020-07-09 05:59:18 | 1 | genericUser
Explanation
Since we want to use order by and limit, we'll create a subquery out of the first line and select all columns from that first subquery. We'll name each field the way we want in the output.
We do the same thing with the 2nd query and add a union all clause between them. Then, we apply ordering based on created date (which was an alias in the first query) to get the results you desired in the order you desired.
If you use union, duplicate rows will be eliminated from the result. If you use union all, duplicate rows - if they exist - will be retained. union all is faster since it combines 2 datasets (as long as columns are same in queries. union has to, additionally, look for duplicate rows and remove them from the query.
You don't mention the version of MySQL you are using, so I'll assume it's a modern one (MySQL 8.x). You can produce a row number on each subset using ROW_NUMBER() and then a plain UNION ALL will do the trick.
I fail to understand the exact order you want, and what does the fourth column article_id (because it is referenced in articlecomments table) means. If you elaborate I can tweak this answer accordingly.
The query that produces the result set you want is:
select *
from ( (
SELECT
a.article_id as id, a.title, a.date,
a.article_id, u.uidUsers,
row_number() over(ORDER BY a.date DESC) as rn
FROM articles a
JOIN users u ON a.idUsers = u.idUsers
WHERE a.idUsers = '1' AND a.published = 'yes'
ORDER BY a.date DESC
LIMIT 5
) union all (
SELECT
c.comment_id, c.message, c.date,
c.article_id, u.uidUsers,
5 + row_number() over(ORDER BY c.date DESC) as rn
FROM articlecomments c
JOIN users u ON c.idUsers = u.idUsers
WHERE c.idUsers = '1'
ORDER BY c.date DESC
LIMIT 5
)
) x
order by rn
Result:
id title date article_id uidUsers rn
-- -------------- ------------------- ---------- ----------- --
1 first article 2020-07-10 10:37:00 1 genericUser 1
2 second article 2020-07-10 10:37:00 2 genericUser 2
3 third article 2020-07-10 10:37:00 3 genericUser 3
1 first message 2020-07-10 10:37:00 1 genericUser 6
2 second message 2020-07-10 10:37:00 1 genericUser 7
3 third message 2020-07-10 10:37:00 1 genericUser 8
See running example in db<>fiddle.
you can cross join like this=
select select(1) from FROM [job] WITH (NOLOCK)
WHERE MemberCode = 'pay'
AND CampaignID = '2'
cross join
select(1)
FROM [product] WITH (NOLOCK)
WHERE MemberCode = 'pay'
AND CampaignID = '2'

postgres insert data from an other table inside array type columns

I have tow table on Postgres 11 like so, with some ARRAY types columns.
CREATE TABLE test (
id INT UNIQUE,
category TEXT NOT NULL,
quantitie NUMERIC,
quantities INT[],
dates INT[]
);
INSERT INTO test (id, category, quantitie, quantities, dates) VALUES (1, 'cat1', 33, ARRAY[66], ARRAY[123678]);
INSERT INTO test (id, category, quantitie, quantities, dates) VALUES (2, 'cat2', 99, ARRAY[22], ARRAY[879889]);
CREATE TABLE test2 (
idweb INT UNIQUE,
quantities INT[],
dates INT[]
);
INSERT INTO test2 (idweb, quantities, dates) VALUES (1, ARRAY[34], ARRAY[8776]);
INSERT INTO test2 (idweb, quantities, dates) VALUES (3, ARRAY[67], ARRAY[5443]);
I'm trying to update data from table test2 to table test only on rows with same id. inside ARRAY of table test and keeping originals values.
I use INSERT on conflict,
how to update only 2 columns quantities and dates.
running the sql under i've got also an error that i don't understand the origin.
Schema Error: error: column "quantitie" is of type numeric but expression is of type integer[]
INSERT INTO test (SELECT * FROM test2 WHERE idweb IN (SELECT id FROM test))
ON CONFLICT (id)
DO UPDATE
SET
quantities = array_cat(EXCLUDED.quantities, test.quantities),
dates = array_cat(EXCLUDED.dates, test.dates);
https://www.db-fiddle.com/f/rs8BpjDUCciyZVwu5efNJE/0
is there a better way to update table test from table test2, or where i'm missing the sql?
update to show result needed on table test:
**Schema (PostgreSQL v11)**
| id | quantitie | quantities | dates | category |
| --- | --------- | ---------- | ----------- | --------- |
| 2 | 99 | 22 | 879889 | cat2 |
| 1 | 33 | 34,66 | 8776,123678 | cat1 |
Basically, your query fails because the structures of the tables do not match - so you cannot insert into test select * from test2.
You could work around this by adding "fake" columns to the select list, like so:
insert into test
select idweb, 'foo', 0, quantities, dates from test2 where idweb in (select id from test)
on conflict (id)
do update set
quantities = array_cat(excluded.quantities, test.quantities),
dates = array_cat(excluded.dates, test.dates);
But this looks much more convoluted than needed. Essentially, you want an update statement, so I would just recommend:
update test
set
dates = test2.dates || test.dates,
quantities = test2.quantities || test.quantities
from test2
where test.id = test2.idweb
Note that this ues || concatenation operator instead of array_cat() - it is shorter to write.
Demo on DB Fiddle:
id | category | quantitie | quantities | dates
-: | :------- | --------: | :--------- | :------------
2 | cat2 | 99 | {22} | {879889}
1 | cat1 | 33 | {34,66} | {8776,123678}

Find specific values and trancate

How to find group values?
Example I have following SKU's:
SkuId Description
VN0A46ZERWV113000M CLASSIC
VN0A46ZERWV112000M CLASSIC
VN0A46ZERWV111500M CLASSIC
VN0A3WCVAZ31XXL Modern
VN0A3WCVAZ310XL Modern
VN0A3WCVAZ3100S Modern
VN0A3WCVAZ3100M Modern
VN0A3TE3RCO113000M Not Classic
VN0A3TE3RCO112000M Not Classic
VN0A3TE3RCO111500M Not Classic
How to describe...:) So, I need find all Sku's with the same description, find the same part in SKU, and add new row after every group. In general, the same part is first 12 characters.
Example in Result:
SkuId Description
VN0A46ZERWV113000M CLASSIC
VN0A46ZERWV112000M CLASSIC
VN0A46ZERWV111500M CLASSIC
VN0A46ZERWV1 NEW
VN0A3WCVAZ31XXL Modern
VN0A3WCVAZ310XL Modern
VN0A3WCVAZ3100S Modern
VN0A3WCVAZ3100M Modern
VN0A3WCVAZ31 NEW
VN0A3TE3RCO113000M Not Classic
VN0A3TE3RCO112000M Not Classic
VN0A3TE3RCO111500M Not Classic
VN0A3TE3RCO1 NEW
If I understand correctly you can try to use UNION ALL and substring function to make it.
use substring to get the first 12 characters from SkuId column in subquery then distinct remove duplicate first 12 characters SkuId then UNION ALL two result set.
CREATE TABLE T(
SkuId VARCHAR(100),
Description VARCHAR(100)
);
INSERT INTO T VALUES ('VN0A46ZERWV113000M' ,'CLASSIC');
INSERT INTO T VALUES ('VN0A46ZERWV112000M' ,'CLASSIC');
INSERT INTO T VALUES ('VN0A46ZERWV111500M' ,'CLASSIC');
INSERT INTO T VALUES ('VN0A3WCVAZ31XXL' ,'Modern');
INSERT INTO T VALUES ('VN0A3WCVAZ310XL' ,'Modern');
INSERT INTO T VALUES ('VN0A3WCVAZ3100S' ,'Modern');
INSERT INTO T VALUES ('VN0A3WCVAZ3100M' ,'Modern');
INSERT INTO T VALUES ('VN0A3TE3RCO113000M' ,'Not Classic');
INSERT INTO T VALUES ('VN0A3TE3RCO112000M' ,'Not Classic');
INSERT INTO T VALUES ('VN0A3TE3RCO111500M' ,'Not Classic');
Query 1:
SELECT * FROM (
select SkuId,Description
from T
UNION ALL
SELECT distinct substring(SkuId,1,12) ,'New'
FROM T
) t1
order by SkuId desc
Results:
| SkuId | Description |
|--------------------|-------------|
| VN0A46ZERWV113000M | CLASSIC |
| VN0A46ZERWV112000M | CLASSIC |
| VN0A46ZERWV111500M | CLASSIC |
| VN0A46ZERWV1 | New |
| VN0A3WCVAZ31XXL | Modern |
| VN0A3WCVAZ310XL | Modern |
| VN0A3WCVAZ3100S | Modern |
| VN0A3WCVAZ3100M | Modern |
| VN0A3WCVAZ31 | New |
| VN0A3TE3RCO113000M | Not Classic |
| VN0A3TE3RCO112000M | Not Classic |
| VN0A3TE3RCO111500M | Not Classic |
| VN0A3TE3RCO1 | New |
I think the additional rows you want are:
select skuid, 'NEW'
from (select distinct left(skuid, 12) as skuid, description
from skus
) t;
For your data and probably for your problem, this will probably do:
select distinct left(skuid, 12) as skuid, 'New'
from skus;
If you specifically want to exclude "names" that have different descriptions:
select left(skuid, 12) as skuid, 'New'
from skus
group by left(skuid, 12)
having min(description) = max(description);
You can add these into the table using insert:
insert into skus (skuid, description)
select distinct left(skuid, 12) as skuid, 'New'
from skus;
If you just want a result set, then use union and the correct order by:
select skuid, description
from ((select skuid, description, 1 as priority
from skus
) union all
(select distinct left(skuid, 12) as skuid, 'New', 2
from skus
)
) sd
order by skuid, priority;

In SQL (PSQL), how to group by partitions of rows (how to do nested group by)?

Wording of the question needs improvement, I'm not sure how to accurately describe it.
Given a table foo, count how many languages each person can speak, grouped by format. Example:
name | format | language
------+----------+------------
abe | compiled | golang
abe | compiled | c
abe | scripted | javascript
jon | scripted | ruby
jon | scripted | javascript
wut | spoken | english
(6 rows)
Result:
name | format | count
------+----------+------------
abe | compiled | 2
abe | scripted | 1
jon | scripted | 2
wut | spoken | 1
Example data can be created using:
create table foo
(
name varchar(40) not null,
format varchar(40) not null,
language varchar(40) not null
);
insert into foo
values
( 'abe', 'compiled', 'golang' ),
( 'abe', 'compiled', 'c' ),
( 'abe', 'scripted', 'javascript' ),
( 'jon', 'scripted', 'ruby' ),
( 'jon', 'scripted', 'javascript' ),
( 'wut', 'spoken', 'english' )
;
I've tried using windowing functions count(*) over (partition by format) but it doesn't squash rows, and it would require a nested window by name, and then by format, whereas count(*) ... group by name used on its own would squash the result into one row per name.
Use group by clause :
select name, format, count(*)
from foo
group by name, format;
However, if you want to go with window function then you can also do that :
select distinct name, format,
count(*) over (partition by name, format)
from foo f;

Write SQL script to insert data

In a database that contains many tables, I need to write a SQL script to insert data if it is not exist.
Table currency
| id | Code | lastupdate | rate |
+--------+---------+------------+-----------+
| 1 | USD | 05-11-2012 | 2 |
| 2 | EUR | 05-11-2012 | 3 |
Table client
| id | name | createdate | currencyId|
+--------+---------+------------+-----------+
| 4 | tony | 11-24-2010 | 1 |
| 5 | john | 09-14-2010 | 2 |
Table: account
| id | number | createdate | clientId |
+--------+---------+------------+-----------+
| 7 | 1234 | 12-24-2010 | 4 |
| 8 | 5648 | 12-14-2010 | 5 |
I need to insert to:
currency (id=3, Code=JPY, lastupdate=today, rate=4)
client (id=6, name=Joe, createdate=today, currencyId=Currency with Code 'USD')
account (id=9, number=0910, createdate=today, clientId=Client with name 'Joe')
Problem:
script must check if row exists or not before inserting new data
script must allow us to add a foreign key to the new row where this foreign related to a row already found in database (as currencyId in client table)
script must allow us to add the current datetime to the column in the insert statement (such as createdate in client table)
script must allow us to add a foreign key to the new row where this foreign related to a row inserted in the same script (such as clientId in account table)
Note: I tried the following SQL statement but it solved only the first problem
INSERT INTO Client (id, name, createdate, currencyId)
SELECT 6, 'Joe', '05-11-2012', 1
WHERE not exists (SELECT * FROM Client where id=6);
this query runs without any error but as you can see I wrote createdate and currencyid manually, I need to take currency id from a select statement with where clause (I tried to substitute 1 by select statement but query failed).
This is an example about what I need, in my database, I need this script to insert more than 30 rows in more than 10 tables.
any help
You wrote
I tried to substitute 1 by select statement but query failed
But I wonder why did it fail? What did you try? This should work:
INSERT INTO Client (id, name, createdate, currencyId)
SELECT
6,
'Joe',
current_date,
(select c.id from currency as c where c.code = 'USD') as currencyId
WHERE not exists (SELECT * FROM Client where id=6);
It looks like you can work out if the data exists.
Here is a quick bit of code written in SQL Server / Sybase that I think answers you basic questions:
create table currency(
id numeric(16,0) identity primary key,
code varchar(3) not null,
lastupdated datetime not null,
rate smallint
);
create table client(
id numeric(16,0) identity primary key,
createddate datetime not null,
currencyid numeric(16,0) foreign key references currency(id)
);
insert into currency (code, lastupdated, rate)
values('EUR',GETDATE(),3)
--inserts the date and last allocated identity into client
insert into client(createddate, currencyid)
values(GETDATE(), ##IDENTITY)
go