I have the following table called table_persons in Hive:
+--------+------+------------+
| people | type | date |
+--------+------+------------+
| lisa | bot | 19-04-2022 |
| wayne | per | 19-04-2022 |
+--------+------+------------+
If type is "bot", I have to add two rows in the table d1_info else if type is "per" i only have to add one row so the result is the following:
+---------+------+------------+
| db_type | info | date |
+---------+------+------------+
| x_bot | x | 19-04-2022 |
| x_bnt | x | 19-04-2022 |
| x_per | b | 19-04-2022 |
+---------+------+------------+
How can I add two rows if this condition is met?
with a Case When maybe?
You may try using a union to merge or duplicate the rows with bot. The following eg unions the first query which selects all records and the second query selects only those with bot.
Edit
In response to the edited question, I have added an additional parity column (storing 1 or 0) named original to differentiate the duplicate entry named
SELECT
p1.*,
1 as original
FROM
table_persons p1
UNION ALL
SELECT
p1.*,
0 as original
FROM
table_persons p1
WHERE p1.type='bot'
You may then insert this into your other table d1_info using the above query as a subquery or CTE with the desired transformations CASE expressions eg
INSERT INTO d1_info
(`db_type`, `info`, `date`)
WITH merged_data AS (
SELECT
p1.*,
1 as original
FROM
table_persons p1
UNION ALL
SELECT
p1.*,
0 as original
FROM
table_persons p1
WHERE p1.type='bot'
)
SELECT
CONCAT('x_',CASE
WHEN m1.type='per' THEN m1.type
WHEN m1.original=1 AND m1.type='bot' THEN m1.type
ELSE 'bnt'
END) as db_type,
CASE
WHEN m1.type='per' THEN 'b'
ELSE 'x'
END as info,
m1.date
FROM
merged_data m1
ORDER BY m1.people,m1.date;
See working demo db fiddle here
I think what you want is to create a new table that captures your logic. This would simplify your query and make it so you could easily add new types without having to edit logic of a case statement. It may also make it cleaner to view your logic later.
CREATE TABLE table_persons (
`people` VARCHAR(5),
`type` VARCHAR(3),
`date` VARCHAR(10)
);
INSERT INTO table_persons
VALUES
('lisa', 'bot', '19-04-2022'),
('wayne', 'per', '19-04-2022');
CREATE TABLE info (
`type` VARCHAR(5),
`db_type` VARCHAR(5),
`info` VARCHAR(1)
);
insert into info
values
('bot', 'x_bot', 'x'),
('bot', 'x_bnt', 'x'),
('per','x_per','b');
and then you can easily do a join:
select
info.db_type,
info.info,
persons.date date
from
table_persons persons inner join info
on
info.type = persons.type
I've got three tables:
Lessons:
CREATE TABLE lessons (
id SERIAL PRIMARY KEY,
title text NOT NULL,
description text NOT NULL,
vocab_count integer NOT NULL
);
+----+------------+------------------+-------------+
| id | title | description | vocab_count |
+----+------------+------------------+-------------+
| 1 | lesson_one | this is a lesson | 3 |
| 2 | lesson_two | another lesson | 2 |
+----+------------+------------------+-------------+
Lesson_vocabulary:
CREATE TABLE lesson_vocabulary (
lesson_id integer REFERENCES lessons(id),
vocabulary_id integer REFERENCES vocabulary(id)
);
+-----------+---------------+
| lesson_id | vocabulary_id |
+-----------+---------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 2 |
| 2 | 4 |
+-----------+---------------+
Vocabulary:
CREATE TABLE vocabulary (
id integer PRIMARY KEY,
hiragana text NOT NULL,
reading text NOT NULL,
meaning text[] NOT NULL
);
Each lesson contains multiple vocabulary, and each vocabulary can be included in multiple lessons.
How can I get the vocab_count column of the lessons table to be calculated and updated whenevr I add more rows to the lesson_vocabulary table. Is this possible, and how would I go about doing this?
Thanks
You can use SQL triggers to serve your purpose. This would be similar to mysql after insert trigger which updates another table's column.
The trigger would look somewhat like this. I am using Oracle SQL, but there would just be minor tweaks for any other implementation.
CREATE TRIGGER vocab_trigger
AFTER INSERT ON lesson_vocabulary
FOR EACH ROW
begin
for lesson_cur in (select LESSON_ID, COUNT(VOCABULARY_ID) voc_cnt from LESSON_VOCABULARY group by LESSON_ID) LOOP
update LESSONS
set VOCAB_COUNT = LESSON_CUR.VOC_CNT
where id = LESSON_CUR.LESSON_ID;
end loop;
END;
It's better to create a view that calculates that (and get rid of the column in the lessons table):
select l.*, lv.vocab_count
from lessons l
left join (
select lesson_id, count(*)
from lesson_vocabulary
group by lesson_id
) as lv(lesson_id, vocab_count) on l.id = lv.lesson_id
If you really want to update the lessons table each time the lesson_vocabulary changes, you can run an UPDATE statement like this in a trigger:
update lessons l
set vocab_count = t.cnt
from (
select lesson_id, count(*) as cnt
from lesson_vocabulary
group by lesson_id
) t
where t.lesson_id = l.id;
I would recommend using a query for this information:
select l.*,
(select count(*)
from lesson_vocabulary lv
where lv.lesson_id = l.lesson_id
) as vocabulary_cnt
from lessons l;
With an index on lesson_vocabulary(lesson_id), this should be quite fast.
I recommend this over an update, because the data remains correct.
I recommend this over a trigger, because it is simpler.
I recommend this over a subquery with aggregation because it should be faster, particularly if you are filtering on the lessons table.
Inspired by this StackOverflow question:
Find mutual element in different facts in swi-prolog
We have the following
Problem statement
Given a database of "actors starring in movies"
(starsin is the relation linking actor "bob" to movie "a" for example)
starsin(a,bob).
starsin(c,bob).
starsin(a,maria).
starsin(b,maria).
starsin(c,maria).
starsin(a,george).
starsin(b,george).
starsin(c,george).
starsin(d,george).
And given set of movies M, find those actors that starred in all the movies of M.
The question was initially for Prolog.
Prolog solution
In Prolog, an elegant solution involves the predicate
setof/3,
which collects possible variable instantiations into a set (which is really list without
duplicate values):
actors_appearing_in_movies(MovIn,ActOut) :-
setof(
Ax,
MovAx^(setof(Mx,starsin(Mx,Ax),MovAx), subset(MovIn,MovAx)),
ActOut
).
I won't go into details about this, but let's look at the test code, which is of interest here.
Here are five test cases:
actors_appearing_in_movies([],ActOut),permutation([bob, george, maria],ActOut),!.
actors_appearing_in_movies([a],ActOut),permutation([bob, george, maria],ActOut),!.
actors_appearing_in_movies([a,b],ActOut),permutation([george, maria],ActOut),!.
actors_appearing_in_movies([a,b,c],ActOut),permutation([george, maria],ActOut),!.
actors_appearing_in_movies([a,b,c,d],ActOut),permutation([george],ActOut),!.
A test is a call to the predicate actors_appearing_in_movies/2, which is given
the input list of movies (e.g. [a,b]) and which captures the resulting list of
actors in ActOut.
Subsequently, we just need to test whether ActOut is a permutation of the expected
set of actors, hence for example:
permutation([george, maria],ActOut)`
"Is ActOut a list that is a permutation of the list [george,maria]?.
If that call succeeds (think, doesn't return with false), the test passes.
The terminal ! is the cut operator and is used to tell the Prolog engine to not
reattempt to find more solutions, because we are good at that point.
Note that for the empty set of movies, we get all the actors. This is arguably correct:
every actors stars in all the movies of the empty set (Vacuous Truth).
Now in SQL.
This problem is squarely in the domain of relational algebra, and there is SQL, so let's have
a go at this. Here, i'm using MySQL.
First, set up the facts.
DROP TABLE IF EXISTS starsin;
CREATE TABLE starsin (movie CHAR(20) NOT NULL, actor CHAR(20) NOT NULL);
INSERT INTO starsin VALUES
( "a" , "bob" ),
( "c" , "bob" ),
( "a" , "maria" ),
( "b" , "maria" ),
( "c" , "maria" ),
( "a" , "george" ),
( "b" , "george" ),
( "c" , "george" ),
( "d", "george" );
Regarding the set of movies given as input, giving them in the form of a
(temporary) table sounds natural. In MySQL, "temporary tables" are local to the session. Good.
DROP TABLE IF EXISTS movies_in;
CREATE TEMPORARY TABLE movies_in (movie CHAR(20) NOT NULL);
INSERT INTO movies_in VALUES ("a"), ("b");
Approach:
The results can now be obtained by getting, for each actor, the intersection of the set of
movies denoted by movies_in and the set of movies in which an actor ever appeared
(created for each actor via the inner join), then counting (for each actor) whether the
resulting set has at least as many entries as the set movies_in.
Wrap the query into a procedure for practical reasons.
A delimiter is useful here:
DELIMITER $$
DROP PROCEDURE IF EXISTS actors_appearing_in_movies;
CREATE PROCEDURE actors_appearing_in_movies()
BEGIN
SELECT
d.actor
FROM
starsin d, movies_in q
WHERE
d.movie = q.movie
GROUP BY
actor
HAVING
COUNT(*) >= (SELECT COUNT(*) FROM movies_in);
END$$
DELIMITER ;
Run it!
Problem A appears:
Is there a better way than edit + copy-paste table creation code,
issue a CALL and check the results "by hand"?
DROP TABLE IF EXISTS movies_in;
CREATE TEMPORARY TABLE movies_in (movie CHAR(20) NOT NULL);
CALL actors_appearing_in_movies();
Empty set!
Problem B appears:
The above is not desired, I want "all actors", same as for the Prolog solution.
As I do not want to tack a weird edge case exception onto the code, my approach must
be wrong. Is there one which naturally covers this case but doesn't become too complex?
T-SQL and PostgreSQL one-liners are fine too!
The other test cases yield expected data:
DROP TABLE IF EXISTS movies_in;
CREATE TEMPORARY TABLE movies_in (movie CHAR(20) NOT NULL);
INSERT INTO movies_in VALUES ("a"), ("b");
CALL actors_appearing_in_movies();
+--------+
| actor |
+--------+
| george |
| maria |
+--------+
DROP TABLE IF EXISTS movies_in;
CREATE TEMPORARY TABLE movies_in (movie CHAR(20) NOT NULL);
INSERT INTO movies_in VALUES ("a"), ("b"), ("c");
CALL actors_appearing_in_movies();
+--------+
| actor |
+--------+
| george |
| maria |
+--------+
DROP TABLE IF EXISTS movies_in;
CREATE TEMPORARY TABLE movies_in (movie CHAR(20) NOT NULL);
INSERT INTO movies_in VALUES ("a"), ("b"), ("c"), ("d");
CALL actors_appearing_in_movies();
+--------+
| actor |
+--------+
| george |
+--------+
And given set of movies M, find those actors that starred in all the movies of M.
I would use:
select si.actor
from starsin si
where si.movie in (<M>)
group by si.actor
having count(*) = <n>;
If you have to deal with an empty set, then you need a left join:
select a.actor
from actors a left join
starsin si
on a.actor = si.actor and si.movie in (<M>)
group by a.actor
having count(si.movie) = <n>;
<n> here is the number of movies in <M>.
Update: The second approach in extended form
create or replace temporary table
actor (actor char(20) primary key)
as select distinct actor from starsin;
select
a.actor,
si.actor,si.movie -- left in for docu
from
actor a left join starsin si
on a.actor = si.actor
and si.movie in (select * from movies_in)
group
by a.actor
having
count(si.movie) = (select count(*) from movies_in);
Then for empty movies_in:
+--------+-------+-------+
| actor | actor | movie |
+--------+-------+-------+
| bob | NULL | NULL |
| george | NULL | NULL |
| maria | NULL | NULL |
+--------+-------+-------+
and for this movies_in for example:
+-------+
| movie |
+-------+
| a |
| b |
+-------+
movie here is the top of the group:
+--------+--------+-------+
| actor | actor | movie |
+--------+--------+-------+
| george | george | a |
| maria | maria | a |
+--------+--------+-------+
The following solution involves counting and an UPDATE
Writeup here: A Simple Relational Database Operation
We are using MariaDB/MySQL SQL.
T-SQL or PL/SQL are more complete.
Manual page for CREATE TABLE
Manual page for CREATE PROCEDURE
Manual page for data types in MariaDB
Note that SQL has no vector data types that can be passed to procedures. Gotta work without that.
Enter facts as table:
CREATE OR REPLACE TABLE starsin
(movie CHAR(20) NOT NULL, actor CHAR(20) NOT NULL,
PRIMARY KEY (movie, actor));
INSERT INTO starsin VALUES
( "a" , "bob" ),
( "c" , "bob" ),
( "a" , "maria" ),
( "b" , "maria" ),
( "c" , "maria" ),
( "a" , "george" ),
( "b" , "george" ),
( "c" , "george" ),
( "d", "george" );
Enter a procedure to compute solution and actually ... print it out.
DELIMITER $$
CREATE OR REPLACE PROCEDURE actors_appearing_in_movies()
BEGIN
-- collect all the actors
CREATE OR REPLACE TEMPORARY TABLE tmp_actor (actor CHAR(20) PRIMARY KEY)
AS SELECT DISTINCT actor from starsin;
-- table of "all actors x (input movies + '--' placeholder)"
-- (combinations that are needed for an actor to show up in the result)
-- and a flag indicating whether that combination shows up for real
CREATE OR REPLACE TEMPORARY TABLE tmp_needed
(actor CHAR(20),
movie CHAR(20),
actual TINYINT NOT NULL DEFAULT 0,
PRIMARY KEY (actor, movie))
AS
(SELECT ta.actor, mi.movie FROM tmp_actor ta, movies_in mi)
UNION
(SELECT ta.actor, "--" FROM tmp_actor ta);
-- SELECT * FROM tmp_needed;
-- Mark those (actor, movie) combinations which actually exist
-- with a numeric 1
UPDATE tmp_needed tn SET actual = 1 WHERE EXISTS
(SELECT * FROM starsin si WHERE
si.actor = tn.actor AND si.movie = tn.movie);
-- SELECT * FROM tmp_needed;
-- The result is the set of actors in "tmp_needed" which have as many
-- entries flagged "actual" as there are entries in "movies_in"
SELECT actor FROM tmp_needed GROUP BY actor
HAVING SUM(actual) = (SELECT COUNT(*) FROM movies_in);
END$$
DELIMITER ;
Testing
There is no ready-to-use unit testing framework for MariaDB, so we
"test by hand" and write a procedure, the out of which we check manually.
Variadic arguments don't exist, vector data types don't exist.
Let's accept up to 4 movies as input and check the result manually.
DELIMITER $$
CREATE OR REPLACE PROCEDURE
test_movies(IN m1 CHAR(20),IN m2 CHAR(20),IN m3 CHAR(20),IN m4 CHAR(20))
BEGIN
CREATE OR REPLACE TEMPORARY TABLE movies_in (movie CHAR(20) PRIMARY KEY);
CREATE OR REPLACE TEMPORARY TABLE args (movie CHAR(20));
INSERT INTO args VALUES (m1),(m2),(m3),(m4); -- contains duplicates and NULLs
INSERT INTO movies_in (SELECT DISTINCT movie FROM args WHERE movie IS NOT NULL); -- clean
DROP TABLE args;
CALL actors_appearing_in_movies();
END$$
DELIMITER ;
The above passes all the manual tests, in particular:
CALL test_movies(NULL,NULL,NULL,NULL);
+--------+
| actor |
+--------+
| bob |
| george |
| maria |
+--------+
3 rows in set (0.003 sec)
For example, for CALL test_movies("a","b",NULL,NULL);
First set up the table with all actors against in all the movies in the input set, including the
"doesn't exist" movie represented by a placeholder --.
+--------+--------+-------+
| actual | actor | movie |
+--------+--------+-------+
| 0 | bob | -- |
| 0 | bob | a |
| 0 | bob | b |
| 0 | george | -- |
| 0 | george | a |
| 0 | george | b |
| 0 | maria | -- |
| 0 | maria | a |
| 0 | maria | b |
+--------+--------+-------+
Then mark those rows with a 1 where the actor-movie combination actually exists in starsin.
+--------+--------+-------+
| actual | actor | movie |
+--------+--------+-------+
| 0 | bob | -- |
| 1 | bob | a |
| 0 | bob | b |
| 0 | george | -- |
| 1 | george | a |
| 1 | george | b |
| 0 | maria | -- |
| 1 | maria | a |
| 1 | maria | b |
+--------+--------+-------+
Finally select an actor for inclusion in the solution if the SUM(actual) is equal to the
number of entries in the input movies table (it cannot be larger), as that means that the
actor indeed appears in all movies of the input movies table. In the special case where that
table is empty, the actor-movie combination table will only contain
+--------+--------+-------+
| actual | actor | movie |
+--------+--------+-------+
| 0 | bob | -- |
| 0 | george | -- |
| 0 | maria | -- |
+--------+--------+-------+
and thus all actors will be selected, which is what we want.
I have a table:
id integer
status_id integer
children integer[]
How do I write a query to join the table to itself on children and find all records where (all children items have status_id 2) & the item has status of 1?
In addition should children be indexed and with what index?
Edit: Based on krokodilko answer I think the query may be:
SELECT id
FROM (
SELECT p.id, p.status_id, every(c.status_id = 2) AS all_children_status_2
FROM positions p
JOIN positions c
ON c.id = ANY (p.children)
GROUP BY p.id, p.status_id
) sub
WHERE all_children_status_2 IS TRUE AND status_id = 1;
Edit 2:
Please note I have found in my reading that the array columns should use a GIN or GIST index. However unfortunately postgres does not use these indexes when using ANY. These mean that while the above query works it is very slow.
Use ANY operator:
Demo: http://www.sqlfiddle.com/#!17/2540d/1
CREATE TABLE parent(
id int,
child int[]
);
INSERT INTO parent VALUES(1, '{2,4,5}');
CREATE TABLE child(
id int
);
INSERT INTO child VALUES (1),(2),(3),(4),(5);
SELECT *
FROM parent p
JOIN child c
ON c.id = any( p.child );
| id | child | id |
|----|-------|----|
| 1 | 2,4,5 | 2 |
| 1 | 2,4,5 | 4 |
| 1 | 2,4,5 | 5 |
In addition should children be indexed and with what index?
Yes - if children table is huge (more than a few hundreds/thousands rows).
CREATE INDEX some_name ON child( id )
I have a table called Index which has the columns id and value, where id is an auto-increment bigint and value is a varchar with an english word.
I have a table called Search which has relationships to the table Index. For each search you can define which indexes it should search in a table called Article.
The table Article also has relationships to the table Index.
The tables which define the relationships are:
Searches_Indexes with columns id_search and id_index.
Articles_Indexes with columns id_article and id_index.
I would like to find all Articles that contain the same indexes of Search.
For example: I have a Search with indexes laptop and dell, I would like to retrieve all Articles which contain both indexes, not just one.
So far I have this:
SELECT ai.id_article
FROM articles_indexes AS ai
INNER JOIN searches_indexes AS si
ON si.id_index = ai.id_index
WHERE si.id_search = 1
How do I make my SQL only return the Articles with all the Indexes of a Search?
Edit:
Sample Data:
Article:
id | name | description | ...
1 | 'Dell Laptop' | 'New Dell Laptop...' | ...
2 | 'HP Laptop' | 'Unused HP Laptop...' | ...
...
Search:
id | name | id_user | ...
1 | 'Dell Laptop Search' | 5 | ...
Index:
id | value
1 | 'dell'
2 | 'laptop'
3 | 'hp'
4 | 'new'
5 | 'unused'
...
Articles_Indexes:
Article with id 1 (the dell laptop) has the Indexes 'dell', 'laptop', 'new'.
Article with id 2 (the hp laptop) has the Indexes 'laptop', 'hp', 'unused'.
id_article | id_index
1 | 1
1 | 2
1 | 4
...
2 | 2
2 | 3
2 | 5
...
Searches_Indexes:
Search with id 1 only contains 2 Indexes, 'dell' and 'laptop':
id_search | id_index
1 | 1
1 | 2
Required output:
id_article
1
If I understand correctly, you want aggregation and a HAVING clause. Assuming there are no duplicate entries in the indexes tables:
SELECT ai.id_article
FROM articles_indexes ai INNER JOIN
searches_indexes si
ON si.id_index = ai.id_index
WHERE si.id_search = 1
GROUP BY ai.id_article
HAVING COUNT(*) = (SELECT COUNT(*) FROM searches_indexes si2 WHERE si2.id_search = 1);
This counts the number of matches and makes sure it matches the number you are looking for.
I should add this. If you wanted to look for all searches at the same time, I'd be inclined to write this as:
SELECT si.id_search, ai.id_article
FROM articles_indexes ai INNER JOIN
(SELECT si.*, COUNT(*) OVER (PARTITION BY si.id_index) as cnt
FROM searches_indexes si
) si
ON si.id_index = ai.id_index
GROUP BY si.id_search, ai.id_article, si.cnt
HAVING COUNT(*) = si.cnt;
You can compare arrays. Here is some example:
create table article_index(id_article int, id_index int);
create table search_index(id_search int, id_index int);
insert into article_index
select generate_series(1,2), generate_series(1,10);
insert into search_index
select generate_series(1,2), generate_series(1,4);
select
id_article
from article_index
group by id_article
having array_agg(id_index) #> (select array_agg(id_index) from search_index where id_search = 2);
Learn more about arrays in postgres.