Performing a complex self-referential query using the Django ORM - sql

I have the following model:
class Message(Model):
url = URLField("URL")
email = EmailField("E-Mail")
contacted = BooleanField("Contacted", default=False)
With example data like:
| url | email | contacted |
+-----+-----------------+-----------+
| foo | foo#example.com | N |
| bar | bar#example.com | N |
| baz | foo#example.com | Y |
I would like to select all distinct rows (by e-mail address) whose e-mail addresses have never been contacted. With this example data, the bar#example.com row would be the only one returned.

This will return the records you want:
not_contacted = Message.objects.exclude(
email__in=Message.objects.filter(contacted=True).values('email')
)
This has the advantage of only running one query. Your query will look something like this:
SELECT
messages_message.id, messages_message.url, messages_message.email, messages_message.contacted
FROM
Messages
WHERE NOT
(messages_message.email IN
( SELECT U0.email from messages_message U0 WHERE U0.contacted = True )
)
Note that for many, many records this query may not be optimal, but it will probably work for most uses.

DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE massage
( zurl varchar NOT NULL
, zemail varchar NOT NULL
, contacted boolean
);
INSERT into massage(zurl, zemail, contacted) VALUES
( 'foo', 'foo#example.com', False)
,( 'bar', 'bar#example.com', False)
,( 'baz', 'foo#example.com', True)
;
SELECT
DISTINCT zemail AS zemail
, MIN(zurl) AS zurl
FROM massage m
WHERE NOT EXISTS (
SELECT *
FROM massage nx
WHERE nx.zemail = m.zemail
AND nx.contacted = True
)
GROUP BY zemail;
If there are multiple records for a given email address, the above one picks the one with the "lowest" URL. If you want them all, the query would be even simpler:
SELECT m.zurl, m.zemail
FROM massage m
WHERE NOT EXISTS (
SELECT *
FROM massage nx
WHERE nx.zemail = m.zemail
AND nx.contacted = True
) ;

Related

Get Ids from constant list for which there are no rows in corresponding table

Let say I have a table Vehicles(Id, Name) with below values:
1 Car
2 Bike
3 Bus
and a constant list of Ids:
1, 2, 3, 4, 5
I want to write a query returning Ids from above list for which there are no rows in Vehicles table. In the above example it should return:
4, 5
But when I add new row to Vehicles table:
4 Plane
It should return only:
5
And similarly, when from the first version of Vehicle table I remove the third row (3, Bus) my query should return:
3, 4, 5
I tried with exist operator but it doesn't provide me correct results:
select top v.Id from Vehicle v where Not Exists ( select v2.Id from Vehicle v2 where v.id = v2.id and v2.id in ( 1, 2, 3, 4, 5 ))
You need to treat your "list" as a dataset, and then use the EXISTS:
SELECT V.I
FROM (VALUES(1),(2),(3),(4),(5))V(I) --Presumably this would be a table (type parameter),
--or a delimited string split into rows
WHERE NOT EXISTS (SELECT 1
FROM dbo.YourTable YT
WHERE YT.YourColumn = V.I);
Please try the following solution.
It is using EXCEPT set operator.
Set Operators - EXCEPT and INTERSECT (Transact-SQL)
SQL
-- DDL and sample data population, start
DECLARE #Vehicles TABLE (ID INT PRIMARY KEY, vehicleType VARCHAR(30));
INSERT INTO #Vehicles (ID, vehicleType) VALUES
(1, 'Car'),
(2, 'Bike'),
(3, 'Bus');
-- DDL and sample data population, end
DECLARE #vehicleList VARCHAR(20) = '1, 2, 3, 4, 5'
, #separator CHAR(1) = ',';
SELECT TRIM(value) AS missingID
FROM STRING_SPLIT(#vehicleList, #separator)
EXCEPT
SELECT ID FROM #Vehicles;
Output
+-----------+
| missingID |
+-----------+
| 4 |
| 5 |
+-----------+
In SQL we store our values in tables. We therefore store your list in a table.
It is then simple to work with it and we can easily find the information wanted.
I fully agree that it is possible to use other functions to solve the problem. It is more intelligent to implement database design to use basic SQL. It will run faster, be easier to maintain and will scale for a table of a million rows without any problems. When we add the 4th mode of transport we don't have to modify anything else.
CREATE TABLE vehicules(
id int, name varchar(25));
INSERT INTO vehicules VALUES
(1 ,'Car'),
(2 ,'Bike'),
(3 ,'Bus');
CREATE TABLE ids (iid int)
INSERT INTO ids VALUES
(1),(2),(3),(4),(5);
CREATE VIEW unknownIds AS
SELECT iid unknown_id FROM ids
LEFT JOIN vehicules
ON iid = id
WHERE id IS NULL;
SELECT * FROM unknownIds;
| unknown_id |
| ---------: |
| 4 |
| 5 |
INSERT INTO vehicules VALUES (4,'Plane')
SELECT * FROM unknownIds;
| unknown_id |
| ---------: |
| 5 |
db<>fiddle here

SQL get top level object from joins

Working on a query right now where we want to understand which business is referring the most downstream orders for us. I've put together a very basic table for demonstration purposes here with 4 businesses listed. Bar and Donut were both ultimately referred by Foo and I want to be able to show Foo as a business has generated X number of orders. Obviously getting the the single referral for Foo (from Bar) and Bar (from Donut) are simple joins. But how do you go from Bar to get back to Foo?
I'll add that I've done some more googling this AM and found a few very similar questions about the top level parent and most of the responses suggest recursive CTE. It's been awhile since I've dug deep into SQL stuff, but 8 years ago I know these were not overly popular. Is there another way around this? Perhaps better to just store that parent ID on the order table at the time of order?
+----+--------+--------------------+
| Id | Name | ReferralBusinessId |
+----+--------+--------------------+
| 1 | Foo | |
| 2 | Bar | 1 |
| 3 | Donut | 2 |
| 4 | Coffee | |
+----+--------+--------------------+
WITH RECURSIVE entity_hierarchy AS (
SELECT id, name, parent FROM entities WHERE name = 'Donut'
UNION
SELECT e.id, e.name, e.parent FROM entities e INNER JOIN entity_hierarchy eh on e.id = eh.parent
)
SELECT id, name, parent FROM entity_hierarchy;
SQL Fiddle Example
Assuming you're using SQL Server, you could use a query like the one below to generate a hierarchical Id path for a particular business.
declare #tbl as table (Id int, Name varchar(30), ReferralBusinessId int)
insert into #tbl (id, Name, ReferralBusinessId) values
(1, 'Foo', null),
(2, 'Bar', 1),
(3, 'Donut', 2),
(4, 'Coffee', null);
;WITH business AS (
SELECT Id, Name, ReferralBusinessId
, 0 AS Level
, CAST(Id AS VARCHAR(255)) AS Path
FROM #tbl
UNION ALL
SELECT R.Id, R.Name, R.ReferralBusinessId
, Level + 1
, CAST(Path + '.' + CAST(R.Id AS VARCHAR(255)) AS VARCHAR(255))
FROM #tbl R
INNER JOIN business b ON b.Id = R.ReferralBusinessId
)
SELECT * FROM business ORDER BY Path

How can join table with IN() in ON couse?

I have two table
User
id | name | category
1 | test | [2,4]
Category
id | name
1 | first
2 | second
3 | third
4 | fourth
now i need to join this both table and get data like:
name | category
test | second, fourth
i tried like:
select u.name as name, c.name as category
from user
INNER JOIN category on(c.id in (u.category))
but it's not working.
As others have suggested, if you have any control whatsoever over the design of this database, don't store multiple values in user.category, but instead have a bridging table between the two which maps one or more category values to each user record.
However, if you are not in a position to be able to redesign the database, here's a way to get the result you're looking for. First, let's create some test data:
create table [user]
(
id int,
[name] varchar(50),
category varchar(50) -- I'm assuming this is a string type
)
create table category
(
id int,
[name] varchar(50)
)
insert into [user] values
(1,'test','[2,4]'),
(2,'another test','[1,2,4]'),
(3,'more test','[1,3,2,4]')
insert into category values
(1,'first'),
(2,'second'),
(3,'third'),
(4,'fourth');
Then you can use a CTE with split_string to pull apart the individual category values, join them to their names, then recombine them into a single comma-separated value with for xml:
with r as
(
select
u.[name] as username,
cat.id,
cat.[name] as categoryname
from [user] u
outer apply
(
select value from string_split(substring(u.category,2,len(u.category)-2),',')
) c
left join category cat on c.value = cat.id
)
select
r.username,
stuff(
(select ',' + categoryname
from r r2
where r.username = r2.username
order by r2.id
for xml path ('')), 1, 1, '') as categories
from r
group by r.username
which gives the desired output:
/-----------------------------------------\
| username | categories |
|-------------|---------------------------|
|another test | first,second,fourth |
|more test | first,second,third,fourth |
|test | second,fourth |
\-----------------------------------------/
I'm making a couple of assumptions here:
You're using MS SQL Server
The category values always begin with [, end with ] and contain nothing but a comma-delimited string containing value category ids

Convert an array into a Map

I have a table with a column like
[{"key":"e","value":["253","203","204"]},{"key":"st","value":["mi"]},{"key":"k2","value":["1","2"]}]
Which is of the format array<struct<key:string,value:array<string>>>
I want to convert the column into below format :
{"e":["253","203","204"],"st":["mi"],"k2":["1","2"]}
which is of the type map<string,array<string>>
I have tried exploding the array but that does not work. Any ideas how I can do this in hive.
Without use of external libraries it's impossible. Please refer to brickhouse or create your own UDAF.
Note: further code provides snippets to reproduce the problem and solving the problem that Hive's built-in functions can solve. i.e map<string,string> not map<string, array<string>>.
-- reproducing the problem
CREATE TABLE test_table(id INT, input ARRAY<STRUCT<key:STRING,value:ARRAY<STRING>>>);
INSERT INTO TABLE test_table
SELECT
1 AS id,
ARRAY(
named_struct("key","e", "value", ARRAY("253","203","204")),
named_struct("key","st", "value", ARRAY("mi")),
named_struct("key","k2", "value", ARRAY("1", "2"))
) AS input;
SELECT id, input FROM test_table;
+-----+-------------------------------------------------------------------------------------------------------+--+
| id | input |
+-----+-------------------------------------------------------------------------------------------------------+--+
| 1 | [{"key":"e","value":["253","203","204"]},{"key":"st","value":["mi"]},{"key":"k2","value":["1","2"]}] |
+-----+-------------------------------------------------------------------------------------------------------+--+
With exploding and using STRUCT features, we can split the keys and values.
SELECT id, exploded_input.key, exploded_input.value
FROM (
SELECT id, exploded_input
FROM test_table LATERAL VIEW explode(input) d AS exploded_input
) x;
+-----+------+----------------------+--+
| id | key | value |
+-----+------+----------------------+--+
| 1 | e | ["253","203","204"] |
| 1 | st | ["mi"] |
| 1 | k2 | ["1","2"] |
+-----+------+----------------------+--+
The idea is to use your UDAF to "collect" a map while aggregating on id.
What Hive can solve with built in function is generating map<string,string> by converting rows to strings with a special delimiter, aggregate rows via another special delimiter and use str_to_map built-in function on the delimiters to generate map<string, string>.
SELECT
id,
str_to_map(
-- outputs: e:253,203,204#st:mi#k2:1,2 with delimiters between aggregated rows
concat_ws('#', collect_list(list_to_string)),
'#', -- first delimiter
':' -- second delimiter
) mapped_output
FROM (
SELECT
id,
-- outputs 3 rows: (e:253,203,203), (st:mi), (k2:1,2)
CONCAT(exploded_input.key,':' , CONCAT_WS(',', exploded_input.value)) as list_to_string
FROM (
SELECT id, exploded_input
FROM test_table LATERAL VIEW explode(input) d AS exploded_input
) x
) y
GROUP BY id;
Which outputs a string to string map like:
+-----+-------------------------------------------+--+
| id | mapped_output |
+-----+-------------------------------------------+--+
| 1 | {"e":"253,203,204","st":"mi","k2":"1,2"} |
+-----+-------------------------------------------+--+
with input_set as (
select array(named_struct('key','e','value',array('253','203','204')),named_struct('key','st','value',array('mi')),named_struct('key','k2','value',array('1','2'))) as input_array
), break_input_set as (
select y.col_num as y_col_num,y.col_value as y_col_value from input_set lateral view posexplode(input_set.input_array) y as col_num, col_value
), create_map as (
select map(y_col_value.key,y_col_value.value) as final_map from break_input_set
)
select * from create_map;
var Array = [{"key":"e","value":["253","203","204"]},{"key":"st","value":["mi"]},{"key":"k2","value":["1","2"]}];
var obj = {}
for(var i=0;i<Array.length;i++){
obj[Array[i].key] = Array[i].value
}
obj will be in the required format

PostgreSQL query on text array value

I have a table where one column has an array - but stored in a text format:
mytable
id ids
-- -------
1 '[3,4]'
2 '[3,5]'
3 '[3]'
etc ...
I want to find all records that have the value 5 as an array element in the ids column.
I was trying to achieve this by using the "string to array" function and removing the [ symbols with the translate function, but couldn't find a way.
You can do this: http://www.sqlfiddle.com/#!1/5c148/12
select *
from tbl
where translate(ids, '[]','{}')::int[] && array[5];
Output:
| ID | IDS |
--------------
| 2 | [3,5] |
You can also use bool_or: http://www.sqlfiddle.com/#!1/5c148/11
with a as
(
select id, unnest(translate(ids, '[]','{}')::int[]) as elem
from tbl
)
select id
from a
group by id
having bool_or(elem = 5);
To see the original elements:
with a as
(
select id, unnest(translate(ids, '[]','{}')::int[]) as elem
from tbl
)
select id, '[' || array_to_string(array_agg(elem), ',') || ']' as ids
from a
group by id
having bool_or(elem = 5);
Output:
| ID | IDS |
--------------
| 2 | [3,5] |
Postgresql DDL is atomic, if it's not late yet in your project, just structure your stringly-typed array to a real array: http://www.sqlfiddle.com/#!1/6e18c/2
alter table tbl
add column id_array int[];
update tbl set id_array = translate(ids,'[]','{}')::int[];
alter table tbl drop column ids;
Query:
select *
from tbl
where id_array && array[5]
Output:
| ID | ID_ARRAY |
-----------------
| 2 | 3,5 |
You can also use contains operator: http://www.sqlfiddle.com/#!1/6e18c/6
select *
from tbl
where id_array #> array[5];
I prefer the && syntax though, it directly connotes intersection. It reflects that you are detecting if there's an intersection between two sets(array is a set)
http://www.postgresql.org/docs/8.2/static/functions-array.html
If you store the string representation of your arrays slightly differently, you can cast to array of integer directly:
INSERT INTO mytable
VALUES
(1, '{3,4}')
,(2, '{3,5}')
,(3, '{3}');
SELECT id, ids::int[]
FROM mytable;
Else, you have to put in one more step:
SELECT (translate(ids, '[]','{}'))::int[]
FROM mytable
I would consider making the column an array type to begin with.
Either way, you can find your row like this:
SELECT id, ids
FROM (
SELECT id, ids, unnest(ids::int[]) AS elem
FROM mytable
) x
WHERE elem = 5