Filter 2-dimensional array - sql

I have this array:
1:0, 2:0, 3:0, 4:0, 5:0, 6:0, 7:0, 8:0, 9:0, 10:0,11:0,12:0,13:0,14:0,15:0,16:0
17:0,18:0,19:0,20:0,21:0,22:0,23:0,24:0,25:0,26:0,27:0,28:0,29:0,30:0,31:0,32:0,
49:0,33:0,34:0,35:0,36:0,37:0,38:0,39:0,40:0,41:0,42:0,43:0,44:0,45:0,46:0,47:0,
48:0,50:0,51:0,52:0,53:0,54:0,55:0,56:0,57:0,58:0,59:0,60:0,61:0,62:0,63:9,64:0,
65:0,66:0,67:0,68:0,69:0,70:0,71:0,72:0,73:0,74:0,75:0,76:0,77:0,78:0,79:0,80:0,
81:0,82:0,83:0,84:0,85:0,86:0,87:0,88:0,89:0,90:0,91:0,92:0,93:0,94:0,95:0,96:0,
97:0,98:0,99:0,100:0
I want to filter all entry like *:0 so that I only get this result:
63:9
I think I have to describe it better:
I have a table users with a field user_skill.
In this field is a such a string: 1:0, 2:0, 3:0, 4:3, 5:8, 6:9, 7:0, 8:0, 9:0 with this syntax: skill_id:prio,skill_id:prio,skill_id:prio,skill_id:prio,...
Now I want to to join the users table with the skills table like this:
SELECT skill_name
FROM users
inner join skills on skills.skill_id = ANY (string_to_array(regexp_replace(user_skill,':[0-9]*','','g'),',')::int[])
where user_id = 16
order by skill_name
That works well but I only want to see skill_name where the user has prio <> 0.

Proper solution
You might want to familiarize yourself with normalization and implement this as a proper n:m relation between the tables users and skills with an additional attribute prio in the user_skill table. Here is a complete recipe:
How to implement a many-to-many relationship in PostgreSQL?
Then your query can be very simple:
SELECT s.skill_name
FROM user_skill uk
JOIN skills s USING (skill_id)
WHERE uk.user_id = 16
AND uk.prio <> 0
ORDER BY s.skill_name;
It can (and should) be backed up with indices and will be faster by several orders of magnitude than what you have right now.
It will need some more space on disk.
Solution for the dark side
While being locked in this unfortunate situation you can help yourself with this query. However, this assumes at least Postgres version
SELECT s.skill_name
FROM (
SELECT split_part(us_item, ':', 1) AS skill_id
FROM (
SELECT trim(unnest(string_to_array(user_skill, ','))) AS us_item
FROM users
WHERE user_id = 16 -- enter user_id here
) x
WHERE split_part(us_item, ':', 2) <> '0'
) u
JOIN skills s USING (skill_id)
ORDER BY 1;
Demo with example:
SELECT split_part(us_item, ':', 1) AS skill_id
FROM (
SELECT trim(unnest(string_to_array(
'1:0, 2:0, 3:0, 4:0, 5:0, 6:0, 7:0, 8:0, 9:0, 10:0,11:0,12:0,13:0,14:0,15:0,16:0,'
'17:0,18:0,19:0,20:0,21:0,22:0,23:0,24:0,25:0,26:0,27:0,28:0,29:0,30:0,31:0,32:0,'
'49:0,33:0,34:0,35:0,36:0,37:0,38:0,39:0,40:0,41:0,42:0,43:0,44:0,45:0,46:0,47:0,'
'48:0,50:0,51:0,52:0,53:0,54:0,55:0,56:0,57:0,58:0,59:0,60:0,61:0,62:0,63:9,64:0,'
'65:0,66:0,67:0,68:0,69:0,70:0,71:0,72:0,73:0,74:0,75:0,76:0,77:0,78:0,79:0,80:0,'
'81:0,82:0,83:0,84:0,85:0,86:0,87:0,88:0,89:0,90:0,91:0,92:0,93:0,94:0,95:0,96:0,'
'97:0,98:0,99:0,100:0', ','))) AS item
) x
WHERE split_part(us_item, ':', 2) <> '0';
trim() deals with leading and trailing spaces, like you have in your example. But those may just be artifacts in the sloppy question.
I fixed a missing ,.
BTW, the SQL standard allows to enter string literal like I demonstrate. Weird, but sometimes useful.

Related

Count of how many times id occurs in table SQL regexp

Hi I have a redshift table of articles that has a field on it that can contain many accounts. So there is a one to many relationship between articles to accounts.
However I want to create a new view where it lists the partner id's in one column and in another column a count of how many times the partner id appears in the articles table.
I've attempted to do this using regex and created a new redshift view, but am getting weird results where it doesn't always build properly. So one day it will say a partner appears 15 times, then the next 17, then the next 15, when the partner id count hasn't actually changed.
Any help would be greatly appreciated.
SELECT partner_id,
COUNT(DISTINCT id)
FROM (SELECT id,
partner_ids,
SPLIT_PART(partner_ids,',',i) partner_id
FROM positron_articles a
LEFT JOIN util.seq_0_to_500 s
ON s.i < regexp_count (partner_ids,',') + 2
OR s.i = 1
WHERE i > 0
AND regexp_count (partner_ids,',') = 0
ORDER BY id)
GROUP BY 1;
Let's start with some of the more obvious things and see if we can start to glean other information.
Next GROUP BY 1 on your outer query needs to be GROUP BY partner_id.
Next you don't need an order by in your INNER query and the database engine will probably do a better job optimizing performance without it so remove ORDER BY id.
If you want your final results to be ordered then add an ORDER BY partner_id or similar clause after your group by of your OUTER query.
It looks like there are also problems with how you are splitting a partnerid from partnerids but I am not positive about that because I need to understand your view and the data it provides to know how that affects your record count for partnerid.
Next your LEFT JOIN statement on the util.seq_0_to_500 I am pretty sure you can drop off the s.i = 1 as the first condition will satisfy that as well because 2 is greater than 1. However your left join really acts more like an inner join because you then exclude any non matches from positron_articles that don't have a s.i > 0.
Oddly then your entire join and inner query gets kind of discarded because you only want articles that have no commas in their partnerids: regexp_count (partner_ids,',') = 0
I would suggest posting the code for your util.seq_0_to_500 and if you have a partner table let use know about that as well because you can probably get your answer a lot easier with that additional table depending on how regexp_count works. I suspect regex_count(partnerids,partnerid) exampleregex_count('12345,678',1234) will return greater than 0 at which point you have no choice but to split the delimited strings into another table before counting or building a new matching function.
If regex_count only matches exact between commas and you have a partner table your query could be as easy as this:
SELECT
p.partner_id
,COUNT(a.id) AS ArticlesAppearedIn
FROM
positron_articles a
LEFT JOIN PARTNERTABLE p
ON regexp_count(a.partnerids,p.partnerid) > 0
GROUP BY
p.partner_id
I will actually correct myself as I just thought of a way to join a partner table without regexp_count. So if you have a partner table this might work for you. If not you will need to split strings. It basically tests to see if the partnerid is the entire partnerids, at the beginning, in the middle, or at the end of partnerids. If one of those is met then the records is returned.
SELECT
p.partner_id
,COUNT(a.id) AS ArticlesAppearedIn
FROM
PARTNERTABLE p
INNER JOIN positron_articles a
ON
(
CASE
WHEN a.partnerids = CAST(p.partnerid AS VARCHAR(100)) THEN 1
WHEN a.partnerids LIKE p.partnerid + ',%' THEN 1
WHEN a.partnerids LIKE '%,' + p.partnerid + ',%' THEN 1
WHEN a.partnerids LIKE '%,' + p.partnerid THEN 1
ELSE 0
END
) = 1
GROUP BY
p.partner_id

Select first or random row in group by

I have this query using PostgreSQL 9.1 (9.2 as soon as our hosting platform upgrades):
SELECT
media_files.album,
media_files.artist,
ARRAY_AGG (media_files. ID) AS media_file_ids
FROM
media_files
INNER JOIN playlist_media_files ON media_files.id = playlist_media_files.media_file_id
WHERE
playlist_media_files.playlist_id = 1
GROUP BY
media_files.album,
media_files.artist
ORDER BY
media_files.album ASC
and it's working fine, the goal was to extract album/artist combinations and in the result set have an array of media files ids for that particular combo.
The problem is that I have another column in media files, which is artwork.
artwork is unique for each media file (even in the same album) but in the result set I need to return just the first of the set.
So, for an album that has 10 media files, I also have 10 corresponding artworks, but I would like just to return the first (or a random picked one for that collection).
Is that possible to do with only SQL/Window Functions (first_value over..)?
Yes, it's possible. First, let's tweak your query by adding alias and explicit column qualifiers so it's clear what comes from where - assuming I've guessed correctly, since I can't be sure without table definitions:
SELECT
mf.album,
mf.artist,
ARRAY_AGG (mf.id) AS media_file_ids
FROM
"media_files" mf
INNER JOIN "playlist_media_files" pmf ON mf.id = pmf.media_file_id
WHERE
pmf.playlist_id = 1
GROUP BY
mf.album,
mf.artist
ORDER BY
mf.album ASC
Now you can either use a subquery in the SELECT list or maybe use DISTINCT ON, though it looks like any solution based on DISTINCT ON will be so convoluted as not to be worth it.
What you really want is something like an pick_arbitrary_value_agg aggregate that just picks the first value it sees and throws the rest away. There is no such aggregate and it isn't really worth implementing it for the job. You could use min(artwork) or max(artwork) and you may find that this actually performs better than the later solutions.
To use a subquery, leave the ORDER BY as it is and add the following as an extra column in your SELECT list:
(SELECT mf2.artwork
FROM media_files mf2
WHERE mf2.artist = mf.artist
AND mf2.album = mf.album
LIMIT 1) AS picked_artwork
You can at a performance cost randomize the selected artwork by adding ORDER BY random() before the LIMIT 1 above.
Alternately, here's a quick and dirty way to implement selection of a random row in-line:
(array_agg(artwork))[width_bucket(random(),0,1,count(artwork)::integer)]
Since there's no sample data I can't test these modifications. Let me know if there's an issue.
"First" pick
Wouldn't it be simpler / cheaper to just use min():
SELECT m.album
,m.artist
,array_agg(m.id) AS media_file_ids
,min(m.artwork) AS artwork
FROM playlist_media_files p
JOIN media_files m ON m.id = p.media_file_id
WHERE p.playlist_id = 1
GROUP BY m.album, m.artist
ORDER BY m.album, m.artist;
Abitrary / random pick
If you are looking for a random selection, #Craig already provided a solution with truly random picks.
You could also use a CTE to avoid additional scans on the (possibly big) base table and then run two separate (cheap) subqueries on the small result set.
For arbitrary selection - not truly random, the result will depend on the physical order of rows in the table and implementation-specifics:
WITH x AS (
SELECT m.album, m.artist, m.id, m.artwork
FROM playlist_media_files p
JOIN media_files m ON m.id = p.media_file_id
)
SELECT a.album, a.artist, a.media_file_ids, b.artwork
FROM (
SELECT album, artist, array_agg(id) AS media_file_ids
FROM x
) a
JOIN (
SELECT DISTINCT ON (1,2) album, artist, artwork
FROM x
) b USING (album, artist);
For truly random results, you can add an ORDER BY .. random() like this to subquery b:
JOIN (
SELECT DISTINCT ON (1, 2) album, artist, artwork
FROM x
ORDER BY 1, 2, random()
) b USING (album, artist);

SQL query on a condition

I'm writing a query to retrieve translated content. I want it so that if there isn't a translation for the given language id, it automatically returns the translation for the default language, with Id 1.
select Translation.Title
,Translation.Summary
from Translation
where Translation.FkLanguageId = 3
-- If there is no LanguageId of 3, select the record with LanguageId of 1.
I'm working in MS SQL but I think the theory is not DBMS-specific.
Thanks in advance.
This assumes one row per Translation only, based on how you phrased the question. If you have multiple rows per FkLanguageId and I've misunderstood, please let us know and the query becomes more complex of course
select TOP 1
Translation.Title
,Translation.Summary
from
Translation
where
Translation.FkLanguageId IN (1, 3)
ORDER BY
FkLanguageId DESC
You'd use LIMIT in another RDBMS
Assuming the table contains different phrases grouped by PhraseId
WITH Trans As
(
select Translation.Title
,Translation.Summary
,ROW_NUMBER() OVER (PARTITION BY PhraseId ORDER BY FkLanguageId DESC) RN
from Translation
where Translation.FkLanguageId IN (1,3)
)
SELECT *
FROM Trans WHERE RN=1
This assumes the existance of a TranslationKey that associates one "topic" with several different translation languages:
SELECT
isnull(tX.Title, t1.Title) Title
,isnull(tX.Summary, t1.Summary) Summary
from Translation t1
left outer join Translation tX
on tx.TranslationKey = t1.Translationkey
and tx.FkLanguageId = #TargetLanguageId
where t1.FkLanguageId = 1 -- "Default
Maybe this is a dirty solution, but it can help you
if not exists(select t.Title ,t.Summary from Translation t where t.FkLanguageId = 3)
select t.Title ,t.Summary from Translation t where t.FkLanguageId = 1
else
select t.Title ,t.Summary from Translation t where t.FkLanguageId = 3
Since your reference to pastie.org shows that you're looking up phrases or specific menu item names in a table I'm going to assume that there is a phrase ID to identify the phrases in question.
SELECT ISNULL(forn_lang.Title, default_lang.Title) Title,
ISNULL(forn_lang.Summary, default_lang.Summary) Summary
FROM Translation default_lang
LEFT OUTER JOIN Translation forn_lang ON default_lang.PhraseID = forn_lang.PhraseID AND forn_lang.FkLanguageId = 3
WHERE default_lang.FkLanguageId = 1

How to write a query returning non-chosen records

I have written a psychological testing application, in which the user is presented with a list of words, and s/he has to choose ten words which very much describe himself, then choose words which partially describe himself, and words which do not describe himself. The application itself works fine, but I was interested in exploring the meta-data possibilities: which words have been most frequently chosen in the first category, and which words have never been chosen in the first category. The first query was not a problem, but the second (which words have never been chosen) leaves me stumped.
The table structure is as follows:
table words: id, name
table choices: pid (person id), wid (word id), class (value between 1-6)
Presumably the answer involves a left join between words and choices, but there has to be a modifying statement - where choices.class = 1 - and this is causing me problems. Writing something like
select words.name
from words left join choices
on words.id = choices.wid
where choices.class = 1
and choices.pid = null
causes the database manager to go on a long trip to nowhere. I am using Delphi 7 and Firebird 1.5.
TIA,
No'am
Maybe this is a bit faster:
SELECT w.name
FROM words w
WHERE NOT EXISTS
(SELECT 1
FROM choices c
WHERE c.class = 1 and c.wid = w.id)
Something like that should do the trick:
SELECT name
FROM words
WHERE id NOT IN
(SELECT DISTINCT wid -- DISTINCT is actually redundant
FROM choices
WHERE class == 1)
SELECT words.name
FROM
words
LEFT JOIN choices ON words.id = choices.wid AND choices.class = 1
WHERE choices.pid IS NULL
Make sure you have an index on choices (class, wid).

Selecting elements that don't exist

I am working on an application that has to assign numeric codes to elements. This codes are not consecutives and my idea is not to insert them in the data base until have the related element, but i would like to find, in a sql matter, the not assigned codes and i dont know how to do it.
Any ideas?
Thanks!!!
Edit 1
The table can be so simple:
code | element
-----------------
3 | three
7 | seven
2 | two
And I would like something like this: 1, 4, 5, 6. Without any other table.
Edit 2
Thanks for the feedback, your answers have been very helpful.
This will return NULL if a code is not assigned:
SELECT assigned_codes.code
FROM codes
LEFT JOIN
assigned_codes
ON assigned_codes.code = codes.code
WHERE codes.code = #code
This will return all non-assigned codes:
SELECT codes.code
FROM codes
LEFT JOIN
assigned_codes
ON assigned_codes.code = codes.code
WHERE assigned_codes.code IS NULL
There is no pure SQL way to do exactly the thing you want.
In Oracle, you can do the following:
SELECT lvl
FROM (
SELECT level AS lvl
FROM dual
CONNECT BY
level <=
(
SELECT MAX(code)
FROM elements
)
)
LEFT OUTER JOIN
elements
ON code = lvl
WHERE code IS NULL
In PostgreSQL, you can do the following:
SELECT lvl
FROM generate_series(
1,
(
SELECT MAX(code)
FROM elements
)) lvl
LEFT OUTER JOIN
elements
ON code = lvl
WHERE code IS NULL
Contrary to the assertion that this cannot be done using pure SQL, here is a counter example showing how it can be done. (Note that I didn't say it was easy - it is, however, possible.) Assume the table's name is value_list with columns code and value as shown in the edits (why does everyone forget to include the table name in the question?):
SELECT b.bottom, t.top
FROM (SELECT l1.code - 1 AS top
FROM value_list l1
WHERE NOT EXISTS (SELECT * FROM value_list l2
WHERE l2.code = l1.code - 1)) AS t,
(SELECT l1.code + 1 AS bottom
FROM value_list l1
WHERE NOT EXISTS (SELECT * FROM value_list l2
WHERE l2.code = l1.code + 1)) AS b
WHERE b.bottom <= t.top
AND NOT EXISTS (SELECT * FROM value_list l2
WHERE l2.code >= b.bottom AND l2.code <= t.top);
The two parallel queries in the from clause generate values that are respectively at the top and bottom of a gap in the range of values in the table. The cross-product of these two lists is then restricted so that the bottom is not greater than the top, and such that there is no value in the original list in between the bottom and top.
On the sample data, this produces the range 4-6. When I added an extra row (9, 'nine'), it also generated the range 8-8. Clearly, you also have two other possible ranges for a suitable definition of 'infinity':
-infinity .. MIN(code)-1
MAX(code)+1 .. +infinity
Note that:
If you are using this routinely, there will generally not be many gaps in your lists.
Gaps can only appear when you delete rows from the table (or you ignore the ranges returned by this query or its relatives when inserting data).
It is usually a bad idea to reuse identifiers, so in fact this effort is probably misguided.
However, if you want to do it, here is one way to do so.
This the same idea which Quassnoi has published.
I just linked all ideas together in T-SQL like code.
DECLARE
series #table(n int)
DECLARE
max_n int,
i int
SET i = 1
-- max value in elements table
SELECT
max_n = (SELECT MAX(code) FROM elements)
-- fill #series table with numbers from 1 to n
WHILE i < max_n BEGIN
INSERT INTO #series (n) VALUES (i)
SET i = i + 1
END
-- unassigned codes -- these without pair in elements table
SELECT
n
FROM
#series AS series
LEFT JOIN
elements
ON
elements.code = series.n
WHERE
elements.code IS NULL
EDIT:
This is, of course, not ideal solution. If you have a lot of elements or check for non-existing code often this could cause performance issues.