How to put together data from different tables without duplicating - sql

Im feeling a bit stupid now but I cant seem to make it happen. I have som tables with data and the problem I have with one SELECT is that the data is sometimes duplicated. (Sorry, English is not my first language, ask if unclear.
SELECT (IM_FAKTUROR.FAKT_NUMMER || ' ' || IM_FAKTURA_GRUPPER.FAKT_TYP) AS 'ProjektNrNamn',
But sometimes those two tables/columns have exactly the same data and in those cases I only want the data from one of them, not both. How to?
If there is different data in the two I want all info.

Use a case expression. If the two columns have the same value, just return one of them. Else return both of them:
SELECT case when IM_FAKTUROR.FAKT_NUMMER = IM_FAKTURA_GRUPPER.FAKT_TYP
then IM_FAKTUROR.FAKT_NUMMER
else (IM_FAKTUROR.FAKT_NUMMER || ' ' || IM_FAKTURA_GRUPPER.FAKT_TYP)
end AS 'ProjektNrNamn',

Try this:
SELECT DISTINCT ProjektNrNamn FROM
(
SELECT (IM_FAKTUROR.FAKT_NUMMER || ' ' || IM_FAKTURA_GRUPPER.FAKT_TYP) AS 'ProjektNrNamn'
) as t

Try SELECT DISTINCT:
SELECT DISTINCT
IM_FAKTUROR.FAKT_NUMMER || ' ' || IM_FAKTURA_GRUPPER.FAKT_TYP AS 'ProjektNrNamn'
FROM ...
But this answer assumes that the project name is the only thing in your select list. If you have other columns, it gets more complicated.

Related

SQL How to make nested query with substring/case statement/ trim

New to SQL and trying to understand nested queries and how to use them. I have a substring, case statement, and trim statement that I'm trying to put together but am unsure of how. The substring has to be done first, then the case statement, then the trim. This is what I have at the moment but unsure of how to get it working. The code is random names/tables as an example
SELECT dtXYZ.*
FROM
(
SELECT dt,
SUBSTRING_INDEX(SUBSTRING_INDEX(dt, ..................... ) as lioness,
SUBSTRING_INDEX(SUBSTRING_INDEX(dt, .....................) as tiger,
SUBSTRING_INDEX(dt, .................) as bear
FROM Animaltab
) dtXYZ
SELECT
CASE WHEN length(bear) = 4 THEN bear
ELSE concat('0', bear)
END AS bear_corr,
CASE WHEN length(lion) = 7 THEN lioness
ELSE concat('0', lioness)
END AS lion_corr
trim(lion_corr) || '_' || trim(tiger) || '_' || trim(bear_corr) as new_imp_animal
Spark supports CTE https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-cte.html
even with databrics this will work see Common Table Expressions (CTEs) in Databricks and Spark
ANd you can nest them like this
WITH dtXYZ(dt,lioness,tiger,bear() AS ( SELECT dt,
SUBSTRING_INDEX(SUBSTRING_INDEX(dt, ..................... ) as lioness,
SUBSTRING_INDEX(SUBSTRING_INDEX(dt, .....................) as tiger,
SUBSTRING_INDEX(dt, .................) as bear
FROM Animaltab),
dtcorrected (dt,bear_corr,lion_corr,tiger) as (
SELECT
dt,
CASE WHEN length(bear) = 4 THEN bear
ELSE concat('0', bear)
END AS bear_corr,
CASE WHEN length(lion) = 7 THEN lioness
ELSE concat('0', lioness)
END AS lion_corr
,tiger
FROM dtXYZ)
SELECT
dt,
trim(lion_corr) || '_' || trim(tiger) || '_' || trim(bear_corr) as new_imp_animal FROM dtcorrected
Order of operations can be tricky with SQL if you're used to ordering things in a procedure. Like nbk commented, CTEs or Common Table Expressions are your best bet. CTEs are defined by the 'with' keyword and are very similar to nested subqueries (you could write the same query nested if you wanted) but are better suited to this operation where the nesting structure of the code doesn't mimic the nesting of the data. I always use CTEs if I'm joining tables that each need independent grouping or filtering. The SQL in the parenthesis essentially creates a view, and the outside SQL is a second higher-order select statement to create a result set. If I'm working with hierarchical data (parent, child, grandchild), I'll go with the nesting in the query to follow that path, but usually, the CTE is easier to organize your ideas. Here's how that would work:
with dtXYZ as
(
SELECT dt,
SUBSTRING_INDEX(SUBSTRING_INDEX(dt, ..................... ) as lioness,
SUBSTRING_INDEX(SUBSTRING_INDEX(dt, .....................) as tiger,
SUBSTRING_INDEX(dt, .................) as bear
FROM Animaltab
)
SELECT
CASE WHEN length(bear) = 4 THEN bear
ELSE concat('0', bear)
END AS bear_corr,
CASE WHEN length(lion) = 7 THEN lioness
ELSE concat('0', lioness)
END AS lion_corr,
trim(lion_corr) || '_' || trim(tiger) || '_' || trim(bear_corr) as new_imp_animal
from
dtXYZ
And in terms of 'order of operations,' case statements and functions in a select can be referenced by other parts of the select statement as inputs. Things can get hairy when you use 'if' ideas that resolve to illogical or error-causing conditions. Still, otherwise, I've had no issues with having many parts of a select refer to each other. It's an excellent way to test out nesting functions.

DB2 -Clause IN-Multiple Value

The following code is working fine when the column 'generated_key ' return one value
WHERE code IN ( SELECT generated_key FROM List_agg )
CODE
generated_key
EU00100ST10000016
EU00100ST10000016
But when the column generated_key containt more than a values, it return 0 rows
CODE
generated_key
EU00100ST10000016
EU00100ST10000016, EU00100ST10000017
If you need to compare to a list, use delimited comparisons
WHERE EXISTS (SELECT 1
FROM list_agg
WHERE ',' || generated_key || ',' LIKE '%,' || code ',%'
)
The "list_agg" name suggests that you are aggregating values from another query. If so, you might be able to use use in with no aggregation. But your question doesn't have enough details to know if that is really the case.

SQL: count occurrences for each row

I have an SQL table called Codes with a primary column code of type String.
I also have another table called Items with a column codestring also of type String. This entry always contains a string with some of the codes of the above table separated by spaces.
Now I want to get all codes and their number of Items containing the respective code. Can I do that?
Codes:
code|...
----|---
"A0A"|
"A0B"|
...|
Items:
...|codestring
----|---------
|"A0A C2B F1K"
|"A0C D2S H3K"
|...
Output:
Codes:
code|...|noOfItems
----|---|---------
"A0A"|...|5
"A0B"|...|10
...|...|...
Assuming all the codes are distinct (or at least no code is a substring of another code), you can use the LIKE operator:
(untested)
SELECT codes.code, count(*)
FROM codes LEFT JOIN items ON items.codestring LIKE '%' + codes.code + '%'
GROUP BY codes.code;
You have a horrible data format and should fix it. The mapping to codes should have a separate row for each code.
If the codes are all three characters, then L Scott Johnson's answer is close enough. Otherwise, you should take the delimiter into consideration:
SELECT c.code, count(i.codestring)
FROM codes c LEFT JOIN
items i
ON ' ' || i.codestring || ' ' LIKE '% ' || c.code || ' %'
GROUP BY c.code;
Note other fixes to the code:
The concatenation operator is the standard ||.
The count() will return 0 if there are no matches.

How to add a title and substring

I need to get a list of names as per the following format
"Mr."+first name initial+last name+"."
There is only one table for this
salesperson (f_name, l_name)
What i have been trying is;
SELECT 'Mr.' ||' ' || SUBSTRING(f_name,1,1) || ' ' || l_name ||’.’||
FROM salesperson;
It works without the substring or left, but not if I include them.
Use concat instead of || operator to concatenate strings in MySQL. As you have it, it would be interpreted as logical OR condition, hence you get the error.
SELECT CONCAT('Mr.',' ',SUBSTRING(f_name,1,1),' ',l_name,'.')
FROM salesperson;
Oracle solution
SELECT 'Mr.'||' '||SUBSTR(f_name,1,1)||' '||l_name||'.'
FROM salesperson;
It is better practice anyways to grab the name data in full and then format it in the view part of your application with languages that are more suited to string manipulation. This also makes your code more reusable.
That being said use this
SELECT CONCAT("Mr. ",SUBSTRING( f_name, 1, 1 ) ," ",l_name,".") FROM salesperson

How to join tables on regex

Say I have two tables msg for messages and mnc for mobile network codes.
They share no relations. But I want to join them
SELECT msg.message,
msg.src_addr,
msg.dst_addr,
mnc.name,
FROM "msg"
JOIN "mnc"
ON array_to_string(regexp_matches(msg.src_addr || '+' || msg.dst_addr, '38(...)'), '') = mnc.code
But query fails with error:
psql:marketing.sql:28: ERROR: argument of JOIN/ON must not return a set
LINE 12: ON array_to_string(regexp_matches(msg.src_addr || '+' || msg...
Is there a way to do such join? Or am I moving wrong way?
A very odd way to join. Every match on one side is combined with every row from the other table ...
regexp_matches() is probably the wrong function for your purpose. You want a simple regular expression match (~). Actually, the LIKE operator will be faster:
Presumably fastest with LIKE
SELECT msg.message
, msg.src_addr
, msg.dst_addr
, mnc.name
FROM mnc
JOIN msg ON msg.src_addr LIKE ('%38' || mnc.code || '%')
OR msg.dst_addr LIKE ('%38' || mnc.code || '%')
WHERE length(mnc.code) = 3;
In addition, you only want mnc.code of exactly 3 characters.
With regexp match
You could write the same with regular expressions but it will most definitely be slower. Here is a working example close to your original:
SELECT msg.message
, msg.src_addr
, msg.dst_addr
, mnc.name
FROM mnc
JOIN msg ON (msg.src_addr || '+' || msg.dst_addr) ~ (38 || mnc.code)
AND length(mnc.code) = 3;
This also requires msg.src_addr and msg.dst_addr to be NOT NULL.
The second query demonstrates how the additional check length(mnc.code) = 3 can go into the JOIN condition or a WHERE clause. Same effect here.
With regexp_matches()
You could make this work with regexp_matches():
SELECT msg.message
, msg.src_addr
, msg.dst_addr
, mnc.name
FROM mnc
JOIN msg ON EXISTS (
SELECT *
FROM regexp_matches(msg.src_addr ||'+'|| msg.dst_addr, '38(...)', 'g') x(y)
WHERE y[1] = mnc.code
);
But it will be slow in comparison.
Explanation:
Your regexp_matches() expression just returns an array of all captured substrings of the first match. As you only capture one substring (one pair of brackets in your pattern), you will exclusively get arrays with one element.
You get all matches with the additional "globally" switch 'g' - but in multiple rows. So you need a sub-select to test them all (or aggregate). Put that in an EXISTS - semi-join and you arrive at what you wanted.
Maybe you can report back with a performance test of all three?
Use EXPLAIN ANALYZE for that.
Your immediate problem is that regexp_matches could return one or more rows.
Try using "substring" instead, which extracts a substring given a regex pattern.
SELECT msg.message,
msg.src_addr,
msg.dst_addr,
mnc.name
FROM "msg"
JOIN "mnc"
ON substring(msg.src_addr || '+' || msg.dst_addr from '38(...)') = mnc.code