Teradata column not sorting correctly - sql

I'm trying to join two tables by a column, and sort a table by the same column.
Here is some example data from the two tables:
table.x
state
00039
01156
table.y
state
39
1156
How do I join and sort the tables in SQL assistant?

Simplest solution would be to cast both sides to integer as #Andrew mentioned, so you could use simple casting, or trycast(...) which will try to cast the value and if that fails won't return an error, but NULL value instead:
select *
from x
inner join y on
trycast(y.state as integer) = trycast(y.state as integer)
order by y.state
Old answer (leaving this here for sake of future readers and what you can / can't do):
If you have a recent version of Teradata (you didn't specify it) you would also have LPAD function. Assuming that y.state is not text, but a number we'd also need to cast it, as lpad takes string as argument. If it is, omit cast(...):
select *
from x
inner join y on
x.state = lpad(cast(y.state as varchar(5)), 5, '0')
order by y.state
If you don't have an LPAD function, then some dirty code with substring might come in handy:
select *
from x
inner join y on
x.state = substring('00000' from char_length(cast(y.state as varchar(5))+1) || cast(y.state as varchar(5)
order by y.state
Above assumes that you store numbers within maximum 5 digits. If it's beyond that number (your sample data says 5) then you need to adjust the code.

Related

Postgresql ARRAY_AGG on array only returns first value

In Postgres 10 I'm having an issue converting an integer to a weekday name and grouping all record values via ARRAY_AGG to form a string.
The following subquery only returns the first value in the arrays indexed by timetable_periods.day (which is an integer)
SELECT ARRAY_TO_STRING(ARRAY_AGG((ARRAY['Mon','Tue','Wed','Thu','Fri','Sat','Sun'])[timetable_periods.day]), '-')
FROM timetable_periods
WHERE courses.id = timetable_periods.course_id
GROUP BY timetable_periods.course_id
whereas this shows all days concatenated in a string, as expected:
SELECT ARRAY_TO_STRING(ARRAY_AGG(timetable_periods.day), ', ')
FROM timetable_periods
WHERE courses.id = timetable_periods.course_id
GROUP BY timetable_periods.course_id
E.G. A Course has 2 timetable_periods, with day values 0 and 2 (i.e. Monday and Wednesday)
The first query only returns "Tue" instead of "Mon, Wed" (so both an indexing issue and only returning the first day).
The second query returns "0, 2" as expected
Am I doing something wrong in the use of ARRAY( with the weeknames?
Thanks
Update: The queries above are subqueries, with the courses table in the main query's FROM
You should post correct SQL statements. I suspect a JOIN of courses and timetable_periods, but courses is missing in the FROM clause. Furthermore, both queries contain AND followed by GROUP BY - this will not work.
From your writings I guess you want something like:
select
c.id,
string_agg((array['Mon','Tue','Wed','Thu','Fri','Sat','Sun'])[tp.day + 1], ', ') as day_names
from
courses c
inner join timetable_periods tp on c.id = tp.course_id
group by
c.id
Your attempts to access the day names array were quite correct. But indexing arrays is 1-based. Concatenating text values can be done with string_agg.

Get row if a number is inside a string range from a column

My table(with simplified columns) has this structure:
brand
period
fuel
Audi
2008-2016
G
BWM
2018-
D
The user will give me the matriculation year of the car and I want to look inside all the periods and find if the given year is inside the range.
For example 2010 would return me the Audi and 2020 should return the BWM.
My current query looks like the following:
SELECT *
FROM cars
WHERE brand='Audi' AND fuel='G' AND
<userIntroducedYear> BETWEEN <year1> AND <year2>;
My guess is that I should be able to get this with some subquery instead of the BETWEEN but I'm a bit lost doing it.
Thanks in advance.
You should fix your data model so you are:
Storing number values as numbers (years are numbers, not strings).
Storing only one value in a scalar column.
Postgres offers an range data type as well, which does exactly what you want.
For this, though, I'll just parse period into the period start and end years using split_part() and a lateral join to do the work in the FROM clause:
select c.*
from cars c cross join lateral
(values (split_part(c.period, '-', 1)::int, nullif(split_part(c.period, '-', 2), '')::int)
) v(period_start, period_end)
where c.brand = 'Audi' and c.fuel = 'G' and
v.period_start <= :userIntroducedYear and
(v.period_end >= :userIntroducedYear or v.period_end is null);

How to check if a float is between multiple ranges in Postgres?

I'm trying to write a query like this:
SELECT * FROM table t
WHERE ((long_expression BETWEEN -5 AND -2) OR
(long_expression BETWEEN 0 AND 2) OR
(long_expression BETWEEN 4 and 6))
Where long_expression is approximately equal to this:
(((t.s <#> (SELECT s FROM user WHERE user.user_id = $1)) / (SELECT COUNT(DISTINCT cluster_id) FROM cluster) * -1) + 1)
t.s and s are the CUBE datatypes and <#> is the indexed distance operator.
I could just repeat this long expression multiple times in the body, but this would be extremely verbose. An alternative might be to save it in a variable somehow (with a CTE?), but I think this might remove the possibility of using an index in the WHERE clause?
I also found int4range and numrange, but I don't believe they would work here either, because the distance operator returns float8's, not integer or numerics.
You can use a lateral join:
SELECT t.*
FROM table t CROSS JOIN LATERAL
(VALUES (long_expression)) v(x)
WHERE ((v.x BETWEEN -5 AND -2) OR
(v.x BETWEEN 0 AND 2) OR
(v.x BETWEEN 4 and 6)
);
Of course, a CTE or subquery could be used as well; I like lateral joins because they are easy to express multiple expressions that depend on previous values.

SQL group by number and replace characters

I have data stored in my database for mobile numbers.
I want to group by the column number in the database.
For example, some numbers may show 44123456789 and 0123456789 which is the same number. How can I group these together?
SELECT DIGITS(column_name) FROM table_name
You should use this format in DB then you assign it any variable, next you can matching their digits with the others.
Not sure it really suits you, but you could build this kind of subquery:
SELECT ta.`phone_nbr`,
COALESCE(list.`normalized_nbr`,ta.`phone_nbr`) AS nbr
FROM (
SELECT
t.`phone_nbr`,
SUBSTRING(t.`phone_nbr`,2) AS normalized_nbr
FROM `your_table` t
WHERE LEFT(t.`phone_nbr`,1) = '0'
UNION
SELECT
t.`phone_nbr`,
sub.`filter_nbr` AS normalized_nbr
FROM `your_table` t,
( SELECT
SUBSTRING(t2.`phone_nbr`,2) AS filter_nbr
FROM `your_table` t2
WHERE LEFT(t2.`phone_nbr`,1) = '0') sub
WHERE LEFT(t.`phone_nbr`,1) != '0'
AND t.`phone_nbr` LIKE CONCAT('%',sub.`filter_nbr`)
) list
LEFT OUTER JOIN `your_table` ta
ON ta.`phone_nbr` = list.`phone_nbr`
It will return you a list of phone numbers with their "normalized" number, i.e. with the 0 or international prefix removed if there is a duplicate match, and the raw number otherwise.
You can then use a GROUP BY clause on the nbr field, join on the phone_nbr for the rest of your query.
It has some limits, as it can unfortunately group similar stripped numbers. +49123456789, +44123456789 and 0123456789 will unfortunately have the same normalized number.

SQL return exactly one row or null in a select sub-query

In Oracle, is it possible to have a sub-query within a select statement that returns a column if exactly one row is returned by the sub-query and null if none or more than one row is returned by the sub-query?
Example:
SELECT X,
Y,
Z,
(SELECT W FROM TABLE2 WHERE X = TABLE1.X) /* but return null if 0 or more than 1 rows is returned */
FROM TABLE1;
Thanks!
How about going about it in a different way? A simple LEFT OUTER JOIN with a subquery should do what you want:
SELECT T1.X
,T1.Y
,T1.Z
,T2.W
FROM TABLE1 AS T1
LEFT OUTER JOIN (
SELECT X
,W
FROM TABLE2
GROUP BY X,W
HAVING COUNT(X) = 1
) AS T2 ON T2.X = T1.X;
This will only return items that have exactly 1 instance of X, and LEFT OUTER JOIN it back to the table when appropriate (leaving the non-matches NULL).
This is also ANSI-compliant, so it is quite performant.
Besides a CASE solution or rewriting the inline subquery as an outer join, this will work, if you can apply an aggregate function (MIN or MAX) on the W column:
SELECT X,
Y,
Z,
(SELECT MIN(W) FROM TABLE2 WHERE X = TABLE1.X HAVING COUNT(*) = 1) AS W
FROM TABLE1;
SELECT
X, Y, Z, (SELECT W FROM TABLE2 WHERE X = TABLE1.X HAVING COUNT(*) = 1)
FROM
TABLE1;
my answer is: dont use subselects (unless you are sure ...)
no need and not a good idea to use a subselect here as PlantTheIdea mentioned because of two things
explaination:
subselect means:
one select for each row of the primary select result set. i.e. if you get 1000 rows, you also get 1000 (small) select statemts in your db-system (ignoring optimizer here)
and(!)
with a subselect you have a good chance to hide (or override) a heavy database or select problem. that means: you are only expecting none (NULL) or one (exactly) row (both easily resolvable with a [left outer] join). if there are more than one in your subselect there is something wrong, the SQL Error points that out
the "HAVING COUNT(X) = 1" of course correct, has the small (or not small) problem, thats: "why is there a count of more than one row?"
I spent hours of lifetime finding a workarround like this, just ending up in "dont do it if you are realy sure ..."
I see that in opposite to a "having" like this
...
HAVING date=max(date) -- depends on sql dialect
or
where date = select max(date) from same_table
and with my last example i again want to point out: if you get here more than one row (both from today ;.) you have a DB problem - you chould use a timestamp instead for example