Oracle SQL - Not null code is not working - sql

I am trying to retrieve the monetary amount associated with project IDs, however I only want data where a project ID exists (not blank)
When I type my SQL code below...
SELECT project_id, monetary_amount, journal_line_date
FROM PS_JRNL_LN
where project_id is not null
and journal_line_date BETWEEN to_date ('2020/01/01','yyyy/mm/dd')
AND TO_DATE ('2020/03/04','yyyy/mm/dd')
this query works however, I am still getting blank values in my result

You dont have nulls but blank spaces add below in your query
SELECT project_id, monetary_amount,
journal_line_date
FROM PS_JRNL_LN
where ( project_id is not null or
( project_id is not
null
and LTRIM( RTrim(project_id)) not
like '')
and
journal_line_date BETWEEN
to_date ('2020/01/01','yyyy/mm/dd')
AND TO_DATE
('2020/03/04','yyyy/mm/dd')

Here is something that can help you find out what is happening in the project_id column. (Most likely, a bunch of ' ' values, meaning non-empty string consisting of a single space.)
select project_id, dump(project_id)
from ps_jrnl_ln
where ltrim(project_id, chr(32) || chr(9)) is null
and project_id is not null
;
DUMP shows you exactly what is stored in your table. 32 is the ASCII code for a single space; 9 (or 09) is the code for horizontal tab. I expect you will get rows where the DUMP column shows a single character, with code 32. But - who knows; you may find other things as well.
That will help you understand what's in the column. (You may also check describe ps_jrnl_ln - you may find out that the column is declared not null!!!)
If you find a bunch of rows where the project id is a single space, of course, in your actual query you will have to change
where project_id is not null
to
where ltrim(project_id, chr(32) || chr(9)) is not null
Or, perhaps, if indeed a single space is used as placeholder for null:
where project_id != ' '

Few things you must implement in your table design to prevent the problem at first place than struggling with the data:
Add a NOT NULL constraint to the column.
Add a CHECK constraint to prevent unwanted characters like whitespaces etc. and only allow the data you want to load.
If you don't want a check constraint, then handle it during loading the data using TRIM.
If necessary, make the PROJECT_ID column the primary key, that would implicitly not allow NULL values. Usually, the ID column in a table suggests it's a primary key but it could vary in your use case.
If you are not allowed to alter the design by doing none of the above, then at least you could handle the data insertion at application level where you might be taking it as input.

Inner join that journal table to the source of truth for Project ID's. Assuming there are no "blank" ID's in that table, then you won't get "blanks" in your result.
e.g.
SELECT j.project_id, j.monetary_amount, j.journal_line_date
FROM PS_JRNL_LN J
INNER JOIN PROJECT_MASTER P ON j.project_id = p.id /* should remove "blanks" */
where j.journal_line_date >= to_date ('2020/01/01','yyyy/mm/dd')
and j.journal_line_date < TO_DATE ('2020/03/05','yyyy/mm/dd')
Note also, I never use between for date ranges, the above pattern using >= & < is more reliable (as it works regardless of the time precision of the data).

Try using filter condition:
ltrim(rtrim(project_id)) <> ''

Related

SQL Concatenate column values and store in an extra column

I am using SQL Server 2019 (v15.0.2080.9) and I've got a simple task but somehow I am not able to solve it...
I have a little database with one table containing a first name and a last name column
CREATE TABLE [dbo].[person]
(
[first_name] [nchar](200) NULL,
[last_name] [nchar](200) NULL,
[display_name] [nchar](400) NULL
) ON [PRIMARY]
GO
and I want to store the combination of first name with an extra whitespace in between in the third column (yes I really have to do that...).
So I thought I might use the CONCAT function
UPDATE [dbo].[person]
SET display_name = CONCAT(first_name, ' ', last_name)
But my display_name column is only showing me the first name... so what's wrong with my SQL?
Kind regards
Sven
Your method should work and does work. The issue, though is that the data types are nchar() instead of nvarchar(). That means they are padded with spaces and the last name is starting at position 201 in the string.
Just fix the data type.
In addition, I would suggest that you use a computed column:
alter table person add display_name as (concat(first_name, ' ', last_name));
This ensures that the display_name is always up-to-date -- even when the names change.
Here is a db<>fiddle.
As a note: char() and nchar() are almost never appropriate. The one exception is when you have fixed length strings, such as state or country abbreviations or account codes. In almost all cases, you want varchar() or nvarchar().

How to get unique values from each column based on a condition?

I have been trying to find an optimal solution to select unique values from each column. My problem is I don't know column names in advance since different table has different number of columns. So first, I have to find column names and I could use below query to do it:
select column_name from information_schema.columns
where table_name='m0301010000_ds' and column_name like 'c%'
Sample output for column names:
c1, c2a, c2b, c2c, c2d, c2e, c2f, c2g, c2h, c2i, c2j, c2k, ...
Then I would use returned column names to get unique/distinct value in each column and not just distinct row.
I know a simplest and lousy way is to write select distict column_name from table where column_name = 'something' for every single column (around 20-50 times) and its very time consuming too. Since I can't use more than one distinct per column_name, I am stuck with this old school solution.
I am sure there would be a faster and elegant way to achieve this, and I just couldn't figure how. I will really appreciate any help on this.
You can't just return rows, since distinct values don't go together any more.
You could return arrays, which can be had simpler than you may have expected:
SELECT array_agg(DISTINCT c1) AS c1_arr
,array_agg(DISTINCT c2a) AS c2a_arr
,array_agg(DISTINCT c2b) AS c2ba_arr
, ...
FROM m0301010000_ds;
This returns distinct values per column. One array (possibly big) for each column. All connections between values in columns (what used to be in the same row) are lost in the output.
Build SQL automatically
CREATE OR REPLACE FUNCTION f_build_sql_for_dist_vals(_tbl regclass)
RETURNS text AS
$func$
SELECT 'SELECT ' || string_agg(format('array_agg(DISTINCT %1$I) AS %1$I_arr'
, attname)
, E'\n ,' ORDER BY attnum)
|| E'\nFROM ' || _tbl
FROM pg_attribute
WHERE attrelid = _tbl -- valid, visible table name
AND attnum >= 1 -- exclude tableoid & friends
AND NOT attisdropped -- exclude dropped columns
$func$ LANGUAGE sql;
Call:
SELECT f_build_sql_for_dist_vals('public.m0301010000_ds');
Returns an SQL string as displayed above.
I use the system catalog pg_attribute instead of the information schema. And the object identifier type regclass for the table name. More explanation in this related answer:
PLpgSQL function to find columns with only NULL values in a given table
If you need this in "real time", you won't be able to archive it using a SQL that needs to do a full table scan to archive it.
I would advise you to create a separated table containing the distinct values for each column (initialized with SQL from #Erwin Brandstetter ;) and maintain it using a trigger on the original table.
Your new table will have one column per field. # of row will be equals to the max number of distinct values for one field.
For on insert: for each field to maintain check if that value is already there or not. If not, add it.
For on update: for each field to maintain that has old value != from new value, check if the new value is already there or not. If not, add it. Regarding the old value, check if any other row has that value, and if not, remove it from the list (set field to null).
For delete : for each field to maintain, check if any other row has that value, and if not, remove it from the list (set value to null).
This way the load mainly moved to the trigger, and the SQL on the value list table will super fast.
P.S.: Make sure to pass all you SQL from trigger to explain plan to make sure they use best index and execution plan as possible. For update/deletion, just check if old value exists (limit 1).

Combining concatenation with ORDER BY

I have troubles in combining concatenation with order by in Postgre (9.1.9).
Let's say, I have a table borders with 3 fields:
Table "borders"
Column | Type | Modifiers
---------------+----------------------+-----------
country1 | character varying(4) | not null
country2 | character varying(4) | not null
length | numeric |
The first two fields are codes of the countries and the third one is the length of the border among those countries.
The primary key is defined on the first two fields.
I need to compose a select of a column that would have unique values for the whole table, in addition this column should be selected in decreasing order.
For this I concatenate the key fields with a separator character, otherwise two different rows might give same result, like (AB, C and A, BC).
So I run the following query:
select country1||'_'||country2 from borders order by 1;
However in the result I see that the '_' character is omited from the sorting.
The results looks like this:
?column?
----------
A_CH
A_CZ
A_D
AFG_IR
AFG_PK
AFG_TAD
AFG_TJ
AFG_TM
AFG_UZB
A_FL
A_H
A_I
.
.
You can see that the result is sorted as if '_' doesn't exists in the strings.
If I use a letter (say 'x') as a separator - the order is correct. But I must use some special character that doesn't appear in the country1 and country2 fields, to avoid contentions.
What should I do, in order to make the '_' character to be taken into account during the sorting.
EDIT
It turned out that the concatenation has nothing to do with the problem. The problem is that the order by simply ignores '_' character.
select country1 || '_' || country2 collate "C" as a
from borders
order by 1
sql fiddle demo
Notes according to discussion in comments:
1.) COLLATE "C" applies in the ORDER BY clause as long as it references the expression in the SELECT clause by positional parameter or alias. If you repeat the expression in ORDER BY you also need to repeat the COLLATE clause if you want to affect the sort order accordingly.
sql fiddle demo
2.) In collations where _ does not influence the sort order, it is more efficient to use fog's query, even more so because that one makes use of the existing index (primary key is defined on the first two fields).
However, if _ has an influence, one needs to sort on the combined expression:
sql fiddle demo
Query performance (tested in Postgres 9.2):
sql fiddle demo
PostgreSQL Collation Support in the manual.
Just order by the two columns:
SELECT country1||'_'||country2 FROM borders ORDER BY country1, country2;
Unless you use aggregates or windows, PostgreSQL allows to order by columns even if you don't include them in the SELECT list.
As suggested in another answer you can also change the collation of the combined column but, if you can, sorting on plain columns is faster, especially if you have an index on them.
What happens when you do the following?
select country1||'_'||country2 from borders order by country1||'_'||country2
My knowledge on order by 1 only does an ordinal sort. It won't do anything on concatenated columns. Granted, I'm speaking from SQL Server knowledge, so let me know if I'm way off base.
Edited: Ok; just saw Parado's post as I posted mine. Maybe you could create a view from this query (give it a column name) and then requery the view, order by that column? Or do the following:
select country_group from (
select country1||'_'||country2 as country_group from borders
) a
order by country_group

Easy way to transfer/update a list of numbers to be used in the SQL 'in' command?

I'm always being given a large list of say id's which I need to search in our database have manually put them into a sql statement like the follow which can take a while putting single quotes around each number followed by a comma, I was hoping someone has a easy way of doing this for me? Or am I just being a bit lazy...
select * from blah where idblah in ('1234-A', '1235-A', '1236-A' ................)
You can use the worlds' simplest code generator.
Just paste in the list of values, setup the pattern and voila... you have a set of quoted values.
I have also used Excel in the past, using the CONCAT function with smart paste.
I would set aside a table to hold the values and have my queries JOIN against that table. Set up a simple import script (don't forget to clear out the table at the start) and something like this is a breeze. Run the import, run the query. You never have to touch the query again or regenerate any code.
As an example:
CREATE TABLE Search_ID_List (
id VARCHAR(20) NOT NULL,
CONSTRAINT PK_Search_ID_List PRIMARY KEY CLUSTERED (id)
)
and:
SELECT
<column list>
FROM
Search_ID_List SIL
INNER JOIN Blah B ON
B.id = SIL.id
If you want to be able to save past search criteria or have multiple searches available to you at the same time then you can just add an identifying column which gets filled in by your import. It can be the file from where the ids came, some descriptive code/name, or whatever. Then just add that to the WHERE clause of your query and you're all set.
You could do something like this.
select * from blah where ',' + '1234-A,1235-A,1236-A' + ',' LIKE ',%' + idblah + '%,'
This pattern is super useful when you're being passed a comma delimited list of values to filter by, but I think would be applicable here as well.

Invalid Number Error! Can't seem to get around it

Oracle 10g DB. I have a table called s_contact. This table has a field called person_uid. This person_uid field is a varchar2 but contains valid numbers for some rows and in-valid numbers for other rows. For instance, one row might have a person_uid of '2-lkjsdf' and another might be 1234567890.
I want to return just the rows with valid numbers in person_uid. The SQL I am trying is...
select person_uid
from s_contact
where decode(trim(translate(person_uid, '1234567890', ' ')), null, 'n', 'c') = 'n'
The translate replaces all numbers with spaces so that a trim will result in null if the field only contained numbers. Then I use a decode statement to set a little code to filter on. n=number, c=char.
This seems to work when I run just a preview, but I get an 'invalid number' error when I add a filter of...
and person_uid = 100
-- or
and to_number(person_uid) = 100
I just don't understand what is happening! It should be filtering out all the records that are invalid numbers and 100 is obviously a number...
Any ideas anyone? Greatly Appreciated!
Unfortunately, the various subquery approaches that have been proposed are not guaranteed to work. Oracle is allowed to push the predicate into the subquery and then evaluate the conditions in whatever order it deems appropriate. If it happens to evaluate the PERSON_UID condition before filtering out the non-numeric rows, you'll get an error. Jonathan Gennick has an excellent article Subquery Madness that discusses this issue in quite a bit of detail.
That leaves you with a few options
1) Rework the data model. It's generally not a good idea to store numbers in anything other than a NUMBER column. In addition to causing this sort of issue, it has a tendency to screw up the optimizer's cardinality estimates which leads to less than ideal query plans.
2) Change the condition to specify a string value rather than a number. If PERSON_UID is supposed to be a string, your filter condition could be PERSON_UID = '100'. That avoids the need to perform the implicit conversion.
3) Write a custom function that does the string to number conversion and ignores any errors and use that in your code, i.e.
CREATE OR REPLACE FUNCTION my_to_number( p_arg IN VARCHAR2 )
RETURN NUMBER
IS
BEGIN
RETURN to_number( p_arg );
EXCEPTION
WHEN others THEN
RETURN NULL;
END;
and then my_to_number(PERSION_UID) = 100
4) Use a subquery that prevents the predicate from being pushed. This can be done in a few different ways. I personally prefer throwing a ROWNUM into the subquery, i.e. building on OMG Ponies' solution
WITH valid_persons AS (
SELECT TO_NUMBER(c.person_uid) 'person_uid',
ROWNUM rn
FROM S_CONTACT c
WHERE REGEXP_LIKE(c.personuid, '[[:digit:]]'))
SELECT *
FROM valid_persons vp
WHERE vp.person_uid = 100
Oracle can't push the vp.person_uid = 100 predicate into the subquery here because doing so would change the results. You could also use hints to force the subquery to be materialized or to prevent predicate pushing.
Another alternative is to combine the predicates:
where case when translate(person_uid, '1234567890', ' ')) is null
then to_number(person_uid) end = 100
When you add those numbers to the WHERE clause it's still doing those checks. You can't guarantee the ordering within the WHERE clause. So, it still tries to compare 100 to '2-lkjsdf'.
Can you use '100'?
Another option is to apply a subselect
SELECT * FROM (
select person_uid
from s_contact
where decode(trim(translate(person_uid, '1234567890', ' ')), null, 'n', 'c') = 'n'
)
WHERE TO_NUMBER(PERSON_UID) = 100
Regular expressions to the rescue!
where regexp_like (person_uid, '^[0-9]+$')
Use the first part of your query to generate a temp table. Then query the temp table based on person_uid = 100 or whatever.
The problem is that oracle tries to convert each person_uid to an int as it gets to it due to the additional and statement in your where clause. This behavior may or may not show up in the preview depending on what records where picked.