This is the table I have :
For every unique TID, there are 2 records. For a unique TID if both records in a field is populated I want the name of the field. For example, for T01 : Field2 and Field4 have both records populated.
My current approach is I create a column with comma separated values with the field names :
INSERT INTO TEMP
SELECT *,
(CASE WHEN COUNT(IIF(Field1 IS NOT NULL,1,NULL)) = 2 THEN 'FIELD1' ELSE 'NO' END) + ',' +
(CASE WHEN COUNT(IIF(Field2 IS NOT NULL,1,NULL)) = 2 THEN 'FIELD2' ELSE 'NO' END) + ',' +
(CASE WHEN COUNT(IIF(Field3 IS NOT NULL,1,NULL)) = 2 THEN 'FIELD3' ELSE 'NO' END) + ',' +
(CASE WHEN COUNT(IIF(Field4 IS NOT NULL,1,NULL)) = 2 THEN 'FIELD4' ELSE 'NO' END) AS ATTR
FROM ORIGINAL_TABLE;
I then convert the comma separated column into multiple records :
SELECT *, S.ITEMS as ATTRIBUTES
FROM TEMP
CROSS APPLY DBO.SPLIT(ATTR, ',') S
WHERE S.ITEMS NOT LIKE '%NO%'
Consider T101 of the result obtained from above command, This gives me the output :
Edit : Apologies. It should be Field2 instead of Field1.
This does give me information on the fields for every unique TID that follows the condition but I want it to be more specific. I run this for very big data with over 100 columns so this approach is slow.
Is there a way to get this? Where I display just the fields that satisfy the condition and their values for both records in T101.
Edit : Apologies. It should be Field2 instead of Field1 in the table.
I am fairly new to SQL, any help would be much appreciated!
Your question is rather complicated, and I'm not 100% sure what you really want. But based on:
For a unique TID if both records in a field is populated I want the name of the field.
You can unpivot and aggregate. Assuming that your columns all have a similar data type, you can use:
SELECT t.tId, v.fieldname
FROM ORIGINAL_TABLE t CROSS APPLY
(VALUES ('Field1', Field1),
('Field2', Field2),
('Field3', Field3),
('Field4', Field4)
) v(fieldname, val)
GROUP BY t.tID, v.fieldname
HAVING COUNT(*) = COUNT(v.val) -- all populated
Related
I have to do a select query to create a view with specific criteria.
I have multiple tables which contains many many columns and lines.
However, I have extracted a value to use as my key (e.g.: id). I have 7000+ of those unique keys that I extracted from all my tables with the function UNION to avoid duplicates.
Now, I want to add a column INDICATOR_1 which will affect the value YES or NO based on criteria.
This is where I struggle.
I need to find the line in those tables that contain the id. After that, I'd like to check, always in that line, if the field XYZ contains the value 'N' (example). If yes, affect the value 'YES' to INDICATOR_1, else it's no.
In a matter of pseudo-code, what I want to do looks like this :
CASE
WHEN id = (id from table_1) AND (if table_1.xyz = 'N')
THEN 'YES'
ELSE 'NO'
END AS INDICATOR_1
I don't know if I'm clear enough, but your help will be greatly appreciated.
If I understand correctly, you want a separate indicator for each table. Something like this:
select i.*,
(case when exists (select 1
from table1 t1
where t1.id = i.id and t1.xyz = 'N'
)
then 'YES' else 'NO'
end) as indicator_1,
(case when exists (select 1
from table2 t2
where t2.id = i.id and t2.xyz = 'N'
)
then 'YES' else 'NO'
end) as indicator_2,
. . .
from (<your id list here>) i
I think you should fix this in the union, where you have all the data you need. You probably have something like:
SELECT Id
FROM table_1
UNION
SELECT Id
FROM table_2
How about selecting the information you want as well (I use distinct here to clarify):
SELECT DISTINCT Id
, CASE WHEN table_1.xyz = 'N' THEN 'N'
ELSE 'Y'
END INDICATOR_1
FROM table_1
This can lead to more records than you had, if id's can have records of both flavours exist. We can fix that with a row number in an outer query. You end up with something like:
SELECT Id
, INDICATOR_1
FROM (
SELECT Id
, INDICATOR_1
, ROW_NUMBER()OVER(PARTITION BY ID ORDER BY CASE WHEN INDICATOR_1 ='N' THEN 0 ELSE 1 END) RN
FROM (
SELECT Id
, CASE WHEN table_1.xyz = 'N' THEN 'N'
ELSE 'Y'
END INDICATOR_1
FROM table_1
UNION
...
) T
) S
WHERE S.RN = 1
You can in fact shorten that by using the inner most case expression in the ROW_NUMBER expression.
Here is my query with the output below the syntax.
SELECT DISTINCT CASE WHEN id = 'RUS0261431' THEN value END AS sr_type,
COUNT(CASE WHEN id in ('RUS0290788') AND value in ('1','2','3','4') THEN respondentid END) AS sub_ces,
COUNT(CASE WHEN id IN ('RUS0290788') AND value in ('5','6','7') THEN respondentid END) AS pos_ces,
COUNT(*) as total_ces
FROM `some_table`
WHERE id in ( 'RUS0261431') AND id <> '' AND value IS NOT NULL
GROUP BY 1
As you can see with the attached table I'm unable to group the values based on Id RUS0290788 with the distinct values that map to RUS0261431. Is there anyway to pivot with altering my case when statements so I can group sub_ces and pos_ces by sr_type. Thanks in advanceenter image description here
You can simplify your WHERE condition to WHERE id = ('RUS0261431'). Only records with this value will be selected so you do not have to repeat this in the CASE statements.
QUERY:
select ws_path from workpaths where
(
(ws_path like '%R_%') or
(ws_path like '%PB_%' ) or
(ws_path like '%ST_%')
)
OUTPUT:
/x/eng/users/ST_3609843_ijti4689_3609843_1601272247
/x/eng/users/ST_3610020_zozt5229_3610020_1601282033
/x/eng/users/ST_3611181_zozt5229_3611181_1601282032
/x/eng/users/ST_3611226_zozt5229_3611226_1601282033
/x/eng/users-random/john/N_3582168_3551186_1601040805
/x/eng/users-random/james/N_3582619_3551186_1601041405
/x/eng/users-random/jimmy/N_3582791_3551186_1601042005
/x/eng/users/R_3606462_3606462_1601251334
/x/eng/users/R_3611775_3612090_1601290909
/x/eng/users/R_3612813_3613016_1601292252
Is there way to group partially by ST_, N_ and R_?
i.e. group by ws_path wont work at the moment for the obvious reason
I need to only look at the last item in the path (split by '/') and then the front part of splitting with '_'
You can use regexp_substr to get the patterns being searched for and then group by the number of such occurrences.
select regexp_substr(ws_path,'\/R_|\/PB_|\/ST_'), count(*)
from workpaths
group by regexp_substr(ws_path,'\/R_|\/PB_|\/ST_')
Regexp is a good solution but can be expensive. A simpler substring might be cheaper and faster:
CREATE TABLE tbl (field1 VARCHAR(100));
INSERT INTO dbo.tbl
( field1 )
VALUES
('/x/eng/users/ST_3609843_ijti4689_3609843_1601272247'),
('/x/eng/users/ST_3610020_zozt5229_3610020_1601282033'),
('/x/eng/users/ST_3611181_zozt5229_3611181_1601282032'),
('/x/eng/users/ST_3611226_zozt5229_3611226_1601282033'),
('/x/eng/users-random/john/N_3582168_3551186_1601040805'),
('/x/eng/users-random/james/N_3582619_3551186_1601041405'),
('/x/eng/users-random/jimmy/N_3582791_3551186_1601042005'),
('/x/eng/users/R_3606462_3606462_1601251334'),
('/x/eng/users/R_3611775_3612090_1601290909'),
('/x/eng/users/R_3612813_3613016_1601292252');
SELECT
COUNT(CASE WHEN field1 LIKE '%/ST_%' THEN 1 ELSE NULL END) AS 'st_count',
COUNT(CASE WHEN field1 LIKE '%/N_%' THEN 1 ELSE NULL END) AS 'n_count',
COUNT(CASE WHEN field1 LIKE '%/R_%' THEN 1 ELSE NULL END) AS 'r_count'
FROM dbo.tbl
I have a table with a bunch of boolean columns. I'd like to rank these columns by the count of true values each one has.
I found a way to count the number of true values in a column using:
SELECT count(CASE WHEN col1 THEN 1 ELSE null END) as col1,
count(CASE WHEN col2 THEN 1 ELSE null END) as col2
....
FROM my_table;
but this approach has two problems:
I have to manually type the names of the columns
I have to then transpose the result and order by value
Is there a way to do the whole operation one query?
This is not actually a crosstab job (or "pivot" in other RDBMS), but the reverse operation, "unpivot" if you will. One elegant technique is a VALUES expression in a LATERAL join.
The basic query can look like this, which takes care of:
I have to then transpose the result and order by value
SELECT c.col, c.ct
FROM (
SELECT count(col1 OR NULL) AS col1
, count(col2 OR NULL) AS col2
-- etc.
FROM tbl
) t
, LATERAL (
VALUES
('col1', col1)
, ('col2', col2)
-- etc.
) c(col, ct)
ORDER BY 2;
That was the simple part. Your other request is harder:
I have to manually type the names of the columns
This function takes your table name and retrieves meta data from the system catalog pg_attribute. It's a dynamic implementation of the above query, safe against SQL injection:
CREATE OR REPLACE FUNCTION f_true_ct(_tbl regclass)
RETURNS TABLE (col text, ct bigint)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT format('
SELECT c.col, c.ct
FROM (SELECT %s FROM tbl) t
, LATERAL (VALUES %s) c(col, ct)
ORDER BY 2 DESC'
, string_agg (format('count(%1$I OR NULL) AS %1$I', attname), ', ')
, string_agg (format('(%1$L, %1$I)', attname), ', ')
)
FROM pg_attribute
WHERE attrelid = _tbl -- valid, visible, legal table name
AND attnum >= 1 -- exclude tableoid & friends
AND NOT attisdropped -- exclude dropped columns
AND atttypid = 'bool'::regtype -- only character types
);
END
$func$;
Call:
SELECT * FROM f_true_ct('tbl'); -- table name optionally schema-qualified
Result:
col | ct
------+---
col1 | 3
col3 | 2
col2 | 1
Works for any table to rank all boolean columns by their count of true values.
To understand the function parameter, read this:
Table name as a PostgreSQL function parameter
Related answers with more explanation:
Check whether empty strings are present in character-type columns
Replace empty strings with null values
If I understand correctly, you can do this with a giant union all:
select c.*
from ((select 'col1' as which, sum(case when col1 then 1 else 0 end) as cnt from t
) union all
(select 'col2' as which, sum(case when col2 then 1 else 0 end) as cnt from t
) union all
. . .
) c
order by cnt desc;
Although you still need to type the results, this does sidestep the transpositions.
so basicially there is 1 question and 1 problem:
1. question - when I have like 100 columns in a table(and no key or uindex is set) and I want to join or subselect that table with itself, do I really have to write out every column name?
2. problem - the example below shows the 1. question and my actual SQL-statement problem
Example:
A.FIELD1,
(SELECT CASE WHEN B.FIELD2 = 1 THEN B.FIELD3 ELSE null FROM TABLE B WHERE A.* = B.*) AS CASEFIELD1
(SELECT CASE WHEN B.FIELD2 = 2 THEN B.FIELD4 ELSE null FROM TABLE B WHERE A.* = B.*) AS CASEFIELD2
FROM TABLE A
GROUP BY A.FIELD1
The story is: if I don't put the CASE into its own select statement then I have to put the actual rowname into the GROUP BY and the GROUP BY doesn't group the NULL-value from the CASE but the actual value from the row. And because of that I would have to either join or subselect with all columns, since there is no key and no uindex, or somehow find another solution.
DBServer is DB2.
So now to describing it just with words and no SQL:
I have "order items" which can be divided into "ZD" and "EK" (1 = ZD, 2 = EK) and can be grouped by "distributor". Even though "order items" can have one of two different "departements"(ZD, EK), the fields/rows for "ZD" and "EK" are always both filled. I need the grouping to consider the "departement" and only if the designated "departement" (ZD or EK) is changing, then I want a new group to be created.
SELECT
(CASE WHEN TABLE.DEPARTEMENT = 1 THEN TABLE.ZD ELSE null END) AS ZD,
(CASE WHEN TABLE.DEPARTEMENT = 2 THEN TABLE.EK ELSE null END) AS EK,
TABLE.DISTRIBUTOR,
sum(TABLE.SOMETHING) AS SOMETHING,
FROM TABLE
GROUP BY
ZD
EK
TABLE.DISTRIBUTOR
TABLE.DEPARTEMENT
This here worked in the SELECT and ZD, EK in the GROUP BY. Only problem was, even if EK was not the designated DEPARTEMENT, it still opened a new group if it changed, because he was using the real EK value and not the NULL from the CASE, as I was already explaining up top.
And here ladies and gentleman is the solution to the problem:
SELECT
(CASE WHEN TABLE.DEPARTEMENT = 1 THEN TABLE.ZD ELSE null END) AS ZD,
(CASE WHEN TABLE.DEPARTEMENT = 2 THEN TABLE.EK ELSE null END) AS EK,
TABLE.DISTRIBUTOR,
sum(TABLE.SOMETHING) AS SOMETHING,
FROM TABLE
GROUP BY
(CASE WHEN TABLE.DEPARTEMENT = 1 THEN TABLE.ZD ELSE null END),
(CASE WHEN TABLE.DEPARTEMENT = 2 THEN TABLE.EK ELSE null END),
TABLE.DISTRIBUTOR,
TABLE.DEPARTEMENT
#t-clausen.dk: Thank you!
#others: ...
Actually there is a wildcard equality test.
I am not sure why you would group by field1, that would seem impossible in your example. I tried to fit it into your question:
SELECT FIELD1,
CASE WHEN FIELD2 = 1 THEN FIELD3 END AS CASEFIELD1,
CASE WHEN FIELD2 = 2 THEN FIELD4 END AS CASEFIELD2
FROM
(
SELECT * FROM A
INTERSECT
SELECT * FROM B
) C
UNION -- results in a distinct
SELECT
A.FIELD1,
null,
null
FROM
(
SELECT * FROM A
EXCEPT
SELECT * FROM B
) C
This will fail for datatypes that are not comparable
No, there's no wildcard equality test. You'd have to list every field you want tested individually. If you don't want to test each individual field, you could use a hack such as concatenating all the fields, e.g.
WHERE (a.foo + a.bar + a.baz) = (b.foo + b.bar + b.az)
but either way, you're listing all of the fields.
I might tend to solve it something like this
WITH q as
(SELECT
Department
, (CASE WHEN DEPARTEMENT = 1 THEN ZD
WHEN DEPARTEMENT = 2 THEN EK
ELSE null
END) AS GRP
, DISTRIBUTOR
, SOMETHING
FROM mytable
)
SELECT
Department
, Grp
, Distributor
, sum(SOMETHING) AS SumTHING
FROM q
GROUP BY
DEPARTEMENT
, GRP
, DISTRIBUTOR
If you need to find all rows in TableA that match in TableB, how about INTERSECT or INTERSECT DISTINCT?
select * from A
INTERSECT DISTINCT
select * from B
However, if you only want rows from A where the entire row matches the values in a row from B, then why does your sample code take some values from A and others from B? If the row matches on all columns, then that would seem pointless. (Perhaps your question could be explained a bit more fully?)