Exclude rows where ID is duplicated but 3rd column has "unknown - sql

I have this dataset where there are rows with duplicate personIDs but different pcpgrouper2. I only want to exclude "Unknown" where there are more than 1 of the same PersonID and one of them is "Unknown". If the ID has only one row with "Unknown" then keep that row.
Name
PersonID
PCPGrouper
ABBAN, AVRIL
1094893
Unknown
ABBIS,CHLOE
1114294
T Docs
ABBIS,CHLOE
1114294
Unknown

We can try using the following SQL query (assuming you are using SQL):
SELECT Name, PersonID, PCPGrouper
FROM yourTable t1
WHERE PCPGrouper != 'Unknown' OR
NOT EXISTS (
SELECT 1
FROM yourTable t2
WHERE t2.PersonID = t1.PersonID AND
t2.PCPGrouper != 'Unknown'
);
The above logic retains any person record whose grouper value is not unknown or is unknown but there is no other value.

Related

SQL join of multiple queries using CTE

WITH group1 AS
(
SELECT
[column1],
[column2]
FROM
table1
),
Group2 AS
(
SELECT
(column3),
COUNT(column3)
FROM
table 2 AS Count
WHERE
(year (date_value) = 2018 and month(Date_vaLue) = 2)
GROUP BY
column2
)
SELECT *
FROM group1
JOIN group2 ON group1. table1 = group2.table2;
I get an error:
No column name was specified for column 2 of 'group2'
As this isn't a column and is just an identifier I am confused why it thinks the code (Group2 AS (Select (column3 ),) is a column.
I am new at sql so this might just be a silly error
Column 1 is a name and column two is a unique key for that name
Column 2 and column 3 contain the same exact data and I am simply trying to show the number of times it occurs in the DB on the column 3 table, including 0, and relate it back to column 1.
Each datapoint in column 3 contains only data from column2.
Thanks in advance!
There are so many errors in that query, I don't know where to start
In a cte each column must have a name. select columnname makes the resulting column named columnname. An aggregation function like count does not set a column name, so your second column in your second cte does not have a name, as the error states. Use
SELECT column, count(othercolumn) AS ctcol ...
You can't add columns you don't use in the grouping to your select list without an aggregation function. Furthermore you can't add a column aggregated and unggregated to the select list. But I suppose, that's only a typo
SELECT column2, COUNT(column3) AS ctcol
FROM tablexy
...
GROUP BY column2
Your cte don't have any columns named table1 or table2, so your join won't work. Use column named from the cte
SELECT * FROM group1 JOIN group2 ON group1.column2 = group2.column2
I think you need to name the column COUNT(column3) , so...
Group2 AS (Select (column3 ),
COUNT (column3) as cntr
From table 2 as Count
Where (year (date_value) = 2018 and month(Date_vaLue) = 2)
Group by column2
)

hive scan and select in one query

I have a hive table, say emp_details(name, dept).
In this table, I need to check if any records exists with dept = ‘a’ then select those records. If no such record is found then only I will choose records with dept = ‘b’. The source data has either 'a' or 'b' as dept value and my result set will contain either 'a' or 'b' not both.
The problem is I am bound to use only one hive query for this issue.
Calculate a_exist flag and use it for filtering:
select name, dept
from
(select name,
dept,
(count(case when dept='a' then 1 end) over()>0) as a_exist
from test_a
)a
where (a_exist and dept='a') --only a if exists
or ((NOT a_exist)and dept='b') --return b if a not exists
;

How select values where all columns are null for particular ID, ID is not unique

I have a table with following format and I want to get the LotId if Value1 is null for all the rows.
Now If I am doing Select,
Select * from Table1 where Value1 IS null , I am getting back a row .
But I want nothing should be returned as there are two rows which have some value.
I thought of self join , but this can have n number of rows.
Id LotId Value1
-------------------------------------------------
1 LOt0065 NULL
2 LOt0065 SomeValue
3 LOt0065 SomeValue
I think you'll need to use an EXISTS subquery here:
SELECT a.lotid
FROM table1 a
WHERE NOT EXISTS (
SELECT 1
FROM table1 b
WHERE b.lotid = a.lotid
AND b.value1 IS NOT NULL
);
If my syntax is right, then this will show you all records that don't have any NULL values for that lotid:
It uses a SELECT 1 because the subquery doesn't need to show any value, it just needs to match on the outer query.
You compare the table in the inner query to the table in the outer query and match on the common field you're looking at (lotid in this case)
This could also be done with a NOT IN clause.
Does this give you the result you want?

Check for same character value in column

I'm trying to verify when a column has all the same values for the same group. Here is a sample of my table data:
So using this data, for example. I want to check to see if all values of Status is the same for every row with the same TPID. So TPID 60210 should result with True since both items have a Status of A. However, TPID 60061 should result in false since two of the Line_Item show A and the rest P.
I intend to update a different table using this information, setting its status using a CASE statement. But I'm at a loss how to check against this column to find the values I desire.
;WITH CTE_Count
AS
(
SELECT TPID, COUNT(DISTINCT Status) CNT
FROM TableName
GROUP BY TPID
)
UPDATE AnotherTableName
SET ColumnName = (
CASE WHEN CTE_Count.CNT = 1 -- all row has same status
THEN SomeValue
ELSE SomeOtherValue END
)
FROM AnotherTableName
INNER JOIN CTE_Count ON ...

In postgresql, how can I fill in missing values within a column?

I'm trying to figure out how to fill in values that are missing from one column with the non-missing values from other rows that have the same value on a given column. For instance, in the below example, I'd want all the "1" values to be equal to Bob and all of the "2" values to be equal to John
ID # | Name
-------|-----
1 | Bob
1 | (null)
1 | (null)
2 | John
2 | (null)
2 | (null)
`
EDIT: One caveat is that I'm using postgresql 8.4 with Greenplum and so correlated subqueries are not supported.
CREATE TABLE bobjohn
( ID INTEGER NOT NULL
, zname varchar
);
INSERT INTO bobjohn(id, zname) VALUES
(1,'Bob') ,(1, NULL) ,(1, NULL)
,(2,'John') ,(2, NULL) ,(2, NULL)
;
UPDATE bobjohn dst
SET zname = src.zname
FROM bobjohn src
WHERE dst.id = src.id
AND dst.zname IS NULL
AND src.zname IS NOT NULL
;
SELECT * FROM bobjohn;
NOTE: this query will fail if more than one name exists for a given Id. (and it won't touch records for which no non-null name exists)
If you are on a postgres version >-9, you could use a CTE to fetch the source tuples (this is equivalent to a subquery, but is easier to write and read (IMHO). The CTE also tackles the duplicate values-problem (in a rather crude way):
--
-- CTE's dont work in update queries for Postgres version below 9
--
WITH uniq AS (
SELECT DISTINCT id
-- if there are more than one names for a given Id: pick the lowest
, min(zname) as zname
FROM bobjohn
WHERE zname IS NOT NULL
GROUP BY id
)
UPDATE bobjohn dst
SET zname = src.zname
FROM uniq src
WHERE dst.id = src.id
AND dst.zname IS NULL
;
SELECT * FROM bobjohn;
UPDATE tbl
SET name = x.name
FROM (
SELECT DISTINCT ON (id) id, name
FROM tbl
WHERE name IS NOT NULL
ORDER BY id, name
) x
WHERE x.id = tbl.id
AND tbl.name IS NULL;
DISTINCT ON does the job alone. Not need for additional aggregation.
In case of multiple values for name, the alphabetically first one (according to the current locale) is picked - that's what the ORDER BY id, name is for. If name is unambiguous you can omit that line.
Also, if there is at least one non-null value per id, you can omit WHERE name IS NOT NULL.
If you know for a fact that there are no conflicting values (multiple rows with the same ID but different, non-null names) then something like this will update the table appropriately:
UPDATE some_table AS t1
SET name = (
SELECT name
FROM some_table AS t2
WHERE t1.id = t2.id
AND name IS NOT NULL
LIMIT 1
)
WHERE name IS NULL;
If you only want to query the table and have this information filled in on the fly, you can use a similar query:
SELECT
t1.id,
(
SELECT name
FROM some_table AS t2
WHERE t1.id = t2.id
AND name IS NOT NULL
LIMIT 1
) AS name
FROM some_table AS t1;