Combine all rows in a column where most rows are null - sql

I am writing a query trying to match true account IDs to incorrect account IDs across two tables using the following query:
SELECT DISTINCT
p.visitor_id,
CASE WHEN p.visitor_id = c.username then accountId else null end as correct_account_id,
CASE WHEN c.accountId is null then p.account_id else null end as incorrect_account_id
FROM `a_table` p
LEFT JOIN `another_table` c
ON p.account_id = c.accountID
and am getting this result (single vistor_id subset):
visitor_id
correct_account_id
incorrect_account_id
1
null
id
1
id
null
1
null
null
I would like to create one row per visitor_id where there are no null values and just the two ids are listed.

It sounds like you want:
SELECT p.visitor_id,
MAX(CASE WHEN p.visitor_id = c.username then accountId else null end) as correct_account_id,
MAX(CASE WHEN c.accountId is null then p.account_id else null end) as incorrect_account_id
FROM a_table p LEFT JOIN another_table c ON p.account_id = c.accountID
GROUP BY p.visitor_id
I removed the distinct and added the group by. Group by p.visitor_id means you want one row per visitor_id.
I wrapped your two CASE statements in MAX which means we want the maximum value (within the visitor_id grouping) there. Depending on your DB, you might need MIN instead (some DBs order nulls after other values by default, some order nulls after other values by default). Either way, the intention here is to find a non-null value for the column.
The big assumption here is that for each visitor_id, you have at most one correct_account_id and at most one incorrect_account_id that you care about. If you have more than one for the same visitor_id, this will only get one of them (the max/min). (Given that you explicitly say you want one row per visitor_id, this seems like a safe assumption.)

Related

SQL - Count new entries based on last date

I have a table with the follow structure
ID ReportDate Object_id
What I need to know, is the count of new and count of old (Object id's)
For example: If I have the data below:
I want the following output grouped by ReportDate:
I thought a way doing it using a Where clause based on date, however i need the data for all the dates I have in the table. To see the count of what already existed in the previous report and what is new at that report. Any Ideas?
Edit: New/Old definition- New would be the records that never appeared before that report run date and appeared on this one, whereas old is the number of records that had at least one match in previous dates. I'll edit the post to include this info.
managed to do it using a left join. Below is my solution in case it helps anyone in the future :)
SELECT table.ReportRunDate,
-1*sum(table.ReportRunDate = new_table.init_date) as count_new,
-1*sum(table.ReportRunDate <> new_table.init_date) as count_old,
count(*) as count_total
FROM table LEFT JOIN
((SELECT Object_ID, min(ReportRunDate) as init_date
FROM table
GROUP By OBJECT_ID) as new_table)
ON table.Object_ID = new_table.Object_ID
GROUP BY ReportRunDate
This would work in Oracle, not sure about ms-access:
SELECT ReportDate
,COUNT(CASE WHEN rnk = 1 THEN 1 ELSE NULL END) count_of_new
,COUNT(CASE WHEN rnk <> 1 THEN 1 ELSE NULL END)count_of_old
FROM (SELECT ID
,ReportDate
,Object_id
,RANK() OVER (PARTITION BY Object_id ORDER BY ReportDate) rnk
FROM table_name)
GROUP BY ReportDate
Inner query should rank each occurence of object_id based on the ReportDate so the 1st occurrence of certain object_id will have rank = 1, the next one rank = 2 etc.
Then the outer query counts how many records with rank equal/not equal 1 are the within each group.
I assumed that 1 object_id can appear only once within each reportDate.

How can I find out the relationship between two columns in database?

I have a view defined in SQL Server database and it has two columns A and B, both of which have the type of INT. I want to find out the relationship between these two, 1 to 1 or 1 to many or many to many. Is there a SQL statement I can use to find out?
For the relationship, it means for a given value of A, how many values of B maps to this value. If there is only one value, then it is 1 to 1 mapping.
You could use CTEs to generate COUNTs of how many distinct A values were associated with each B value and vice versa, then take the MAX of those values to determine if the relationship is 1 or many on each side. For example:
WITH CTEA AS (
SELECT COUNT(DISTINCT B) ac
FROM t
GROUP BY A
),
CTEB AS (
SELECT COUNT(DISTINCT A) bc
FROM t
GROUP BY B
)
SELECT CONCAT(
CASE WHEN MAX(bc) = 1 THEN '1' ELSE 'many' END,
' to ',
CASE WHEN MAX(ac) = 1 THEN '1' ELSE 'many' END
) AS [A to B]
FROM CTEA
CROSS JOIN CTEB
Note that any time a relationship is listed as 1, it may actually be many but just not showing that because of limited data in the table.
Demo on dbfiddle
Assuming you have no NULL values:
select (case when count(*) = count(distinct a) and
count(*) = count(distinct b)
then '1-1'
when count(*) = count(distinct a) or
count(*) = count(distinct b)
then '1-many'
else 'many-many'
end)
from t;
Note: This does not distinguish between 1-many for a-->b or b-->a.
You would use count and group by to get this information.
--This would give you count of values of b which map to every values of a. If there is at least one row with a count give you a value greater than 1 it means the mapping between a and b is one to many.
select a,count( distinct b)
from table
group by a
If all of the rows have the values equal to one for all of the elements in a then the mapping is one-one
A caveat , null in b would be ignored in count expressions. ie because null and another null is not equivalent

Clean up 'duplicate' data while preserving most recent entry

I want to display each crew member, basic info, and the most recent start date from their contracts. With my basic query, it returns a row for each contract, duplicating the basic info with a distinct start and end date.
I only need one row per person, with the latest start date (or null if they have never yet had a start date).
I have limited understanding of group by and partition functions. Queries I have reverse engineered for similar date use partition and create temp tables where they select from. Ultimately I could reuse that but it seems more convoluted than what we need.
select
Case when P01.EMPLOYMENTENDDATE < getdate() then 'Y'
else ''
end as "Deactivate",
concat(p01.FIRSTNAME,' ',p01.MIDDLENAME) as "First and Middle",
p01.LASTNAME,
p01.PIN,
(select top 1 TELENO FROM PW001P0T WHERE PIN = P01.PIN and TELETYPE = 6 ORDER BY TELEPRIORITY) as "EmailAddress",
org.NAME AS Vessel,
case
WHEN c02.CODECATEGORY= '20' then 'MARINE'
WHEN c02.CODECATEGORY= '10' then 'MARINE'
ELSE 'HOTEL' end as "Department",
c02.name as RankName,
c02.Alternative RankCode,
convert(varchar, ACT.DATEFROM,101) EmbarkDate,
convert(varchar,(case when ACT.DATEFROM is null then p03.TODATEESTIMATED else ACT.DATEFROM end),101) DebarkDate
FROM PW001P01 p01
JOIN PW001P03 p03
ON p03.PIN = p01.PIN
LEFT JOIN PW001C02 c02
ON c02.CODE = p03.RANK
/*LEFT JOIN PW001C02 CCIRankTbl
ON CCIRankTbl.CODE = p01.RANK*/
LEFT JOIN PWORG org
ON org.NUMORGID = dbo.ad_scanorgtree(p03.NUMORGID, 3)
LEFT JOIN PWORGVESACT ACT
ON ACT.numorgid=dbo.ad_scanorgtree(p03.numorgid,3)
where P01.EMPLOYMENTENDDATE > getdate()-10 or P01.EMPLOYMENTENDDATE is null
I only need to show one row per column. The first 5 columns will be the same always. The last columns depend on contract, and we just need data from the most recent one.
<table><tbody><tr><th>Deactivate</th><th>First and Middle</th><th>Lastname</th><th>PIN</th><th>Email</th><th>Vessel</th><th>Department</th><th>Rank</th><th>RankCode</th><th>Embark</th><th>Debark</th></tr><tr><td> </td><td>Martin</td><td>Smith</td><td>123</td><td>msmith#fake.com</td><td>Ship1</td><td>Marine</td><td>ViceCaptain</td><td>VICE</td><td>9/1/2008</td><td>9/20/2008</td></tr><tr><td> </td><td>Matin</td><td>Smith</td><td>123</td><td>msmith#fake.com</td><td>Ship2</td><td>Marine</td><td>Captain</td><td>CAP</td><td>12/1/2008</td><td>12/20/2008</td></tr><tr><td> </td><td>Steve Mark</td><td>Dude</td><td>98765</td><td>sdude#fake.com</td><td>Ship1</td><td>Hotel</td><td>Chef</td><td>CHEF</td><td>5/1/2009</td><td>8/1/2009</td></tr><tr><td> </td><td>Steve Mark</td><td>Dude</td><td>98765</td><td>sdude#fake.com</td><td>Ship3</td><td>Hotel</td><td>Chef</td><td>CHEF</td><td>10/1/2010</td><td>12/20/2010</td></tr></tbody></table>
Change your query to a SELECT DISTINCT on the main query and use a sub-select for DebarkDate column:
(SELECT TOP 1 A.DATEFROM FROM PWORGVESACT A WHERE A.numorgid = ACT.numorgid ORDER BY A.DATEFROM DESC) AS DebarkDate
You can do whatever conversions on the date you need to from the result of that sub-query.

SQL null spaces in calculated columns

I have created a calculated column but it is giving me a row with null value. If I add another calculated field, it adds 2 null rows, and so on.
My objective is to get a single row with a single value. No nulls.
The code:
SELECT
CLIENT_CODE,
( CASE WHEN CLITBP.TBPCODIGO=101 THEN COALESCE( CLITBP.TBPDESC2,0) ELSE NULL END) TAB101
FROM
CLIENT
GROUP BY 1,2
the wrong output
the intended output
If you want one row per client code, then you should have only one key in the GROUP BY. Perhaps this is what you want:
SELECT CLIENT_CODE,
MAX(CASE WHEN CLITBP.TBPCODIGO = 101 THEN COALESCE(CLITBP.TBPDESC2, 0) END) as TAB101
FROM CLIENT
GROUP BY CLIENT_CODE;

Need a count of one column if i am having another column values multiple

I am having this query:
select qos.orgname, qos.org, qos.suborg, qos.Archive, qos.location, count(c.coe) AS DEPT, c.coe AS DEP,
qos.siteid, qos.admin as sitelead,
CASE When qos.Archive = 0 THEN 'Active'
when qos.Archive is null THEN '-'
ELSE 'Archived'
END AS STATUS
from qryOrgsite qos WITH (NOLOCK)
LEFT JOIN ltbcoe c WITH (NOLOCK) on qos.orgname = c.orgname and qos.location= c.location
group by qos.orgname, qos.location, qos.org, qos.suborg, qos.Archive, c.coe,
qos.siteid, qos.ADMIN
This gives me some records as follows:
So i want the count of "Dept" column which are active. I mean it should return only one row with Organization B and Dept as 7....e.g here the Dept column should be 7.
that means I want count of c.coe column.
The problem here is your GROUP BY is too inclusive. What the query is asking for is a count, but the results have to be unique by all of the columns in your GROUP BY. If you only want a count per orgname, you will need to do
SELECT qos.orgname, COUNT(*)
FROM qryOrgsite qos
GROUP BY qos.orgname
This essentially says that you want to count all rows by the orgname. Each column you add to the group by creates unique combinations for your COUNT. For example, if you grouped by orgname and location it would give you a roll up count for each combination of those two columns. Based on the data you show above this would result in
OrganizationB Demo-Fixe 1
OrganizationB GE CapitalP 3
OrganizationB Hadasa Plant 1
OrganizationB Mostoles Plant 1
You can wrap your query in another:
select orgname, count(*)
from (
select qos.orgname, qos.org, qos.suborg, qos.Archive, qos.location, count(c.coe) AS DEPT, c.coe AS DEP,
qos.siteid, qos.admin as sitelead,
CASE When qos.Archive = 0 THEN 'Active'
when qos.Archive is null THEN '-'
ELSE 'Archived'
END AS STATUS
from qryOrgsite qos WITH (NOLOCK)
LEFT JOIN ltbcoe c WITH (NOLOCK) on qos.orgname = c.orgname and qos.location= c.location
group by qos.orgname, qos.location, qos.org, qos.suborg, qos.Archive, c.coe,
qos.siteid, qos.ADMIN) t1
where t1.orgname = 'Organization B' and t1.STATUS = 'Active'
group by t1.orgname
guys i have got the answer.
I had to remove the department name from the group by and the select
because the count(c.coe) didnt had any effect.
Thanks for all you help