Transpose only certain data in SQL - sql

My data looks like this:
Company Year Total Comment
Comp A 01-01-2000 5,000 Checked
Comp A 01-01-2001 6,000 Checked
Comp B 05-05-2007 3,000 Not checked completely
Comp B 05-05-2008 4,000 Checked
Comp C 18-01-2003 1,500 Not checked completely
Comp C 18-01-2002 3,500 Not checked completely
I've been asked to transpose certain data, but I do not believe this can be done using SQL (Server) so that it looks like this:
Company Base Date Base Date-1 Comment Base Date Comment Base Date-1
Comp A 01-01-2001 01-01-2000 Checked Checked
Comp B 05-05-2008 05-05-2007 Checked Not completely checked
Comp C 18-01-2003 18-01-2002 Not completely checked Not completely checked
I have never built anything like this. If I would then maybe Excel is a better alternative? How should I tackle this?
Is it possible using SELECT MAX(Base Date) and MIN(Base Date)? And how would I then tackle the strings like that..

You can use a self join to do this. However, you should think about dates like February 29 as they only occur in leap years.
select t1.company,t1.year as basedate,t2.year as basedate_1,
t1.comment as comment_basedate,t2.comment as comment_basedate_1
from t t1
left join t t2 on t1.company=t2.company dateadd(year,1,t2.year)=t1.year
Change the left join to an inner join if you only need results where both the date values exist for a company. This solution assumes there can only be one comment per day.

I'd assign a row number to each record partitioned by company ordered by year desc though an analytical function in a common table expression... then use a left self join... on the row number + 1 and company.
This assumes you only want 1 record per company using the 2 most recent years. and if only 1 record exists for a company null values are acceptable for the second year. If not we can change the left join to an inner and eliminate both records...
We use a common table expression (though a inline view would work as well) to assign a row number to each record. That value is then made available in our self join so we don't have to worry about different dates and max values. We then use our RowNumber (RN) and company to join the 2 desired records together. To save on some performance we limit 1 table to RN 1 and the second table to RN 2.
WITH CTE AS (
SELECT *, Row_Number() over (Partition by Company Order by Year Desc) RN FROM TABLE)
SELECT A.Company
, A.Year as Base_Date
, B.Year as Base_Date1
, A.comment as Base_Date_Comment
, B.Comment as Base_Date1_Comment
FROM CTE A
LEFT JOIN CTE B
on A.RN+1 = B.RN
and A.Company = B.Company
and B.RN = 2
WHERE A.RN = 1
Note the limit on RN=2 must be on the join since it's an outer join or we would eliminate the companies without 2 years. (in essence making the left join an inner)
This approach makes all columns of the data available for each row.

If there are only two rows each, then that's pretty simple. If there are more than two rows, you could do something like this -- essentially joining all rows, then making sure A represents the earliest row and B represents the latest row.
SELECT A.Company, A.Year AS [Base Date], B.Year AS [Base Date 1],
A.Comment AS [Comment Base Date], B.Comment AS [Comment Base Date 1]
FROM MyTable A
INNER JOIN MyTable B ON A.Company = B.Company
WHERE A.Year = (SELECT MIN(C.YEAR) FROM MyTable C WHERE C.Company = A.Company)
AND B.Year = (SELECT MAX(C.YEAR) FROM MyTable C WHERE C.Company = B.Company)
There might be a more efficient way to do this with Row_Number or something.

Related

LEFT JOIN by closer value condition

I have this query
SELECT
loc.proceso,
loc.codigo_municipio,
loc.codigo_concejo,
loc.concejo,
(CASE
WHEN loc.poblacion IS NOT NULL THEN loc.poblacion
ELSE pob.valor
END) AS poblacion
FROM develop.031401_elecciones_dimension_localizacion_electoral AS loc
LEFT JOIN develop.031401_elecciones_dimension_proceso_electoral AS proc
ON loc.proceso = proc.proceso
LEFT JOIN develop.020101_t05 AS pob
ON loc.codigo_municipio = CAST(pob.cmun AS INT) AND pob.year = proc.anno_eleccion
In the second LEFT JOIN, I would like to change the second condition pob.year = proc.anno_eleccion so that it does not only search for the exact year when joining. Instead, I would like to get the closer year stored in my pob table. For example, the first year stored in pob is 2003, so I want all the entries in loc whose year is lower than 2003 to be matched with that value when performing the join. Also at the inverse, the last year stored in pob is 2020, so I want those entries in loc whose year is 2021 (or even greater), to be matched with the 2020 row from my pob table. When the exact year is contained in pob table, it should be used for the join.
1. If you want the nearest year to NOW
I don't think of a direct join but you can try this one by using ROW_NUMBER() function to sort data by year and pick the first result to join:
(WHERE rn = 1 picks the first index, so it prevents any duplicate)
LEFT JOIN
(SELECT T.* FROM
(SELECT ROW_NUMBER() OVER (PARTITION BY pob.cmun ORDER BY pob.year DESC) AS rn,
*
FROM develop.020101_t05) AS T
WHERE rn = 1) AS pob
ON loc.codigo_municipio = CAST(pob.cmun AS INT) AND pob.year = proc.anno_eleccion
2. If you want the nearest year to your data
Even it's not best practice, you can join your data using comparison operators on join condition. Then, take the difference between two years, sort the difference ascending and pick the first result using ROW_NUMBER() function. See example:
SELECT * FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY a.Id ORDER BY a.Year - b.Year) AS RowNumber,
a.Id,
a.Year,
b.Year,
a.Year - b.Year AS YearDiff
FROM a
LEFT JOIN b ON a.Id = b.Id AND a.Year >= b.Year) AS T
WHERE RowNumber = 1

Oracle sql query that returns customers who are coming for the first time in property

I have 3 tables: customer, property and stays. Table customer contains all the data about customers (customer_id, name, surname, email...). Table property contains the list of all the properties (property_id, property_name...) and table stays contains all the earlier stays of customers (customer_id, property_id, stay_id, arrival_date, departure_date...).
Some customers stayed multiple times in more than one properties and some customers are coming for the first time.
Can someone please explain oracle sql query which returns only the customers who are stying in any of the properties for the first time.
Sorry guys for answering late..
This is what I got so far.
Tables:
Customer $A$
Stays $B$
Property $C$
Customer_Fragments $D$
SELECT a.RIID_,
sub.CUSTOMER_ID_,
sub.PROPERTY_ID,
b.ARRIVAL_DATE,
c.PROPERTY_SEGMENT,
ROW_NUMBER() OVER (
PARTITION BY sub.CUSTOMER_ID_,
sub.PROPERTY_ID
ORDER BY
sub.CUSTOMER_ID_ asc
) RN
FROM
(
SELECT
b.CUSTOMER_ID_,
b.PROPERTY_ID
FROM
$B$ b
GROUP BY
b.CUSTOMER_ID_,
b.PROPERTY_ID
HAVING
COUNT(*)= 1
) sub
INNER JOIN $A$ a ON sub.CUSTOMER_ID_ = a.CUSTOMER_ID_
INNER JOIN $B$ b ON sub.CUSTOMER_ID_ = b.CUSTOMER_ID_
INNER JOIN $C$ c ON sub.PROPERTY_ID = c.PROPERTY_ID
LEFT JOIN $D$ d ON a.RIID_ = d.RIID_
WHERE
b.ARRIVAL_DATE = TRUNC(SYSDATE + 7)
AND c.PROPERTY_DESTINATION = 'Destination1'
AND lower(c.NAME_) NOT LIKE ('unknown%')
AND a.NWL_PERMISSION_STATUS = 'I'
AND a.EMAIL_DOMAIN_ NOT IN ('abuse.com', 'guest.booking.com')
AND (d.BLACKLISTED != 'Y'
or d.BLACKLISTED is null
)
I want to select all customers who will come to Destination1, 7days from today to inform them about some activities. Customers can book several
properties in Destination1 and have the same arrival date (example: I can book a room in property1 for me and my wife and also book a room in property2 for my friends.. and we all come to destination1 on the same arrival date).
When this is the case I want to send just one info email to a customer and not two emails. The above SQL query returns two rows when this is the case and I want it to return just one row (one row = one email).
It is always required to post a code you have already tried to write so that we can than help you with eventual mistakes.
After that being said, I'll try to help you nevertheless, writing the full code.
Try this:
select cus.*
from customers cus
join stays st
on st.customer_id = cus.customer_id
where st.arrival_date >= YOUR_DATE --for example SYSDATE or TRUNC(SYSDATE)
and 1 = (select count(*)
from stays st2
where st2.customer_id = cus.customer_id)
You haven't specified it, but I GUESS that you are interested in getting the first-time-customers whose arrival date will be at or after some specified date. I've written that into my code in WHERE clause, where you should input such a date.
If you remove that part of the WHERE clause, you'll get all the customers that stayed just once (even if that one and only stay was 10 years ago, for example). What's more, if you remove that part of the code, you can than also remove the join on stays st table from the query too, as the only reason for that join was the need to have access to arrival date.
If you need explanations for some other parts of the code too, ask in the comments for this answer.
I hope I helped!
WITH first_stays (customer_id, property_id, first_time_arrival_date, stay_no) AS
(
SELECT s.customer_id
, s.property_id
, s.arrival_date AS first_time_arrival_date
, ROW_NUMBER() OVER (PARTITION BY s.customer_id, s.property_id ORDER BY s.arrival_date) stay_no
FROM stays s
)
SELECT c.surname AS customer_surname
, c.name AS customer_name
, p.property_name
, s.first_time_arrival_date
FROM customer c
INNER
JOIN first_stays s
ON s.customer_id = c.customer_id
AND s.stay_no = 1
INNER
JOIN property p
ON p.property_id = s.property_id
The first part WITH first_stays is a CTE (Common Table Expression, called subquery factoring in Oracle) that will number the stays for each pair (customer_id, property_id) ordered by the arrival date using a window function ROW_NUMBER(). Then just join those values to the customer and property tables to get their names or whatever, and apply stay_no = 1 (first stay) condition.
If I understand the question correctly, this is not a complicated query:
select c.*
from c join
(select s.customer_id
from stays s
group by s.customer_id
having count(*) = 1
) s
on s.customer_id = c.customer_id;

Why is my Left Not pulling all the dates even though they exist on the other table SQL

I have a table without dates and wish to join on the table with dates.I am doing a left join on id and bn_number. The id can have more than one dates , i obviously want the latest date from the other tables as it has more than one date for each id. i am not sure how to get all the dates at least then i can be able to choose the latest one.
select Reg_Property_id,a.Bnd_nbr,account_balance,abs(account_balanc‌​e) as Bond_Balance,a.Bnd_regDate
into #Jan2014ValidFin
from #Jan2014Valid aa
left join Pr_analytics..bond a
on aa.Reg_Property_id=a.Prop_id
and aa.bnd_nbr=a.Bnd_nbr
where aa.reg_property_id is not null
SQL
Please assist.
Use the ROW_NUMBER() window function to get the most recent date:
SELECT c.*
FROM (
SELECT a.cols, b.cols, ROW_NUMBER() OVER (PARTITION BY b.colID1,b.colID2 ORDER BY b.theDate DESC) AS rn
FROM a
LEFT OUTER JOIN b ON a.col1 = b.col1
AND a.col2 = b.col2
) c
WHERE c.rn = 1
A simple group by should do the trick:
SELECT
Reg_Property_id -- What table is this from?
,a.Bnd_nbr
,account_balance -- What table is this from?
,abs(account_balance) as Bond_Balance -- What table is this from?
,max(a.Bnd_regDate) as Bnd_regDate
into #Jan2014ValidFin
from #Jan2014Valid aa
left join Pr_analytics..bond a
on aa.Reg_Property_id = a.Prop_id
and aa.bnd_nbr = a.Bnd_nbr
where aa.reg_property_id is not null
group by
Reg_Property_id
,a.Bnd_nbr
,account_balance
,abs(account_balance)
Note that if there are no dates (a.Bnd_regDate), you will get NULL
Note also that if any of the values marked "what table is this from" are found in #Jan2014Valid, you will need to either aggregate them (max, sum, etc.) or include them in the group by clause--I can't tell which, from the information provided.

SQL Get aggregate as 0 for non existing row using inner joins

I am using SQL Server to query these three tables that look like (there are some extra columns but not that relevant):
Customers -> Id, Name
Addresses -> Id, Street, StreetNo, CustomerId
Sales -> AddressId, Week, Total
And I would like to get the total sales per week and customer (showing at the same time the address details). I have come up with this query
SELECT a.Name, b.Street, b.StreetNo, c.Week, SUM (c.Total) as Total
FROM Customers a
INNER JOIN Addresses b ON a.Id = b.CustomerId
INNER JOIN Sales c ON b.Id = c.AddressId
GROUP BY a.Name, c.Week, b.Street, b.StreetNo
and even if my SQL skill are close to none it looks like it's doing its job. But now I would like to be able to show 0 whenever the one customer don't have sales for a particular week (weeks are just integers). And I wonder if somehow I should get distinct values of the weeks in the Sales table, and then loop through them (not sure how)
Any help?
Thanks
Use CROSS JOIN to generate the rows for all customers and weeks. Then use LEFT JOIN to bring in the data that is available:
SELECT c.Name, a.Street, a.StreetNo, w.Week,
COALESCE(SUM(s.Total), 0) as Total
FROM Customers c CROSS JOIN
(SELECT DISTINCT s.Week FROM sales s) w LEFT JOIN
Addresses a
ON c.CustomerId = a.CustomerId LEFT JOIN
Sales s
ON s.week = w.week AND s.AddressId = a.AddressId
GROUP BY c.Name, a.Street, a.StreetNo, w.Week;
Using table aliases is good, but the aliases should be abbreviations for the table names. So, a for Addresses not Customers.
You should generate a week numbers, rather than using DISTINCT. This is better in terms of performance and reliability. Then use a LEFT JOIN on the Sales table instead of an INNER JOIN:
SELECT a.Name
,b.Street
,b.StreetNo
,weeks.[Week]
,COALESCE(SUM(c.Total),0) as Total
FROM Customers a
INNER JOIN Addresses b ON a.Id = b.CustomerId
CROSS JOIN (
-- Generate a sequence of 52 integers (13 x 4)
SELECT ROW_NUMBER() OVER (ORDER BY a.x) AS [Week]
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a(x)
CROSS JOIN (SELECT x FROM (VALUES(1),(1),(1),(1)) b(x)) b
) weeks
LEFT JOIN Sales c ON b.Id = c.AddressId AND c.[Week] = weeek.[Week]
GROUP BY a.Name
,b.Street
,b.StreetNo
,weeks.[Week]
Please try the following...
SELECT Name,
Street,
StreetNo,
Week,
SUM( CASE
WHEN Total IS NULL THEN
0
ELSE
Total
END ) AS Total
FROM Customers a
JOIN Addresses b ON a.Id = b.CustomerId
RIGHT JOIN Sales c ON b.Id = c.AddressId
GROUP BY a.Name,
c.Week,
b.Street,
b.StreetNo;
I have modified your statement in three places. The first is I changed your join to Sales to a RIGHT JOIN. This will join as it would with an INNER JOIN, but it will also keep the records from the table on the right side of the JOIN that do not have a matching record or group of records on the left, placing NULL values in the resulting dataset's fields that would have come from the left of the JOIN. A LEFT JOIN works in the same way, but with any extra records in the table on the left being retained.
I have removed the word INNER from your surviving INNER JOIN. Where JOIN is not preceded by a join type, an INNER JOIN is performed. Both JOIN and INNER JOIN are considered correct, but the prevailing protocol seems to be to leave the INNER out, where the RDBMS allows it to be left out (which SQL-Server does). Which you go with is still entirely up to you - I have left it out here for illustrative purposes.
The third change is that I have added a CASE statement that tests to see if the Total field contains a NULL value, which it will if there were no sales for that Customer for that Week. If it does then SUM() would return a NULL, so the CASE statement returns a 0 instead. If Total does not contain a NULL value, then the SUM() of all values of Total for that grouping is performed.
Please note that I am assuming that Total will not have any NULL values other than from the RIGHT JOIN. Please advise me if this assumption is incorrect.
Please also note that I have assumed that either there will be no missing Weeks for a Customer in the Sales table or that you are not interested in listing them if there are. Again, please advise me if this assumption is incorrect.
If you have any questions or comments, then please feel free to post a Comment accordingly.

Joining 3 tables on 2 columns?

I've created 3 views with identical columns- Quantity, Year, and Variety. I want to join all three tables on year and variety in order to do some calculations with quantities.
The problem is that a particular year/variety combo does not occur on every view.
I've tried queries like :
SELECT
*
FROM
a
left outer join
b
on a.variety = b.variety
left outer join
c
on a.variety = c.variety or b.variety = c.variety
WHERE
a.year = '2015'
and b.year = '2015'
and a.year= '2015'
Obviously this isn't the right solution. Ideally I'd like to join on both year and variety and not use a where statement at all.
The desired output would be put all quantities of matching year and variety on the same line, regardless of null values on a table.
I really appreciate the help, thanks.
You want a full outer join, not a left join, like so:
Select coalesce(a.year, b.year, c.year) as Year
, coalesce(a.variety, b.variety, c.variety) as Variety
, a.Quantity, b.Quantity, c.Quantity
from tableA a
full outer join tableB b
on a.variety = b.variety
and a.year = b.year
full outer join tableC c
on isnull(a.variety, b.variety) = c.variety
and isnull(a.year, b.year) = c.year
where coalesce(a.year, b.year, c.year) = 2015
The left join you are using won't pick up values from b or c that aren't in a. Additionally, your where clause is dropping rows that don't have values in all three tables (because the year in those rows is null, which is not equal to 2015). The full outer join will grab rows from either table in the join, regardless of whether the other table contains a match.