How to find missing data in table Sql - sql

This is similar to How to find missing data rows using SQL? and How to find missing rows (dates) in a mysql table? but a bit more complex, so I'm hitting a wall.
I have a data table with the noted Primary key:
country_id (PK)
product_id (PK)
history_date (PK)
amount
I have a products table with all products, a countries table, and a calendar table with all valid dates.
I'd like to find all countries, dates and products for which there are missing products, with this wrinkle:
I only care about dates for which there are entries for a country for at least one product (i.e. if the country has NOTHING on that day, I don't need to find it) - so, by definition, there is an entry in the history table for every country and date I care about.
I know it's going to involve some joins maybe a cross join, but I'm hitting a real wall in finding missing data.
I tried this (pretty sure it wouldn't work):
SELECT h.history_date, h.product_id, h.country_id, h.amount
FROM products p
LEFT JOIN history h ON (p.product_id = h.product_id)
WHERE h.product_id IS NULL
No Joy.
I tried this too:
WITH allData AS (SELECT h1.country_id, p.product_id, h1.history_date
FROM products p
CROSS JOIN (SELECT DISTINCT country_id, history_date FROM history) h1)
SELECT f.history_date, f.product_id, f.country_id
FROM allData f
LEFT OUTER JOIN history h ON (f.country_id = h.country_id AND f.history_date = h.history_date AND f.product_id = h.product_id)
WHERE h.product_id IS NULL
AND h.country_id IS NOT NULL
AND h.history_date IS NOT null
also no luck. The CTE does get me every product on every date that there is also data, but the rest returns nothing.

I only care about dates for which there are entries for a country for
at least one product (i.e. if the country has NOTHING on that day, I
don't need to find it)
So we care about this combination:
from (select distinct country_id, history_date from history) country_date
cross join products p
Then it's just a matter of checking for existence:
select *
from (select distinct country_id, history_date from history) country_date
cross join products p
where not exists (select null
from history h
where country_date.country_id = h.country_id
and country_date.history_date = h.history_date
and p.product_id = h.product_id
)

Related

SQL - Create complete matrix with all variables even if null

please provide some assistance/guidance to solve the following:
I have 1 main table which indicates sales volumes by sales person per different product type.
Where a salesperson did not sell a particular product on a particular day, there is no record.
The intention is to create null value records for salesmen that did not sell a product on a specific day. The query must be dynamic as there are many more salesmen with sales over many days.
Thanks in advance
Just generate records for all sales persons, days, and products using cross join and then bring in the existing data:
select p.salesperson, d.salesdate, st.salestype,
coalesce(t.sales_volume, 0)
from (select distinct salesperson from t) p cross join
(select distinct salesdate from t) d cross join
(select distinct salestype from t) st left join
t
on t.salesperson = p.salesperson and
t.salesdate = d.salesdate and
t.salestype = st.salestype;
Note: You may have other tables that have lists of sales people, dates, and types -- and those can be used instead of the select distinct queries.

Oracle sql query that returns customers who are coming for the first time in property

I have 3 tables: customer, property and stays. Table customer contains all the data about customers (customer_id, name, surname, email...). Table property contains the list of all the properties (property_id, property_name...) and table stays contains all the earlier stays of customers (customer_id, property_id, stay_id, arrival_date, departure_date...).
Some customers stayed multiple times in more than one properties and some customers are coming for the first time.
Can someone please explain oracle sql query which returns only the customers who are stying in any of the properties for the first time.
Sorry guys for answering late..
This is what I got so far.
Tables:
Customer $A$
Stays $B$
Property $C$
Customer_Fragments $D$
SELECT a.RIID_,
sub.CUSTOMER_ID_,
sub.PROPERTY_ID,
b.ARRIVAL_DATE,
c.PROPERTY_SEGMENT,
ROW_NUMBER() OVER (
PARTITION BY sub.CUSTOMER_ID_,
sub.PROPERTY_ID
ORDER BY
sub.CUSTOMER_ID_ asc
) RN
FROM
(
SELECT
b.CUSTOMER_ID_,
b.PROPERTY_ID
FROM
$B$ b
GROUP BY
b.CUSTOMER_ID_,
b.PROPERTY_ID
HAVING
COUNT(*)= 1
) sub
INNER JOIN $A$ a ON sub.CUSTOMER_ID_ = a.CUSTOMER_ID_
INNER JOIN $B$ b ON sub.CUSTOMER_ID_ = b.CUSTOMER_ID_
INNER JOIN $C$ c ON sub.PROPERTY_ID = c.PROPERTY_ID
LEFT JOIN $D$ d ON a.RIID_ = d.RIID_
WHERE
b.ARRIVAL_DATE = TRUNC(SYSDATE + 7)
AND c.PROPERTY_DESTINATION = 'Destination1'
AND lower(c.NAME_) NOT LIKE ('unknown%')
AND a.NWL_PERMISSION_STATUS = 'I'
AND a.EMAIL_DOMAIN_ NOT IN ('abuse.com', 'guest.booking.com')
AND (d.BLACKLISTED != 'Y'
or d.BLACKLISTED is null
)
I want to select all customers who will come to Destination1, 7days from today to inform them about some activities. Customers can book several
properties in Destination1 and have the same arrival date (example: I can book a room in property1 for me and my wife and also book a room in property2 for my friends.. and we all come to destination1 on the same arrival date).
When this is the case I want to send just one info email to a customer and not two emails. The above SQL query returns two rows when this is the case and I want it to return just one row (one row = one email).
It is always required to post a code you have already tried to write so that we can than help you with eventual mistakes.
After that being said, I'll try to help you nevertheless, writing the full code.
Try this:
select cus.*
from customers cus
join stays st
on st.customer_id = cus.customer_id
where st.arrival_date >= YOUR_DATE --for example SYSDATE or TRUNC(SYSDATE)
and 1 = (select count(*)
from stays st2
where st2.customer_id = cus.customer_id)
You haven't specified it, but I GUESS that you are interested in getting the first-time-customers whose arrival date will be at or after some specified date. I've written that into my code in WHERE clause, where you should input such a date.
If you remove that part of the WHERE clause, you'll get all the customers that stayed just once (even if that one and only stay was 10 years ago, for example). What's more, if you remove that part of the code, you can than also remove the join on stays st table from the query too, as the only reason for that join was the need to have access to arrival date.
If you need explanations for some other parts of the code too, ask in the comments for this answer.
I hope I helped!
WITH first_stays (customer_id, property_id, first_time_arrival_date, stay_no) AS
(
SELECT s.customer_id
, s.property_id
, s.arrival_date AS first_time_arrival_date
, ROW_NUMBER() OVER (PARTITION BY s.customer_id, s.property_id ORDER BY s.arrival_date) stay_no
FROM stays s
)
SELECT c.surname AS customer_surname
, c.name AS customer_name
, p.property_name
, s.first_time_arrival_date
FROM customer c
INNER
JOIN first_stays s
ON s.customer_id = c.customer_id
AND s.stay_no = 1
INNER
JOIN property p
ON p.property_id = s.property_id
The first part WITH first_stays is a CTE (Common Table Expression, called subquery factoring in Oracle) that will number the stays for each pair (customer_id, property_id) ordered by the arrival date using a window function ROW_NUMBER(). Then just join those values to the customer and property tables to get their names or whatever, and apply stay_no = 1 (first stay) condition.
If I understand the question correctly, this is not a complicated query:
select c.*
from c join
(select s.customer_id
from stays s
group by s.customer_id
having count(*) = 1
) s
on s.customer_id = c.customer_id;

SQL LEFT JOIN - Inner select not returning columns

I have two tables called 'Customers' and 'Orders'. Tables column names are as follow:
Customers: id, name, address
Orders: id, person_id, product, price
The desired outcome is to query all customers with one of their latest purchases. I have a lot of duplicates in 'Orders' table whereby two records with same time-stamp due to some bug.
I have written the following code but the issue is that the query does not return table 2(Orders) column values. Can anyone advise what the issue is?
SELECT C.Id,C.Name, O.item, O.price, O.product
FROM Customers C
LEFT JOIN
(
SELECT TOP 1 person_id
FROM Orders
WHERE status = 'Pending'
) O ON C.ID = O.person_id
Results: O.item, O.price, O.product values are all null
Edit: Sample Data
ID/ NAME/ ADDRESS/
1/ A/ Ad1/
2/ B/ Ad2/
3/ C/ Ad3/
ID/ Person ID/ PRODUCT PRICE/ Created Date
ID-1234/ 1/ Book/ $5/ 26-2-2017
ID-1235/ 1/ Book/ $5/ 26-2-2017
ID-1236/ 2/ Calendar/ $10/ 4-2-2017
ID-1238/ 1/ Pen/ $2/ 1-1-2016
Assuming that the id column in Orders is a primary key autoincrement, then the following should work:
SELECT c.id,
c.name,
COALESCE(t1.price, 0.0) AS price,
COALESCE(t1.product, 'NA') AS product
FROM Customers c
LEFT JOIN Orders t1
ON c.id = t1.person_id
LEFT JOIN
(
SELECT person_id, MAX(CAST(SUBSTRING(id, 4, LEN(id)) AS INT)) AS max_id
FROM Orders
GROUP BY person_id
) t2
ON t1.person_id = t2.person_id AND
t2.max_id = CAST(SUBSTRING(t1.id, 4, LEN(t1.id)) AS INT)
This answer assumes that taking the greatest order ID per customer will yield the most recent purchase. Ideally you should have a timestamp column which captures when a transaction took place. Note that even in the query above, we still have no way of knowing when the most recent transaction took place.
So where is the timestamp column? It's not mentioned in your table schema. But your description does not mention the status column either, and that is clearly in there.
Is orders.id unique? Is it the key for the Orders table?> If it is, then your schema has no way to identify "duplicate" records. You cannot mean to imply that only one order per customer is allowed, so if there are multiple orders for a single customer, how do we identify the duplicates? By the unmentioned timestamp column?
If there IS a `timestamp column, and that's how you would identify dupes, then use it.
SELECT C.Id,C.Name, O.item, O.price, O.product
FROM Customers C LEFT JOIN Orders o
on o.id = (Select Min(id) from orders
where person_id = c.Id
and timestamp = o.timestamp
and status = 'Pending')

Excluding multiple results in specific column (SQL JOIN)

I'm taking my first steps in terms of practical SQL use in real life.
I have a few tables with contractual and financial information and the query works exactly as I need - to a certain point. It looks more or less like that:
SELECT /some columns/ from CONTRACTS
Linked 3 extra tables with INNER JOIN to add things like department names, product information etc. This all works but they all have simplish one-to-one relationship (one contract related to single department in Department table, one product information entry in the corresponding table etc).
Now this is my challenge:
I also need to add contract invoicing information doing something like:
inner join INVOICES on CONTRACTS.contnoC = INVOICES.contnoI
(and selecting also the Invoice number linked to the Contract number, although that's partly optional)
The problem I'm facing is that unlike with other tables where there's always one-to-one relationship when joining tables, INVOICES table can have multiple (or none at all) entries that correspond to a single contract no. The result is that I will get multiple query results for a single contract no (with different invoice numbers presented), needlessly crowding the query results.
Essentially I'm looking to add INVOICES table to a query to just identify if the contract no is present in the INVOICES table (contract has been invoiced or not). Invoice number itself could be presented (it is with INNER JOIN), however it's not critical as long it's somehow marked. Invoice number fields remains blank in the result with the INNER JOIN function, which is also necessary (i.e. to have the row presented even if the match is not found in INVOICES table).
SELECT DISTINCT would look to do what I need, but I seemed to face the problem that I need to levy DISTINCT criteria only for column representing contract numbers, NOT any other column (there can be same values presented, but all those should be presented).
Unfortunately I'm not totally aware of what database system I am using.
Seems like the question is still getting some attention and in an effort to provide some explanation here are a few techniques.
If you just want any contract with details from the 1 to 1 tables you can do it similarily to what you have described. the key being NOT to include any column from Invoices table in the column list.
SELECT
DISTINCT Contract, Department, ProductId .....(nothing from Invoices Table!!!)
FROM
Contracts c
INNER JOIN Departments D
ON c.departmentId = d.Department
INNER JOIN Product p
ON c.ProductId = p.ProductId
INNER JOIN Invoices i
ON c.contnoC = i.contnoI
Perhaps a Little cleaner would be to use IN or EXISTS like so:
SELECT
Contract, Department, ProductId .....(nothing from Invoices Table!!!)
FROM
Contracts c
INNER JOIN Departments D
ON c.departmentId = d.Department
INNER JOIN Product p
ON c.ProductId = p.ProductId
WHERE
EXISTS (SELECT 1 FROM Invoices i WHERE i.contnoI = c.contnoC )
SELECT
Contract, Department, ProductId .....(nothing from Invoices Table!!!)
FROM
Contracts c
INNER JOIN Departments D
ON c.departmentId = d.Department
INNER JOIN Product p
ON c.ProductId = p.ProductId
WHERE
contnoC IN (SELECT contnoI FROM Invoices)
Don't use IN if the SELECT ... list can return a NULL!!!
If you Actually want all of the contracts and just know if a contract has been invoiced you can use aggregation and a case expression:
SELECT
Contract, Department, ProductId, CASE WHEN COUNT(i.contnoI) = 0 THEN 0 ELSE 1 END as Invoiced
FROM
Contracts c
INNER JOIN Departments D
ON c.departmentId = d.Department
INNER JOIN Product p
ON c.ProductId = p.ProductId
LEFT JOIN Invoices i
ON c.contnoC = i.contnoI
GROUP BY
Contract, Department, ProductId
Then if you actually want to return details about a particular invoice you can use a technique similar to that of cybercentic87 if your RDBMS supports or you could use a calculated column with TOP or LIMIT depending on your system.
SELECT
Contract, Department, ProductId, (SELECT TOP 1 InvoiceNo FROM invoices i WHERE c.contnoC = i.contnoI ORDER BY CreateDate DESC) as LastestInvoiceNo
FROM
Contracts c
INNER JOIN Departments D
ON c.departmentId = d.Department
INNER JOIN Product p
ON c.ProductId = p.ProductId
GROUP BY
Contract, Department, ProductId
I would do it this way:
with mainquery as(
<<here goes you main query>>
),
invoices_rn as(
select *,
ROW_NUMBER() OVER (PARTITION BY contnoI order by
<<some column to decide which invoice you want to take eg. date>>) as rn
)
invoices as (
select * from invoices_rn where rn = 1
)
select * from mainquery
left join invoices i on contnoC = i.contnoI
This gives you an ability to get all of the invoice details to your query, also it gives you full control of which invoice you want see in your main query. Please read more about CTEs; they are pretty handy and much easier to understand / read than nested selects.
I still don't know what database you are using. If ROW_NUMBER is not available, I will figure out something else :)
Also with a left join you should use COALESCE function for example:
COALESCE(i.invoice_number,'0')
Of course this gives you some more possibilities, you could for example in your main select do:
CASE WHEN i.invoicenumber is null then 'NOT INVOICED'
else 'INVOICED'
END as isInvoiced
You can use
SELECT ..., invoiced = 'YES' ... where exists ...
union
SELECT ..., invoiced = 'NO' ... where not exists ...
or you can use a column like "invoiced" with a subquery into invoices to set it's value depending on whether you get a hit or not

SQL Beginner: Getting items from 2 tables (+grouping+ordering)

I have an e-commerce website (using VirtueMart) and I sell products that consist child products. When a product is a parent, it doesn't have ParentID, while it's children refer to it. I know, not the best logic but I didn't create it.
My SQL is very basic and I believe I ask for something quite easy to achieve
Select products that have children.
Sort results by prices (ASC/DSC).
SELECT * FROM Products INNER JOIN Prices ON Products.ProductID = Prices.ProductID ORDER BY Products.Price [ASC/DSC]
Explanation:
SELECT - Select (Get/Retrieve)
* - ALL
FROM Products - Get them from a DB Table named "Products".
INNER JOIN Prices - Selects all rows from both tables as long as there is a match between the columns in both tables. Rather, JOIN DB Table "Products" with DB Table "Prices".
ON - Like WHERE, this defines which rows will be checked for matches.
Products.ProductID = Prices.ProductID - Your match criteria. Get the rows where "ProductID" exists in both DB Tables "Products" and "Prices".
ORDER BY Products.Price [ASC/DSC] - Sorting. Use ASC for Ascending, DSC for Descending.
This table design is subpar for a number of reasons. First, it appears that the value 0 is being used to indicate lack of a parent (as there's no 0 ID for products). Typically this will be a NULL value instead.
If it were a NULL value, the SQL statement to get everything without a parent would be as simple as this:
SELECT * FROM Products WHERE ParentID IS NULL
However, we can't do that. If we make the assumption that 0 = no parent, we can do this:
SELECT * FROM Products WHERE ParentID = 0
However, that's a dangerous assumption to make. Thus, the correct way to do this (given your schema above), would be to compare the two tables and ensure that the parentID exists as a ProductID:
SELECT a.*
FROM Products AS a
WHERE EXISTS (SELECT * FROM Products AS b WHERE a.ID = b.ParentID)
Next, to get the pricing, we have to join those two tables together on a common ID. As the Prices table seems to reference a ProductID, we can use that like so:
SELECT p.ProductID, p.ProductName, pr.Price
FROM Products AS p INNER JOIN Prices AS pr ON p.ProductID = pr.ProductID
WHERE EXISTS (SELECT * FROM Products AS b WHERE p.ID = b.ParentID)
ORDER BY pr.Price
That might be sufficient per the data you've shown, but usually that type of table structure indicates that it's possible to have more than one price associated with a product (we're unable to tell whether this is true based on the quick snapshot).
That should get you close... if you need something more, we'll need more detail.
use the below script if you are using ssms.
SELECT pd.ProductId,ProductName,Price
FROM product pd
LEFT JOIN price pr ON pd.ProductId=pr.ProductID
WHERE EXISTS (SELECT 1 FROM product pd1 WHERE pd.productID=pd1.ParentID)
ORDER BY pr.Price ASC
Note :neither of your parent product have price in price table. If you want the sum of price of their child product use the below script.
SELECT pd.ProductId,pd.ProductName,SUM(ISNULL(pr.Price,0)) SUM_ChildPrice
FROM product pd
LEFT JOIN product pd1 ON pd.productID=pd1.ParentID
LEFT JOIN price pr ON pd1.ProductId=pr.ProductID
GROUP BY pd.ProductId,pd.ProductName
ORDER BY pr.Price ASC
You will have to use self-join:
For example:
SELECT * FROM products parent
JOIN products children ON parent.id = children.parent_id
JOIN prices ON prices.product_id = children.id
ORDER BY prices.price
Because we are using JOIN it will filter out all entries that don't have any children.
I haven't tested it, I hope it would work.