SQL Query Advice - Most recent item - sql

I have a table where I store customer sales (on periodicals, like newspaper) data. The product is stored by issue. Example
custid prodid issue qty datesold
1 123 2 12 01052008
2 234 1 5 01022008
1 123 1 5 01012008
2 444 2 3 02052008
How can I retrieve (whats a faster way) the get last issue for all products, for a specific customer? Can I have samples for both SQL Server 2000 and 2005? Please note, the table is over 500k rows.
Thanks

Assuming that "latest" is determined by date (rather than by issue number), this method is usually pretty fast, assuming decent indexes:
SELECT
T1.prodid,
T1.issue
FROM
Sales T1
LEFT OUTER JOIN dbo.Sales T2 ON
T2.custid = T1.custid AND
T2.prodid = T1.prodid AND
T2.datesold > T1.datesold
WHERE
T1.custid = #custid AND
T2.custid IS NULL
Handling 500k rows is something that a laptop can probably handle without trouble, let alone a real server, so I'd stay clear of denormalizing your database for "performance". Don't add extra maintenance, inaccuracy, and most of all headaches by tracking a "last sold" somewhere else.
EDIT: I forgot to mention... this doesn't specifically handle cases where two issues have the same exact datesold. You might need to tweak it based on your business rules for that situation.

Generic SQL; SQL Server's syntax shouldn't be much different:
SELECT prodid, max(issue) FROM sales WHERE custid = ? GROUP BY prodid;

Is this a new project? If so, I would be wary of setting up your database like this and read up a bit on normalization, so that you might end up with something like this:
CustID LastName FirstName
------ -------- ---------
1 Woman Test
2 Man Test
ProdID ProdName
------ --------
123 NY Times
234 Boston Globe
ProdID IssueID PublishDate
------ ------- -----------
123 1 12/05/2008
123 2 12/06/2008
CustID OrderID OrderDate
------ ------- ---------
1 1 12/04/2008
OrderID ProdID IssueID Quantity
------- ------ ------- --------
1 123 1 5
2 123 2 12
I'd have to know your database better to come up with a better schema, but it sound like you're building too many things into a flat table, which will cause lots of issues down the road.

If you're looking for most recent sale by date maybe that's what you need:
SELECT prodid, issue
FROM Sales
WHERE custid = #custid
AND datesold = SELECT MAX(datesold)
FROM Sales s
WHERE s.prodid = Sales.prodid
AND s.issue = Sales.issue
AND s.custid = #custid

To query on existing growing historical table is way too slow!
Strongly suggest you create a new table tblCustomerSalesLatest which stores the last issue data of each customer. and select from there.

Related

POSTGRESQL - Finding specific product when

I've attempted to write a query but I've not managed to get it working correctly.
I'm attempting to retrieve where a specific product has been bought but where it also has been bought with other products. In the case below, I want to find where product A01 has been bought but also when it was bought with other products.
Data (extracted from tables for illustration):
Order | Product
123456 | A01
123457 | A01
123457 | B02
123458 | C03
123459 | A01
123459 | C03
Query which will return all orders with product A01 without showing other products:
SELECT
O.NUMBER
O.DATE
P.NUMBER
FROM
ORDERS O
JOIN PRODUCTS P on P.ID = O.ID
WHERE
P.NUMBER = 'A01'
I've tried to create a sub query which brings back just orders of product A01 but I don't know how to place it in the query for it to return all orders containing product A01 as well as any other product ordered with it.
Any help on this would be very grateful.
Thanks in advance.
You can use conditional SUM to detect if one ORDER group have one ore more 'A01'
CREATE TABLE orders
("Order" int, "Product" varchar(3))
;
INSERT INTO orders
("Order", "Product")
VALUES
(123456, 'A01'),
(123457, 'A01'),
(123457, 'B02'),
(123458, 'C03'),
(123459, 'A01'),
(123459, 'C03')
;
SELECT "Order"
FROM orders
GROUP BY "Order"
HAVING SUM(CASE WHEN "Product" = 'A01' THEN 1 ELSE 0 END) > 0
I appreciated Juan's including the DDL to create the database on my system. By the time I saw it, I'd already done all the same work, except that I got around the reserved word problem by naming that field Order1.
Sadly, I didn't consider that either of the offered queries worked on my system. I used MySQL.
The first one returned the A01 lines of the two orders on which other products were ordered too. I took Alex's purpose to include seeing all items of all orders that included A01. (Perhaps he wants to tell future customers what other products other customers have ordered with A01, and generate sales that way.)
The second one returned the three A01 lines.
Maybe Alex wants:
select *
from orders
where Order1 in (select Order1
from orders
where Product = 'A01')
It outputs all lines of all orders that include A01. The subquery makes a list of all orders with A01. The first query returns all lines of those orders.
In a big database, you might not want to run two queries, but this is the only way I see to get the result I understood Alex wanted. If that is what he wanted, he would have to run a second query once armed with output from the queries offered, so there's no real gain.
Good discussion. Thanks to all!
Use GROUP BY clause along with HAVING like
select "order", Product
from data
group by "order"
having count(distinct product) > 1;

SQL Determining differences in near-identical rows

If I have a table of correct data I need to check with my actual table to make sure the data is correct and I have some rows like the following:
Data_Check_Table
FRUIT ------- PRICE ------- WEEKS_FRESH ------- SUPPLIER
Apple $1 1 Big Co.
Banana $1 1 Super Co.
and the actual table with this info:
Data_Table
FRUIT ------- PRICE ------- WEEKS_FRESH ------- SUPPLIER
Apple $2 1 Big Co.
Banana $1 1 Super Co.
...and assume there are many other rows, some match up fine and others have inconsistencies in certain areas (Maybe the wrong price? Or wrong supplier? Maybe even both.) How would I do a select to find these rows that are inconsistent with the actual data?
Select dt.Fruit,dt.Price, dt.Weeks_Fresh,dtc.Fruit,dtc.Price, dtc.Weeks_Fresh,...
From DataTable dt
FULL OUTER JOIN
DataTable_Check dtc
ON dt.Fruit = dtc.Fruit
AND dt.Price = dtc.Price
.....
Where dt.Fruit IS NULL OR dtc.Fruit IS NULL
The full join includes records from each table regardless of whether there is a match, so if either side is null then you know there is a mismatch.
The following to find actual records not matching correct records:
select *
from Data_Table
minus
select *
from Data_Check_Table

Count Unique Occurrences PowerPivot

I am new to PowerPivot and DAX formulas. I assume that what I am trying to do is very basic and it has probably been answered somewhere, I just don't know what to search on it find it.
I am trying to determine the percent of sales people who had a sale in a given quarters. I have two tables, one that lists the sales people and one that list all the sales for a quarter. For example
Employee ID
123
456
789
Sales ID - Emp ID - Amount
135645 ---- 123 ----- $50
876531 ---- 123 ----- $127
258546 ---- 123 ----- $37
516589 ---- 789 ----- $128
998513 ---- 789 ----- $79
As a result, the pivot table would look like this:
Emp ID - % w/ sales
123 -------- 100%
456 -------- 0%
789 -------- 100%
Total ------- 66%
If you can point me to a post where this has been addressed or let me know the best way to address this I would appreciate it. Thank you.
Here's a simple way of doing this (assuming table names emps and sales):
=IF (DISTINCTCOUNT ( sales[Emp ID] ) = BLANK (),
0,
DISTINCTCOUNT ( sales[Emp ID] )
)
/ COUNTROWS ( emps )
The IF() is only required to ensure that people who haven't made a sale appear in the Pivot. All the actual formula is doing is dividing the number of sales rows by the number of employee rows.
Jacob
You'll need to remove the text that begins with --. I wanted to describe what the DAX is doing. This may not do what you want because it only factors the employees in the context. E.x.: If the user filtered out all employees that didn't have sales, should the grand total be 100% or 66%? For the former, you'll need to use the ALL DAX function and the below DAX does the latter. I'm very new to DAX so I'm sure there's a better way to do what you want.
=IF
(
-- are we processing 1 employee or multiple employees? (E.x.: The grand total processes multiple employees...)
COUNTROWS(VALUES(employee[Employee ID])) > 1,
--If Processing multiple employees do X / Y
-- Where X = The number of employees that have sales
-- Where Y = The number of employees selected by the user
COUNTROWS(FILTER(employee, NOT(ISBLANK(CALCULATE(COUNT(sales[Sales ID])))))) / COUNTROWS(employee),
-- If processing single employee, return 0 if they have no sales, and 1 if they have sales
IF
(
ISBLANK(CALCULATE(COUNT(sales[Sales ID]))),
0,
1
)
)

Shopping basket: Products with options, how to check if a combination exists?

I have a bit of a SQL problem that I'm hoping someone can help.
On my website someone can order a product and then options for that products, e.g. they buy a car, and options such as tyres, stereo system. The customer can add multiple items to their basket for the same main item (a car), but then different options for it, such as:
car, pirelli, b&o
car, michelin, b&o
I have this DB structure to achieve this:
orders
----------
order_ref
total
orders_parts
------------
id
order_ref
part_id
quantity
orders_parts_options
--------------------
id
option
orders
------
12345 1000.01
orders_parts
------------
1001 12345 Audi 1
1002 12345 Audi 1
orders_parts_options
--------------------
1001 michelin
1001 b&o
1002 pirelli
1002 b&o
So here you can see I have two Audis in my shopping basket, one with michelin, one with pirelli, both with B&O audio systems. My question; let's say another call is made to add an item to the shopping basket for this order, e.g. another Audi with Michelin and B&O, what SQL would I need to get orders_parts.id of 1001?
I came up with this bit of rubbish:
SELECT op.id FROM orders_parts op
INNER JOIN orders_parts_options opo ON (op.id = opo.id)
WHERE op.order_ref = 12345 AND (opo.option = 'michelin' OR opo.option = 'b&o')
But I get this result
1001
1001
1002
from that. I'm guessing I need to aggregate it and have a having count = 2 in there, but just cannot work it out. Anyone smarter out there who can help me?
(just to add, the DB is normalized in real life, but for clarity I've but full text values in there).
SELECT op.id, count(op.id) as n FROM orders_parts op
INNER JOIN orders_parts_options opo ON (op.id = opo.id)
WHERE op.order_ref = 12345 AND (opo.option = 'michelin' OR opo.option = 'b&o')
GROUP BY op.id;
This query retrieves all parts (and their number) for the order_ref=12345 where option = 'michelin' or option = 'b&o'

How to do SQL join effectively?

I have two tables.
Order
Replication.
A single order record can have multiple Replication records. I want to join these two tables, such that i always retrieve a single record out of the join even if multiple records exist.
Sample data
Replication table:
ORDID | STATUS | ID | ERRORMSG | HTTPSTATUS | DELIVERYCNT
=========================================================
1717410307 1 JBM-9e92ae0c NULL 200 1
----------
1717410307 1 JBM-9fb59af1 NULL 400 -99
----------
1717410308 1 JBM-0764b091 NULL 403 1
----------
1717410308 1 JBM-0764b091 NULL 200 1
Order Table:
ORDID | ORDTYPE | DATE
----------
1717410307 CAR 22-SEP-2011
1717410308 BUS 23-SEP-2011
How can i make a join effectively so as , i will get as many records in order table and a replication table that should be dynamically selected on a priority basis.
The priority can be defined as :
Any record with a delivery count of -99
HTTPSTATUS != 200
Please guide me how can i proceed with this joining?
Please let me know if you need any clarification.
Your help is much appreciated!
Is it possible to use ORDER BY clause based on the HTTPSTATUS and DELIVERYCNT?
In that case you can write a specific ORDER BY and getting the TOP 1 from it (don't know which RDBMS do you use) or getting ROW_NUMBER() OVER (ORDER BY ... ) AS RowN WHERE RowN = 1
But this is the ugly (yet quick) solution.
The other option is to make a subquery where you add a new column which will make the priority calculation.
To make the query effective you should consider indexing (or using RDBMS specific solutions like included columns)