Duplicate Values in SQL - sql

I'm using this query and have used the Select Distinct code to enusre no duplicates are pulled from the database.
However on my QTD colum the number is sometimes X2 the proper amount?
This is probably an error with the server or would my query be incorrect?
SELECT DISTINCT ad.eid, MAX(u1.email) as ops,MAX(u2.email) as rep,
(SUM(ad.cost)) as qtd_spend,
Sum(case when day < current_date AND day >='2015-01-01' then cost else 0 end) as MTD,
AVG(case when day < current_date AND day >= current_date-7 then cost else null end) as weekly_spend
FROM adcube as ad
inner JOIN advertisables as a on ad.eid = a.eid
LEFT JOIN organizations as o on o.id = a.id
LEFT outer JOIN users as u1 on o.ops_organization_id = u1.organization_id
LEFT outer JOIN users as u2 on o.sales_organization_id = u2.organization_id
WHERE day >='2015-01-01' and day < current_date
GROUP BY eid

You must have GROUP BY if you have aggregate functions (such as SUM or MAX).
What is likely the problem is in you JOINs.
I am not familiar with your data structure, but I am assuming that in your advertisables table, it contains (or CAN contain) more than one entry of the same "eid" - is this correct? Or do you have a constraint?
If this is correct, then when you join even if you have only ONE entry in the "adcube" table, once it JOINs with the multiple entries in the "advertisables" table then it pulls up TWO records (or however many match) and then the aggregate results at the select level of the statement then sum BOTH (or more) columsn.
So you should take the duplicates out of hte joining tables or factor that into account.
EDIT:
Ok, well glad to know that is the problem. You will not fix it by INNER JOINING either. You will have to do an inline select statement.
The best way to solve this, from what I understand you are trying to do, is do the following:
SELECT ad.eid
, (
select max(u1.email)
from JOIN advertisables as a
LEFT JOIN organizations as o on o.id = a.id
LEFT outer JOIN users as u1 on o.ops_organization_id = u1.organization_id
LEFT outer JOIN users as u2 on o.sales_organization_id = u2.organization_id
where a.eid = ad.eid
) as ops
, (
select max(u2.email)
from JOIN advertisables as a
LEFT JOIN organizations as o on o.id = a.id
LEFT outer JOIN users as u1 on o.ops_organization_id = u1.organization_id
LEFT outer JOIN users as u2 on o.sales_organization_id = u2.organization_id
where a.eid = ad.eid
) as rep
, (SUM(ad.cost)) as qtd_spend
, Sum(case when day < current_date AND day >='2015-01-01' then cost else 0 end) as MTD
, AVG(case when day < current_date AND day >= current_date-7 then cost else null end) as weekly_spend
FROM adcube as ad
WHERE day >='2015-01-01' and day < current_date
GROUP BY eid

Related

Get all tables where there is no booking on this time or date

So basically I have a tables table. And a bookings table. A table can be assigned to a booking via the table_no column. The booking also has a reservation_time and reservation_date columns. What I'd like my query to do, is to return all tables that aren't linked to a booking on a certain time or date. It's really bugging me.
Here is what my query looks like as of now
select t.id, t.number
FROM tables t JOIN
bookings b
ON b.table_no = t.number JOIN
reservation_time_data r
ON r.id = b.reservation_time
WHERE t.number != b.table_no AND b.reservation_date != '2020-07-22' AND 45 NOT BETWEEN r.start_time AND r.end_time
You seem to want not exists. Based on your sample query, I think this is:
select t.id, t.number
from tables t
where not exists (select 1
from bookings b join
reservation_time_data r
on r.id = b.reservation_time
where b.table_no = t.number and
b.reservation_date = '2020-07-22' and
45 >= r.start_time and
45 <= r.end_time
);
I think you can get it with left join like this
select t.id, t.number FROM tables t Left JOIN bookings b ON b.table_no = t.number
WHERE b.table_no is null AND (b.reservation_date = '2020-07-22' Or b.[your time column here] BETWEEN b.start_time AND b.end_time )

Switching to Vertica from MySql, aggregate in where clause not working

Recently we have switched to Vertica from MySQL. I am lost on how to re-create the <=30 check inside the where clause in the query below. This currently does not work in Vertica, but does in MySQL.
Essentially, a user owns cars and cars have parts. I want to total the amount of cars and car parts in a timeframe, but only for users who have less than or equal to 30 cars.
select
count(distinct cr.id) as 'Cars',
count(distinct cp.id) as 'Car Parts'
from
users u
inner join
user_emails ue on u.id = ue.user_id
inner join
cars cr on cr.user_id = u.id
inner join
car_parts cp on cp.car_id = cr.id
where
(
select count(*) from cars where cars.user_id=u.id
) <=30
and
ue.is_real = true and ue.is_main = true
and
cr.created_at >= '2017-01-01 00:00:00' and cr.created_at <= '2017-02-17 23:59:59'
Any help or guidance is greatly appreciated!
Before my mouse flies away and my monitors goes blank, I get this error:
ERROR: Correlated subquery with aggregate function COUNT is not supported
You would use a subquery this way. You would use a window function:
select count(distinct cr.id) as Cars,
count(distinct cp.id) as CarParts
from users u join
user_emails ue
on u.id = ue.user_id join
(select cr.*, count(*) over (partition by user_id) as cnt
from cars cr
) cr
on cr.user_id = u.id join
car_parts cp
on cp.car_id = cr.id
where cr.cnt <= 30 and
ue.is_real = true and ue.is_main = true
cr.created_at >= '2017-01-01' and
cr.created_at < '2017-02-18';
Notes:
Don't enclose column aliases in single quotes. That is a bug waiting to happen. Only use single quotes for string and date constants.
You can simplify the date logic. Using < is better than <= to capture everything that happens on a particular day.

Show row in a series even if the data is missing from the table

I need a SQL query to return a row for every month in years 2015 and 2016 for every company that pays dues. The resulting dataset will show which months the company didn't pay dues by a null value. The problem is that if they didn't pay dues they won't have an entry in the database so no row will appear for than month. Here is the query:
SELECT
case when n.co_id <>'' then n.co_id else n.ID end ID
,su.CONTINUOUS_SINCE
,n.COMPANY
,a.EFFECTIVE_DATE
, a.AMOUNT
FROM dbo.Name n
LEFT OUTER JOIN dbo.Activity a ON n.ID = a.ID
inner JOIN dbo.Loc_Info l ON n.ID = l.ID
inner JOIN dbo.Segment_Categories s ON l.CURRENT_SEGMENT = s.CODE
inner JOIN dbo.Subscriptions su on su.id=n.id
WHERE   a.PRODUCT_CODE='rental' and n.MEMBER_TYPE in ('rb','rl') and a.EFFECTIVE_DATE Between '2015-07-01' And GetDate() AND a.ACTIVITY_TYPE='dues'
order by case when n.co_id <>'' then n.co_id else n.ID end, EFFECTIVE_DATE asc
If the company has paid every month it works out fine but the point is to find the companies that haven't paid so suppose Company XYZ paid every month in 2015 except June I need a row for June for Company XYZ that has a NULL value or a zero or some other indicator that they missed a payment. As it stands now the row is simply omitted because the data isn't there and it is hard to find a missing row out of thousands or rows.
I realize it is probably a different type of join or something but I am just not getting it to work out.
You can create a dummy table for the months, left join the dbo.Activity to it, that way you'll get all the months, and then join that to dbo.Name
1) Generate all the months from 1 to 12 with a recursive cte.
2) Get all months years and companies combinations with a cross join.
3) left join on this result-set to show missing months.
with months as (select 1 mth
union all
select mth+1 from months where mth<12)
,yearmonthscompanies as (select *
from months m
cross join (select 2015 yr union all select 2016 yr) y
cross join (select distinct id,co_id,company from name) c
)
SELECT
case when ymc.co_id <>'' then ymc.co_id else ymc.ID end ID
,su.CONTINUOUS_SINCE
,ymc.COMPANY
,coalesce(a.effective_date,datefromparts(ymc.yr,ymc.mth,1)) as effective_date
,coalesce(a.AMOUNT,0) amount
FROM yearmonthscompanies ymc
LEFT JOIN dbo.Name n ON n.co_id=ymc.co_id and n.id=ymc.id and n.company=ymc.company
LEFT JOIN dbo.Activity a ON n.ID = a.ID and a.PRODUCT_CODE='rental'
and n.MEMBER_TYPE in ('rb','rl') and a.EFFECTIVE_DATE Between '2015-07-01' and GetDate()
and a.ACTIVITY_TYPE='dues'
and year(a.effective_date) = ymc.yr and month(a.effective_date) = ymc.mth
inner JOIN dbo.Loc_Info l ON n.ID = l.ID
inner JOIN dbo.Segment_Categories s ON l.CURRENT_SEGMENT = s.CODE
inner JOIN dbo.Subscriptions su on su.id=n.id
order by case when ymc.co_id <>'' then ymc.co_id else ymc.ID end
,effective_date

Group by year month sql, when no entry write 0

I have an sql query, but I want to show months, where no entries. Now just show entries when it has entry. Here is the code:
SELECT YEAR(T0.[Recontact]) AS 'Év', MONTH(T0.[Recontact]) AS 'Hónap',
T1.[SlpName], COUNT(T0.[ClgCode]) AS 'Tárgyalások'
FROM OCLG T0
INNER JOIN OSLP T1 ON T0.[SlpCode] = T1.[SlpCode]
WHERE T0.[Action] = 'M' AND
T0.[Recontact] >= 'date' AND
T0.[Recontact] <= 'date2' AND
T1.[SlpName] = 'user name'
GROUP BY YEAR(T0.[Recontact]), MONTH(T0.[Recontact]), T1.[SlpName]
ORDER BY 1,2
If the year + month is totally missing from your data, you'll need to construct an empty row somewhere that can be shown in the place. You can create either a calendar table (one row per day) or a month table (one row per month). That can also be a "virtual" tally table constructed in a CTE or similar.
Once you have that, you can do something like this:
select
M.Year, M.Month, X.SlpName, isnull(X.CODES,0) as CODES
from
months M
outer apply (
SELECT
YEAR(T0.[Recontact]) as Year,
MONTH(T0.[Recontact]) AS Month,
T1.[SlpName],
COUNT(T0.[ClgCode]) AS CODES
FROM OCLG T0
INNER JOIN OSLP T1 ON T0.[SlpCode] = T1.[SlpCode]
WHERE T0.[Action] = 'M' AND
T0.[Recontact] >= 'date' AND
T0.[Recontact] <= 'date2' AND
T1.[SlpName] = 'user name'
GROUP BY YEAR(T0.[Recontact]), MONTH(T0.[Recontact]), T1.[SlpName]
) X on X.Year = M.Year and X.Month = M.Month
where M.MONTHDATE >= 'date' and M.MONTHDATE <= 'date2'
ORDER BY 1,2
This was with an imaginary month table that has year, month and monthdate columns, and the date is the first of the month -- you'll still have to check that the range you're fetching is correct.
I haven't tested this, but it should work.
Replace INNER JOIN with LEFT JOIN in your request to get NULL results.
SELECT YEAR(T0.[Recontact]) AS 'Év', MONTH(T0.[Recontact]) AS 'Hónap',
T1.[SlpName], COUNT(T0.[ClgCode]) AS 'Tárgyalások'
FROM OCLG T0
LEFT JOIN OSLP T1 ON (T0.[SlpCode] = T1.[SlpCode]
AND T1.[SlpName] = 'user name')
WHERE T0.[Action] = 'M'
AND T0.[Recontact] >= 'date'
AND T0.[Recontact] <= 'date2'
GROUP BY YEAR(T0.[Recontact]), MONTH(T0.[Recontact]), T1.[SlpName]
ORDER BY 1,2
Replace INNER JOIN with LEFT JOIN in your request to get NULL results.
to learn more visit :
http://academy.comingweek.com/sql-groupby-clause/

Count columns of joined table

I am writing a query to summarize the data in a Postgres database:
SELECT products.id,
products.NAME,
product_types.type_name AS product_type,
delivery_types.delivery,
products.required_selections,
Count(s.id) AS selections_count,
Sum(CASE
WHEN ss.status = 'WARNING' THEN 1
ELSE 0
END) AS warning_count
FROM products
JOIN product_types
ON product_types.id = products.product_type_id
JOIN delivery_types
ON delivery_types.id = products.delivery_type_id
LEFT JOIN selections_products sp
ON products.id = sp.product_id
LEFT JOIN selections s
ON s.id = sp.selection_id
LEFT JOIN selection_statuses ss
ON ss.id = s.selection_status_id
LEFT JOIN listings l
ON ( s.listing_id = l.id
AND l.local_date_time BETWEEN
To_timestamp('2014/12/01', 'YYYY/mm/DD'
) AND
To_timestamp('2014/12/30', 'YYYY/mm/DD') )
GROUP BY products.id,
product_types.type_name,
delivery_types.delivery
Basically we have a product with selections, these selections have listings and the listings have a local_date. I need a list of all products and how many listings they have between the two dates. No matter what I do, I get a count of all selections (a total). I feel like I'm overlooking something. The same concept goes for warning_count. Also, I don't really understand why Postgres requires me to add a group by here.
The schema looks like this (the parts you would care about anyway):
products
name:string
, product_type:fk
, required_selections:integer
, deliver_type:fk
selections_products
product_id:fk
, selection_id:fk
selections
selection_status_id:fk
, listing_id:fk
selection_status
status:string
listing
local_date:datetime
The way you have it you LEFT JOIN to all selections irregardless of listings.local_date_time.
There is room for interpretation, we would need to see actual table definitions with all constraints and data types to be sure. Going out on a limb, my educated guess is you can fix your query with the use of parentheses in the FROM clause to prioritize joins:
SELECT p.id
, p.name
, pt.type_name AS product_type
, dt.delivery
, p.required_selections
, count(s.id) AS selections_count
, sum(CASE WHEN ss.status = 'WARNING' THEN 1 ELSE 0 END) AS warning_count
FROM products p
JOIN product_types pt ON pt.id = p.product_type_id
JOIN delivery_types dt ON dt.id = p.delivery_type_id
LEFT JOIN ( -- LEFT JOIN!
selections_products sp
JOIN selections s ON s.id = sp.selection_id -- INNER JOIN!
JOIN listings l ON l.id = s.listing_id -- INNER JOIN!
AND l.local_date_time >= '2014-12-01'
AND l.local_date_time < '2014-12-31'
LEFT JOIN selection_statuses ss ON ss.id = s.selection_status_id
) ON sp.product_id = p.id
GROUP BY p.id, pt.type_name, dt.delivery;
This way, you first eliminate all selections outside the given time frame with [INNER] JOIN before you LEFT JOIN to products, thus keeping all products in the result, including those that aren't in any applicable selection.
Related:
Join four tables involving LEFT JOIN without duplicates
While selecting all or most products, this can be rewritten to be faster:
SELECT p.id
, p.name
, pt.type_name AS product_type
, dt.delivery
, p.required_selections
, COALESCE(s.selections_count, 0) AS selections_count
, COALESCE(s.warning_count, 0) AS warning_count
FROM products p
JOIN product_types pt ON pt.id = p.product_type_id
JOIN delivery_types dt ON dt.id = p.delivery_type_id
LEFT JOIN (
SELECT sp.product_id
, count(*) AS selections_count
, count(*) FILTER (WHERE ss.status = 'WARNING') AS warning_count
FROM selections_products sp
JOIN selections s ON s.id = sp.selection_id
JOIN listings l ON l.id = s.listing_id
LEFT JOIN selection_statuses ss ON ss.id = s.selection_status_id
WHERE l.local_date_time >= '2014-12-01'
AND l.local_date_time < '2014-12-31'
GROUP BY 1
) s ON s.product_id = p.id;
It's cheaper to aggregate and count selections and warnings per product_id first, and then join to products. (Unless you only retrieve a small selection of products, then it's cheaper to reduce related rows first.)
Related:
Why does the following join increase the query time significantly?
Also, I don't really understand why Postgres requires me to add a group by here.
Since Postgres 9.1, the PK column in GROUP BY covers all columns of the same table. That does not cover columns of other tables, even if they are functionally dependent. You need to list those explicitly in GROUP BY if you don't want to aggregate them.
My second query avoids this problem on the outset by aggregating before the join.
Aside: chances are, this doesn't do what you want:
l.local_date_time BETWEEN To_timestamp('2014/12/01', 'YYYY/mm/DD')
AND To_timestamp('2014/12/30', 'YYYY/mm/DD')
Since date_time seems to be of type timestamp (not timestamptz!), you would include '2014-12-30 00:00', but exclude the rest of the day '2014-12-30'. And it's always better to use ISO 8601 format for dates and timestamps, which is means the same with every locale and datestyle setting. Hence:
WHERE l.local_date_time >= '2014-12-01'
AND l.local_date_time < '2014-12-31'
This includes all of '2014-12-30', and nothing else. No idea why you chose to exclude '2014-12-31'. Maybe you really want to include all of Dec. 2014?
WHERE l.local_date_time >= '2014-12-01'
AND l.local_date_time < '2015-01-01'