MSSQL Sum of values referenced by another table - sql

I'm attempting to create a report on the total money spent per day.
In the database are these two tables. They are matched using "UID" made at creation.
I've created this query but it results in duplicate dates.
Select LEFT(f.timestamp, 10) timestamp, sum(s.Total) Total
FROM dbo.purchasing AS f
Join (SELECT uid,SUM(CONVERT(DECIMAL(18,2), (CONVERT(DECIMAL(18,4), qty) * price))) Total
FROM dbo.purchasingitems
GROUP BY uid)
AS s ON f.uid = s.uid
GROUP BY TIMESTAMP
purchasing:
+--+---------+------------+--------+---+
|ID| UID | timestamp | contact|...|
+--+---------+------------+--------+---+
| 1|abr92nas9| 01/01/2018 | ROB |...|
| 2|nsa93m187| 02/02/2018 | ROB |...|
+--+---------+------------+--------+---+
purchasingitems:
+--+---------+-----+--------+---+
|ID| UID | QTY | Price |...|
+--+---------+-----+--------+---+
| 1|abr92nas9| 20 | 0.2435 |...|
| 2|abr92nas9| 5 | 0.5 |...|
| 3|nsa93m187| 1 | 100 |...|
| 4|nsa93m187| 4 | 15.5 |...|
+--+---------+-----+--------+---+

You need to group by the expression:
SELECT LEFT(f.timestamp, 10) as timestamp, sum(s.Total) as Total
FROM dbo.purchasing f JOIN
(SELECT uid, SUM(CONVERT(DECIMAL(18,2), (CONVERT(DECIMAL(18,4), qty) * price))) as Total
FROM dbo.purchasingitems
GROUP BY uid
) s
ON f.uid = s.uid
GROUP BY LEFT(f.timestamp, 10);
Notes:
You should not be storing date/time values as strings (unless you have a really good reason). If timestamp is a date, you should use cast(timestamp as date).
You should not be using string functions on date/times.
timestamp is a keyword in SQL Server (although not reserved), so it is not a good choice for a column name.
Your problem is that you think that GROUP BY timestamp refers to the expression in the SELECT. SQL Server does not support column aliases, so it can only refer to the column of that name.
I don't see a reason to convert to decimal for the multiplication. You might have a good reason.
You probably want order by as well, to ensure that the result set is in a sensible order.

Data you posted does NOT produce duplicates
No reason for the sub query
Select LEFT(f.timestamp, 10) timestamp,
SUM(CONVERT(DECIMAL(18,2), (CONVERT(DECIMAL(18,4), s.qty) * s.price))) Total
FROM dbo.purchasing AS f
join dbo.purchasingitems s
ON f.uid = s.uid
GROUP BY f.TIMESTAMP

Related

sql dw count(*) with selected columns doesn't aggregate

This is rather strange, and I have had this query work on numerous databases but here I am stumped.
I know that my Synapse table has duplicates
SELECT nmiandnmisuffixkey, ReadingDate, IntervalNumber
FROM [dbo].[factMeterDataDetail]
where nmiandnmisuffixkey = 'XXXXXXXXXX'
and readingdate = '2020-10-08'
and IntervalNumber = 12
produces
+--------------------+-------------+----------------+
| nmiandnmisuffixkey | ReadingDate | IntervalNumber |
+--------------------+-------------+----------------+
| XXXXXXXXXX | 2020-10-08 | 12 |
| XXXXXXXXXX | 2020-10-08 | 12 |
+--------------------+-------------+----------------+
but when I try to run following
SELECT nmiandnmisuffixkey, ReadingDate, IntervalNumber, count(*) as cnt
FROM [dbo].[factMeterDataDetail]
where nmiandnmisuffixkey = 'XXXXXXXXXX'
and readingdate = '2020-10-08'
and IntervalNumber = 12
group by nmiandnmisuffixkey, ReadingDate, IntervalNumber
I get the following:-
+--------------------+-------------+----------------+-----+
| nmiandnmisuffixkey | ReadingDate | IntervalNumber | cnt |
+--------------------+-------------+----------------+-----+
| XXXXXXXXXX | 2020-10-08 | 12 | 1 |
| XXXXXXXXXX | 2020-10-08 | 12 | 1 |
+--------------------+-------------+----------------+-----+
why does the count not aggregate up?
Some possibilities -
date has different time/millisecond. so, you can try removing time part and run group by query again.
string column(key) can have white spaces in the end or begining. you can use ltrim/rtrim and run group by query again. Client tool will display them as identical data like your output. By trimming space there can be a true comparison.
In OP's case, it was case #2. Using ltrim/rtrim resolved the agg issue.
This is your query:
select nmiandnmisuffixkey, ReadingDate, IntervalNumber, count(*) as cnt
from [dbo].[factMeterDataDetail]
where nmiandnmisuffixkey = 'XXXXXXXXXX' and
readingdate = '2020-10-08' and
IntervalNumber = 12
group by nmiandnmisuffixkey, ReadingDate, IntervalNumber
The query is filtering on specific values for each of the columns used in the group by. And yet, you are getting multiple rows when aggregating on them.
So, your question is really: "When does an equality comparison not match the concept of "equality" for aggregation?"
I'm sure this is not a comprehensive list.
One possibility is that IntervalNumber is really a string. The = converts the values to a number, so '012' and '12' are the same for equality, but not for aggregation. (Here is an example.)
In other words, type conversion can cause this discrepancy.
This might occur with strings and collations. Normally, I would expect a collation conflict error. But you might check if the string columns have an explicit collation different from the database default (which would be used for the string constant).
I don't think there is an equivalent difference for your date comparison.
I should also note a workaround for this use-case:
select max(nmiandnmisuffixkey), max(ReadingDate), max(IntervalNumber), count(*) as cnt
from [dbo].[factMeterDataDetail]
where nmiandnmisuffixkey = 'XXXXXXXXXX' and
readingdate = '2020-10-08' and
IntervalNumber = 12;
That is, just use an aggregation query with no group by. It is guaranteed to return one row.
Your table definitely has an ID. When you do count (*) it includes the ID. First, put the desired fields in a temp table, then group by.
In this way:
SELECT
nmiandnmisuffixkey, ReadingDate, IntervalNumber
Into
#tmp FROM [dbo].[factMeterDataDetail]
where
nmiandnmisuffixkey = 'XXXXXXXXXX' and readingdate = '2020-10-08' and IntervalNumber = 12
Select
nmiandnmisuffixkey, ReadingDate, IntervalNumber,count (*)as cnt
from
#tmp
Group by
nmiandnmisuffixkey, ReadingDate, IntervalNumber

Count and name content from a SQL Server table

I have a table which is structured like this:
+-----+-------------+-------------------------+
| id | name | timestamp |
+-----+-------------+-------------------------+
| 1 | someName | 2016-04-20 09:41:41.213 |
| 2 | someName | 2016-04-20 09:42:41.213 |
| 3 | anotherName | 2016-04-20 09:43:41.213 |
| ... | ... | ... |
+-----+-------------+-------------------------+
Now, I am trying to create a query, which selects all timestamps since time x and count the amount of times the same name occurs in the result.
As an example, if we would apply this query to the table above, with 2016-04-20 09:40:41.213 as the date from which on it should be counted, the result should look like this:
+-------------+-------+
| name | count |
+-------------+-------+
| someName | 2 |
| anotherName | 1 |
+-------------+-------+
What I have accomplished so far is the following query, which gives me the the names, but not their count:
WITH screenshots AS
(
SELECT * FROM SavedScreenshotsLog
WHERE timestamp > '2016-04-20 09:40:241.213'
)
SELECT s.name
FROM SavedScreenshotsLog s
INNER JOIN screenshots sc ON sc.name = s.name AND sc.timestamp = s.timestamp
ORDER BY s.name
I have browsed through stackoverflow but was not able to find a solution which fits my needs and as I am not very experienced with SQL, I am out of ideas.
You mention one table in your question, and then show a query with two tables. That makes it hard to follow the question.
What you are asking for is a simple aggregation:
SELECT name, COUNT(*)
FROM SavedScreenshotsLog
WHERE timestamp > '2016-04-20 09:40:241.213'
GROUP BY name
ORDER BY COUNT(*) DESC;
EDIT:
If you want "0" values, you can use conditional aggregation:
SELECT name,
SUM(CASE WHEN timestamp > '2016-04-20 09:40:241.213' THEN 1 ELSE 0 END) as cnt
FROM SavedScreenshotsLog
GROUP BY name
ORDER BY cnt DESC;
Note that this will run slower because there is no filter on the dates prior to aggregation.
CREATE TABLE #TEST (name varchar(100), dt datetime)
INSERT INTO #TEST VALUES ('someName','2016-04-20 09:41:41.213')
INSERT INTO #TEST VALUES ('someName','2016-04-20 09:41:41.213')
INSERT INTO #TEST VALUES ('anotherName','2016-04-20 09:43:41.213')
declare #YourDatetime datetime = '2016-04-20 09:41:41.213'
SELECT name, count(dt)
FROM #TEST
WHERE dt >= #YourDatetime
GROUP BY name
I've posted the answer, because using the above query can generate errors in converting the string in where clause into a datetime, it depends on the format of the datetime.

SQL to find the date when the price last changed

Input:
Date Price
12/27 5
12/21 5
12/20 4
12/19 4
12/15 5
Required Output:
The earliest date when the price was set in comparison to the current price.
For e.g., price has been 5 since 12/21.
The answer cannot be 12/15 as we are interested in finding the earliest date where the price was the same as the current price without changing in value(on 12/20, the price has been changed to 4)
This should be about right. You didn't provide table structures or names, so...
DECLARE #CurrentPrice MONEY
SELECT TOP 1 #CurrentPrice=Price FROM Table ORDER BY Date DESC
SELECT MIN(Date) FROM Table WHERE Price=#CurrentPrice AND Date>(
SELECT MAX(Date) FROM Table WHERE Price<>#CurrentPrice
)
In one query:
SELECT MIN(Date)
FROM Table
WHERE Date >
( SELECT MAX(Date)
FROM Table
WHERE Price <>
( SELECT TOP 1 Price
FROM Table
ORDER BY Date DESC
)
)
This question kind of makes no sense so im not 100% sure what you are after.
create four columns, old_price, new_price, old_date, new_date.
! if old_price === new_price, simply print the old_date.
What database server are you using? If it was Oracle, I would use their windowing function. Anyway, here is a quick version that works in mysql:
Here is the sample data:
+------------+------------+---------------+
| date | product_id | price_on_date |
+------------+------------+---------------+
| 2011-01-01 | 1 | 5 |
| 2011-01-03 | 1 | 4 |
| 2011-01-05 | 1 | 6 |
+------------+------------+---------------+
Here is the query (it only works if you have 1 product - will have to add a "and product_id = ..." condition on the where clause if otherwise).
SELECT p.date as last_price_change_date
FROM test.prices p
left join test.prices p2 on p.product_id = p2.product_id and p.date < p2.date
where p.price_on_date - p2.price_on_date <> 0
order by p.date desc
limit 1
In this case, it will return "2011-01-03".
Not a perfect solution, but I believe it works. Have not tested on a larger dataset, though.
Make sure to create indexes on date and product_id, as it will otherwise bring your database server to its knees and beg for mercy.
Bernardo.

SQL - Select unique rows from a group of results

I have wrecked my brain on this problem for quite some time. I've also reviewed other questions but was unsuccessful.
The problem I have is, I have a list of results/table that has multiple rows with columns
| REGISTRATION | ID | DATE | UNITTYPE
| 005DTHGP | 172 | 2007-09-11 | MBio
| 005DTHGP | 1966 | 2006-09-12 | Tracker
| 013DTHGP | 2281 | 2006-11-01 | Tracker
| 013DTHGP | 2712 | 2008-05-30 | MBio
| 017DTNGP | 2404 | 2006-10-20 | Tracker
| 017DTNGP | 508 | 2007-11-10 | MBio
I am trying to select rows with unique REGISTRATIONS and where the DATE is max (the latest). The IDs are not proportional to the DATE, meaning the ID could be a low value yet the DATE is higher than the other matching row and vise-versa. Therefore I can't use MAX() on both the DATE and ID and grouping just doesn't seem to work.
The results I want are as follows;
| REGISTRATION | ID | DATE | UNITTYPE
| 005DTHGP | 172 | 2007-09-11 | MBio
| 013DTHGP | 2712 | 2008-05-30 | MBio
| 017DTNGP | 508 | 2007-11-10 | MBio
PLEASE HELP!!!?!?!?!?!?!?
You want embedded queries, which not all SQLs support. In t-sql you'd have something like
select r.registration, r.recent, t.id, t.unittype
from (
select registration, max([date]) recent
from #tmp
group by
registration
) r
left outer join
#tmp t
on r.recent = t.[date]
and r.registration = t.registration
TSQL:
declare #R table
(
Registration varchar(16),
ID int,
Date datetime,
UnitType varchar(16)
)
insert into #R values ('A','1','20090824','A')
insert into #R values ('A','2','20090825','B')
select R.Registration,R.ID,R.UnitType,R.Date from #R R
inner join
(select Registration,Max(Date) as Date from #R group by Registration) M
on R.Registration = M.Registration and R.Date = M.Date
This can be inefficient if you have thousands of rows in your table depending upon how the query is executed (i.e. if it is a rowscan and then a select per row).
In PostgreSQL, and assuming your data is indexed so that a sort isn't needed (or there are so few rows you don't mind a sort):
select distinct on (registration), * from whatever order by registration,"date" desc;
Taking each row in registration and descending date order, you will get the latest date for each registration first. DISTINCT throws away the duplicate registrations that follow.
select registration,ID,date,unittype
from your_table
where (registration, date) IN (select registration,max(date)
from your_table
group by registration)
This should work in MySQL:
SELECT registration, id, date, unittype FROM
(SELECT registration AS temp_reg, MAX(date) as temp_date
FROM table_name GROUP BY registration) AS temp_table
WHERE registration=temp_reg and date=temp_date
The idea is to use a subquery in a FROM clause which throws up a single row containing the correct date and registration (the fields subjected to a group); then use the correct date and registration in a WHERE clause to fetch the other fields of the same row.

postgres - partial column in SELECT/GROUP BY - column must appear in the GROUP BY clause or be used in an aggregate function

Both the following two statements produce an error in Postgres:
SELECT substring(start_time,1,8) AS date, count(*) as total from cdrs group by date;
SELECT substring(start_time,1,8) AS date, count(*) as total from cdrs group by substring(start_time,1,8);
The error is:
column "cdrs.start_time" must appear in the GROUP BY clause or be used
in an aggregate function
My reading of postgres docs is that both SELECT and GROUP BY can use an expression
postgres 8.3 SELECT
The start_time field is a string and has a date/time in form ccyymmddHHMMSS. In mySQL they both produce desired and expected results:
+----------+-------+
| date | total |
+----------+-------+
| 20091028 | 9 |
| 20091029 | 110 |
| 20091120 | 14 |
| 20091121 | 4 |
+----------+-------+
4 rows in set (0.00 sec)
I need to stick with Postgres (heroku). Any suggestions?
p.s. there is lots of other discussion around that talks about missing items in GROUP BY and why mySQL accepts this, why others don't ... strict adherence to SQL spec etc etc, but I think this is sufficiently different to 1062158/converting-mysql-select-to-postgresql and 1769361/postgresql-group-by-different-from-mysql to warrant a separate question.
You did something else that you didn't describe in the question, as both of your queries work just fine. Tested on 8.5 and 8.3.8:
# create table cdrs (start_time text);
CREATE TABLE
# insert into cdrs (start_time) values ('20090101121212'),('20090101131313'),('20090510040603');
INSERT 0 3
# SELECT substring(start_time,1,8) AS date, count(*) as total from cdrs group by date;
date | total
----------+-------
20090510 | 1
20090101 | 2
(2 rows)
# SELECT substring(start_time,1,8) AS date, count(*) as total from cdrs group by substring(start_time,1,8);
date | total
----------+-------
20090510 | 1
20090101 | 2
(2 rows)
Just to summarise, error
column "cdrs.start_time" must appear in the GROUP BY clause or be used in an aggregate function
was caused (in this case) by ORDER BY start_time clause. Full statement needed to be either:
SELECT substring(start_time,1,8) AS date, count(*) as total FROM cdrs GROUP BY substring(start_time,1,8) ORDER BY substring(start_time,1,8);
or
SELECT substring(start_time,1,8) AS date, count(*) as total FROM cdrs GROUP BY date ORDER BY date;
Two simple things you might try:
Upgrade to postgres 8.4.1
Both queries Work Just Fine For Me(tm) under pg841
Group by ordinal position
That is, GROUP BY 1 in this case.