SQL query giving wrong sum - sql

I'm using the rather old Microsoft Query that comes with Excel to query an ODBC database. However it's giving me the wrong sum when I join two tables.
This works fine:
SELECT accountcode, SUM(tr_amount)
FROM deb_trans deb_trans
WHERE (today() > dr_tr_due_date + 14)
GROUP BY accountcode
However, this does not:
SELECT deb_trans.accountcode, Sum(deb_trans.tr_amount)
FROM deb_trans deb_trans, mailer_master mailer_master
WHERE (today()>dr_tr_due_date+14) AND (mailer_master.accountcode=deb_trans.accountcode)
GROUP BY deb_trans.accountcode
The joined field being accountcode.
The field tr_amount orginates from the deb_trans table. It is not present in mailer_master.
Any ideas? Thanks guys!

If you join the tables, you get a row for each combination which corresponds to the filter criteria before it is grouped. In this case: a row for each deb_trans and mailer_master combination filtered by date. If you want a valid sum, you should not join another table the way that the number of rows (before grouping) is changed.

Related

Selecting a single row from a column that has multiple rows

I'm a SQL newbie so bear with me.
I am writing a select statement to select data from multiple tables which I have done however when I try to select a specific column I get duplicates as that column can rightly have multiple rows. What I want to do is select the most appropriate row and select that.
My code so far:
Select
a.[StudentId], a.[Name], a.[StartDT], a.[EndDT],
b.[ClassID], b.[Module], b.[ModStart], b.[ModEnd]
from
[Data].[StudentTbl] a
left join
[Data].[ClassTbl] b on a.[StudentId] = b.[Student_ID]
When I select the b.[Module] I'm getting multiple rows as there can be a number of modules per class however I am wanting to select the b.[Module] the student has completed before leaving.
Essentially if the a.[EndDT] is equal to b.[ModEnd], I need that specific row. Max function doesn't always work as there are DQ issues within the ClassTbl that when a student has left a row is inserted after the last module saying N/A
What I'm currently getting is this:
What I want to get eventually:

Oracle SQL Developer(4.0.0.12)

First time posting here, hopes it goes well.
I try to make a query with Oracle SQL Developer, where it returns a customer_ID from a table and the time of the payment from another. I'm pretty sure that the problems lies within my logicflow (It was a long time I used SQL, and it was back in school so I'm a bit rusty in it). I wanted to list the IDs as DISTINCT and ORDER BY the dates ASCENDING, so only the first date would show up.
However the returned table contains the same ID's twice or even more in some cases. I even found the same ID and same DATE a few times while I was scrolling through it.
If you would like to know more please ask!
SELECT DISTINCT
FIRM.customer.CUSTOMER_ID,
FIRM.account_recharge.X__INSDATE FELTOLTES
FROM
FIRM.customer
INNER JOIN FIRM.account
ON FIRM.customer.CUSTOMER_ID = FIRM.account.CUSTOMER
INNER JOIN FIRM.account_recharge
ON FIRM.account.ACCOUNT_ID = FIRM.account_recharge.ACCOUNT
WHERE
FIRM.account_recharge.X__INSDATE BETWEEN TO_DATE('14-01-01', 'YY-MM-DD') AND TO_DATE('14-12-31', 'YY-MM-DD')
ORDER
BY FELTOLTES
Your select works like this because a CUSTOMER_ID indeed has more than one X__INSDATE, therefore the records in the result will be distinct. If you need only the first date then don't use DISTINCT and ORDER BY but try to select for MIN(X__INSDATE) and use GROUP BY CUSTOMER_ID.
SELECT DISTINCT FIRM.customer.CUSTOMER_ID,
FIRM.account_recharge.X__INSDATE FELTOLTES
Distinct is applied to both the columns together, which means you will get a distinct ROW for the set of values from the two columns. So, basically the distinct refers to all the columns in the select list.
It is equivalent to a select without distinct but a group by clause.
It means,
select distinct a, b....
is equivalent to,
select a, b...group by a, b
If you want the desired output, then CONCATENATE the columns. The distict will then work on the single concatenated resultset.

how to write a two counts and the newest record in hive

my data format is like this
query guid result time
I want to write a sql like
select
query,
count(query),
count(distinect guid),
result
from
table
group by
query
second column means the number of same querys,third column means the number of the distinct guids,the fourth column means the newest result,while same query may have several results and we chose the newest result by the time.since its logic is a little complex,how can i write a sql to do all these things?
select a.md5,a.cnt,a.wide,b.main_level from (select md5,count(md5) cnt,count(distinct guid) wide,max(time) maxtime from hive group by md5) a join hive b on a.maxtime = b.time ;

Complex sql query involving timestamp to timestamp periods and joins and sums - is it even possible?

I am trying to create database query, which will select rows from one table, create periods from those rows (using Lag window function), and join the query with rows from different table, where it sums value(s)from one column per row in first table.
Table A:
id,
created_at,
object_id
Table B:
id,
end_time,
value,
object_id
And rows, that query yields should consist of columns something like:
lag(tablea.created_at) over(tablea.object_id, tablea.created_at),
tablea.created_at,
tablea.object_id
sum(tableb.value) where it sums the tableb.value from matching period
I tried creating query where i put the window function into WHERE clause only to get an error. I also tried puting the period into join on clause but that also raised an error.
It is no problem, if it is not possible. I just want to know if it possible and in that case how it is possible. If it is not possible, then i just will try to come up with alternative aproach.
Edit: link to example sqlfiddle: http://sqlfiddle.com/#!1/c7878
Edit2: The SQL i tried was something like:
SELECT lag(a.created_at, a.created_at, a.object_id, sum(b.value) from tablea a left join tableb b on (something) order by a.object_id, a.created_at
But obviously that did not work, because i could not use window function in ON clause. That's where i got stuck

Why does the number of rows increase in a SELECT statement with INNER JOIN when a second column is selected?

I am writing some queries with self-joins in SQL Server. When I have only one column in the SELECT clause, the query returns a certain number of rows. When I add another column, from the second instance of the table, to the SELECT clause, the results increase by 1000 rows!
How is this possible?
Thanks.
EDIT:
I have a subquery in the FROM clause, which is also a self-join on the same table.
How is this possible?
the only thing I can think of is that you have SELECT DISTINCT and the additional column makes some results distinct that weren't before the additional column.
For example I would expect the second result to have many more rows
SELECT DISTINCT First_name From Table
vs
SELECT DISTINCT First_name, Last_name From Table
But if we had the actual SQL then something else might come to mind